CellarTracker Main Site
Register for Forum | Login | My Profile | Member List | Search

Outage

 
View related threads: (in this forum | in all forums)

Logged in as: Guest
Users viewing this topic: none
  Printable Version
All Forums >> [Cellar Talk] >> Release Notes >> Outage Page: [1]
Login
Message << Older Topic   Newer Topic >>
Outage - 10/7/2013 5:45:16 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
I am going to make this short, as I am 24 hours into this. We had a major hardware issue. It first cropped up Friday. We initiated a failover that we hoped would work. Sadly the failover completed Sunday at 6:50am and promptly took the site down. We have spent the last ~24 hours researching our options, rebuilding the environment, and restoring data from backups. Through some very unfortunate circumstances, we lost 12 hours of data on the main site. We also lost 1 week of Forum postings and uploaded images.

Please see more here: http://www.facebook.com/CellarTracker

We intend to learn a great deal from this experience. Despite taking many steps to have a highly redundant environment, we were still burned pretty badly. It is our plan to correct that and prevent this from ever happening again.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker
Post #: 1
RE: Outage - 10/7/2013 5:54:06 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
I should add: anyone who lost a significant number of edits or just feels really heavily inconvenienced from the outage, please email me at eric@cellartracker.com. I will happily add a full year of premium support to your account.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to Eric)
Post #: 2
RE: Outage - 10/7/2013 6:11:15 AM   
crispino

 

Posts: 532
Joined: 3/8/2012
From: Allston Rock City (Boston), MA
Status: offline
I am soooooo glad I didn't start the official Secret Santa thread last week.

Sorry to hear about the troubles! When a couple hours went by yesterday and the site was still down, I knew it couldn't be good. Hope everything is back to normal for you soon.

- Chris

_____________________________

CT Secret Santa Compendium
CT Santa Profile

(in reply to Eric)
Post #: 3
RE: Outage - 10/7/2013 6:13:28 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
Not quite back to normal but hopefully getting there soon.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to crispino)
Post #: 4
RE: Outage - 10/7/2013 6:48:48 AM   
cigar52

 

Posts: 345
Joined: 8/15/2013
From: California... now Sarasota, FL
Status: offline
Eric, So glad you stay on top of things. Only had few bottles to remove, otherwise all good... thanks again for a great site!

(in reply to Eric)
Post #: 5
RE: Outage - 10/7/2013 8:05:57 AM   
JohnNezlek

 

Posts: 1191
Joined: 12/14/2006
From: Gloucester. VA
Status: offline
Eric,

Bad things happen to good systems all the time.

Nothing is perfect.

Most important, thank you for keeping us informed.

CT remains the best, hands down.

John


_____________________________

Too many wines, too little time.

(in reply to cigar52)
Post #: 6
RE: Outage - 10/7/2013 8:10:25 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
Thanks John.

We can do better. We need to do better.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to JohnNezlek)
Post #: 7
RE: Outage - 10/7/2013 9:32:35 AM   
dsGris

 

Posts: 4712
Joined: 8/31/2009
From: Portland, OR
Status: offline
This has been a minor inconvenience for most of us. My wife has experienced numerous work IT system and phone shutdowns when integrating or implementing new systems over the last several years and it is not pretty. Thanks again from all of us wine hobbyists.

_____________________________

DennisG
Granpa Wino

(in reply to Eric)
Post #: 8
RE: Outage - 10/7/2013 9:34:29 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
Many thanks Dennis.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to dsGris)
Post #: 9
RE: Outage - 10/7/2013 11:01:07 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
By the way, we are working on bringing up our 2nd and 3rd database servers (which service all of the guest and crawler queries). Right now, to limit the load on the primary database, I am running one of those databases with fairly stale (10 day old) data. However I should fairly soon have one of the replicants totally up to date and will switch to it then. The rub is that, if you are not logged in, the website is a bit stale right now.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to Eric)
Post #: 10
RE: Outage - 10/7/2013 12:53:00 PM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
FYI, just a reminder that we will have a brief, planned outage between 2:45 and 3:15pm PDT today. We should only be down 10-15 minutes in total during that window as we replace some defective hardware.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to Eric)
Post #: 11
RE: Outage - 10/7/2013 4:50:16 PM   
skinut

 

Posts: 7
Joined: 3/1/2005
From: DC Suburbs
Status: offline
Dianne was trying to log in yesterday and thought she was doing something wrong. When I tried we realized it was a big deal (for you) but a really minor inconvenience for us. Really glad you were able to track it down. Sorry for those of you who lost data.
As long as you learned from the outage things can only get better, right? Hang in there.

(in reply to Eric)
Post #: 12
RE: Outage - 10/7/2013 5:14:18 PM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
Yeah, it was pretty traumatic for us.

Hardware is fixed, and the SAN firmware is updated as well. Hopefully that closes out this chapter.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to skinut)
Post #: 13
RE: Outage - 10/7/2013 5:42:51 PM   
Zweder

 

Posts: 181
Joined: 10/8/2007
Status: offline
Of course this is NOT what you want to happen. But sometimes it does. Learn from it, but for the rest no big deal!

I am a member since 2006. There were a few short outages before, but as far as I know never any loss of data. Now there were. An inconvenience for  those involved of course, but CT is so perfect normally, so this mishap should be accepted as something that just can happen.

I think Eric, Dan and Andrew deserve an applause, a good night of sleep and a good bottle for their efforts of the last 30 (or so) hours.


I have learned from this that I should make a backup myself more frequently. Especially after I have entered a significant amount of data.

Cheers,
Zweder.


< Message edited by Zweder -- 10/7/2013 6:24:59 PM >

(in reply to Eric)
Post #: 14
RE: Outage - 10/8/2013 5:43:17 PM   
renzetti

 

Posts: 23
Joined: 6/10/2013
Status: offline
I agree to all comments above and wish all the best to the CT team! It is really good to know you are closely following up on what has happened, taking measures to fix that and prevent further recurrence.

I would like to ask Zweder how is it possible for ourselves to backup data entered in the site. This may help us in recover our data in the very unfortunate chance this happens again some day (which we all hope it does not). :-)

Cheers

Ricardo

(in reply to Zweder)
Post #: 15
RE: Outage - 10/8/2013 5:59:37 PM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
You can use this tool to backup data: https://www.cellartracker.com/content.asp?iContent=38

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to renzetti)
Post #: 16
RE: Outage - 10/8/2013 6:38:23 PM   
renzetti

 

Posts: 23
Joined: 6/10/2013
Status: offline
Dear Eric,

Great! Thanks a lot!

Ricardo

(in reply to Eric)
Post #: 17
RE: Outage - 10/8/2013 9:44:51 PM   
eridan

 

Posts: 24
Joined: 9/29/2010
Status: offline
Eric,

The great thing is, the bottles I had during the outage now can be drunk a second time. That can't be bad.

Also I suddenly got some quality time with the kids...



// Erik

(in reply to Eric)
Post #: 18
RE: Outage - 10/9/2013 12:32:03 AM   
jpr142

 

Posts: 1
Joined: 1/31/2011
Status: offline
Let me join others in congratulating the CT team on excellent communications during the outage and for what I'm sure was an intense amount of work to get the site running again. Well done!

(in reply to eridan)
Post #: 19
RE: Outage - 10/9/2013 4:03:40 AM   
bacchus

 

Posts: 1136
Joined: 7/25/2004
From: Staten Island, New York
Status: offline
i guess we should be grateful this happened on a weekend when most of us were drinking and not actively exercising the database.

i used to administer a database myself so i know what it feels like when you copy bad data from one device to another. once lost 30 days of data like this.

kudos to the team for stepping up to the plate and doing everything necessary to address the situation.

in terms of compensation for inconvenience, instead of one year extension to a select few, wouldn't it be more fair to give a month extension to all?

at id=791 i have been a faithful and grateful user from the beginning.

_____________________________

A Country Gentleman

(in reply to Eric)
Post #: 20
RE: Outage - 3/20/2014 11:56:43 PM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
At this point this is quite an old thread.

However, I wanted to let you know that since this outage we have been VERY busy and have completed a number of very significant actions:

1) Dramatically revised the frequency and durability of our onsite backups. (Have moved from 2 per day to 20 per day and all now directly onto redundant, durable media.)
2) Added daily offsite database cloud backups.
3) Now have weekly offsite backups of every machine in our environment.
4) Have deployed an entirely new Flash SAN with significantly more redundancy than our prior device.
5) And are now using our former SAN as an additional backup and failover device.

At this point, we are more redundant (about 7 layers worth) than at any point in the 10 year history of the site. There is always still a chance that a series of failures could create a future outage or data loss event, but we learned a great deal from the issues we faced last October.

10/6/2013 was the worst day of my professional life. We have taken dozens of steps to ensure that it remains as such. The integrity of your data is our absolute highest priority.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to bacchus)
Post #: 21
RE: Outage - 3/21/2014 12:03:11 AM   
Eric

 

Posts: 17326
Joined: 10/10/2003
From: Seattle, WA
Status: offline
quote:

ORIGINAL: bacchus
in terms of compensation for inconvenience, instead of one year extension to a select few, wouldn't it be more fair to give a month extension to all?

Sorry I missed this.

In the end, 10 people contacted me (out of 120,000 active users and 300,000 registered users). I just added a year onto your current payment as the 11th user.

_____________________________

Cheers!
-Eric LeVine

http://twitter.com/cellartracker
http://facebook.com/cellartracker

(in reply to bacchus)
Post #: 22
Page:   [1]
All Forums >> [Cellar Talk] >> Release Notes >> Outage Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts


Forum Software © ASPPlayground.NET Advanced Edition 2.4.5 ANSI

0.250