Forums » Help & Troubleshooting



Data loss on db8 / db10 / db36

Some customers did not get the automated email that was sent out so I'm starting a new thread with all relevant info from Clicky here at the very top. This way I can share this thread with affected customers who did not get the email.

Please make sure the email in your account is valid and up to date! We have no way of contacting you otherwise. You can update your email address here:

https://clicky.com/user/edit


----

COPY OF EMAIL SENT OUT JANUARY 11:

On January 9 2015, Clicky experienced catastrophic data loss for databases 8, 10, and 36. These 3 databases were all hosted on the same physical server.

If you are getting this email, then you had at least one site whose data was stored on this server.

One of the hard drives in this server had been acting up for a few weeks, causing some lag in processing but no problems otherwise. We were at the data center to replace it on January 9. Before the RAID (data redundancy) could be fully rebuilt, another drive in the same server died, causing the RAID to completely fail. With 2 failed drives on a 4-drive RAID 10 set up, there's 67% chance that the data is still good, but luck was not on our side.

Everything possible was done to recover the data, but we were unsuccessful.

As compensation, we are refunding your money from the last 3 years, proportional to how many of the sites in your account were affected. For example, if you had paid $300 total for Clicky over the last 3 years and 50% of your sites were affected, your refund would be $150.

Refunds are being placed in your affiliate account, which you can access here:

https://clicky.com/user/affiliate

Affiliate money can be cashed out via Paypal or applied towards future upgrades. In this case however you will also be able to request a physical check, if desired (this option will be available within 24 hours). For those of you paying with a credit card, our payment processor does not allow us to do refunds for payments older than 120 days, so we can't do refunds directly to your card for payments covering 3 years.

No excuse makes this loss acceptable and nothing could possibly make up for it, but we'll do everything we can to make sure it never happens again.



----

"YOU THOUGHT RAID WAS A BACKUP SOLUTION?!?"

No.

We always had backups. But when we virtualized everything (Summer 2012) we had to temporarily stop doing backups because the storage method on each server changed significantly. There was no longer a spare drive in each server dedicated just for storing backups. So we needed to come up with a centralized backup solution after this, but this was around the same time that the amount of data we had was just getting ridiculous, well over 100 billion rows of data at that point, which proves challenging ("BACKUPS ARE HARD", yeah, thanks). Getting backups online again was always in the back of my head, but before I knew it, "temporarily" had become two and a half years and disaster struck. Of course the 6 years we actually had backups, we never needed them. That's how these things work.

Our top priority is bringing data backups back to our service as soon as possible. The old backups we had are gone, as most parts from our old servers, including the hard drives, were recycled when we moved to a purely virtualized infrastructure.

We are also adding more in depth disk monitoring. Currently we have alerts set up when a drive in a RAID falls offline. We also monitor the real-time-ness of all servers throughout the day which makes it obvious when a server is having issues, but this is a task manually performed. We are working on a script that will check for disk i/o errors in the dmesg log on all servers throughout the day, so we will know immediately if there is potentially any problem with any of the disks/ssds in our data center and be able to replace them much more proactively than we have in the past.

Obviously we regret not taking these steps sooner. We're offering apologies and refunds to all affected customers and hope that most of you accept those and stick with us.


-------

DATA RECOVERY

We are actually still working on trying to recover the data. I don't think it will be possible but we are trying.

Meantime what we did was recreate these virtual database servers (8/10/36) on a new machine and started logging data fresh there, so that you can still get new data for your affected sites. If we are able to recover the data, the challenge is going to be merging the two data sets. Our database design won't make that possible, so what we would likely do is split each affected site into two site registrations on Clicky: Old, and New. Then you can go to one of them to view your data prior to Jan 9, and the other one to view your data since then. Not ideal but at least the data would still be available. Regardless of whether or not this happens, we are giving refunds everyone who wants one, you deserve it.


-------

OTHER

Approximately 5% of paying customers were affected by this loss. If everyone claims their refund it's going to total close to $100,000. That stings like hell, but it's the right thing to do.

Some have asked if this will put us out of business. No, it won't.

Those of you who have emailed us with empathy and words of encouragement, thank you, the world could use a little more of that.

The option to cash out your refund via physical check is not available yet, as we're getting slammed with emails and communication with affected customers is our top priority right now. But we will have that as an option soon, I promise.

Posted Tue Jan 13 2015 4:08p by Your Friendly Clicky Admin


Thank you for this post, because I haven`t receive any email. Today I was surprise seeing no data before January for some websites. I appreciate your way you reacted.

Posted Tue Jan 13 2015 10:18p by adami***


The option to request a physical check is now live.

Posted Tue Jan 13 2015 11:36p by Your Friendly Clicky Admin


The first batch of refunds claimed via Paypal will hopefully be sent out on Friday. We're using Paypal's "mass pay" feature to send them in batches, as we do for normal affiliate payouts. Mass pay requires the full amount of money being sent to be in our Paypal account at that time. As we've never sent even close to this amount of money at once before, we weren't aware of this limitation and we did not have sufficient funds to process the payout that was planned yesterday (Wednesday). Money was transferred from our bank account to Paypal that day and that should clear Friday. But if not, Monday is a bank holiday unfortunately so it would be Tuesday, or possibly Wednesday at the very latest. Just know that as soon as the transfer is completed, all claimed refunds as of that point in time will be sent. Thereafter we will be processing refunds once a week, including the physical check option for those of you opting for that.

Posted Thu Jan 15 2015 8:23p by Your Friendly Clicky Admin


While it is unfortunate that data was lost, for someone like myself tracking metrics on non-commerce related websites/applications, I do not actually mind. You should give users the option of cashing out their refund, donating it to an organisation or not cashing out the refund at all. While I was one of the people affected, it honestly isn't a big deal to me whether or not I was offered a refund.

Thank you for being such a great company and offering refunds though. I am sure those affected who do care will appreciate the gesture and those affected who are not really phased (like me) appreciate you taking accountability and taking it on the chin. The 100k figure is quite high, but it will undoubtedly result in a high user retention opposed to not doing anything at all.

So, thank you for being such a great company. I might not be on a high monthly plan, but I do appreciate the gesture still.

Posted Thu Jan 15 2015 9:34p by dwayne***


I'm thrilled to hear that you're going to continue efforts to retrieve this data. My sites are my livelihood, so of course when this happened, I did some research to see if I'd be better off with another company. But the truth is, I've had worse things happen with other companies, and they didn't take any responsibility. So I decided to stick with Clicky, even though I believed that data was lost forever, because things go wrong at every company - it's how the company responds that matters, and you guys treat your clients the way I treat mine, which is the way I want to be treated. Knowing you're still working to retrieve the data just confirms to me that this was the right decision.

Posted Thu Jan 15 2015 10:23p by merono***


As others, I am not thrilled at losing the data. However, the true test of customer service is how a company reacts when something goes wrong. Clicky has been open, proactive and offered a significant refund without (as far as I know) anyone having to ask for it. I'm actually more impressed with Clicky now than I was before. I'm staying.

Posted Fri Jan 16 2015 2:47a by sandstone***


Thank you everyone :)

Posted Fri Jan 16 2015 12:29p by Your Friendly Clicky Admin


Dang, the money did not make it to Paypal today. Should be Tuesday then.

Posted Fri Jan 16 2015 2:42p by Your Friendly Clicky Admin


My company lost roughly 50 million pageviews of extremely valuable data because of this. Only two of our twelve sites had data loss, and one of those was for our main product.

We were extremely upset when this happened, but being a software company, we are also empathetic to the situation. If this happened to us, we would be devastated.

And while we recognize that there really are only two things Clicky can do (work to try and recover the lost data and provide monetary compensation), a $300 refund does absolutely nothing for us as a company.

So, we will not be taking the refund and hope many others have the same thoughts about the situation as we do. It's a bad situation for everyone, but pulling ~$100k out of Clicky isn't going to help them potentially recover the data or get a backup strategy in place as quickly as possible.

At the very least, take some time to think before immediately requesting a payout of your refund.

Posted Sat Jan 17 2015 8:49a by limitedpres***


Hi Guys -

Thank you for being so transparent about this. I use Clicky for a very specific purpose: to issue certificates of completion to members who've taken all the lessons in their classes. With the loss of data, I can no longer verify they've taken the classes. I was also considering white labeling Clicky because it's such a SIMPLE solution, which is what my members want.

I've tried finding a replacement service for Clicky, but haven't found anything I'm happy with. Please get a backup solution online ASAP and I'll be happy to keep my business with you.

Thanks!

Posted Mon Jan 19 2015 1:29p by mariapeag***


Plans are already in place and code is being written. By the end of the month our backup service should be fully online and all databases will have a backup done by then and synced to at least one remote location as well.

Posted Mon Jan 19 2015 4:40p by Your Friendly Clicky Admin


what is the status of the data recovery? i just took my refund because some email told me to. we were buried on page two of google since 11/2013, we just hit spot #2 on page 1 5 weeks ago. the data that was lost while we were buried on page two for a whole year is extremely important to us.

i am familiar with RAID Arrays and data recovery. In this day in age there is absolutely no reason for this. Please tell me your backup solutions. I assume you backed up to the cloud? Internal SAN? Also a old school tape backup as a super safe backup when all else fails? So my question is if you have two backup solutions in place of your data, why is this restore taking so long?

you can send each drive out to a recovery place probably for thousands of dollars per drive and they can absolutely recover the data , but how to you plan to get this data back online?

Posted Tue Jan 20 2015 8:10a by texmex1***


Yes, what is the status of the data recovery? It seems very unlikely that none of the data can be recovered unless your servers physically were on fire for so long that the components melted to a crisp. Partial data is better than no data.

I'm sure you're heartened by the kind words sent on this forum, but the reality is your company will not likely survive if you don't recover the data as the details of this situation will get out. And everything looks really bad to anyone who reads through the sequence of events prior, during, and after the incident.

Posted Tue Jan 20 2015 11:26a by storym***


hmmm only 5% of customers affected, over the various sites I have seems way more than 5% have data loss.
$32 refund....yeh uhm thanks.

Posted Wed Jan 21 2015 9:26a by DeanColl***


The money came through to Paypal finally so refunds will be sent out later today.

We are still working on the data recovery, obviously we will report anything relevant.

Posted Wed Jan 21 2015 10:23a by Your Friendly Clicky Admin


What is your plan for the data recovery? Maybe if you post that, some knowledgable customers can assist or offer expertise.

Posted Thu Jan 22 2015 5:33p by storym***


We had one employee working on data recovery full time for over a week but he wasn't able to recover the data. We are now working with a third party company.

Posted Tue Jan 27 2015 1:14p by Your Friendly Clicky Admin


By the way our backup service is back online. By Saturday night all db servers will be backed up. We're storing backups locally on a single server that has over 5TB of space, which is enough for now, as well as syncing them all remotely to rsync.net (great service, btw).

Since we virtualize, disk space is a concern with multiple dbs sharing the same physical hardware, so we can't have the ideal solution of having daily backups. So instead we're doing weekly backups, each night only one virtual server on any physical server will actually do its backup. But on the 'master' backup server, it syncs all backups locally and remotely every single day.

Posted Tue Jan 27 2015 1:19p by Your Friendly Clicky Admin


Good news, we have the RAID rebuilt and are copying the databases to a new server now. Likely there will be some corruption and repairs will need to be run on them, and we'll need to do thorough testing. It will be another day or two before anything is back online and available. We'll keep you updated.

Posted Thu Jan 29 2015 2:13p by Your Friendly Clicky Admin


Update on recovery, I'm using db8 as the test bed here. We ran repairs on all tables and almost everything seems perfect when I view reports for db8 sites and having the data pulled from the recovery server instead of the new one that was created to replace it.

One table was irrecoverably messed up unfortunately, luckily it wasn't any of the main ones, but those of who use the Goals feature will care. This table is what stored goal sub-data, so when you viewed any report, say Countries, and it would break down the goal count for each country - that data will not be available. This table was also used for the main goal report, when viewing a goal and it would break down the two columns, "this visit" and "first visit"... this table was also used for that, so that data is gone. Everything else is working perfectly though. (Note, this data isn't available on your end just yet).

Now that I have the process down, db10 and db36 will both be repaired by an automated process over night tonight so tomorrow I will look through all of the tables to check data integrity, run some tests, etc. That will take most of the day. If possible I will get the data online and available to you guys tomorrow night but it will likely leak into Wednesday.

Posted Mon Feb 2 2015 7:46p by Your Friendly Clicky Admin


There were some unexpected issues with the main summary table, which is the most important one, issue is on all 3 servers. We're going to need to run a special kind of repair on this table when we've seen this issue maybe two times total in the past. It's going to take quite a while. The data should be available sometime next week.

Posted Tue Feb 3 2015 11:50p by Your Friendly Clicky Admin


So those "special" repairs are done and great news, it worked. I'm looking at reports prior to Jan 9 for sites on all these servers and they're all working.

Only issue is with db8, it had some pretty bad corruption on the summary table that we couldn't fully repair. So it does not have any data after Sep 17, so basically Sep 18 - Jan 9 will have no data from this main summary table. You can still view individual visitors and actions from that date range though. Again this is only on db8. db10 and 36 do NOT have this issue.

This recovered data is not publicly available yet but I will do my best to make it available by Monday night (US time). I need to write some code to allow switching on the fly between the two database servers (old and new). Long term the goal will be to make it completely seamless so you won't know or care which server the data is coming from, but in the interest of letting you access your historical data ASAP, the first release will likely require switching between the two servers manually, or maybe we'll do it automatically based on the date/range that's being viewed. Dunno yet. Haven't had to do anything like this before so I'm not sure what will work best.

Posted Sun Feb 8 2015 7:06p by Your Friendly Clicky Admin


Ok the recovered data is now online. Here's how it works for now. You cannot view combined data from the new db8/10/36 and the old 'recovery' ones - for now. (Long term we want to do that).

Which server you are accessing when you make a request via the web UI or the API is determined by the date in the URL. If viewing a single date, if that date is <= Jan 9, it puts you on the old server automatically, otherwise it's the new one. For date *ranges*, we check the end of the date range, e.g. Jan 1-10, the end of that range is Jan 10 - you'd be on the new server so data from Jan 1-9 wouldn't be included in that request. If however the range was Jan 10-15, or Jan 1-8, you would get all data available from those ranges since the range does not span the Jan 9 D-day. I know that's a little confusing but just test it out, hopefully it will make sense. And long term when we make it more seamless so the two data sets can blend together, it won't matter anymore.

Jan 8 and 9 are pretty much blank for all sites, but Jan 7 and before, almost all data is there.

The exception is sites on db8. The summary table, which stores things such as "top countries" or "top pages" for a given date, this table was very badly corrupted and no data between Sep 17 2014 and Jan 9 2015 was recoverable. So basically all reports other than the visitor and action logs will be blank if you are viewing anywhere between those dates. But data older than Sep 17 is still there.

Posted Wed Feb 11 2015 3:39p by Your Friendly Clicky Admin


Fantastic news! Thank you for the continued updates and for recovering this data and getting it back online for everyone affected.

Posted Wed Feb 11 2015 4p by limitedpres***


whats the deal here, i have been extremely upset about this data loss and cannot even understand how this shit wasn't retrievable through a variety of backup and/or data recovery services. is the data back online yet?

Posted Wed Feb 18 2015 5:48p by texmex1***


Yes, see my post 2 posts above yours.

Posted Wed Feb 18 2015 6:55p by Your Friendly Clicky Admin


thank you so much for working so hard on getting the recovered data back. It's very awesome that you guys are trying to turn this around.

Posted Wed Feb 25 2015 3:14p by quotient***


You must be logged in to your account to post!