db71 emergency maintenance

About 4:30pm PST on Saturday, Nov 6, the MySQL process for db71 started crashing repeatedly. Upon investigation, it was determined that the index file for a key database was corrupted. This used to be something that would happen once or twice a year back in the olden days, but it has never happened since we moved to ZFS and InnoDB back in mid-2016. These systems are very robust and we're happy as a peach with them.

But apparently they're not perfect. And in fact it could be a hardware issue, we can't be certain yet. Likely however it was just a random anomaly.

Anyways, we have a data purge process to get rid of old data for deleted and inactive sites, as disk space needs evolve. This process completely rebuilds the database indexes as well, so we're just running this whole process on db71 now. Sadly this is one of our very biggest databases so it will likely be close to 24 hours before it's done. During this time, data processing is halted. Typically the database can stay online as well, so you can still view your old reports, but in this case we need to take it fully offline to ensure no random end-user queries crash the database. This is extremely rare that this has to happen so we apologize for that, however it being a Saturday is pretty much the best timing possible, so that's nice at least.

We have a thread going on Twitter. I'll try to keep both the forum and Twitter updated but Twitter is our usual go-to for this kind of thing. So make sure to follow us if you're not already:

Posted Sat Nov 6 2021 9:32p by Your Friendly Clicky Admin

Still chugging along. Looks like it's going to be at least another 12 hours. I'll updated the ETA later as it gets closer.

Posted Sun Nov 7 2021 5:15a by Your Friendly Clicky Admin

We're within about 2 hours of the database coming back online, assuming no issues as it finishes.

At this point it's more than 24 hours behind real time, and there are some very high traffic sites on this server, so it take a while for the backlog to be processed. Likely an additional 8-12 hours.

Again, we'll update as we know more.

Posted Sun Nov 7 2021 5:20p by Your Friendly Clicky Admin

It's done and we're testing things... hopefully can bring it online for all of you shortly

Posted Sun Nov 7 2021 6:48p by Your Friendly Clicky Admin

Everything looks good, the backlog is processing. Currently it's about 29 hours behind real time but going faster than expected. Should be done within 8 hours easily.

Posted Sun Nov 7 2021 9:38p by Your Friendly Clicky Admin

It's already 1/3 through the backlog in just 1 hour so it will be done pretty soon actually :)

Posted Sun Nov 7 2021 10:31p by Your Friendly Clicky Admin

Our stats don't work anymore after the maintance. It still says no visitors.

Posted Sun Nov 7 2021 11:41p by TransportOn***

