Page 1 of 1
Unplanned Service Outage [back up! Small revert]
Posted: Thu May 26, 2016 9:34 pm
by Red Squirrel
The main file server that houses everything on my network failed hard and will not come back up. It's not looking good. I will probably have to reinstall the OS on the file server and all the VMs if they corrupted in the process, which means having to reconfigure everything which is long and tedious.
I am not sure what is the extent of data loss yet if any. Posting from mobile as my entire network us down. I'm just going to go pass out in a corner and try to pretend this did not just happen.
Edit: To clarify on potential data loss, we do have backups, but data loss will mean having to restore them which will take longer and there is going to be somewhat of a revert (about 1 day) if we do use backups.
Archived topic from AOV, old topic ID:6657, old post ID:38849
Unplanned Service Outage [back up! Small revert]
Posted: Thu May 26, 2016 10:53 pm
by Red Squirrel
I was able to boot with a rescue CD and all 3 of the raid arrays have been started and the data is visible. This does not mean the VMs are not corrupted, but it is a very good start as it's one thing less to worry about. One of the arrays did go into resync mode, so I'm going to just leave that alone till it finishes before I do anything further.
Meanwhile I will research to see if there is anything I can do to save the OS, as I really don't want to have to resetup everything.
If the VMs are not corrupt, I could have my whole network back online by as early as tomorrow night, and that includes the shard.
I will continue to post updates on progress.
Archived topic from AOV, old topic ID:6657, old post ID:38850
Unplanned Service Outage [back up! Small revert]
Posted: Thu May 26, 2016 11:32 pm
by Ixxie
I am useless with tech but I have cookies.
Archived topic from AOV, old topic ID:6657, old post ID:38851
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 12:01 am
by Red Squirrel
I could use cookies.
I'm waiting for the resync to happen. ETA is like 550 minutes. I probably don't have to, but better safe than sorry I figure. Then tomorrow I will try to see if I can save the OS. Failing that I will have no choice but to reinstall. Good news is since that is a file server there is actually not THAT much to configure. I'm more worried about the VMs at this point as some of them are complex and will take a while to setup. The shard VMs will most likely be fine, but I have tons of stuff of my own on there too. I think the worse case scenario for the shard is having to revert to a day old backup if the VM is corrupt.
Archived topic from AOV, old topic ID:6657, old post ID:38852
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 12:49 am
by Ixxie
i don't know what any of that is
But i can share with you my super secret cookie recipe. It pmuch rocks all manner of socks. And it sounds like you've got some time to kill.
Archived topic from AOV, old topic ID:6657, old post ID:38853
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 10:48 am
by Ixxie
Progress? Are you still a ginger? Or did it all turn white/fall out
Archived topic from AOV, old topic ID:6657, old post ID:38854
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 11:50 am
by Death
Ixxie wrote:Progress? Are you still a ginger? Or did it all turn white/fall out
Lmao why dont we have an upvote button
Archived topic from AOV, old topic ID:6657, old post ID:38855
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 1:56 pm
by Red Squirrel
The raid sync is now complete. Now I'm just doing more research to see if there is any way I can salvage the OS. Worse case scenario I may have to reinstall it. If I can't figure out how to salvage it by like midnight I'll probably just proceed with a reinstall.
I still don't know if my VMs are going to be corrupted, that is probably the most stressful part at this point.
Still a ginger, but some of the hairs are starting to turn white.
:
Archived topic from AOV, old topic ID:6657, old post ID:38856
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 3:16 pm
by Red Squirrel
Small update, I managed to fix the bootloader on the server, it now sorta boots, but NFS gets hung up because it can't resolve hostnames. The DNS server is in a VM, the VM is on a datastore, on THAT server! Need to figure out the best course of action at this point. I may have to setup a temp wildcard DNS server just so it can resolve to something and let the system boot.
Archived topic from AOV, old topic ID:6657, old post ID:38857
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 3:54 pm
by Red Squirrel
Managed to get the server to boot. I got lucky in that my old DNS server is still running, I restarted the DNS service and pointed to that IP. I was able to get the VMware server to refresh the storage and now I'm going through each VM one by one to see if there is any damage.
No ETA yet, but if everything actually ends up working 100% I could have things up as soon as a few hours, but it will probably go up and down a few times as I test reboots and stuff to ensure that next time the system actually comes back properly.
Archived topic from AOV, old topic ID:6657, old post ID:38858
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 4:40 pm
by Red Squirrel
Things are looking half decent, I was able to start all my VMs back and nothing stands out at me immediately, though it's still very early, and I already did notice some services that had to be started manually and what not, but as far as I can tell all the data seems intact.
However as far as the shard is concerned things are not as great, the database is corrupted. I will be having to restore a backup. Good news is I'm way more ahead than I figured I'd be at this point. I was really sure I'd be having to reinstall the OS on every single VM.
Archived topic from AOV, old topic ID:6657, old post ID:38859
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 4:52 pm
by Red Squirrel
I was able to restore a database from May 26 6:00am.
The shard is now back up!
It may potentially go down as I need to recheck everything on my network and that may involve reboots to make sure stuff is running correctly.
Archived topic from AOV, old topic ID:6657, old post ID:38860
Unplanned Service Outage [back up! Small revert]
Posted: Fri May 27, 2016 6:18 pm
by Red Squirrel
Ok can't think of anything else I need to check, I can't actually do a test reboot of my environment till I fix my DNS situation, so that won't happen for a while. So I now consider the shard as back up.
Archived topic from AOV, old topic ID:6657, old post ID:38861