Page 1 of 1
TC1 HDD failure
Posted: Sun Feb 10, 2008 9:21 pm
by Red Squirrel
Drive 3 of my network's main server has failed. That houses all my production VMs, and one of those is TC1. I also lost all my pp2 downloads etc... but thats another story. (nah I don't back that stuff up as its not considered critical, but still a bad loss)
So TC1 will be unavailable for a while.
I will order another drive. first I'm doing a scan on other drives as I figure if by chance another is about to fail I may as well just replace them all at once.
The major issue now is that TC1 and live are not synced. In fact, some files on TC1 are newer, and some files on live are newer. So this will be a nightmare of a merge to do.
This is just another rude reminder of how important it is to backup. I have 2 backups of TC1 on hand, though I still lost a lot since I did quite a lot today. I will be making another backup to my PC as well as a backup of the live shard, in case further disaster occurs.
Archived topic from AOV, old topic ID:2325, old post ID:14890
TC1 HDD failure
Posted: Mon Feb 11, 2008 9:21 am
by ninja2007
dang that sucks man .... arg again there is something wrong with tc1 just dont make it public ppl get on there and mess around and crash it on purpose just dont make it public
Archived topic from AOV, old topic ID:2325, old post ID:14901
TC1 HDD failure
Posted: Mon Feb 11, 2008 11:35 am
by Red Squirrel
People can't really crash the physical server. Its a physical disk failure.
If the shard itself gets trashed it takes 2 minutes to fix.
Archived topic from AOV, old topic ID:2325, old post ID:14903
TC1 HDD failure
Posted: Mon Feb 11, 2008 9:17 pm
by Red Squirrel
wow the damage is worse then I thought. for some stupid reasons the backups have not ran since like feb 1 08. So I lost everything since then. Well live may be newer but everything that was incomplete to put on live, is lost. It may take months to recover from this data loss and months before there is another update.
Also for some weird reason my PC refuses to talk to the domain now (got part of the server back up, minux the VM/sql part of it) so I can't access anything from my own PC. really sucks... I might end up having to trash the whole domain and just go to local user accounts, but thats just.. gross. I'm reimaging my PC now in hopes it links back to the DC.... at least that will be one less headache if I can at least map all my shares and stuff.
But yeah, this just totally sucks. I don't know why the backups randomly stopped running. I'm scared to see what the result of the SQL restore will be....
Archived topic from AOV, old topic ID:2325, old post ID:14912
TC1 HDD failure
Posted: Tue Feb 12, 2008 11:05 pm
by Red Squirrel
I'm in the process of rebuilding my most critical VMs on a temp HDD. I have a 80GB external USB drive so just hooked it up to the server and mounted as /data3 and started rebuilding the file structure.
I'm still going to wait till the actual replacement drive comes in before doing critical work though. I'm hoping to be back in business by Friday. I'll still have to go through all scripts to figure out what needs to be recoded so this may take a few weeks to recover from.
I also figured out why the backups were not running and rectified the issue.
I still did not check the backup server, if I'm lucky those backups will be a bit more recent then Feb1. What I'm worried about the most is the core changes as those arn't on live (only the exe is) and there was a few recent critical security fixes put in.
Archived topic from AOV, old topic ID:2325, old post ID:14938
TC1 HDD failure
Posted: Wed Feb 13, 2008 8:26 pm
by Red Squirrel
The new drive came in, and its big!
I hate how HDD manufacturers use 1,000,000,000,000 bytes = 1TB calculation, since their 1TB is not really 1TB, its less. But I expected that as they all use this gimick, though regardless, I still have a crapload of space:
Filesystem Size Used Avail Use% Mounted on
/dev/hda2 9.5G 4.8G 4.3G 54% /
/dev/hda1 145M 8.9M 128M 7% /boot
/dev/hdb1 459G 242G 194G 56% /data
/dev/hda5 100G 4.7G 90G 5% /data2
tmpfs 1.8G 0 1.8G 0% /dev/shm
/dev/sda1 917G 205M 871G 1% /data3
the last one is the drive TC1 is going on. It's a virtual drive mind you, so its about 40ish GB.
I will start to rebuild my VMs as well as assess the damage of TC1 and see where I can start off from to recode all that was lost. This alone may take a few days as it will be hard to figure out what actually got done since the loss and such, as the backups are quite old so it will most likely be way more then I expect. Lot of stuff is on live though so I can use live as a base backup source.
Archived topic from AOV, old topic ID:2325, old post ID:14947
TC1 HDD failure
Posted: Thu Feb 14, 2008 10:01 pm
by Red Squirrel
TC1 is back up and setup. Now I'm in the process of recoding everything that was lost in the crash. That may take about a week or so, then I'll be back to same place I was Sunday, and I can resume normal operations.
I will also do some major changes in the backup procedure to ensure this never happens again to this extent.
Archived topic from AOV, old topic ID:2325, old post ID:14985
TC1 HDD failure
Posted: Fri Feb 15, 2008 11:06 pm
by Red Squirrel
I'm mostly now back to square one, which means the cleanup process is completed. Theres some stuff left to recode that I had already done so still playing catchup but in general everything is cleaned up now.
So normal production will be able to resume soon.
Archived topic from AOV, old topic ID:2325, old post ID:14996
TC1 HDD failure
Posted: Thu Feb 21, 2008 9:42 pm
by Red Squirrel
Wow this blows, I'm still finding stuff I need to recode due to this. This has set me behind by so much, its ridiculous. The worse part is I HAVE daily backups, but for almost a whole week they did not run, so lost everything that was done the week of the failure.
Archived topic from AOV, old topic ID:2325, old post ID:15169