TC1 HDD failure

Stay up to date with shard happenings
Locked
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

Drive 3 of my network's main server has failed. That houses all my production VMs, and one of those is TC1. I also lost all my pp2 downloads etc... but thats another story. (nah I don't back that stuff up as its not considered critical, but still a bad loss)

So TC1 will be unavailable for a while.

I will order another drive. first I'm doing a scan on other drives as I figure if by chance another is about to fail I may as well just replace them all at once.

The major issue now is that TC1 and live are not synced. In fact, some files on TC1 are newer, and some files on live are newer. So this will be a nightmare of a merge to do.

This is just another rude reminder of how important it is to backup. I have 2 backups of TC1 on hand, though I still lost a lot since I did quite a lot today. I will be making another backup to my PC as well as a backup of the live shard, in case further disaster occurs.

Archived topic from AOV, old topic ID:2325, old post ID:14890
Honk if you love Jesus, text if you want to meet Him!
ninja2007
Posts: 412
Joined: Tue Aug 21, 2007 7:38 am

TC1 HDD failure

Post by ninja2007 »

dang that sucks man .... arg again there is something wrong with tc1 just dont make it public ppl get on there and mess around and crash it on purpose just dont make it public

Archived topic from AOV, old topic ID:2325, old post ID:14901
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

People can't really crash the physical server. Its a physical disk failure.

If the shard itself gets trashed it takes 2 minutes to fix.

Archived topic from AOV, old topic ID:2325, old post ID:14903
Honk if you love Jesus, text if you want to meet Him!
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

wow the damage is worse then I thought. for some stupid reasons the backups have not ran since like feb 1 08. So I lost everything since then. Well live may be newer but everything that was incomplete to put on live, is lost. It may take months to recover from this data loss and months before there is another update.

Also for some weird reason my PC refuses to talk to the domain now (got part of the server back up, minux the VM/sql part of it) so I can't access anything from my own PC. really sucks... I might end up having to trash the whole domain and just go to local user accounts, but thats just.. gross. I'm reimaging my PC now in hopes it links back to the DC.... at least that will be one less headache if I can at least map all my shares and stuff.

But yeah, this just totally sucks. I don't know why the backups randomly stopped running. I'm scared to see what the result of the SQL restore will be....

Archived topic from AOV, old topic ID:2325, old post ID:14912
Honk if you love Jesus, text if you want to meet Him!
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

I'm in the process of rebuilding my most critical VMs on a temp HDD. I have a 80GB external USB drive so just hooked it up to the server and mounted as /data3 and started rebuilding the file structure.

I'm still going to wait till the actual replacement drive comes in before doing critical work though. I'm hoping to be back in business by Friday. I'll still have to go through all scripts to figure out what needs to be recoded so this may take a few weeks to recover from.

I also figured out why the backups were not running and rectified the issue.

I still did not check the backup server, if I'm lucky those backups will be a bit more recent then Feb1. What I'm worried about the most is the core changes as those arn't on live (only the exe is) and there was a few recent critical security fixes put in.

Archived topic from AOV, old topic ID:2325, old post ID:14938
Honk if you love Jesus, text if you want to meet Him!
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

The new drive came in, and its big!

I hate how HDD manufacturers use 1,000,000,000,000 bytes = 1TB calculation, since their 1TB is not really 1TB, its less. But I expected that as they all use this gimick, though regardless, I still have a crapload of space:

Filesystem Size Used Avail Use% Mounted on
/dev/hda2 9.5G 4.8G 4.3G 54% /
/dev/hda1 145M 8.9M 128M 7% /boot
/dev/hdb1 459G 242G 194G 56% /data
/dev/hda5 100G 4.7G 90G 5% /data2
tmpfs 1.8G 0 1.8G 0% /dev/shm
/dev/sda1 917G 205M 871G 1% /data3

the last one is the drive TC1 is going on. It's a virtual drive mind you, so its about 40ish GB.


I will start to rebuild my VMs as well as assess the damage of TC1 and see where I can start off from to recode all that was lost. This alone may take a few days as it will be hard to figure out what actually got done since the loss and such, as the backups are quite old so it will most likely be way more then I expect. Lot of stuff is on live though so I can use live as a base backup source.

Archived topic from AOV, old topic ID:2325, old post ID:14947
Honk if you love Jesus, text if you want to meet Him!
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

TC1 is back up and setup. Now I'm in the process of recoding everything that was lost in the crash. That may take about a week or so, then I'll be back to same place I was Sunday, and I can resume normal operations.

I will also do some major changes in the backup procedure to ensure this never happens again to this extent.

Archived topic from AOV, old topic ID:2325, old post ID:14985
Honk if you love Jesus, text if you want to meet Him!
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

I'm mostly now back to square one, which means the cleanup process is completed. Theres some stuff left to recode that I had already done so still playing catchup but in general everything is cleaned up now.

So normal production will be able to resume soon.

Archived topic from AOV, old topic ID:2325, old post ID:14996
Honk if you love Jesus, text if you want to meet Him!
User avatar
Red Squirrel
Posts: 29209
Joined: Wed Dec 18, 2002 12:14 am
Location: Northern Ontario
Contact:

TC1 HDD failure

Post by Red Squirrel »

Wow this blows, I'm still finding stuff I need to recode due to this. This has set me behind by so much, its ridiculous. The worse part is I HAVE daily backups, but for almost a whole week they did not run, so lost everything that was done the week of the failure.

Archived topic from AOV, old topic ID:2325, old post ID:15169
Honk if you love Jesus, text if you want to meet Him!
Locked