Server Degradation - [solved]
Posted: Fri Dec 11, 2020 3:09 am
Everything is good and we are now back to full disk redundancy
----
Just a heads up I have 2 drives that have failed in one of the raid arrays on the NAS and it just so happens both the shard's VM and the database server are on that array.
I have 4 drives on order from 2 retailers to maximize chance of getting it sooner, and to have 2 spares after, since a couple of the other drives are throwing errors too. The drives in this array have around 60 thousand hours on them so I think it's just a matter of time until they all need to be swapped out.
In order to minimize load on this degraded array and reduce chance of another failure, I have decided to turn off the database server. The shard will continue to run, but if anything happens such as a crash, it will result in a revert to around Dec 11 2:45am ET. (no longer the case)
You can continue to play as normal, and if all goes well there will be no revert.
I'm on night shifts right now so I don't want to do anything too drastic at this point as I can't dedicate my focus 100% to it, but once I'm off again, I want to look at migrating the database server to another array. In theory I should be able to do that while the shard continues to run, and when I bring it back up, it will start to save again.
The shard itself does not really produce much disk IO so that will remain.
For your viewing pleasure, this is a shot of the carnage:
If I'm understanding this right, any of the B drives can fail and we will be safe, but if any of the A ones fail, then the entire array is lost. I do have backups but hope I don't need to use them as it's still a pain to rebuild everything.
Archived topic from AOV, old topic ID:6847, old post ID:39583
----
Just a heads up I have 2 drives that have failed in one of the raid arrays on the NAS and it just so happens both the shard's VM and the database server are on that array.
I have 4 drives on order from 2 retailers to maximize chance of getting it sooner, and to have 2 spares after, since a couple of the other drives are throwing errors too. The drives in this array have around 60 thousand hours on them so I think it's just a matter of time until they all need to be swapped out.
In order to minimize load on this degraded array and reduce chance of another failure, I have decided to turn off the database server. The shard will continue to run, but if anything happens such as a crash, it will result in a revert to around Dec 11 2:45am ET. (no longer the case)
You can continue to play as normal, and if all goes well there will be no revert.
I'm on night shifts right now so I don't want to do anything too drastic at this point as I can't dedicate my focus 100% to it, but once I'm off again, I want to look at migrating the database server to another array. In theory I should be able to do that while the shard continues to run, and when I bring it back up, it will start to save again.
The shard itself does not really produce much disk IO so that will remain.
For your viewing pleasure, this is a shot of the carnage:
If I'm understanding this right, any of the B drives can fail and we will be safe, but if any of the A ones fail, then the entire array is lost. I do have backups but hope I don't need to use them as it's still a pain to rebuild everything.
Archived topic from AOV, old topic ID:6847, old post ID:39583