Where possible, I like to give a post-mortem of any experienced downtime.
The root cause of this downtime was a bad code load that hadn't been properly tested. In local testing, it performed fine, but when loaded to production, caused an untrapped error in a routine that is called very frequently. This resulted in the engine terminating. The issue was resolved by reverting the changes; because it was a partial revert, there were resulting merge conflicts that had to be manually resolved.
In total, the downtime lasted for approximately 10 minutes. The amount of time shaved off of my life as I scrambled to resolve the issue is approximately 5 months. I've put some more rigorous test plans in place for code of this type in the future.
@Eoghan, the item 'a mithril flute' doesn't work anymore since the game went down. Can you take a look at this? It's an artifact I use every single day that I log in and it sucks that it stopped working suddenly. I bugged it too under report 63833.
You say, "Oh crap." You say, "My bottle is empty." Jeremy raises an eyebrow questioningly. Jeremy slaps you on the cheek.
@Eoghan, the item 'a mithril flute' doesn't work anymore since the game went down. Can you take a look at this? It's an artifact I use every single day that I log in and it sucks that it stopped working suddenly. I bugged it too under report 63833.
This was actually accidentally disabled along with some other code changes on the 27th, and has nothing to do with the downtime. It's fixed now.
Comments
The root cause of this downtime was a bad code load that hadn't been properly tested. In local testing, it performed fine, but when loaded to production, caused an untrapped error in a routine that is called very frequently. This resulted in the engine terminating. The issue was resolved by reverting the changes; because it was a partial revert, there were resulting merge conflicts that had to be manually resolved.
In total, the downtime lasted for approximately 10 minutes. The amount of time shaved off of my life as I scrambled to resolve the issue is approximately 5 months. I've put some more rigorous test plans in place for code of this type in the future.
You say, "My bottle is empty."
Jeremy raises an eyebrow questioningly.
Jeremy slaps you on the cheek.