Emergency Mode and Disaster Recovery
The one thing you should know about changing poker software platforms is it’s a very hard thing to do. Therefore, it’s rarely done. Upgrading existing software is usually much easier but changing over to a completely new one is a whole other ballgame.
After 2 years of development, we finally decided it was time to take the plunge and make this move. As a business decision, it made sense. The new platform will allow us to develop products and features 10 times faster than the predecessor. Still, logic aside, it was a huge undertaking.
After extensive testing in a development area with strong results, we felt we were ready to make the switch. From the outset, everything seemed to be going well. Players were logging in, our new Blitz Poker product was flying, and it seemed to be looking strong.
A few hours in though, things started happening. The cashier wasn’t displaying player balances, players were being logged out sporadically, and we were forced to cancel several tournaments due to other recurring issues. More and more glitches were being reported as the day ticked on.
It was at this point that our company made the decision to switch into emergency mode and disaster recovery. I called a company-wide meeting and outlined some of the issues that we were facing, and how we were going to proceed going forward.
The first thing that needed to be done is start prioritizing issues of importance. Our development team needed to know where to focus their efforts, and they were getting requests from all departments.
On a blackboard, I had all of us put together the list of issues needed to be resolved in order of importance:
1. Air
2. Water
3. Food
4. Sleep
5. Pillows and footsie warmers
Items such as the ability to download the poker client and deposit and withdraw funds would be classified as “air”, whereas reporting to find out the open rate on an email would be classified as “pillows and footsie warmers”.
As I’ve often mentioned, our company works under the Scrum Agile management system. One of the benchmarks of this system is to fail fast. What I mean by this is that we’re in constant communication in short iterations to display product and results. This way, if we’re off track or something needs to be changed, we’ll know fast enough so we don’t spend too much time or money on something we shouldn’t be.
During Emergency Mode, we changed our company Stand Ups (meetings) to multiple times a day. This way everyone could be informed on any progress that was made, and any new issues that were being reported. This ensured we were knocking things off our priority list while making any changes necessary based on any new information that was reported.
Wash, rinse repeat.
We worked our way down the list for days like this, until the items became less and less emergent and much more manageable.
While the software move wasn’t perfect, I’m happy to report we’re now down to “pillows” and “footsie warmers” problems. Once those are resolved, we intend to sit as a company and do a full retrospect on how the experience went, and what we can do to improve for the next time.
There is no doubt in my mind that had our teams not been well practiced in the Scrum/Agile methodology, we’d be in a lot worse shape than we are now.
I’m proud of how we all came together.
OT Solutions | Robotics | MES | Systems Integration | SCADA | Industrial Applications | Private Pilot
5 年You hit the ground running! As a software developer and ACR player — I understand the challenge and I’m impressed with how WPN handled the situation when the rollout didn’t go as planned.
PokerGO - Store Operations and Technical Support
5 年????