It was only a few short months ago that a vulnerability in the Xen Hypervisor resulted in problems for Amazon’s EC2. We called it the Amazonian apocalypse then, and the time has come for its sequel.
Last time, as you may recall, servers were chunked into sections and given windows during which they needed to be rebooted. This time, the same thing is happening. Amazon says that it’s only a problem for 10% of EC2 instances, as it was last time. But we do know that it’s another Xen bug. Most likely, its some combination of five bugs that were listed on the Xen security advisory site yesterday. Those vulnerabilities won’t be made public until March 10, however.
That’s also the deadline to reboot your EC2 instance, if you are, in fact, one of the unlucky less than 10% of EC2 instances that are affected. Amazon contacted us to inform us that not all of those affected will even need to reboot their servers by hand. They indicated that this was the case last time as well.
(UPDATE: Amazon contacted us Monday morning to let us know that they’ve figured out a workaround that allows them to take care of this reboot for 99.9% of affected customers. That means you probably will not have to reboot now.)
It’s a very difficult thing for us to gauge from the outside, but I can confirm that last time, despite it being only an under-10% problem, I personally knew of admins that were up over night redeploying servers. This was, in almost all cases, entirely their own fault: They’d designed their systems in a way that made things break after patching, or that required a lot of manual labor for rebooting their EC2 instances.
Additionally, this less than 10% issue is not exclusive to single customers. Commenters online are already chiming in with their server numbers: a Hacker News commenter casted aspersions on how accurate that 10% number really is:
Been there, done that. AWS re:Boot in September 2014 showed us how good it was to invest in Ansible roles for all parts of our infrastructure. Still, a lot of hassle for Ops Team, especially that it was done during DevOps Days Warsaw 😉 AWS also said “10%” then, but for us it was 81 out of ~300 instances.
What is sad is that we learned about it from Hacker News and not from AWS, even when we have premium support and our own account manager. :/