In today’s digital business world, disasters aren’t tolerated well.
When online offerings go down, organizations suffer considerable loss to revenue and reputation. For this reason, Disaster Recovery (DR) is too little, too late, and modern businesses must design for continuous availability and active/active architectures.
Along with meeting the new demands of “always on” business, continuous availability
models also avoid the inefficient economics of DR. Historically, active/active operations
created their own technical and economic obstacles, but new approaches mitigate many of those challenges.
Active/active operations provide a spectrum of advantages to enterprises, including:
• Lower total operational costs
• Improved asset utilization
• Seamless scalability
• Dramatically higher uptime
• Improved end user experience
• Superior workload performance
Limitations of Dark DR
Best practices dictate keeping data and redundant systems safely replicated at a
location far away from the primary data center in case of some regional disaster. This
model requires a regular flow of information between the two (or more) sites.
Replicating a copy of the data for backup reasons is fairly straightforward. Using those
secondary systems live along with the primary systems, however, has historically been
very challenging at the application layer. The dominant model has been to employ
a robust primary server able to handle the full workload, with plenty of overhead to
spare, backed by a remote secondary server. The two servers run replication software,
which sends a copy of the primary’s data to the secondary server to allow for
information recovery, but that secondary server is passive or “dark” – the
application doesn’t talk to it during normal operations.
The “dark DR” model suffers from several technical disadvantages. First, operations
suffer when the primary server fails, because the secondary server frequently lacks all
the information, applications and customized code the primary server holds.
In addition, the primary server’s normal workflow must be redirected to the secondary server, which becomes, at least temporarily, the new primary server. This redirection can require significant amounts of manual configuration, with an IT team at each location working overtime to enable and troubleshoot the switch.
Similar reconfiguration applies to DNS, networking, replication topology, and other infrastructure elements. Testing requirements are massive, and additional IT staff must step into place at the secondary facility while the original IT team remains focused on trying to get the primary facility back online.
Dark Costs of DR
For those who follow the DR strategy, running a full-capacity redundant system in a secondary site represents a necessary yet considerable ongoing expense to the enterprise. Too often, executives see this investment as inescapable, so they turn a blind eye to the factors comprising that expense.
These costs include:
• Extra maintenance
• Extra staffing
• Downtime costs
• Lost business
The costs associated with running a full-capacity redundant system in a secondary site can be numerous and subtle. Those costs can be especially hard to swallow when expected returns on infrastructure investments prove elusive.
Advantages of Active/Active
Given the limitations of DR, businesses need an alternative, particularly as webscale IT practices filter down from the likes of Google and Facebook into mainstream enterprises. These organizations have not only introduced the world to new IT practices, but, more importantly, they have reset expectations among users. These “personal”
apps perform so well and so consistently that enterprise users now apply the same standard to all applications they use, including enterprise applications.
The active/active model offers several notable technical advantages:
• It enables a smooth failover, meaning operations transition from the failing server to the other server(s) with no interruption in services.
• A team can perform maintenance on one system while the other stays active.
• Businesses can cut expenses by moving workloads in response to changing cost factors, such as local energy or real estate costs that impact a data center’s financial viability.
• Applications can handle more traffic due to the scaling of capacity.
• Cutting workload levels in locations creates more capacity for serving traffic growth.
• Security improves because IT can patch a vulnerability on demand rather than waiting for the next maintenance window.
These technical advantages also pay dividends economically:
• By spreading the traffic load across multiple systems, organizations put less strain on servers, extending the functional life of hardware.
• Lower site use means lower hardware expenses. While a dark DR system means a total extra cost of 2.5-3X that of a single center, an active/active setup increases costs by only 1.4 to 1.8X. That is because organizations don’t need as much hardware in each location.
• For many organizations, increased application performance leads directly to enhanced revenue, such as from speeding up transactions on e-commerce sites so that customers are less inclined to abandon shopping carts.
• Maintenance costs are lower because the tasks can be done during work hours rather than requiring a crew in the middle of the night. They also require fewer staff members because organizations can keep the application running during maintenance. This means that developers and other application specialists don’t need to be involved.
• With little to no downtime required for maintenance, organizations can further increase revenue that otherwise would have been lost.
Because the active/active model offers notable advantages, any enterprise can and should implement an active/active architecture and reap the benefits of the continuous availability model it enables.