Software development isn’t easy. And the bigger the software gets, the harder it is to build right from the ground up. A string of high-profile failures has given us a timely reminder of this. But let’s look at the HealthCare.gov fiasco in a little more detail. Yes, there were some user interface idiosyncrasies. Yes, the site wasn’t up to handling the traffic it received. Yes, users were turned away, certain plans or providers were not included, and many of those precious few applications that were submitted had unexplained errors. But if we look past the symptoms, underlying much of this seemed to be serious problems with the system integration.

This entire site was essentially a friendly face on a massive integration project, and it’s certainly not alone in that, in the healthcare space or others. But massive integration projects can be done well, even at the scale we’re talking about. Here are five things to consider to bump the odds in your favor:

1. Separate integration logic from application logic
A user-facing application shouldn’t be responsible for translating data formats, converting between synchronous and asynchronous requests, or retrying when a partner system is down. Not to mention that there shouldn’t be any need to redeploy an application if a partner endpoint changes. Application frameworks and application developers are better at building applications, while integration frameworks and tools are better at handling integration challenges. It’s best to build an application to send and receive idealized formats, and to use external integration tools to handle detailed transformations and other processing.

Does this mean an ESB? Spring Integration or Apache Camel? An enterprise integration product? Just a pile of custom code? It’s hard to give a specific answer. We’ve seen solutions based on all of the above, from Groovy scripts to off-the-shelf products. The concept is the important part: building a service that can easily adapt to changing integration partners, formats or endpoints, and leaving the application side of the interface fairly clean.

2. Use tools that make integration easy
When extensive integration is a given, it makes the most sense to select tools that make integration easy. This means easy to configure, easy to develop transformations and other logic, easy to test, easy to debug, and easy to deploy. It means supporting multiple options for building integration logic—perhaps avoiding ugly generated code in favor of dynamic inspection, perhaps avoiding complex GUIs in favor of simple blocks of code. It means making it easy to reject or queue requests when an endpoint is down, easy to update individual integration flows, and easy to scale out to handle additional load.

The problem is that most options look the same from a spec sheet. The only way to tell is with some hands-on experience. How easy is it for a developer get some integration code and corresponding tests up and running on his or her machine? Can the configuration or code be sensibly version-controlled, and flow through already established channels for continuous integration and continuous deployment? How easy is it to deploy versioned integration logic, supporting multiple releases of a client or endpoint? Can it be secured conveniently if needed? Most likely a proof of concept will be the only way to answer these questions.

3. Parallel implementations can work
Projects with many integration points tend to have a large number of interested parties, many of whom may be developing their parts of the system in parallel. Typically each group finishes and tests their components, then hooks them up to the rest for testing. When you put all these first drafts together, all late in the game, you typically run into a lot of late-breaking problems. But there’s an easy way to avoid that. If everybody starts by providing simulated requests and replies, then testing can begin on day one and each group can phase in real implementations over time. Each system gets immediate feedback as to whether anything’s breaking, and there’s much less debugging required to fix small changes from a known working state.

The easiest way to start is by capturing a request or reply message (if there’s a working channel available, either production or test).  Failing that, perhaps to construct a simple message by hand.  Every time the integration channel is invoked, return the same static message.  Perhaps broaden out from there to a handful of messages, if there are different types of requests or replies. Then slowly start phasing in actual logic. That may mean substituting parts of the static messages with real content piece by piece, or using code to process the requests and replies but feeding it initially with largely static data. Over time, add the needed logic behind the scenes, populate the rest of the common requests or replies, and then work out toward the edge cases. If both sides can agree on an order of attack, so much the better.

It still won’t help if one side takes the firewall approach, not releasing an iota of code until it’s “done.” But even if things look to be heading that way, telling your partners that your side of the pipe is in place and ready for testing on day one may encourage them to give it a go, and any feedback you get will help.

4. Build in monitoring from the start
Let’s face it: problems will happen. Partner systems will be down, data formats will change, delays will accumulate. Besides simply firing alerts when problems crop up, integrated monitoring can help reproduce and troubleshoot problems. Bad requests can be stashed for later use, or a long series of requests can be captured for playback in a load test. Timing requests and gathering statistics across request types can suggest when to separate certain flows onto different machines for performance. But all that aside, simply monitoring for failures in any of the integration points can mean the difference between proactive fixes and customers complaining that their requests were never completed.

It’s all well and good to build a system that has monitoring support, but take the opportunity early to use it. Build automated tests with messages captured via the monitoring channels. When integrating new partners, instead of manually inspecting early test responses, let the monitoring system inform you when a problem crops up. It may seem like more work up front, but having all the plumbing in place sure beats “monitoring support” that’s never been tried in practice.

5. Expect change
If there’s one certainty in the world of software, it’s change. Systems will be upgraded, formats will be revised, features will be added. The more systems are interconnected, the more an isolated change is likely to have unanticipated repercussions.

In some ways, all the prior points have been leading up to this. When one endpoint changes, it’s better if the integration logic is isolated from the related application logic, so accommodating the change doesn’t automatically require changes to additional applications. It’s better if the related integration logic is easy to update, test and deploy. It’s better if altered requests or responses are immediately flagged, and can be captured and replayed in testing.

And one step further: It’s best if all the systems have a comprehensive automated test suite. When these changes happen, it’s much easier to test a fix if the computer can run the entire scenario for you. Pop in the latest messages from the error queue and instantly reproduce the problem. That probably depends, of course, on having an adequate test environment, with real, cleansed or adequately simulated production data, and a process to keep it up to date. It’s worth the investment, though: If change is a given, you’ll be leaning heavily on the test environment. With today’s tools, there’s no reason it can’t be straightforward to replicate the production stack where needed, whether on your own machine or on a temporary node in the cloud.

So the next time you kick off a big integration project, remember: It’s possible to do it right. When 10% of the applications submitted via HeathCare.gov have unrecoverable errors, we should all be embarrassed.

Aaron Mulder is CTO and director at Chariot Solutions, responsible for the design and deployment of technical development standards, policies and style guidelines. He has been working with Java technology since its inception. He has directly contributed to many open-source projects, including Apache projects such as Geronimo, ActiveMQ, ServiceMix, OpenEJB, and XBean, as well as other projects, including JBoss and PostgreSQL.

Guest Views are contributions by SD Times readers. Interested in contributing a Guest View? See the guidelines for the details.