With many if not most applications, it is common that a very small part of the code is responsible for nearly all of the application response time. That is, the application will spend almost all of its time executing a very minor part of the codebase.

In some cases, this small piece of code has been well optimized, and the application is as fast as can reasonably be expected. However, this is likely the exception rather than the rule. It might also be that the real delay happens in external code in a third-party application that your application depends on.

Regardless of where a performance bottleneck lies, half of the work in fixing it (or working around it) is usually spent identifying where it’s located.

One of the first things you must do to identify your pain points is to understand how your back end is being utilized. For example, if your application back-end functionality is exposed through a public API that clients use, you will want to know which API functions are being called, and how often and at what frequency they are being called.

If you have a standard website, you will want to know what users tend to do on your site: Which pages do they load and how often? Luckily, for some websites, this can be as easy as checking Google Analytics for the 5-10 most commonly accessed pages.

The second and more important step is to combine performance testing with performance monitoring in order to nail down where the problems lie. When it comes to performance testing, it’s usually a matter of experimenting until you find the point at which things either start to fall apart, often indicated by transaction times suddenly increasing, or when they just stop working.

When you run a test and reach the point at which the system is clearly under stress, you can then start looking for the bottleneck(s). In many cases, the mere fact that the system is under stress can make it a lot easier to find the bottlenecks.

If you know or suspect your major bottlenecks to be in your own codebase, you can use performance-monitoring tools to find out exactly where the code latency is happening. By combining these performance-testing and performance-monitoring tools, you will be able to optimize the right parts of the code and improve actual scalability.

Let’s say you have a website that is accessed by browsers. The site infrastructure consists of a database (SQL) server and a Web server. When a user accesses your site, the Web server fetches data from the database server. Then it performs some fairly demanding calculations on the data before sending information back to the user’s browser.

Now, let’s say you’ve forgotten to set up an important database table index in your database, a pretty common performance problem experienced with SQL databases. In this case, if you only monitor your application components (the physical servers, the SQL server and the Web server) while a single user is accessing your site, you might see that the database takes 50ms to fetch the data, and the calculations performed on the Web server take 100ms. This may lead you to start optimizing your Web server code because it looks as if that is the major performance bottleneck.

However, if you submit the system to a performance test to simulate, say, 10 users loading your website at exactly the same time, you might see that the database server now takes 500ms to respond, while the calculations on the Web server take 250ms.

The problem in this example is that your database server has to perform a lot of disk operations because of the missing table index, and those operations scale linearly (at best) with increased usage because the system has only one disk.  

The calculations, on the other hand, are each run on a single CPU core, which means a single user will always experience a calculation time of X (as fast as a single core can perform the calculation), but multiple concurrent users will be able to use separate CPU cores (often four or eight on a standard server) and experience the same calculation time, X.

Another potential scalability factor could be if calculations are cached, which would increase scalability of the calculations. This would allow average transaction times for the calculations to decrease with an increased number of users.

The point of this example is that, until you submit a system to real heavy traffic, you have really no idea how it will perform when lots of people are using the system. Put bluntly, optimizing the parts of the code you identified as performance bottlenecks when being monitored may end up being a total waste of time. It’s a combination of monitoring and testing that will deliver the information you need to properly scale.

Ragnar Lonn is founder and CEO of Load Impact, an online load-testing service.