How do we make our programs faster? How do we make anything faster?
My first co-op job was working in packaging. They had a small Industrial Engineering library and let me read from it at work. The first book I read changed my life and my way of thinking about problem solving: The Goal by Eliyahu M. Goldratt. Just about every time I work on performance, it's impossible for me to not make comparisons to The Goal in my head. In this post, we'll look at some stories from the book and how they apply to performance tuning in programming.
Things Worth Fixing
It may seem obvious, but before you fix something, you need to know if it's worth fixing. This is why we start with measurement. To improve a process, we need to identify bottlenecks. If we improve performance in one part of our process but we still hit our primary bottleneck, we might actually make things worse.
In The Goal, they talk about a scout troop going for a long hike. If you've hiked with a large number of people you'll know that when there is a tricky part, say a stream you must cross single file, there's usually a line of people waiting for their turn.
In the story, the troop has a scout that is slower than everyone else. Let's call him Schneems. He starts out somewhere in the middle of the group of scouts. As he hikes and comes across these obstacles, he must wait in line like everyone else. The troop leader notices that some of the members ahead of Schneems get WAY ahead. The first scout might be a full hour faster getting to the destination. Eventually people start passing Schneems. Now when he gets to obstacles, he has to wait behind more people.
This wouldn't cause much of a problem if the goal was to get each scout to the end as fast as possible. However that's not the case. The troop has to move as a group. They can't leave until they're all accounted for. That first scout has to sit waiting for hours while Schneems eventually makes his way to the destination. Everyone is exhausted. Is there any way to speed things up?
In this scenario, Schneems is the bottleneck. We could make the whole troop faster by asking him not to come, but that's not an option because the outdoors should be enjoyed by everyone. Since we can't eliminate the bottleneck, we can make sure that he is never waiting.
We want to make sure that any time he gets to a stream he doesn't have to stop. Every second he waits in a line is another second the whole troop waits, so we should use our resources to make sure the bottleneck is always moving at peak efficiency. A new rule is made - now Schneems walks in front and no one is allowed to pass him. The really fast kids get bunched up behind him but they actually get to go home earlier.
Resources and Tradeoffs
When we are optimizing for performance, we have a limited pool of resources. We can usually trade some resources to get another.
In the case of the hike, the fast kids have an excess of time at the end. We are trading that for a shorter overall time spent by the group hiking. In programming, our resources are usually CPU, RAM, and IO. For example we can trade an expensive calculation (CPU) for storing the result in memcache and fetching it when we need it (network IO).
Benchmarking and measuring aren't just for finding the slow case; they're about trying to better understand the loads you're placing on your system. While your program may take a bunch of inputs, there are likely some that are more common than others. If you're running a web app, each request might be to a different path or page, but every page loaded will require JS and CSS to be downloaded. We can optimize these by putting them behind CDNs.
In addition to the common cases, you also need to understand stress cases (i.e., edge cases). If you speed up your happy path by 1 percent but you slow down everything else by 90 percent, it might not be such a good change.
Transmissions and Heat Transfer
The next tale from The Goal happens in a factory. Parts are machined for a transmission (my father actually worked as a manager for a company that produced automatic transmissions), and in this factory, raw metal is cut to size for gears, then it's placed in milling machines. After the gears are milled, they're placed into a heat treatment furnace one at a time so that the parts are tempered. Once the furnace is full, it gets run. Then the parts are removed and go on to a separate finishing and installation process.
If you've been in a factory that has to do heat treatment, you'll see the problem already. Heat treating takes a long time. Once you get the materials to the specific (often high) temperature, you have to hold them there for a while, then let them slowly cool down. The equipment used for this is huge. Think of a pizza oven the size of a military barracks. It takes up a lot of factory floor space, and it's expensive to buy and operate. Buying another heat treatment furnace is rarely a cost effective option.
Let's say this plant has a huge order, and they have to finish on time. If they don't deliver by the due date, they'll lose their contract and the plant could go out of business. They have to increase their throughput.
Since the furnace is our bottleneck, we want to make sure it's never waiting. When it's not heating gears, it's not making money. But we can't put the furnace at the beginning of the line like we did with the slow scout. What can we do?
In the plant we have three resource machines:
the furnace
workers
factory space
We can leverage workers and factory space to make sure the machines are constantly running. Instead of loading gears one by one, we can buy racks and set them next to the furnaces. The racks get loaded with gears and are ready to go into the furnace as soon as the batch before that is done.
Industrial engineers call this process "accumulation." Yes, it's more effort for the workers; now they have to move the parts twice. Yes, it takes up much more space in the factory. But if your goal is to get the maximum amount of gears through the factory every day, then we have to make sure the furnace is always running. This solves that problem.
If you figure out which part of your code is holding up your response the longest, you can work to make sure that it's as fast as possible. If it can't be faster, make sure it's not stuck waiting behind other slow code.
99 Bottlenecks on the Wall
The problem with "fixing" a bottleneck is that it will always expose another. For example, maybe once the furnace is operating at near peak efficiency, it turns out that the milling machines can't actually produce gears fast enough to keep the furnace full. When you fix one performance part of your program, the bottleneck will move and you'll need to fix another. Today it may be SQL execution time; maybe tomorrow it's view rendering.
That might sound super discouraging, but I don't think it is. Performance tuning is not some magical art. It's a process.
When you're done with the process, you start the process over again. You start with a goal and state what you want to achieve. It could be "I want more throughput" or "I want lower response times." You then benchmark to understand your program.
Once you understand the happy paths and edge cases, you can start looking for bottlenecks. When you eliminate or optimize a bottleneck, you'll see progress toward your goal. If you've hit your goal, stop. If not, start the process over again. There's another bottleneck somewhere in there just waiting for you to find it.
It's easy to get stuck and not know where to go next. In the above stories, we stated the resources and simplified them. In programming and life, it's never quite so easy. Maybe you can't figure out how to benchmark something, or maybe you found the slow part but don't know how to optimize it. Sometimes it's already optimized, and you need to move on to something else. This all sounds scary, but it's not as bad if you break it down.
Benchmarking is a skill like any other. You can find blog posts, book, and other materials on it. If you don't know where to start, find someone who's doing performance tuning and look at how they do it. Copy their benchmarking techniques until you find other better practices.
Once you're comfortable benchmarking, the same learning path applies to optimizing. Find pull requests where people are doing performance work. Why is it faster? What are they talking about in the PR? If you don't understand it, pull in a co-worker or a friend and take turns trying to explain it to each other.
If you're at this stage, your goal is to learn how to benchmark and learn how to trade resources effectively for speed. If you're getting started, the bottleneck to making your code faster is your own learning curve. That's okay though; there's a recipe for that. Make a goal, take measurements, fix the bottleneck, then repeat.