Building software is ultimately a cycle of creating, maintaining, and decomposing ideas. We often focus on the creation part of this equation. That’s because it's exciting, fun, and full of new twists and turns. There’s something inherently beautiful about creating something out of nothing. Yet all that's new eventually gets older and older until it fades away.
Somewhere between the idea of new and decomposed applications exists codebases that we tend to refer to as legacy. When we call something legacy, it has either a connotation of pride or something more burden-like. Legacy codebases are often what we find ourselves dealing with after the promise of "working on new and exciting products." We often have to extract and improve the better parts of the past to advance to the exciting future we’re heading toward.
I would dare to say that no engineering roadmap -- of a mature company -- doesn’t factor in legacy code in some way. This legacy codebase could be a product that’s starting to sunset. Maybe it's something that was written in a version of your favorite framework but you haven’t updated in over four years or so. Legacy codebases are always going to be around in some form.
With this in mind, I want to provide some useful insights and strategies for dealing with legacy codebases. These ideas are not only concerned with tolerating legacy code, but thriving when working with it. Being able to effectively work with legacy codebases is a crucial engineering skill that will benefit any engineer for their whole career. There are no silver bullets ahead, but I hope they help kickstart productivity with legacy code.
Complaints, Empathy, Documentation, and Testing
So much of our success with handling legacy code comes down to our attitude. It might seem a bit odd to mention attitude as a major factor in dealing with code, but it's a cornerstone in my strategy.
We either do two things when working with legacy code:
complain
empathize
When we choose to complain, we often cast the authors before us as unwise and unskilled. It's a product of our frustration when dealing with something that we don’t quite understand yet.
Empathizing takes a different route. While we certainly want to find the weaknesses and problems in the codebase, we also want to understand what makes it good and how we can capitalize on those decisions.
If we approach it from an "everything about this isn’t right" point of view, we’re bound to make more mistakes than the original authors.
How do we empathize with code?
Everyone has their way of understanding the design story behind codebase. However, I’ve found that documentation and testing are the two strongest tools we have to better understand code. I’ve also found that many legacy codebases are missing one or both of these factors.
Documenting, inside or outside the codebase, ensures that previous experience is helpful down the road. If there’s not much guidance to go off of, make sure the next person that has to handle this codebase has a better idea of its purpose than you currently do.
Working effectively with legacy code means breaking the cycles of misinformation and lack of understanding that you inherited. I’ve found that I’m a lot less likely to refer to something as “legacy” if it has a healthy amount of churn and it's easy to work with.
Testing can be a bit more difficult. If there are existing tests, I like to use them to validate the documentation around the code. If there are no tests, we can write new tests to validate whether or not the documentation we wrote is correct. We can also use existing documentation to help us better understand what cases we should test for.
With documentation and testing, we should find ourselves with some baseline of understanding for how a piece of code functions and confidence that the use cases we test for are passing. Even if your codebase already has documentation and tests, consider expanding on them or verifying them by writing newer tests.
We will always need a meaningful way to communicate how code works, so any way that we can improve on these elements is never a bad investment.
Keeping Design Patterns Consistent
Once we’re confident with our documentation and tests, we need to move on toward making meaningful changes in our legacy codebase. One of the biggest encouragements I have when doing this is to keep design patterns consistent if possible.
Not all design patterns are as effective as others. There may be cases where you’re able to identify an optimization that’s way better than anything already in the app. If you choose to go ahead and replace one piece of code with this new design pattern, make sure you change other instances as well.
While our optimization might have made the codebase more efficient, it's also made the code more inconsistent and confusing. I can tell you from experience that nothing is worse than dealing with a legacy codebase with numerous random design patterns.
The point of all this is that we need to understand the message and story that a codebase is trying to tell. We’re all human people writing human solutions. There are bound to be errors, perks, and quirks to our approaches.
The Best Algorithms Hardly Ever Change
I remember hearing a talk about legacy code by Micheal Feathers a few years ago. In a video of the talk, he shows a piece of code and asks: “What would you change? What’s wrong with it?” The whole ruse is that Micheal is showing us a piece of code that’s been in production for twelve years with little to no error. The code is suspiciously normal, so we think something needs to change about it.
One of the biggest things that we often do with legacy code is always think that we need to completely change it. That’s why the empathy and understanding I've mentioned is so important.
We have to accept that we’re likely not the best person or team ever to come across this code. We’re going to make it better only if we understand where it came from. However, sometimes the best thing we can do with a piece of code is leave it alone or make really minimal changes.
Furthermore, legacy codebases are kind of like hardened clay in a lot of ways. With water, heat, and enough molding, we can help shape it into something different. However, it's a lot harder to make a change, and the changes will drastically effect how people have viewed or experienced it. Since we often didn’t build any or all of the codebase, we don’t ever fully understand the consequences of our changes or fixes. This is what makes refactoring legacy code a bit more dangerous than normal.
The need for a refactor may be absolutely necessary. However, we should approach it with extra caution as our new versions and takes on algorithms might be lacking the robust support of their ancestor.
Moving Forward
Using empathy to approach a legacy codebase by documenting, testing, and learning about the history of a legacy codebase will help us understand what exactly we should do to it. Some algorithms are best left unchanged. Others need to be changed but need to belong to the same consistent design patterns of the application.
By building more clear and consistent improvements to a legacy codebase, we leave it better than we found it. Maybe someday we'll find others having great joy with working within it!