A Tale of Four Bugs

Stephen Connolly's picture

This post is a post about a recent chain of interconnected bugs and mistakes that we found. I feel there is learning in this tale of many interconnected bugs/mistakes…even if I cannot quite place my finger on what exactly that learning is.

So our story all beings with the great UI refactoring that is JENKINS-43507…

Ideally, any change should be small. Code review works best when the changes are small…but we also have to balance that with ensuring that the changes are complete, so while the refactoring started off as a series of small changes there reached a point of The Great Switcheroo™ where it was necessary to swap everything over with new tests to cover the switch over.

Ideally a lot of the preparation code could have been merged in small change requests one at a time, but it can be hard to test code until it can be used, and adding a change request that consists of new code that isn’t used (yet) and cannot be tested (yet) can make things hard to review… anyway that is my excuse for a collection of changes requests that clock in at nearly 40k LoC.

If we take the GitHub Branch Source PR as an example of the code: github-branch-source-plugin#141

GitHub reports this as:

GitHub diffstats for github-branch-source-plugin#141

This is the part of the story wherein I provide some excuses as to why the PR is not really that big, if you are not interested in excuses you can safely skip it!

Scary…well let’s take a look at that in more detail:

# number of lines of code in new files
$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;') ; \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p' ; done | wc -l
   11343
# number of lines of code in deleted files
$ for removed in $(git diff master --name-status | sed -n -e 's/^D.//gp;') ; \
do git diff master -- "$removed" | sed -n -e '/^--- a/d;/^-/p' ; done | wc -l
     155
# number of lines of code removed from existing files
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;') ; \
do git diff master -- "$modified" | sed -n -e '/^--- a/d;/^-/p' ; done | wc -l
    1320
# number of lines of code added to existing files
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;') ; \
do git diff master -- "$modified" | sed -n -e '/^+++ a/d;/^+/p' ; done | wc -l
    2739

Still looking like a lot of code… but we can dig further

# number of lines of test code in new files
$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;' | grep 'src/test') ; \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p' ; done | wc -l
    8405
# number of lines of test code in deleted files
$ for removed in $(git diff master --name-status | sed -n -e 's/^D.//gp;' | grep 'src/test') ; \
do git diff master -- "$removed" | sed -n -e '/^--- a/d;/^-/p' ; done | wc -l
       0
# number of lines of test code removed from existing files
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;' | grep 'src/test') ; do \
git diff master -- "$modified" | sed -n -e '/^--- a/d;/^-/p' ; done | wc -l
       5
# number of lines of test code added to existing files
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;' | grep 'src/test') ; \
do git diff master -- "$modified" | sed -n -e '/^+++ a/d;/^+/p' ; done | wc -l
      14

So of the (cli count) 14082 lines “added”, 8405 of those lines were test code and test data…

$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;' | grep 'src/test/java'); \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p'  ; done | wc -l
    6355
$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;' | grep 'src/test/resources"); \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p'  ; done | wc -l
    2050

OK, at least half of the new code is actually new tests. We can do a similar analysis on the production code new files:

$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;' | grep 'src/main/java'); \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p' ; done | wc -l
    2735
$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;' | grep 'src/main/resources'); \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p' ; done | wc -l
     203

I tend to add a lot of comments and Javadoc to production code… so what is left if we strip that out (and blank lines):

$ for created in $(git diff master --name-status | sed -n -e 's/^A.//gp;' | grep 'src/main/java'); \
do git diff master -- "$created" | sed -n -e '/^+++ a/d;/^+/p' ; done | sed -e 's/^+//' | \
sed -e '/^ *\*/d;/^ *\/\*/d;/^ *$/d;/^ *\/\//d' | wc -l
    1327

So more than half of the new production code that I wrote is actually comments…

What about the files that I changed:

# including comments (added lines)
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;' | grep 'src/main/java'); \
do git diff master -- "$modified" | sed -n -e '/^+++ a/d;/^+/p'  ; done | sed -e 's/^+//' | wc -l
    2651
# including comments (removed lines)
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;' | grep 'src/main/java'); \
do git diff master -- "$modified" | sed -n -e '/^--- a/d;/^-/p'  ; done | sed -e 's/^-//' | wc -l
    1214

# excluding comments (added lines)
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;' | grep 'src/main/java'); \
do git diff master -- "$modified" | sed -n -e '/^+++ a/d;/^+/p'  ; done | sed -e 's/^+//' | \
sed -e '/^ *\*/d;/^ *\/\*/d;/^ *$/d;/^ *\/\//d' | wc -l
    1934
# excluding comments (removed lines)
$ for modified in $(git diff master --name-status | sed -n -e 's/^M.//gp;' | grep 'src/main/java'); \
do git diff master -- "$modified" | sed -n -e '/^--- a/d;/^-/p'  ; done | sed -e 's/^-//' | \
sed -e '/^ *\*/d;/^ *\/\*/d;/^ *$/d;/^ *\/\//d' | wc -l
    1040

So what this means is that the 13-14k LoC PR is about 3k of new production code (I would argue it is about 1k lines moved and 2k lines added)…which is a lot…but not as bad as the diffstat initially said…and we got about twice that amount of new tests.

So yeah, the pull request should not be that big, but we reached the point where small refactorings could not continue while being reviewed in context.

This is the end of the excuses.

First off, in this refactoring of the GitHub Branch Source I made a simple mistake:

Mistake 1

In all the changes in the PR, I was refactoring old methods such that they called through to the corresponding new methods (to retain binary API compatibility).

Pop-Quiz: Can you spot the mistake in maintaining binary API compatibility in this code?

New code with mistake

For comparison, here is what the old code looked like:

Old code

The mistake is that the effective behavior of the code is changed. I had maintained binary compatibility. I had deliberately not maintained source compatibility (so that when you update the dependency you are forced to switch to the new method) but I was missing behavioral compatibility.

The fix is to add these four lines:

Fixed code

So, I hear you ask, with 6k LoC of new tests, how come you didn’t catch that one?

Mistake 2

The existing tests all called the now deprecated setBuildOriginBranch(boolean), setBuildOriginBranchWithPR(boolean), etc methods in order to configure the branch and pull request discovery settings to those required for the test. Those methods were changed. Previously they were simple setters that just wrote to the backing boolean field. One of the points of this PR is to refactor away from 6 boolean fields with 64 combinations and replace them with more easily tested traits, so the setters will add, update the traits as necessary:

Legacy setter adds traits when missing

So because the tests were setting up the instance to test explicitly, they were not going to catch any issues with the legacy constructor’s default behavior settings, though they did catch some issues with my migration logic.

I used code coverage to verify that I had tests for all of the new methods containing logic…so of course I had added tests like:

Test of legacy setter

Which were checking the branch point in the constructor…so when self-reviewing the code I looked at the 100% code coverage for the method and said Woot! (This was Mistake 2)

Can't have low coverage... if you don't have the code for the missing tests

I had not got any tests that verified the behavioral contract of the legacy constructor.

Mistake 3

Now these plugins have a semi-close coupling with BlueOcean, so one of our critical acceptance criteria includes verification against BlueOcean.

Acceptance criteria

The first step in all that was to bump the dependency versions in BlueOcean to run the acceptance tests…

Now you may remember that I said that I had explicitly broken source compatibility for the legacy methods, this was in order to catch cases where people are assuming that the old getters / setters are exclusively the entirety of the configuration. If you are copying or re-creating a GitHubSCMNavigator instance via code and you use the legacy methods, the new instance will be invalid, your code needs to upgrade to the new traits based API to function correctly.

So when I bumped the dependencies in BlueOcean without changing the code, my expectation was that the build would blow up because of the source incompatibility and it would then be compiler assisted replacement of the legacy methods… oh but little did I count on this little subtle behavioural change between ASM4 and ASM5…

ASM5 needed a signature change so we can enforce @Restricted

Back in early May, Jesse spotted and fixed this issue with the ASM upgrade… 

Without blaming anyone, the mistake here is that the BlueOcean code had not picked up the fix, so there were no compiler errors. The code compiled correctly.

This turns out to be fortuitous…

Mistake 4

BlueOcean’s create flow for GitHub needs to reconfigure the GitHubSCMNavigator to add each repository that is “created” into the regex of named repositories to discover.

Now, in hindsight, there is a lot wrong with that… but the mistake was to recreate a new instance of the GitHubSCMNavigator each time, rather than reconfigure the instance.

In fact the original code even had a setter for the regex field:

Old code had setPattern(...)

So to some extent there was no need to replace the existing instance with every change:

Replacing the instance always (last line)

But, in principle there should be nothing wrong with replacing it each time…and in any case the new repository may require the credentialsId to be updated and the pre-JENKINS-43507 code used a final field and did not provide a setter…

The mistake here was not to replicate the rest of the configuration. In effect, every time you created a pipeline on GitHub using the BlueOcean creation flow, you blew away any configuration that had been applied via the classic UI: JENKINS-45058…the code should really have looked like this:

Pre-JENKINS-43507 fix for JENKINS-45058

So how did we discover all four of these mistakes?

Well my PR #1186 that bumped the plugin versions had test failures:

OMG! Failing tests, it is the end of days

The fact that the compilation succeeded rather than fail as expected (because of the @Restricted) annotation exposed Mistake 3

Then Mistake 4 actually exposed Mistake 1… if BlueOcean was preserving the configuration on creation these tests may not have failed…certainly a manual verification of the test scenario might have resulted in the test failure being chalked up as a bad test, but because the configuration was continually being reset to the constructor default, the manual verification forced Mistake 1 to the surface:

Manual verification escalates attention

So Mistake 1 would have been caught if we didn’t have Mistake 2…

Once you catch a mistake in production code, you typically add tests so part of fixing Mistake 1 was to also fix Mistake 2

If it were not for Mistake 3, running the BlueOcean tests with the updated plugin versions would have required code changes that would probably have bypassed Mistake 4 and we would have missed Mistake 1 in making those code changes…

Without Mistake 4 we might not have found Mistake 1…

Without Mistake 1 we might not have found Mistake 4…

Four interrelated mistakes and without anyone of them we might not have found any of the others.

As I said at the beginning, I feel there is learning in this tail of many interconnected bugs/mistakes…even if I cannot quite place my finger on what exactly that learning is.

Hopefully you have enjoyed reading this analysis!

 

Add new comment