How I sped up our Rails Test Suite by 267%

This is the first post in a series I’ll be doing here at Codeship on speeding up our test suite. I also released a gem that detects the slowdowns in this part for you.

Making test suites run fast has always been an obsession/passion of mine, in fact it all started back in 2010 when I spoke about faster testing in the talk Grease Your Suite. Since then, having a fast test suite is something I maintain on any project I work on. So, when I joined Codeship a few weeks ago their test suite naturally drew my attention. Today I'll detail techniques I used to discover slow points in the test suite along with their solutions. In the end I got the suite to run over 3x faster!

Something's not quite right

On my first day at Codeship I got there early. A little too early in fact, as no one was in the office yet. So, I sat down to get the project running on my laptop before everyone got in. I checked out the code, set up the database, and started running the tests. Then, Alex, who is responsible for our Product Experience, came in and said, "Oh, you don't want to run the tests on your laptop, they're too slow, we run them all on Codeship." That makes sense to me, the application is large and my laptop is very small, while Codeship's servers are big and fast (#shamelessplug).

But something wasn't quite right. In my taskbar I keep a little triple monitor plugin showing cpu %, memory %, and swap %. And while I was running the Codeship suite, the cpu % barely lifted off the floor! I knew something weird was happening.

Inspecting application status

By a long shot, the Cucumber suite was both the longest and the least CPU intensive. If you’re new to Cucumber, you can check out our other articles on setup, BDD, and continuous integration with cucumber). I picked out one cucumber feature at random: configuring a project. When I ran it, it would hang here for a while:

LANG:yaml $ cucumber features/configuring_a_project.feature Feature: Configuring should be highlighted in the configuration nav setup As a user I want to configure my project settings Background: # features/configuring_a_project.feature:5

It would sit there for 10 seconds or so, and my cpu meter was flat. The app was waiting for something, but I didn't know what. Maybe the browser was booting up? Maybe a network request? I didn't know. I tried ctrl-c but cucumber traps it and exits gracefully, so I couldn't get a stack trace. I needed to find out what the app was up to when it was hanging.

GDB

Enter GDB, the GNU Debugger. But wait, isn't that for C and C++? Well, it just so happens that Ruby's (MRI) interpreter is written in C, and from GDB we can call C functions, including those in the MRI interpreter.

First, we attach to the running ruby process. I ran my cucumber feature and while it was hanging I found the PID and attached GDB:

LANG:code $ ps ax | grep cucumber 9105 pts/6 Sl+ 0:00 ruby ./bin/cucumber features/configuring_a_project.feature 9107 ? Sl 0:00 cucumber features/configuring_a_project.feature 9126 pts/12 S+ 0:00 grep --color=auto cucumber $ sudo gdb -p 9107 # lots of gdb output (gdb) call rb_backtrace()

Since that function call is to the ruby interpreter, I have to look over at my ruby process to see the output:

LANG:yaml $ cucumber features/configuring_a_project.feature -t @javascript [Zonebie] Setting timezone: ZONEBIE_TZ="Hobart" Using the default profile... Feature: Configuring should be highlighted in the configuration nav setup As a user I want to configure my project settings Background: # features/configuring_a_project.feature:5 from .gems/gems/puffing-billy-0.2.3/lib/billy/proxy.rb:16:in `block in start' from .gems/gems/puffing-billy-0.2.3/lib/billy/proxy.rb:70:in `main_loop' from .gems/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run' from .gems/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'

This isn't helpful, because the first thread was just our http proxy waiting for input. We have to dump the stack trace for all the threads that are running to find the one that's actually our own code:

LANG:code (gdb) info threads Id Target Id Frame 6 Thread 0x7ff08de84700 (LWP 9218) "ruby-timer-thr" 0x00007ff08d02f39d in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x7ff0793d2700 (LWP 9221) "ruby" 0x00007ff08d0311c3 in select () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x7ff0795d4700 (LWP 9222) "ruby" 0x00007ff08d0311c3 in select () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7ff0892fa700 (LWP 9225) "ruby" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 2 Thread 0x7ff0798d7700 (LWP 9228) "ruby" 0x00007ff08da55dbd in read () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 0x7ff08de6a740 (LWP 9216) "ruby" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238

I have no idea which one is the one we want, so let's dump them all!

LANG:dockerfile (gdb) thread 2 [Switching to thread 2 (Thread 0x7ff0798d7700 (LWP 9228))] #0 0x00007ff08da55dbd in read () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) call rb_backtrace() # stack trace from /usr/local/lib/ruby/2.1.0/webrick/utils.rb:176:in `block in initialize' from /usr/local/lib/ruby/2.1.0/webrick/utils.rb:176:in `sleep'

Nope, that's our web server, not very helpful.

LANG:dockerfile (gdb) thread 3 [Switching to thread 3 (Thread 0x7ff0892fa700 (LWP 9225))] #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 238 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory. (gdb) call rb_backtrace() # stack trace is huge, and contains a bunch of cucumber stuff, jackpot! from -e:1:in `<main>' <redacted> from .gems/gems/cucumber-1.3.17/bin/cucumber:13:in `<top (required)>' from .gems/gems/cucumber-1.3.17/lib/cucumber/cli/main.rb:47:in `execute!' from .gems/gems/cucumber-1.3.17/lib/cucumber/runtime.rb:49:in `run!' from .gems/gems/cucumber-1.3.17/lib/cucumber/ast/tree_walker.rb:20:in `visit_features' from .gems/gems/cucumber-1.3.17/lib/cucumber/ast/tree_walker.rb:170:in `broadcast' from .gems/gems/cucumber-1.3.17/lib/cucumber/ast/tree_walker.rb:21:in `block in visit_features' <redacted> from .gems/gems/cucumber-1.3.17/lib/cucumber/core_ext/instance_exec.rb:48:in `block in cucumber_instance_exec' from .gems/gems/cucumber-1.3.17/lib/cucumber/core_ext/instance_exec.rb:48:in `instance_exec' from .gems/gems/cucumber-1.3.17/lib/cucumber/rb_support/rb_step_definition.rb:49:in `block in create_proc' from workspace/mothership/features/support/user_helper.rb:59:in `sign_in_as' from workspace/mothership/features/support/signed_in_user.rb:31:in `signed_in?' from .gems/gems/capybara-2.4.4/lib/capybara/session.rb:676:in `block (2 levels) in <class:Session>' from .gems/gems/capybara-2.4.4/lib/capybara/node/matchers.rb:39:in `has_selector?' <redacted> from .gems/gems/capybara-2.4.4/lib/capybara/node/base.rb:102:in `find_css' from .gems/gems/poltergeist-1.5.1/lib/capybara/poltergeist/driver.rb:122:in `find_css'

Boom! features/support/user_helper.rb:59's sign_in_as is hanging. So, what's going on here? Let's check out the code:

LANG:code current_user.sign_out if current_user.signed_in?

OK, that's reasonable, let's sign out if someone signs in when we need to log in as a user. Let's see what signed_in? looks like:

LANG:code def signed_in? page.has_selector? signed_in_selector end

And this, right here, is the most common and subtle performance bug in the behavior of the Capybara gem: auto-waiting finders.

Capybara finders

Let's do a little background on Capybara. Capybara is an awesome Ruby gem that drives a browser over your code so you can exercise your entire application. It has a variety of methods like visit(url) to visit a page, or click_link("Sign in") to click a link. It also has an amazing feature which is to retry a command when it's driving a real browser like Selenium or PhantomJS to handle asynchronous issues. So, if you say click_link("Sign in") but there's no link with the text "Sign in" then Capybara will try to find it over and over until it finds it or it reaches Capybara's default_wait_time.

This is great when we're working on a test and it's not passing yet because our code isn't done. We want it to try and fail if the link isn't there. And if that link pops up via a jQuery plugin, we want Capybara to wait for it to show up, not fail fast.

Wordpress import https://no-cache.hubspot.com/cta/default/1169977/964db6a6-69da-4366-afea-b129019aff07.png

Additional finders

In addition to these commands, Capybara also has finders like has_content?("Dashboard") which would check if "Dashboard" was on the page. But there's a trick: this command assumes "Dashboard" should be on the page, and waits for it to appear. This is supposed to be used with an assertion, like assert page.has_content?("Dashboard").

If you want the opposite you need to flip the finder, not the assertion. So, assert page.has_no_content?("Dashboard") will look to see that the page doesn't have "Dashboard". But if the page does have "Dashboard" it will wait until "Dashboard" leaves the page via ajax.

That brings us to a common mistake: flipping the assertion instead of the finder, like this: refute page.has_content?("Dashboard"). This will wait until the page has the content "Dashboard", which it won't, then return false, which passes the refute. So while this test passes, it always waits the full Capybara default wait time. Which means that this is 15 seconds (the default) slower than assert page.has_no_content?("Dashboard").

In summary, it's really important to use the capybara finder that lines up with what you expect on the page.

Back to our test

So, let's take a fresh look at our code:

LANG:code def signed_in? page.has_selector? signed_in_selector end

This code assumes that the signed_in_selector is present on the page. So if someone is signed in, this code returns true very quickly. If someone is not signed in, this code returns false in 15 seconds after waiting to see the selector show up. The huge problem with our code is that we use signed_in? in branches, so it could be either true or false in different test cases. Whenever it's false (no one is signed in) we lose 15 seconds.

Unfortunately for Codeship, this is almost always used in the first step of a feature: sign a user in (and sign anyone out if they are signed in). So thats 15 seconds times every time this is improperly used.

There are two ways to solve this:

Pass the intent to the helper
Use a non-waiting finder

Passing the intent to the helper means that we would pass our expectation in and then switch the finder we're using, like this:

LANG:code def signed_in?(expected: false) if expected page.has_selector? signed_in_selector else !page.has_no_selector? signed_in_selector end end signed_in?(expected: false) # I don't think they're signed in, are they? signed_in?(expected: true) # I think they're signed in, right?

But that requires us to change every usage, and change the cucumber steps, etc. A big pain. Let's look at #2: the non waiting finder:

LANG:code def signed_in? page.all(signed_in_selector).any? end

This time, we're asking for all the signed in selectors on the page, then using any? which returns true for arrays that aren't empty. This Capybara finder doesn't have an assumed result, it's just a query. So whether true or false, it will be fast.

Find them all

At this point, I realize that while this is a big improvement, there may be many more occurrences of slow finders in our test. So I defined what I was looking for: any time a waiting Capybara method reaches the timeout. In my opinion, if a Capybara finder hits the timeout, it should be replaced with either its double negative (like turning has_content to !has_no_content) for assertions or with a non-waiting finder for branches.

To do this, I made a little monkeypatch, which I later turned into a gem called capybara-slow_finder_errors:

LANG:python module Capybara class SlowFinderError < CapybaraError; end module Node class Base def synchronize_with_timeout_error(seconds=Capybara.default_wait_time, options = {}, &block) start_time = Time.now synchronize_without_timeout_error(seconds, options, &block) rescue Capybara::ElementNotFound => e if Time.now-start_time > seconds raise SlowFinderError end end alias_method :synchronize_without_timeout_error, :synchronize alias_method :synchronize, :synchronize_with_timeout_error end end end

We're extending Capybara's synchronize such that if it exceeds the timeout, it raises a Capybara::SlowFinderError which Capybara won’t catch and turn into a boolean (that’s what it does with ElementNotFound for many finders). Now, we just run our suite and follow every stack trace back to the offending code. Which is exactly what I did.

Results

When I started, the Cucumber suite ran in 9:30 (9 minutes, 30 seconds). After fixing all the slow finders, we got it down to 5:45, almost twice as fast!

One more thing

Previously, I had tried switching to Poltergeist (we were on Selenium) but didn't see a speed improvement and also saw a few errors. Now that I had removed slow finders I had a hunch that Poltergeist would be worth it. At the very least, we could stop Firefox from popping up when we ran our tests (I know, I know, I can use a virtual framebuffer, but it's still annoying).

I switched us over to Poltergeist (based on PhantomJS), and got a free speed boost! Now that the suite was actually running hard on the CPU, Poltergeist made a big difference. We went from 5:45 to 2:35! All in all, we went from 9:30 to 2:35, which is a speedup factor of x3.67!

This is one of the tricky parts of performance improvements: sometimes certain bottlenecks are blocking other improvements from being worthwhile. That's why you always have to fix the current bottleneck before a new performance tweak will have any effect.

So, with my caveat that it may not be your bottleneck, try out capybara-slow_finder_errors and see if you get any errors. At the very least, they're worth investigating.

I'm continuing my quest to discover more ways to speed up Codeship's test suite, and I will have more tips soon! Capybara is supported out of the box on Codeship. If you're interested you can learn more about it in our Codeship Ruby Documentation.

PS: If you liked this article you might also be interested in one of our free eBooks from our Codeship Resources Library. Download it here: Why Continuous Integration Is Important