Speed up ActiveRecord with a little tweaking

Written by: Justin Weiss

When you're building a new Rails app, ActiveRecord's defaults will take you far. Querying with .where, inserting with .save -- it's all so easy, and it’s fast enough.

But after a while -- when a page of mostly simple content takes a second or more to come back from the server, when you start seeing 504 Gateway Timeout errors coming back from nginx because it's taking too long to process the CSV you uploaded -- that's when you know you're going to have to spend some time on performance.

You could solve a lot of these problems with caching. But that adds a whole extra layer of complication. Between expiration, nesting partials, and bugs that only reproduce in production, it's a headache you don't need right now.

Instead, you can spend some time fixing the most common performance problem I've seen in Rails apps: hitting your database too much.

Even if you are running the database on the same machine, there's a lot of connection overhead that'll slow you down. And if your database is on another machine, fetching data that often will just destroy you.

But you don't have to go too far from the simplicity of Rails to see drastic improvements in your app's response time.

Grab all the data at once in ActiveRecord

If you look at the logs in an unoptimized app, they'll probably look like this:

Processing by RestaurantsController#index as HTML
  Restaurant Load (1.6ms)  SELECT `restaurants`.* FROM `restaurants`
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 1
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 2
  Review Load (1.1ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 3
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 4
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 5
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 6
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 7
  Review Load (1.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 8
  Review Load (1.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 9
  Review Load (1.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 10

You're trying to find ten restaurants along with with their reviews, and you're doing eleven SQL calls!

This is called the "N+1 query problem": for every restaurant, you're doing one query for the restaurant data, plus one query for each of their associated reviews. You can probably imagine how bad it becomes the deeper you go. Imagine if you also wanted to grab each restaurant's address, as well as each address' phone number.

You'll run into this problem when you loop over a list of objects and try to query their associations:

app/views/restaurants/index.html.erb

<% @restaurants.each do |restaurant| %>
  <tr>
    <td><%= restaurant.name %></td>
    <td><%= restaurant.review_average %></td>
    ...

You don't need to hit the database N+1 times. You want to hit it at most twice: once for the restaurants you're trying to find, and once for all of the reviews associated with all of those restaurants.

This is called "eager loading," and you can do it really easily with .includes:

app/controllers/restaurants_controller.rb

def index
  @restaurants = Restaurant.all.includes(:reviews)
end

Or, if you want to do something more complicated, like preload all the addresses and the reviews' authors:

app/controllers/restaurants_controller.rb

def index
  @restaurants = Restaurant.all.includes([{:reviews => author}, :address])
end

You have to specify the associations you want to preload, using that array and hash syntax. Rails will do the best it can at consolidating down those calls:

Restaurant Load (1.2ms)  SELECT `restaurants`.* FROM `restaurants`
Review Load (3.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Much better.

If you're not the one fetching the parent records, it's a little more complicated. You have to do the preloading on your own, using ActiveRecord::Associations::Preloader:

ActiveRecord::Associations::Preloader.new.preload(@restaurants, [:reviews])

But it mostly works the same way.

The N+1 problem is easy to fix, and it's pretty easy to avoid once you know the pattern. But it's also easy to miss, especially if your code is spread out across a number of partials.

The bullet gem will try to automatically detect N+1 queries as they happen, so you can chase them down and fix them. And if you're using skylight.io to monitor your app's performance in production, it will also detect and report them to you.

A little tip about preloading

Some apps have different ways to show the same data on the same page. Maybe you want to show addresses ordered by distance in one section, and you only want to show addresses that have associated phone numbers on the other.

But there's no point in eager loading both of these associations. It's the same data, just presented in a different way. So for these, I usually create an unsorted_unfiltered_addresses association and preload that:

app/models/restaurant.rb

has_many :unsorted_unfiltered_addresses, :class_name => "Address"
Restaurant.includes(:unsorted_unfiltered_addresses)

Then, I'll do the rest of the sorting and filtering in Ruby code:

def addresses_sorted_by_distance(point)
  unsorted_unfiltered_addresses.sort_by do |address|
    address.distance_from(point)
  end
end
def addresses_with_phone_numbers
  unsorted_unfiltered_addresses.select do |address|
    address.phone_number.present?
  end
end

It saves a round trip but makes your Ruby code a little more complex. For us at Avvo, it usually helps, but it's worth benchmarking to make sure it's worth doing.

Crafting your own SQL in ActiveRecord

What about grabbing the number of reviews for a restaurant? Or the review average or the date of the earliest review? If you don't need any of the review data, calling all of the review objects just to calculate that stuff is a waste, especially since the database is great at doing that work for you.

Instead, use the select method to do those aggregations for you as you query:

app/controllers/restaurants_controller.rb

@restaurants = Restaurant.all
  .select("restaurants.*, AVG(reviews.rating) AS review_average")
  .joins(:reviews)
  .group("restaurants.id") 

Using SQL aggregations, you can easily calculate that information before it gets to your app. When you use AS in your select, Rails will magically create the right attributes for you:

@restaurants.first.review_average # => 2.3

They just show up as one-off properties on your model.

Going even further, you can drop entirely into SQL to save yourself even more calls. For example, what if you wanted to find only the first ten restaurants that had at least ten reviews in the last six months, along with a count of all those reviews?

You could do this with Rails:

@restaurants = Restaurant.all
  .select("restaurants.*, COUNT(reviews.id) AS review_count")
  .joins(:reviews)
  .group("restaurants.id")
  .where("reviews.created_at > ?", 3.months.ago)
  .having("COUNT(reviews.id) > 10")
  .limit(10)

However, as a query gets more complicated, I tend to use find_by_sql, so I don't have to remember the quirks of the querying methods:

@restaurants = Restaurant.find_by_sql(["
  SELECT  restaurants.*, COUNT(reviews.id) AS review_count
    FROM `restaurants`
      INNER JOIN `reviews` ON `reviews`.`restaurant_id` = `restaurants`.`id`
    WHERE (reviews.created_at > ?)
    GROUP BY restaurants.id
    HAVING COUNT(reviews.id) > 10
    LIMIT 10", 3.months.ago])

And just like before, Rails will create attributes for all of the columns you SELECT.

Using SQL like this is its own skill. It's something you'll have to study before you get great at it. But SQL is helpful for so many complicated data-processing situations. It's worth spending some time to learn it.

find_by_sql isn't used too often. In our pretty large codebase at Avvo, we use it fewer than five times. But in the right place, when you really care about how you find your data, it works extremely well.

Of course, having SQL strings littered throughout your codebase isn't always the most maintainable way to do things. If you find yourself writing the same kind of aggregations and selects over and over again, you could generate a SQL view.

So, let’s revisit my earlier example about restaurants. If you frequently also want a restaurant’s review average and review count, you could create a view with those extra columns:

db/migrate/20150501213118_create_restaurant_with_stats_view.rb

class CreateEnhancedRestaurantsView < ActiveRecord::Migration
  def up
    execute "
      CREATE VIEW enhanced_restaurants AS
      SELECT restaurants.*, count(reviews.id) AS review_count, avg(reviews.rating) AS average_rating
      FROM `restaurants`
      INNER JOIN `reviews` ON `reviews`.`restaurant_id` = `restaurants`.`id`
      GROUP BY restaurants.id"
  end
  def down
    execute "DROP VIEW enhanced_restaurants"
  end
end

Wrap it in an ActiveRecord model:

app/models/enhanced_restaurant.rb

class EnhancedRestaurant < ActiveRecord::Base
  self.primary_key = :id
end

and use it!

$ EnhancedRestaurant.first
EnhancedRestaurant Load (1.2ms)  SELECT  `enhanced_restaurants`.* FROM `enhanced_restaurants` LIMIT 1
=> #<EnhancedRestaurant id: nil, name: "Judd's Pub", created_at: "2015-04-26 03:33:21", updated_at: "2015-04-26 03:33:21", review_count: 14, average_rating: #<BigDecimal:7ff3180238d0,'0.29286E1',18(18)>>

The view works just like a SQL table, and it's automatically kept up to date as you change the data it depends on. All of your complex SQL code is in a migration, and you can treat the object backed by a SQL view as if it's any other ActiveRecord object. Well, almost:

$ EnhancedRestaurant.find(10).update(name: "New Name")
ActiveRecord::StatementInvalid: Mysql2::Error: The target table enhanced_restaurants of the UPDATE is not updatable: UPDATE `enhanced_restaurants` SET `name` = 'New Name' WHERE `enhanced_restaurants`.`id` = 10

Updating data through a view usually won't work out of the box.

If that's too much SQL for you, you're in luck. Now that Arel is part of Rails, you don't have to write any of your own SQL. If you want, you can write Ruby code to generate almost any SQL queries you need. So you could turn that find_by_sql query into this:

reviews = Review.arel_table
restaurants = Restaurant.arel_table
sql = restaurants.project(Arel.star, reviews[:id].count.as("review_count"))
 .join(reviews, Arel::Nodes::InnerJoin)
 .on(restaurants[:id].eq(reviews[:restaurant_id]))
 .where(reviews[:created_at].gt(3.months.ago))
 .group(restaurants[:id]) .having(reviews[:id].count.gt(10))
 .take(10)
 .to_sql
@restaurants = Restaurant.find_by_sql(sql)

To me, that's not as easy to read as the SQL. But because it's Ruby code, you can manipulate it more easily than a string of raw SQL, which can be very handy.

If you want to learn more about Arel, Jiří Pospíšil has a great guide to using Arel to generate and organize some even crazier queries.

Create tons of records, all at the same time

Between includes, select, and find_by_sql, you should be able to improve the performance of even your most data-filled pages. But many apps, especially as they get bigger, don't just select data -- they also process it.

Maybe you get CSV files from a data-processing platform that you regularly import into your own database. Or maybe you want to create a table, based on data you already have in your database but structured differently to make it faster to look things up.

Whatever the problem, you have to get that data into your database. And calling create a half million times is SLOW. If you're inserting objects with associations, you'll also run into the N+1 problem.

At Avvo, we've seen this problem a few times. And when we do, we handle it with Zach Dennis' activerecord-import gem.

The gem makes importing and updating data in bulk incredibly easy. For example, this simple import will do 10 SQL calls:

# Probably loaded from a CSV or API
pricing_data = [
 ["New York", 130],
 ["Los Angeles", 130],
 ["Chicago", 120],
 ["Miami", 110],
 ["Dallas", 110],
 ["Seattle", 100],
 ["San Francisco", 150],
 ["Denver", 80],
 ["Boston", 120],
 ["Washington, D.C.", 100]]
pricing_data.each do |location, price|
 Inventory.create(location: location, price: price)
end 

With activerecord-import, you'd just add your records to an array and call import:

records_to_import = pricing_data.map do |location, price|
 Inventory.new(location: location, price: price)
end
Inventory.import records_to_import 

activerecord-import will figure out the best way to get that data in the database. Usually, it only takes a single SQL call.

If you're just parsing, modifying, and importing CSV data, you don't even need an object. You can bulk insert arrays of columns and values:

Inventory.import [:location, :price], pricing_data

This can be a lot more convenient. Importing data using activerecord-import has led us to order-of-magnitude improvements in speed:

                                  user     system      total        real
with activerecord-import      2.850000   0.050000   2.900000 (  2.989576)
without activerecord-import  16.030000   6.330000  22.360000 ( 34.894255)

Processes that used to take hours to run now take minutes.

Vanilla Rails will take you a long way. But in every app, there are places where speed is a big deal. And when you find those places, reducing SQL calls is an easy place to start optimizing. Each SQL call might take a little longer. But in my experience, it's always been a drastic improvement.

Ready for further software development speed improvements? Get started with Codeship!

Posts you may also find interesting:

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.