Writing Efficient Queries

Written by: Parth Modi

Writing efficient queries that balance performance and memory footprint is the most crucial part of writing good software. Doing this is not super easy, but keeping an eye on logs and how your queries are executed under the hood will definitely help you a lot.

But you know the situation: requirements keep changing, new useless features on top of changed requirements, unrealistic deadlines, and who cares, let's just get the feature working and we'll refactor it later. Which, of course, never happens.

So instead of procrastinating for "will-refactor-later," I started using a few simple rules that helps me create consistently high-quality code. Remember, good code lets you focus on additional performance-tuning whenever it's necessary without worrying about your code.

Here are the ways I keep my queries efficient.

Fetch Only Data You Need

Active Record provides a powerful interface for interacting with data. It's very easy to form relationships between models and fetch records, and following this kind of code is quite usual.

def get_last_pizza
  User.find(1).pizzas.last
end

It fetches the user and brings the last pizza for the fetched user. Pretty simple! But have you ever wondered if it's really necessary to bring the whole user object with all fields of user just to bring the last pizza related to user? Absolutely not.

Instead, just fetching the id field from the user table would be sufficient to bring the last pizza. The table has many pizzas associated with it. It just needs the id of the user to fetch the associated pizzas. Storing Ruby objects consumes memory. Active Record objects are heavy memory-wise.

Initially, when the user table doesn't have many fields, that's okay. Both queries won't show much performance or memory difference.

But as the table grows and more fields are added, as generally happens with any table, you'll thank yourself for taking care of it from the beginning. Improved code would be:

def get_last_pizza
  User.select(:id).find(1).pizzas.last
end

Seeing a query like select user.id, user.some_other_required_field from users ... is always better than select * from users ...

select or pluck?

select() fetches selected fields and returns a model object with which you can access all model methods and relations. pluck() also fetches selected fields, but it returns an array of values and uses less memory.

For example, User.select(:id, :pizza_id) would return a user object, with :id and :pizza_id fields on which you can call methods such as pizzas. But if you use User.pluck(:id, :pizza_id), it would result in an error.

Ruby Methods or Query Methods?

The simple habit of using Ruby methods instead of query methods can reduce unnecessary database calls. To count the total number of friends of a current user with current_user.friends.count is good and will result in a count query. But what about using this query to get count even after fetching all the friends? If friends are already loaded, you can just use current_user.friends.length; this will save you one database call.

The difference is that length is a Ruby method and returns the total number of objects, whereas count is a query method which fires a select count(*) from table; kind of query. The Active Record Querying guide lists query methods such as count, minimum, maximum, average, and sum that can help you with querying a better way.

The point is to know about and use alternative Ruby methods whenever possible to reduce the frequency of your database queries.

Use IN

A fairly common situation is that someone needs to fetch data from a column having a preferred set of values.

class Person < ActiveRecord::Base
  enum preferred_food: [:pizza, :pasta, :noodles, :burgers]
end

For the example above, finding people who prefer pasta and noodles can be done in the following way:

class Person < ActiveRecord::Base
  enum preferred_food: [:pizza, :pasta, :noodles, :burgers]
  def self.preferres_pasta_or_noodles
    where(preferred_food: [Person.preferred_foods[:pasta], Person.preferred_foods[:noodles] ])
    # or use following
    # where('preferred_food IN ( ? ) ', [1, 2])
  end
end

Apart from enum fields, IN can be used when you want to search records from a specific set of IDs or values.

Save Time With a Transaction

This gave me some difficulty. Updating multiple records with the same data, such as setting all records' or a few selected records' Boolean value to false after some migration is easy. You just have to write:

User.where(is_admin: true).update_all(is_admin: false)

And this will result in just one query.

But, when multiple records needs to be updated with different sets of values, the situation gets tough. Updating in such a case is not possible in a single query. However, wrapping them in a single transaction block will improve performance significantly.

The important thing to remember here is that wrapping updates in a transaction will not update all records at once, and it will still fire the same amount of queries, but it will commit the transaction only once.

To understand this behavior, when the code to update the title of multiple posts is not wrapped in a transaction:

20.times do |i|
  Post.update(title: "UPDATE_TITLE#{i}")
end

You will see the following queries:

But when we wrap the update in a transaction:

ActiveRecord::Base.transaction do
  20.times do |i|
    Post.update(title: "UPDATE_TITLE#{i}")
  end
end 

The queries you will see in this case are:

The log itself is self-explanatory. However, there are some alternatives, such as using activerecord-import.

Don't Be Afraid of Joins

Don't be. It won't help. However, learning how to use them properly certainly will.

I personally liked SQL joins only in books and never wanted to use them. But eventually I learned why joins are there and when they should be used. Active Record makes it dead simple to use them. It's easy after you understand the different types of joins.

Generally, INNER JOIN and LEFT OUTER JOIN are used. User.joins(:posts) lets you search users who have posts with a specified title.

User.joins(:posts).where('posts.title = ? ', "SOME TITLE")

Joins are extremely useful in such cases and can help you fetch only data you need, as you are just fetching users who have posts with a specified title. So, whenever you find yourself fetching data and then filtering records based on some particular field of associated records, it might be the right time to use joins.

You can read more about joins here.

Use Includes

You definitely have heard this before. Suppose you need to get the title of all posts made by the first ten authors. You quickly write the query below and go enjoy your weekend.

authors = Author.first(10)
authors.each do |author|
  author.posts.pluck(:title)
end

Good. Only the title is fetched. But, one query to fetch all authors and one query to fetch posts for each author results in a total of eleven queries. Check out the solution below which does the same thing, just in two queries:

authors = Author.first(10).includes(:posts)
authors.each do |author|
  author.posts.pluck(:title)
end

There's one little difference. Add includes(:posts) to your query, and problem solved. Quick, nice, and easy.

But don't just add includes in your query without understanding it properly. Using includes with joins might result in cross-joins depending on the situation, and you don't need that in most cases.

To read more about how and when to use includes with joins, read N + 1: When More Queries Is a Good Thing, where I explain this issue in detail.

Use find_each

Suppose you are building a social networking site and want to get friends for your current_user. It's difficult to assume how many friends a user can have, and writing something like the below might get you in trouble someday.

food_list = []
current_user.friends.each do |friend|
  food_list << friend.favourite_food
end

current_user.friends.each will try to load all the friends of the current user; it consumes more memory and reduces performance as more friends are added. This is where the application might face issues when it scales.

Instead of each, find_each fetches users in batches of 1000 by default. You can change batch size as per requirements, and you should. Look at the documentation to learn more.

Conclusion

There are several strategies and approaches for improving performance of Rails applications, but it's better to start with writing efficient code instead of relying on performance-tuning afterward.

Of course, there are other general rules, such as keep your controllers lean and clean, move time-consuming processes such as sending emails and SMS to background threads, implement caching on different levels, and much more.

On the other hand, there is no point in boosting performance when your queries are not efficient. Performance-tuning should never be an afterthought.

Interested in more tips? Rails has very nice documentation and a super useful guide.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.