Performance is a topic many developers value highly, with web frameworks, programming languages, databases, and various technologies all boasting about performance. However, this isn't usually the first consideration for a web application, as getting a product or service made more quickly is a higher priority.
We tend to think about performance once a product/service is in working order and we need to consider scaling for more users and more efficiently handling those we have.
In this article, we'll focus on performance metrics that matter more for our end goal, “the user experience,” from the Rails side of things.
Areas of Performance
There are several areas to consider when weighing efficiency with speed or resources.
The question of how to be efficient with system resources, such as how much memory the application will occupy, becomes increasingly important when the system requires application forking or additional server spin-ups.
A forking application in Ruby copies an instance of itself and runs that copy where the same memory issued for unchanged items is shared. Whenever the forked application makes a change to objects in memory, it performs a copy of the objects being changed into new memory, so only the common unchanged objects in memory are shared.
A multi-threaded application reduces some memory overhead from the forking kind of application. But while you gain in memory, you'll potentially have that much more difficult a system to debug. Thankfully, existing frameworks have done much of the work for us, so we don't have to think much about implementing multi-threaded systems.
Application start-up time is felt most by developers and not end users. Even so, a slow start-up time can be an indicator of an application loading much more than it needs to.
This also effects the testing of the application; longer test times usually mean less time to be productive working on the application. But when the web service keeps your application actively running, this is not something your users will notice when they visit your site.
This is where everything is felt by the end user. This is also the focus of a lot of the metric tools that help identify which parts of our applications are taking the most time. But we can do better than the “time-consuming items profile,” as this is more a big-picture issue and sometimes just a symptom and not the problem. But we'll get more into that shortly.
Caching is more of a subcategory under page-load time; it directly improves many parts of a site's load time by providing a short-term saved copy of a previous result and therefore skipping a lot of code execution.
Of course, there's much more to caching then just the application side of things -- there are caching proxies and reverse caching proxies that can greatly improve the speed of the site. This article won't be covering caching beyond this mention though.
Effective Performance Improvement
Considering page-load time via a profiler that displays which page resources are taking longer to load is merely a surface-level observation of what's slow. Even when implementing simple performance improvements directly on the slower resources, there is still likely a more effective area that can be improved upon that isn't centric to an individual resource's load time.
Let's look at some numbers. In a Rails application of mine, I was doing some profiling and found that
Pathname#chop_basename was the most time-consuming part of the profile. It said it gathered 1,894 samples at 26.4 percent for loading the main page.
This gave me a real-world value that this one method, and those associated with it, accounted for two-thirds of page-load time in my Rails 4.1 application. Further investigation revealed that, between the application start and the first page load, that method was called close to 25,000 times. And so the project FasterPath was born, rewriting that Ruby method in Rust and shaving off 83.4 percent of the time the former method took.
When considering the impact of the performance improvement you're trying to get, you could focus on the more immediate code for your slow-loading resource. Or you could consider finding a few methods your application uses many thousands of times over and create a more global impact on performance improvement in the entire application by improving them.
I have two tools that are my go-to performance metric tools for my Rails applications:
Each of these tools takes advantage of other libraries to simplify and better enable the tools to give you metrics in each of the categories for performance metrics.
You can get a full list of the options for this command with
derailed exec --help. The options include tools for measuring memory during loading, memory allocation, memory usage over time, garbage collection, and a couple other performance items.
Of each of the options from
derailed, my go-to command is
derailed exec perf:stackprof. This will give a profile of the method times the application spent the most time in during the loading of the main page. In my opinion, this is a great hint as to what you need to focus on.
rack-mini-profiler integrates with any Rack-based web framework, which includes Rails, Sinatra, and Hanami, and helps provide many beautiful in-browser stats to analyze your application with. The most well known of these is the flamegraph.
The flamegraph is a client-side feature for your browser. It shows from the bottom up the starting point of the application through each method that executes during the process of the web page loading.
Of course since the application's start point is where everything runs from, that bottom line will show about 100 percent of the time for the application. All the methods called under that will be shown in the graph above for whichever percentage of time they take to execute, so the measure of each method on the bottom of the graph is the sum of the parts above it. Also, each item in the flamegraph is interactive with labels and click on information.
When you have
rack-mini-profiler installed, you can put
?pp=help on the end of your URL in your browser to see the help page and what's available for you in this performance tool. This tool has many similar features as the
derailed_benchmarks gem does.
There are several paid services you can integrate with your web application that can give you similar details.
One I've used before is New Relic; it's very robust in the details it provides, including a well-tooled dashboard to work with the information. Another one that looks like a great tool is Scout.
Each of these provides insights into where your application is taking too much time, among other details. They're simple to integrate and very convenient to have.
Bare-Bones Performance Considerations
If you're looking to squeeze out a considerable amount of performance by rewriting Ruby code into C or Rust (with Rutie or Helix), then proper consideration should be given for what to write in these languages. If you have a slow method used once only in one page of your site, then the benefit of this kind of change is likely to be minimal; a refactor within Ruby may be a much better idea. But that also depends on how important and how optimized that code already is for you, so that really depends on your case.
Ruby was written to make developers happy. As such, many of the conveniences come with a very minor cost of performance. Ruby is by no means slow and is always getting faster. But when you need to glean out that extra performance, then writing in C or Rust will allow you to work outside of Ruby's object types, which each have a bit of performance overhead for the convenience that they provide.
Personally I prefer to optimize a method that's used many thousands of times over one that's only used a few times. When the impact is felt globally, it's a bit more rewarding, in my humble opinion. But obviously you are more than free to optimize what you'd like and form your own opinions.
We want our websites, and we want them now. Nothing hinders growth for a website more than unreasonable load times. Caching can go a long way to speed up your site, as it was designed specifically to deal with loading performance. But that's not applicable everywhere, and it's a system designed as a work-around.
What would be most ideal is to build your product with ease, gain the performance with ease, and not need to implement on the extra systems because your product stands well enough on its own. In a perfect world, right?
So measure where things stand, prioritize where you see the most beneficial improvement could be, and take the journey measuring each step of the way to learn and achieve what you've set out to do. And once you've done that, please write about it and share so that others can learn from your experience.