Querying and serializing one million records in Ruby on Rails: Performance guide and best practices
I was recently tasked with querying and serializing one million rows for a live, production, public-facing web application. It had to be fast but maintainable.
- The naive solution
- What is ActiveModel and why does it have anything to do with this?
- Querying and serializations strategies
- Building the query
- The setup
- Check it out for yourself
The naive solution
Ruby on Rails allows you to easily query and serialize any number of records. Let’s pretend you’re Zillow and want to serialize
1_000_000 homes for your map —
class HomesController < ApplicationController def index homes = Home.all render json: homes end end
Problem? It’s slow. But why?
* (A) Time spent building the query * (B) Time spent in the database * (C) Time spent metaprogramming via ActiveModel and ActiveRecord * (D) Other
If you answered
(C) Time spent metaprogramming via ActiveModel and ActiveRecord, you are correct!
What is ActiveModel and why does it have anything to do with this?
[ActiveModel](https://api.rubyonrails.org/classes/ActiveModel/Model.html) is a bunch of methods that were extracted from the original ActiveRecord implementation that aren’t database-specific: things like validations and dirty-checking (
*_changed?). Coupled with
ActiveModel generates a number of methods for each column in your database. After
ActiveRecord retrieves the results, it passes each row to
ActiveModel to turn what was a hash into a pretty Ruby object for you to play with.
This post isn’t going to go into benchmarking
ActiveModel, I’ll save that for another day. Instead, we’re here to talk about various query and serialization strategies.
Querying and serializations strategies
We’re going to do some semi-scientific comparison of the following strategies — here are their descriptions, and what I’ll refer to them as for the rest of this post.
- Vanilla Rails (nothing)
- Vanilla Rails,
mapto a hash (nothing_map)
ActiveModelSerializersfor exec_query (asm_exec_query)
ActiveModelSerializersfor execute (asm_execute)
FastJSONAPIfor a hash via
FastJSONAPIfor a hash via execute (fast_execute)
And then all of these with
Oj — add a suffix of
_oj to the label. If you see a label with
_map in it, that means that the results from the query were mapped over.
Building the query
$ rails new $ rake db:setup $ rails g model Home latitude:decimal longitude:decimal $ rake db:migrate $ rake db:seed $ rails s
And then do a bunch of stuff in
homes = Home.select(:id, :latitude, :longitude).limit(1_000_000) # ... other stuff
Tested against Heroku Performance dyno with Standard 0 PostgreSQL database plan.
Heroku presents its own issue with the speed of its router but lots of real-world applications run on Heroku so I did it anyway.
|Name||total (ms)||views (ms)||db (ms)||allocations||retained memory (MB)||allocated memory (MB)|
Database is king
Letting your database do the work is always a good idea. It comes at the expense of writing SQL. For more advanced applications, this can get out of hand really quick. If you’ve never seen a 2,000-line query, consider yourself lucky. If you think that a 2,000-line query is unnecessary, trust me when I say sometimes there are no other options.
exec_query isn’t all that bad
You get to build your query with the familiar APIs and it’s memory managed unlike its brother
execute. I think that the
benefits here are obvious. It’s almost as fast at the expense of using more memory. It’s easier to build such query as well.
You could also drop down to Arel if you want but that wouldn’t necessarily net any meaningful increase since ActiveRecord
uses it under the hood anyway.
ActiveModel is expensive
Does that mean you shouldn’t use it? No.
For serializing objects, do you really need all the convenience of ActiveModel? No.
This will require some actual programming but looping through lists and matching up elements isn’t hard.
Check it out for yourself
- Wouldn’t you just cache these results anyway? Probably. But that’s not a blog post.
- But this is such a basic JSON response! Yeah, it is.
- This doesn’t reflect a real-world application. It does where I work.
- But you didn’t do this. Nope.