I was recently tasked with querying and serializing one million rows for a live, production, public-facing web application. It had to be fast but maintainable.
- The naive solution
- What is ActiveModel and why does it have anything to do with this?
- Querying and serializations strategies
- Building the query
- The setup
- Results
- Conclusions
- Check it out for yourself
- Disclaimer
The naive solution
Ruby on Rails allows you to easily query and serialize any number of records. Let’s pretend you’re Zillow and want to serialize 1_000_000
homes for your map — id
, latitude
, longitude
.
class HomesController < ApplicationController
def index
homes = Home.all
render json: homes
end
end
Bam, done.
Problem? It’s slow. But why?
* (A) Time spent building the query
* (B) Time spent in the database
* (C) Time spent metaprogramming via ActiveModel and ActiveRecord
* (D) Other
If you answered (C) Time spent metaprogramming via ActiveModel and ActiveRecord
, you are correct!
What is ActiveModel and why does it have anything to do with this?
[ActiveModel](https://api.rubyonrails.org/classes/ActiveModel/Model.html)
is a bunch of methods that were extracted from the original ActiveRecord implementation that aren’t database-specific: things like validations and dirty-checking (*_changed?
). Coupled with ActiveRecord
, ActiveModel
generates a number of methods for each column in your database. After ActiveRecord
retrieves the results, it passes each row to ActiveModel
to turn what was a hash into a pretty Ruby object for you to play with.
This post isn’t going to go into benchmarking ActiveModel
, I’ll save that for another day. Instead, we’re here to talk about various query and serialization strategies.
Querying and serializations strategies
We’re going to do some semi-scientific comparison of the following strategies — here are their descriptions, and what I’ll refer to them as for the rest of this post.
- Vanilla Rails (nothing)
- Vanilla Rails,
map
to a hash (nothing_map) ActiveModelSerializers
for anActiveModel
object (asm)ActiveModelSerializers
for exec_query (asm_exec_query)ActiveModelSerializers
for execute (asm_execute)FastJSONAPI
for anActiveModel
object (fast)FastJSONAPI
for a hash viaexec_query
(fast_exec_query)FastJSONAPI
for a hash via execute (fast_execute)ActiveRecord::Connection.execute#to_a
(execute)ActiveRecord::Connection.exec_query#to_a
(exec_query)- PostgreSQL
json_build_object
andjson_agg
(pg)
And then all of these with Oj
— add a suffix of _oj
to the label. If you see a label with _map
in it, that means that the results from the query were mapped over.
Building the query
Effectively,
$ rails new
$ rake db:setup
$ rails g model Home latitude:decimal longitude:decimal
$ rake db:migrate
$ rake db:seed
$ rails s
And then do a bunch of stuff in HomesController#index
:
homes = Home.select(:id, :latitude, :longitude).limit(1_000_000)
# ... other stuff
The setup
Tested against Heroku Performance dyno with Standard 0 PostgreSQL database plan.
Heroku presents its own issue with the speed of its router but lots of real-world applications run on Heroku so I did it anyway.
Results
Name | total (ms) | views (ms) | db (ms) | allocations | retained memory (MB) | allocated memory (MB) |
---|---|---|---|---|---|---|
nothing | 4531.0 | 770.64 | 16.08 | 2437625.6 | 182.0 | 5685.0 |
nothing_oj | 3506.2 | 0.24 | 14.6 | 1525946.8 | 182.0 | 4293.0 |
nothing_map | 3981.0 | 420.26 | 13.98 | 2056337.4 | 182.0 | 4749.0 |
nothing_map_oj | 3342.8 | 0.24 | 15.0 | 1475842.0 | 182.0 | 4038.0 |
asm | 5443.4 | 0.28 | 15.24 | 2567475.2 | 182.0 | 6037.0 |
asm_oj | 5358.2 | 0.24 | 28.38 | 2587581.8 | 182.0 | 6891.0 |
asm_exec_query | 3513.6 | 0.2 | 15.4 | 2037874.2 | 68.4 | 4484.4 |
asm_exec_query_oj | 2167.4 | 0.26 | 23.1 | 1116155.2 | 63.0 | 3239.0 |
asm_execute | 3380.0 | 412.02 | 20.8 | 2017063.2 | 63.0 | 4524.8 |
asm_execute_oj | 1938.2 | 0.22 | 15.94 | 1085954.4 | 63.0 | 3286.0 |
fast | 6109.2 | 0.22 | 18.04 | 3467958.8 | 231.6 | 7666.4 |
fast_oj | 3960.6 | 0.2 | 14.8 | 1826223.2 | 226.0 | 4673.0 |
fast_map | 8216.8 | 961.2 | 12.94 | 4236742.8 | 235.0 | 10046.0 |
fast_map_oj | 4355.6 | 0.2 | 13.48 | 2126065.2 | 235.0 | 5999.0 |
fast_exec_query | 2598.2 | 361.86 | 13.62 | 1446672.4 | 63.0 | 3006.0 |
fast_exec_query_oj | 6997.4 | 0.36 | 14.3 | 3567762.0 | 108.4 | 8671.0 |
fast_execute | 5980.8 | 0.22 | 13.7 | 3306151.2 | 103.0 | 8089.4 |
fast_execute_oj | 1460.6 | 0.2 | 12.06 | 835860.8 | 103.0 | 2662.0 |
execute | 2435.0 | 0.2 | 13.62 | 1375855.6 | 63.0 | 3046.0 |
execute_oj | 590.4 | 0.3 | 17.5 | 175599.0 | 63.0 | 484.0 |
execute_map | 2615.4 | 0.24 | 22.5 | 1495864.8 | 63.0 | 3405.0 |
execute_map_oj | 780.8 | 0.24 | 18.84 | 295621.2 | 63.0 | 843.0 |
exec_query | 2371.6 | 0.24 | 21.18 | 1406075.8 | 63.0 | 2998.0 |
exec_query_oj | 635.0 | 0.28 | 16.52 | 205839.0 | 63.0 | 436.0 |
exec_query_map | 2689.6 | 0.22 | 13.88 | 1526090.4 | 63.0 | 3357.0 |
exec_query_map_oj | 914.6 | 0.22 | 13.04 | 325869.4 | 63.0 | 795.0 |
pg | 545.0 | 0.4 | 69.26 | 5811.2 | 72.0 | 78.0 |
Conclusions
Database is king
Letting your database do the work is always a good idea. It comes at the expense of writing SQL. For more advanced applications, this can get out of hand really quick. If you’ve never seen a 2,000-line query, consider yourself lucky. If you think that a 2,000-line query is unnecessary, trust me when I say sometimes there are no other options.
exec_query isn’t all that bad
You get to build your query with the familiar APIs and it’s memory managed unlike its brother execute
. I think that the
benefits here are obvious. It’s almost as fast at the expense of using more memory. It’s easier to build such query as well.
You could also drop down to Arel if you want but that wouldn’t necessarily net any meaningful increase since ActiveRecord
uses it under the hood anyway.
ActiveModel is expensive
Does that mean you shouldn’t use it? No.
For serializing objects, do you really need all the convenience of ActiveModel? No.
Handling associations
This will require some actual programming but looping through lists and matching up elements isn’t hard.
Check it out for yourself
Check out the repo joshmn/json-benchmark.
Disclaimer
- Wouldn’t you just cache these results anyway? Probably. But that’s not a blog post.
- But this is such a basic JSON response! Yeah, it is.
- This doesn’t reflect a real-world application. It does where I work.
- But you didn’t do this. Nope.