You probably don’t need to #map or #pluck that
{:.no_toc}
#map vs #pluck
On an ActiveRecord-backed ActiveModel object, they have similar behavior. Consider the following:
# == Schema Information
#
# Table name: slugs
#
# id :integer not null, primary key
# sluggable_id :integer
# sluggable_type :string(255)
# name :string(255) not null
# created_at :datetime
# updated_at :datetime
#
class Slug < ActiveRecord::Base
belongs_to :sluggable, polymorphic: true
def to_s
name
end
end
# == Schema Information
#
# Table name: posts
#
# id :integer not null, primary key
# title :string(255) not null
# body :string(65535) not null
# user_id :integer not null
# created_at :datetime
# updated_at :datetime
#
class Post < ActiveRecord::Base
has_many :comments
has_one :slug
def to_s
title
end
end
# == Schema Information
#
# Table name: comments
#
# id :integer not null, primary key
# body :string(65535) not null
# user_id :integer not null
# created_at :datetime
# updated_at :datetime
#
class Comment < ActiveRecord::Base
belongs_to :post
end
The following methods calls return the same:
Post.all.map(&:id) [1, 2, 3, ...]
Post.all.pluck(&:id) [1, 2, 3, ...]
But they generate different SQL:
#map
:
Post Load (1.4ms) SELECT "posts".* FROM "posts"
#pluck
:
Post Load (1.4ms) SELECT "posts"."id" FROM "posts"
They also have vastly different initialization stats:
For #map
:
1.624M memsize ( 53.799k retained)
10.379k objects ( 452.000 retained)
50.000 strings ( 50.000 retained)
And for #pluck
:
68.088k memsize ( 488.000 retained)
1.100k objects ( 10.000 retained)
17.000 strings ( 5.000 retained)
But why?
How #map
works on a ActiveRecord::Relation
ActiveRecord::Relation
acts like an instance of Enumerable
and delgates the Enumerable methods to the loaded array of results from the database.
ActiveRecord
itself does not turn each row from your Post
table into a Ruby object (in this case your Post
model) — that job is left to ActiveModel
(as of Rails 4).
ActiveRecord
returns an ActiveRecord::Result
, which looks something like this:
#<ActiveRecord::Result:0x00007fa74c443bf8
@column_types={},
@columns=["id", "title", "body", "user_id", "created_at", "updated_at"],
@hash_rows=nil,
@rows=[[1, "Good title", "This is my post", "2020-10-28 16:07:24.665399", "2020-10-28 16:07:24.665399"], ...]>
We get this via ActiveRecord::Base.connection.exec_query(Post.all.to_sql)
Only once we try to enumerate on this object does ActiveRecord::Result
do anything with it. If we call #to_a
we get:
[{"id"=>1,
"title"=>"Good title",
"body"=>"This is my post",
"created_at"=>"2020-10-28 16:07:24.665399",
"updated_at"=>"2020-10-28 16:07:24.665399"}]
We could work with this ourselves and instantiate our own objects based on ActiveModel::Model#new
:
ActiveRecord::Base.connection.exec_query(Post.all.to_sql).to_ary.map { |post| Post.new(post) }
But where do all the methods get generated?
If we compare the cost of a model with just an id
column, and Object.new
, we can see the following:
Calling Model.new
on a model with only an id
column (no timestamps)
800.000 memsize ( 0.000 retained)
7.000 objects ( 0.000 retained)
1.000 strings ( 0.000 retained)
Calling Object.new
40.000 memsize ( 0.000 retained)
1.000 objects ( 0.000 retained)
0.000 strings ( 0.000 retained)
# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "rails", github: "rails/rails"
gem "sqlite3"
gem "benchmark-memory"
gem 'pry'
end
require "active_record"
require "minitest/autorun"
require "logger"
# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: "db.sqlite")
ActiveRecord::Base.logger = Logger.new(STDOUT)
if ENV['SETUP'] == 'true'
ActiveRecord::Schema.define do
create_table :posts, force: true do |t|
end
create_table :comments, force: true do |t|
t.integer :post_id
end
end
1000.times do
post = Post.new
post.save
100.times do
Comment.create!(post: post)
end
end
end
class Post < ActiveRecord::Base
has_many :comments
end
class Comment < ActiveRecord::Base
belongs_to :post
end
def gc_diff(times = 1, comps = {})
all_avgs = {}
comps.each do |label, p|
results = []
times.times do
GC.start
gc_start = GC.stat
ts = Time.now
_pid, size_start = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
time_start = ts.to_i * (10 ** 9) + ts.nsec
p.call
ts = Time.now
_pid, size_end = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
time_end = ts.to_i * (10 ** 9) + ts.nsec
gc_end = GC.stat
diff = {}
gc_end.each do |k,v|
val = gc_end[k] - gc_start[k]
next if val.zero?
diff[k] = val
end
diff[:time_in_nanoseconds] = time_end - time_start
diff[:memory] = size_end - size_start
results << diff
end
avgs = {}
results.first.keys.each do |key|
avgs[key] = results.map { |result| result[key] }.average
end
all_avgs[label] = avgs
end
all_avgs
end
Benchmark.memory do |x|
x.report("#map") { Post.all.map(&:id) }
x.report("#pluck") { Post.all.pluck(:id) }
x.compare!
end