November 5, 2020; in Ruby on Rails

You probably don't need to #map or #pluck that relation

You probably don’t need to #map or #pluck that

{:.no_toc}

#map vs #pluck

On an ActiveRecord-backed ActiveModel object, they have similar behavior. Consider the following:

# == Schema Information
#
# Table name: slugs
#
#  id             :integer          not null, primary key
#  sluggable_id   :integer
#  sluggable_type :string(255)
#  name           :string(255)      not null
#  created_at     :datetime
#  updated_at     :datetime
#
class Slug < ActiveRecord::Base 
  belongs_to :sluggable, polymorphic: true 

  def to_s
    name 
  end
end

# == Schema Information
#
# Table name: posts
#
#  id             :integer          not null, primary key
#  title          :string(255)      not null
#  body           :string(65535)    not null
#  user_id        :integer          not null
#  created_at     :datetime
#  updated_at     :datetime
#
class Post < ActiveRecord::Base
  has_many :comments
  has_one :slug 

  def to_s
    title 
  end
end

# == Schema Information
#
# Table name: comments
#
#  id             :integer          not null, primary key
#  body           :string(65535)    not null
#  user_id        :integer          not null
#  created_at     :datetime
#  updated_at     :datetime
#
class Comment < ActiveRecord::Base
  belongs_to :post
end

The following methods calls return the same:

Post.all.map(&:id) [1, 2, 3, ...]
Post.all.pluck(&:id) [1, 2, 3, ...]

But they generate different SQL:

#map:

Post Load (1.4ms)  SELECT "posts".* FROM "posts"

#pluck:

Post Load (1.4ms)  SELECT "posts"."id" FROM "posts"

They also have vastly different initialization stats:

For #map:

1.624M memsize ( 53.799k retained) 10.379k objects ( 452.000 retained) 50.000 strings ( 50.000 retained)

And for #pluck:

68.088k memsize ( 488.000 retained) 1.100k objects ( 10.000 retained) 17.000 strings ( 5.000 retained)

But why?

How #map works on a ActiveRecord::Relation

ActiveRecord::Relation acts like an instance of Enumerable and delgates the Enumerable methods to the loaded array of results from the database.

ActiveRecord itself does not turn each row from your Post table into a Ruby object (in this case your Post model) — that job is left to ActiveModel (as of Rails 4).

ActiveRecord returns an ActiveRecord::Result, which looks something like this:

#<ActiveRecord::Result:0x00007fa74c443bf8
 @column_types={},
 @columns=["id", "title", "body", "user_id", "created_at", "updated_at"],
 @hash_rows=nil,
 @rows=[[1, "Good title", "This is my post", "2020-10-28 16:07:24.665399", "2020-10-28 16:07:24.665399"], ...]>

We get this via ActiveRecord::Base.connection.exec_query(Post.all.to_sql)

Only once we try to enumerate on this object does ActiveRecord::Result do anything with it. If we call #to_a we get:

[{"id"=>1,
  "title"=>"Good title",
  "body"=>"This is my post",
  "created_at"=>"2020-10-28 16:07:24.665399",
  "updated_at"=>"2020-10-28 16:07:24.665399"}]

We could work with this ourselves and instantiate our own objects based on ActiveModel::Model#new:

ActiveRecord::Base.connection.exec_query(Post.all.to_sql).to_ary.map { |post| Post.new(post) }

But where do all the methods get generated?

If we compare the cost of a model with just an id column, and Object.new, we can see the following:

Calling Model.new on a model with only an id column (no timestamps)

800.000 memsize ( 0.000 retained) 7.000 objects ( 0.000 retained) 1.000 strings ( 0.000 retained)

Calling Object.new

40.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained)

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails"
  gem "sqlite3"
  gem "benchmark-memory"
  gem 'pry'
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: "db.sqlite")
ActiveRecord::Base.logger = Logger.new(STDOUT)
if ENV['SETUP'] == 'true'
  ActiveRecord::Schema.define do
    create_table :posts, force: true do |t|
    end

    create_table :comments, force: true do |t|
      t.integer :post_id
    end
  end

  1000.times do 
    post = Post.new
    post.save
    100.times do 
      Comment.create!(post: post)
    end 
  end
end


  class Post < ActiveRecord::Base
    has_many :comments
  end

  class Comment < ActiveRecord::Base
    belongs_to :post
  end

def gc_diff(times = 1, comps = {})
  all_avgs = {}

  comps.each do |label, p|
    results = []
    times.times do 
    GC.start
    gc_start = GC.stat 
    ts = Time.now
    _pid, size_start = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
    time_start = ts.to_i * (10 ** 9) + ts.nsec
    p.call
    ts = Time.now  
    _pid, size_end = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
    time_end = ts.to_i * (10 ** 9) + ts.nsec
    gc_end = GC.stat 

    diff = {}
    gc_end.each do |k,v|
      val = gc_end[k] - gc_start[k]
      next if val.zero?
      diff[k] = val
    end
    diff[:time_in_nanoseconds] = time_end - time_start 
    diff[:memory] = size_end - size_start
    results << diff 
  end

  avgs = {}  
    results.first.keys.each do |key|
      avgs[key] = results.map { |result| result[key] }.average
    end
  all_avgs[label] = avgs
  end
  all_avgs
 end 

Benchmark.memory do |x|
  x.report("#map")  { Post.all.map(&:id) }
  x.report("#pluck") { Post.all.pluck(:id) }

  x.compare!
end