Skip to content

Conversation

@stefannibrasil
Copy link
Contributor

@stefannibrasil stefannibrasil commented Jan 12, 2026

Motivation / Background

We want to run some experiments to improve the library's performance needs and before making any changes, we need to have baseline stats to guarantee new code does not degrade performance. Plus, we want to have benchmark scripts to eventually be part of our CI.

To keep everything in a single place, I moved the previous benchmark tasks to a folder. The goal is to use the folder to document experiment results.

Closes #3159, #3160

Inspired by ruby/json#606.

Results from running the scripts (January 12th, 2026):

Require

faker % RUBYOPT="-W0" ruby benchmark/require.rb
took 119.12799999117851ms to load

Load locales - YML vs JSON

faker % RUBYOPT="-W0" ruby benchmark/load_yml_vs_json.rb
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                 YML    37.000 i/100ms
                JSON   953.000 i/100ms
Calculating -------------------------------------
                 YML    374.033 (± 0.5%) i/s    (2.67 ms/i) -      1.887k in   5.045222s
                JSON      9.691k (± 1.1%) i/s  (103.19 μs/i) -     48.603k in   5.015937s

Comparison:
                 YML:      374.0 i/s
                JSON:     9691.0 i/s - 25.91x  faster

Generators

faker % RUBYOPT="-W0" ruby benchmark/generators.rb
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin24]
Warming up --------------------------------------
Number of generators: 659
                         1.000 i/100ms
Calculating -------------------------------------
Number of generators: 659
                         31.451 (± 9.5%) i/s   (31.80 ms/i) -    156.000 in   5.010814s

@stefannibrasil stefannibrasil force-pushed the sb-3159-benchmark-revamp branch 9 times, most recently from 8a7e4f2 to a937aac Compare January 12, 2026 23:32
@@ -0,0 +1,40 @@
# frozen_string_literal: true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from the previous benchmark rake task

@stefannibrasil stefannibrasil force-pushed the sb-3159-benchmark-revamp branch 2 times, most recently from 0d24c5e to ca35c75 Compare January 12, 2026 23:42
stefannibrasil and others added 2 commits January 12, 2026 16:49
Having these in a folder helps because
we can document experiment results in it as well.

And we can edit the require script to raise an error if it takes
longer than a threshold to load faker.

Co-Authored-By: Thiago Araujo <[email protected]>
branches: [ main ]
pull_request:
branches: [ main ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
permissions:
contents: read

eval("Faker::#{subclass}.public_methods(false) - Faker::Base.public_methods(false)").sort.map do |method|
"Faker::#{subclass}.#{method}"
end.sort
end
Copy link
Contributor

@thdaraujo thdaraujo Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need eval here - we could loop over these constants, do a const_get to get the class, then list its public methods to get generators.

def subclass_methods(subclass)
    klass = Faker.const_get(subclass)
    
    public_methods = klass.public_methods(false) - Faker::Base.public_methods(false)

    generators = public_methods.sort.map do |method|
      [klass, method]
    end
  end

Comment on lines +37 to +39
x.report("Number of generators: #{all_generators.count}") do
all_generators.each { |generator| eval(generator) }
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should build the list of generators outside so that we're only benchmarking generator execution

Suggested change
x.report("Number of generators: #{all_generators.count}") do
all_generators.each { |generator| eval(generator) }
end
generators = all_generators
x.report("Number of generators: #{all_generators.count}") do
generators.each { |klass, generator| klass.send(generator) }
end

x.report('JSON') { JSON.load_file("#{File.dirname(__FILE__)}/../test/fixtures/locales/es-MX.json") }

x.compare!(order: :baseline)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to keep this? Benchmarking json vs yaml load times is not really relevant to Faker, this was just an experiment.

Description: The use of eval represents a serious security risk.
Exclude:
- 'lib/faker/default/json.rb'
- 'benchmark/generators.rb'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need eval on generators, see previous comment

Copy link
Contributor

@thdaraujo thdaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this! do you mind adding to the description the machine you're using so we can compare apples to apples?

e.g. Apple M1 16GB memory on MacOS X.Y.Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a benchmark CI workflow

3 participants