Skip to content

Commit e75132e

Browse files
authored
Avoid dynamic parse method dispatch for faster access (#311)
On some benchmarks it seems to make a difference: - `quoted` from `benchmark/parse.yaml` - `quote_char_nil` from `benchmark/parse_quote_char_nil.yaml` ``` N_ROWS=5000 rake benchmark:parse benchmark:parse_liberal_parsing benchmark:parse_quote_char_nil benchmark:parse_strip ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master unquoted 22.147 22.131 i/s - 100.000 times in 4.515187s 4.518589s quoted 11.517 12.997 i/s - 100.000 times in 8.682986s 7.694298s mixed 14.097 13.964 i/s - 100.000 times in 7.093660s 7.161389s include_col_sep 5.214 5.188 i/s - 100.000 times in 19.178537s 19.277059s include_row_sep 5.195 5.101 i/s - 100.000 times in 19.250419s 19.605061s encode_utf-8 16.030 15.984 i/s - 100.000 times in 6.238449s 6.256427s encode_sjis 16.546 16.376 i/s - 100.000 times in 6.043603s 6.106603s Comparison: unquoted csv 3.3.0: 22.1 i/s master: 22.1 i/s - 1.00x slower quoted master: 13.0 i/s csv 3.3.0: 11.5 i/s - 1.13x slower mixed csv 3.3.0: 14.1 i/s master: 14.0 i/s - 1.01x slower include_col_sep csv 3.3.0: 5.2 i/s master: 5.2 i/s - 1.01x slower include_row_sep csv 3.3.0: 5.2 i/s master: 5.1 i/s - 1.02x slower encode_utf-8 csv 3.3.0: 16.0 i/s master: 16.0 i/s - 1.00x slower encode_sjis csv 3.3.0: 16.5 i/s master: 16.4 i/s - 1.01x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_liberal_parsing.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master unquoted 8.132 8.250 i/s - 100.000 times in 12.297793s 12.121689s unquoted_backslash_quote 3.868 3.866 i/s - 100.000 times in 25.849956s 25.869413s quoted 3.642 3.638 i/s - 100.000 times in 27.454032s 27.484247s quoted_double_quote_outside_quote 2.277 2.202 i/s - 100.000 times in 43.921488s 45.419138s quoted_backslash_quote 1.801 1.803 i/s - 100.000 times in 55.522265s 55.464641s include_col_sep 3.644 3.633 i/s - 100.000 times in 27.440353s 27.528626s include_row_sep 3.629 3.614 i/s - 100.000 times in 27.559354s 27.670274s encode_utf-8 8.149 8.136 i/s - 100.000 times in 12.270936s 12.290646s encode_sjis 8.527 8.425 i/s - 100.000 times in 11.727969s 11.868855s Comparison: unquoted master: 8.2 i/s csv 3.3.0: 8.1 i/s - 1.01x slower unquoted_backslash_quote csv 3.3.0: 3.9 i/s master: 3.9 i/s - 1.00x slower quoted csv 3.3.0: 3.6 i/s master: 3.6 i/s - 1.00x slower quoted_double_quote_outside_quote csv 3.3.0: 2.3 i/s master: 2.2 i/s - 1.03x slower quoted_backslash_quote master: 1.8 i/s csv 3.3.0: 1.8 i/s - 1.00x slower include_col_sep csv 3.3.0: 3.6 i/s master: 3.6 i/s - 1.00x slower include_row_sep csv 3.3.0: 3.6 i/s master: 3.6 i/s - 1.00x slower encode_utf-8 csv 3.3.0: 8.1 i/s master: 8.1 i/s - 1.00x slower encode_sjis csv 3.3.0: 8.5 i/s master: 8.4 i/s - 1.01x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_quote_char_nil.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master without_quote_char 22.840 22.844 i/s - 100.000 times in 4.378284s 4.377488s quote_char_nil 32.370 43.729 i/s - 100.000 times in 3.089285s 2.286831s col_sep_space 12.135 12.106 i/s - 100.000 times in 8.240368s 8.260030s Comparison: without_quote_char master: 22.8 i/s csv 3.3.0: 22.8 i/s - 1.00x slower quote_char_nil master: 43.7 i/s csv 3.3.0: 32.4 i/s - 1.35x slower col_sep_space csv 3.3.0: 12.1 i/s master: 12.1 i/s - 1.00x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_strip.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master default 13.132 13.043 i/s - 100.000 times in 7.615051s 7.667227s no_quote_strip 8.955 8.957 i/s - 100.000 times in 11.167272s 11.164189s Comparison: default csv 3.3.0: 13.1 i/s master: 13.0 i/s - 1.01x slower no_quote_strip master: 9.0 i/s csv 3.3.0: 9.0 i/s - 1.00x slower ```
1 parent 4534f35 commit e75132e

File tree

9 files changed

+26
-13
lines changed

9 files changed

+26
-13
lines changed

benchmark/convert_nil.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ contexts:
44
csv: 3.0.1
55
- gems:
66
csv: 3.0.2
7+
- gems:
8+
csv: 3.3.0
79
- name: "master"
810
prelude: |
911
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/parse.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ contexts:
44
csv: 3.0.1
55
- gems:
66
csv: 3.0.2
7+
- gems:
8+
csv: 3.3.0
79
- name: "master"
810
prelude: |
911
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/parse_liberal_parsing.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ loop_count: 100
22
contexts:
33
- gems:
44
csv: 3.0.2
5+
- gems:
6+
csv: 3.3.0
57
- name: "master"
68
prelude: |
79
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/parse_quote_char_nil.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
loop_count: 100
22
contexts:
3+
- gems:
4+
csv: 3.3.0
35
- name: "master"
46
prelude: |
57
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/parse_strip.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
loop_count: 100
22
contexts:
3+
- gems:
4+
csv: 3.3.0
35
- name: "master"
46
prelude: |
57
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/read.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ contexts:
44
csv: 3.0.1
55
- gems:
66
csv: 3.0.2
7+
- gems:
8+
csv: 3.3.0
79
- name: "master"
810
prelude: |
911
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/shift.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ contexts:
44
csv: 3.0.1
55
- gems:
66
csv: 3.0.2
7+
- gems:
8+
csv: 3.3.0
79
- name: "master"
810
prelude: |
911
$LOAD_PATH.unshift(File.expand_path("lib"))

benchmark/write.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ contexts:
44
csv: 3.0.1
55
- gems:
66
csv: 3.0.2
7+
- gems:
8+
csv: 3.3.0
79
- name: "master"
810
prelude: |
911
$LOAD_PATH.unshift(File.expand_path("lib"))

lib/csv/parser.rb

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -409,13 +409,7 @@ def parse(&block)
409409

410410
begin
411411
@scanner ||= build_scanner
412-
if quote_character.nil?
413-
parse_no_quote(&block)
414-
elsif @need_robust_parsing
415-
parse_quotable_robust(&block)
416-
else
417-
parse_quotable_loose(&block)
418-
end
412+
__send__(@parse_method, &block)
419413
rescue InvalidEncoding
420414
if @scanner
421415
ignore_broken_line
@@ -459,7 +453,6 @@ def prepare
459453
end
460454

461455
def prepare_variable
462-
@need_robust_parsing = false
463456
@encoding = @options[:encoding]
464457
liberal_parsing = @options[:liberal_parsing]
465458
if liberal_parsing
@@ -472,7 +465,6 @@ def prepare_variable
472465
@double_quote_outside_quote = false
473466
@backslash_quote = false
474467
end
475-
@need_robust_parsing = true
476468
else
477469
@liberal_parsing = false
478470
@backslash_quote = false
@@ -554,15 +546,13 @@ def prepare_strip
554546
@rstrip_value = Regexp.new(@escaped_strip +
555547
"+\\z".encode(@encoding))
556548
end
557-
@need_robust_parsing = true
558549
elsif @strip
559550
strip_values = " \t\f\v"
560551
@escaped_strip = strip_values.encode(@encoding)
561552
if @quote_character
562553
@strip_value = Regexp.new("[#{strip_values}]+".encode(@encoding))
563554
@rstrip_value = Regexp.new("[#{strip_values}]+\\z".encode(@encoding))
564555
end
565-
@need_robust_parsing = true
566556
end
567557
end
568558

@@ -808,6 +798,13 @@ def adjust_headers(headers, quoted_fields)
808798

809799
def prepare_parser
810800
@may_quoted = may_quoted?
801+
if @quote_character.nil?
802+
@parse_method = :parse_no_quote
803+
elsif @liberal_parsing or @strip
804+
@parse_method = :parse_quotable_robust
805+
else
806+
@parse_method = :parse_quotable_loose
807+
end
811808
end
812809

813810
def may_quoted?
@@ -987,7 +984,7 @@ def parse_quotable_loose(&block)
987984
quoted_fields = []
988985
elsif line.include?(@cr) or line.include?(@lf)
989986
@scanner.keep_back
990-
@need_robust_parsing = true
987+
@parse_method = :parse_quotable_robust
991988
return parse_quotable_robust(&block)
992989
else
993990
row = line.split(@split_column_separator, -1)
@@ -1011,7 +1008,7 @@ def parse_quotable_loose(&block)
10111008
row[i] = column[1..-2]
10121009
else
10131010
@scanner.keep_back
1014-
@need_robust_parsing = true
1011+
@parse_method = :parse_quotable_robust
10151012
return parse_quotable_robust(&block)
10161013
end
10171014
validate_field_size(row[i])

0 commit comments

Comments
 (0)