Skip to content

Commit bb93c28

Browse files
authored
Don't build quoted_fields array when not needed (#312)
``` N_ROWS=5000 rake benchmark:write benchmark:parse benchmark:parse_liberal_parsing benchmark:parse_quote_char_nil benchmark:parse_strip ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/write.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master generate_line: fields 29.211 29.755 i/s - 100.000 times in 3.423319s 3.360759s generate_line: Row 28.090 28.121 i/s - 100.000 times in 3.560007s 3.556013s generate_line: Hash 26.398 26.888 i/s - 100.000 times in 3.788145s 3.719147s << fields 130.692 142.421 i/s - 100.000 times in 0.765156s 0.702142s << Row 103.416 107.886 i/s - 100.000 times in 0.966972s 0.926906s << Hash 109.760 114.806 i/s - 100.000 times in 0.911082s 0.871038s << fields: write headers 131.147 141.668 i/s - 100.000 times in 0.762501s 0.705878s << Row: write headers 102.956 108.919 i/s - 100.000 times in 0.971286s 0.918117s << Hash: write headers 109.498 115.403 i/s - 100.000 times in 0.913259s 0.866528s Comparison: generate_line: fields master: 29.8 i/s csv 3.3.0: 29.2 i/s - 1.02x slower generate_line: Row master: 28.1 i/s csv 3.3.0: 28.1 i/s - 1.00x slower generate_line: Hash master: 26.9 i/s csv 3.3.0: 26.4 i/s - 1.02x slower << fields master: 142.4 i/s csv 3.3.0: 130.7 i/s - 1.09x slower << Row master: 107.9 i/s csv 3.3.0: 103.4 i/s - 1.04x slower << Hash master: 114.8 i/s csv 3.3.0: 109.8 i/s - 1.05x slower << fields: write headers master: 141.7 i/s csv 3.3.0: 131.1 i/s - 1.08x slower << Row: write headers master: 108.9 i/s csv 3.3.0: 103.0 i/s - 1.06x slower << Hash: write headers master: 115.4 i/s csv 3.3.0: 109.5 i/s - 1.05x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master unquoted 21.798 22.176 i/s - 100.000 times in 4.587570s 4.509469s quoted 11.580 12.896 i/s - 100.000 times in 8.635392s 7.754641s mixed 14.082 14.139 i/s - 100.000 times in 7.101360s 7.072725s include_col_sep 5.206 5.191 i/s - 100.000 times in 19.209061s 19.265310s include_row_sep 5.125 5.179 i/s - 100.000 times in 19.513305s 19.307953s encode_utf-8 16.247 16.221 i/s - 100.000 times in 6.154900s 6.165029s encode_sjis 16.811 16.442 i/s - 100.000 times in 5.948591s 6.082152s Comparison: unquoted master: 22.2 i/s csv 3.3.0: 21.8 i/s - 1.02x slower quoted master: 12.9 i/s csv 3.3.0: 11.6 i/s - 1.11x slower mixed master: 14.1 i/s csv 3.3.0: 14.1 i/s - 1.00x slower include_col_sep csv 3.3.0: 5.2 i/s master: 5.2 i/s - 1.00x slower include_row_sep master: 5.2 i/s csv 3.3.0: 5.1 i/s - 1.01x slower encode_utf-8 csv 3.3.0: 16.2 i/s master: 16.2 i/s - 1.00x slower encode_sjis csv 3.3.0: 16.8 i/s master: 16.4 i/s - 1.02x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_liberal_parsing.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master unquoted 8.135 8.270 i/s - 100.000 times in 12.291808s 12.091261s unquoted_backslash_quote 3.865 3.854 i/s - 100.000 times in 25.872675s 25.946134s quoted 3.627 3.598 i/s - 100.000 times in 27.572211s 27.789378s quoted_double_quote_outside_quote 2.260 2.216 i/s - 100.000 times in 44.241118s 45.117111s quoted_backslash_quote 1.795 1.789 i/s - 100.000 times in 55.721082s 55.903782s include_col_sep 3.622 3.615 i/s - 100.000 times in 27.606966s 27.664617s include_row_sep 3.575 3.611 i/s - 100.000 times in 27.970871s 27.694692s encode_utf-8 8.041 8.175 i/s - 100.000 times in 12.436682s 12.232314s encode_sjis 8.515 8.147 i/s - 100.000 times in 11.744171s 12.274468s Comparison: unquoted master: 8.3 i/s csv 3.3.0: 8.1 i/s - 1.02x slower unquoted_backslash_quote csv 3.3.0: 3.9 i/s master: 3.9 i/s - 1.00x slower quoted csv 3.3.0: 3.6 i/s master: 3.6 i/s - 1.01x slower quoted_double_quote_outside_quote csv 3.3.0: 2.3 i/s master: 2.2 i/s - 1.02x slower quoted_backslash_quote csv 3.3.0: 1.8 i/s master: 1.8 i/s - 1.00x slower include_col_sep csv 3.3.0: 3.6 i/s master: 3.6 i/s - 1.00x slower include_row_sep master: 3.6 i/s csv 3.3.0: 3.6 i/s - 1.01x slower encode_utf-8 master: 8.2 i/s csv 3.3.0: 8.0 i/s - 1.02x slower encode_sjis csv 3.3.0: 8.5 i/s master: 8.1 i/s - 1.05x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_quote_char_nil.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master without_quote_char 22.806 22.552 i/s - 100.000 times in 4.384858s 4.434139s quote_char_nil 32.576 44.911 i/s - 100.000 times in 3.069777s 2.226621s col_sep_space 12.182 12.341 i/s - 100.000 times in 8.208668s 8.102909s Comparison: without_quote_char csv 3.3.0: 22.8 i/s master: 22.6 i/s - 1.01x slower quote_char_nil master: 44.9 i/s csv 3.3.0: 32.6 i/s - 1.38x slower col_sep_space master: 12.3 i/s csv 3.3.0: 12.2 i/s - 1.01x slower ``` ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/vladimirkochnev/.asdf/installs/ruby/3.3.3/bin/ruby -v -S benchmark-driver /Users/vladimirkochnev/code/csv/benchmark/parse_strip.yaml ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23] Calculating ------------------------------------- csv 3.3.0 master default 13.025 13.075 i/s - 100.000 times in 7.677346s 7.648185s no_quote_strip 8.823 8.866 i/s - 100.000 times in 11.333992s 11.279182s Comparison: default master: 13.1 i/s csv 3.3.0: 13.0 i/s - 1.00x slower no_quote_strip master: 8.9 i/s csv 3.3.0: 8.8 i/s - 1.00x slower ```
1 parent e75132e commit bb93c28

File tree

3 files changed

+15
-11
lines changed

3 files changed

+15
-11
lines changed

lib/csv/fields_converter.rb

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ class CSV
44
# Note: Don't use this class directly. This is an internal class.
55
class FieldsConverter
66
include Enumerable
7+
8+
NO_QUOTED_FIELDS = [] # :nodoc:
9+
def NO_QUOTED_FIELDS.[](_index)
10+
false
11+
end
12+
NO_QUOTED_FIELDS.freeze
13+
714
#
815
# A CSV::FieldsConverter is a data structure for storing the
916
# fields converter properties to be passed as a parameter
@@ -44,7 +51,7 @@ def empty?
4451
@converters.empty?
4552
end
4653

47-
def convert(fields, headers, lineno, quoted_fields)
54+
def convert(fields, headers, lineno, quoted_fields=NO_QUOTED_FIELDS)
4855
return fields unless need_convert?
4956

5057
fields.collect.with_index do |field, index|

lib/csv/parser.rb

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -757,7 +757,7 @@ def prepare_header
757757
case headers
758758
when Array
759759
@raw_headers = headers
760-
quoted_fields = [false] * @raw_headers.size
760+
quoted_fields = FieldsConverter::NO_QUOTED_FIELDS
761761
@use_headers = true
762762
when String
763763
@raw_headers, quoted_fields = parse_headers(headers)
@@ -941,11 +941,9 @@ def parse_no_quote(&block)
941941
if line.empty?
942942
next if @skip_blanks
943943
row = []
944-
quoted_fields = []
945944
else
946945
line = strip_value(line)
947946
row = line.split(@split_column_separator, -1)
948-
quoted_fields = [false] * row.size
949947
if @max_field_size
950948
row.each do |column|
951949
validate_field_size(column)
@@ -959,7 +957,7 @@ def parse_no_quote(&block)
959957
end
960958
end
961959
@last_line = original_line
962-
emit_row(row, quoted_fields, &block)
960+
emit_row(row, &block)
963961
end
964962
end
965963

@@ -981,7 +979,7 @@ def parse_quotable_loose(&block)
981979
next
982980
end
983981
row = []
984-
quoted_fields = []
982+
quoted_fields = FieldsConverter::NO_QUOTED_FIELDS
985983
elsif line.include?(@cr) or line.include?(@lf)
986984
@scanner.keep_back
987985
@parse_method = :parse_quotable_robust
@@ -1043,13 +1041,13 @@ def parse_quotable_robust(&block)
10431041
quoted_fields << @quoted_column_value
10441042
elsif parse_row_end
10451043
if row.empty? and value.nil?
1046-
emit_row([], [], &block) unless @skip_blanks
1044+
emit_row(row, &block) unless @skip_blanks
10471045
else
10481046
row << value
10491047
quoted_fields << @quoted_column_value
10501048
emit_row(row, quoted_fields, &block)
10511049
row = []
1052-
quoted_fields = []
1050+
quoted_fields.clear
10531051
end
10541052
skip_needless_lines
10551053
start_row
@@ -1254,7 +1252,7 @@ def start_row
12541252
@scanner.keep_start
12551253
end
12561254

1257-
def emit_row(row, quoted_fields, &block)
1255+
def emit_row(row, quoted_fields=FieldsConverter::NO_QUOTED_FIELDS, &block)
12581256
@lineno += 1
12591257

12601258
raw_row = row

lib/csv/writer.rb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,7 @@ def <<(row)
4040
@lineno += 1
4141

4242
if @fields_converter
43-
quoted_fields = [false] * row.size
44-
row = @fields_converter.convert(row, nil, lineno, quoted_fields)
43+
row = @fields_converter.convert(row, nil, lineno)
4544
end
4645

4746
i = -1

0 commit comments

Comments
 (0)