Dictionary and dictionary reader #1

frolmr · 2018-06-28T06:54:00Z

Add class Dictionary for dictionary objects, class DictionaryReader for parsing .dic and .aff files and instantiating Dictionary object.

zverok

(Left some comments, maybe some would be helpful. I understand it is work in progress.)

zverok · 2018-09-27T17:21:48Z

.rubocop.yml

@@ -0,0 +1,18 @@
+Metrics/LineLength:
+  Max: 120


Too much for most eyes. 100 is good.

zverok · 2018-09-27T17:22:01Z

.rubocop.yml

+  Max: 100
+
+Style/Documentation:
+  Description: 'Document classes and non-namespace modules.'


Descriptions should not be here.

.rubocop.yml

zverok · 2018-09-27T17:25:06Z

lib/parsers/aff_parser.rb

+    private
+
+    def parse_affix_line(aff_group)
+      header = aff_group.first


You can do header, *rule_lines = aff_group, then you'll not need ugly [1..-1] below.

zverok · 2018-09-27T17:25:18Z

lib/parsers/aff_parser.rb

+      rules = aff_group[1..-1].each_with_object([]) do |el, arr|
+        _, _, stripping_rule, affixes, condition = el.split(/[\s*]/)
+        arr << { stripping_rule: stripping_rule, affixes: affixes, condition: condition }
+      end


That's just map (instead of each_with_object([]) and <<)

zverok · 2018-09-27T17:28:10Z

lib/parsers/dic_parser.rb

+      @file.to_a
+           .map { |ln| ln.tr("\n\t", '') }
+           .yield_self do |cnt|
+             { approx_word_count: word_count_try(cnt) }.merge(fetch_words(cnt))


I believe here is logic error (if the first line is not number → it is proper word, you just swallow it is unconvertible and never parse it as a word)

lib/parsers/dic_parser.rb

zverok · 2018-09-27T17:34:18Z

lib/dictionary_reader.rb

+    def read_files_from_arguments(args)
+      args.map do |arg|
+        File.read(arg)
+      end


BTW, you can args.map(&File.method(:read)) ;)

lib/dictionary_reader.rb

zverok · 2018-09-27T17:35:34Z

lib/parsers/aff_parser.rb

+                             .grep(AFFIX_REGEX)
+                             .group_by { |el| el[AFFIX_GROUP_REGEX] }.values
+                             .map(&method(:parse_affix_line))
+      end


You probably don't need tap here ;)

zverok · 2018-10-09T09:10:08Z

lib/dictionary_reader.rb

+
+    def read_files_from_arguments(args)
+      args.map(&File.method(:new))
+    end


This method does nothing worth extraction to the method, as for me :)

zverok · 2018-10-09T09:12:23Z

lib/dictionary_reader.rb

+      args.map(&File.method(:new))
+    end
+
+    def get_data_from_dic_files(dic_files)


You don't need that long names for internal short methods. First, the internal names role is just to show what they do in this context, and it should be short (the whole main method's code should read like an explanation: args → read files → parse files). So the names for internal methods are enough to be "read" and "parse". But... Read below.

zverok · 2018-10-09T09:16:05Z

lib/dictionary_reader.rb

+      Dictionary.new(data)
+    end
+  end
+end


I believe that this "array of files" approach is wrong. It is never array of files, it is exactly .aff file and .dic file. You needed the whole DictionaryParser with fancy metaprogramming just to "hide" this fact, which was unnecessary (and even harmful for readability). This main parser's logic can look this way:

Dictionary.new( aff: AffParser.parse(aff_path), dic: DicParser.parse(dic_path) )

...in abstract ideal world.
But, if you'll look at the task closely, you'll see that dic parsing depends on aff parsing! (first, encoding is defined in aff; then, suffix format: there could be "short", like M, "long" -- I don't remember exactly, something like Mx or something; and "numeric", like 19234). So the real logic of this is something more like

aff = AffParser.parse(aff_path) dic = DicParser.parse(dic_path, aff) Dictionary.new(aff, dic)

↑ no abstract arrays, no metaprogramming, just 3 lines of logic that reflects data structure.

zverok · 2018-10-09T09:19:29Z

lib/parsers/aff_parser.rb

+    def parse_affix_line(aff_group)
+      header, *rule_lines = aff_group
+      name, flag, cross_product, line_count = header.split(/[\s*]/)
+      rules = fetch_rules(rule_lines)


I believe it will be better to have rules = rule_lines.map(&method(:make_rule)) (because method maps exactly 1 line to 1 rule, but current call sequence hides this fact)

zverok · 2018-10-09T09:28:42Z

lib/parsers/dic_parser.rb

+    end
+
+    def fetch_words(content)
+      content.yield_self { |cnt| cnt - [word_count_try(cnt).to_s] }


That's dirty :) You just repeat this try thing two times in two different methods... I'd extract first line into a separate variable in the main parse method, if it mathes the "integera number" pattern.

… else

frolmr added 18 commits June 28, 2018 09:20

dictiaonary parser, reader

7319c46

refactored dictionary parser, affix parser implemented

f68d3e4

refactored affix parser and reader

5f037d4

temporarily removed rubocop.yml

8e01948

basic spec config

1950567

dry-monads added to Gemfile

a5e5ec2

Total Refactoring: args_parser, dict_reader

97490c5

dry-validations gem, spec env, small fixes

885d946

refactoring continues: naming(namespaces), logic separation

a5cdaeb

specs for file_reader

1f93d23

spec for files_from_args_reader

8a83939

sh*t removed

b236713

dictionary_reader refactor

7b9f00a

refactor everything again

78737c9

refactor dic_parser

2b4153f

refactoring for affix files parser

96c8e1a

small refactor for dic parser

98cf2c5

refactor dictionary

a86f5e9

zverok reviewed Sep 27, 2018

View reviewed changes

frolmr added 3 commits October 8, 2018 14:35

specs for readers, parsers, dictionary

322fd1f

huge refactoring after comments

9dc52f1

one more small refactoring fix

e940f08

zverok reviewed Oct 10, 2018

View reviewed changes

frolmr added 2 commits October 11, 2018 17:38

refactoring for dictionary_reader and as a consequence for everything…

18f817c

… else

remove commented code

67f6760

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dictionary and dictionary reader #1

Dictionary and dictionary reader #1

Uh oh!

frolmr commented Jun 28, 2018

Uh oh!

zverok left a comment

Uh oh!

zverok Sep 27, 2018

Uh oh!

zverok Sep 27, 2018

Uh oh!

Uh oh!

zverok Sep 27, 2018

Uh oh!

zverok Sep 27, 2018

Uh oh!

zverok Sep 27, 2018

Uh oh!

Uh oh!

zverok Sep 27, 2018

Uh oh!

Uh oh!

zverok Sep 27, 2018

Uh oh!

zverok Oct 9, 2018

Uh oh!

zverok Oct 9, 2018

Uh oh!

zverok Oct 9, 2018

Uh oh!

zverok Oct 9, 2018

Uh oh!

zverok Oct 9, 2018

Uh oh!

Uh oh!

Dictionary and dictionary reader #1

Are you sure you want to change the base?

Dictionary and dictionary reader #1

Uh oh!

Conversation

frolmr commented Jun 28, 2018

Uh oh!

zverok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!