-
Notifications
You must be signed in to change notification settings - Fork 2
Dictionary and dictionary reader #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Dictionary and dictionary reader #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Left some comments, maybe some would be helpful. I understand it is work in progress.)
.rubocop.yml
Outdated
@@ -0,0 +1,18 @@ | |||
Metrics/LineLength: | |||
Max: 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much for most eyes. 100 is good.
.rubocop.yml
Outdated
Max: 100 | ||
|
||
Style/Documentation: | ||
Description: 'Document classes and non-namespace modules.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descriptions should not be here.
lib/parsers/aff_parser.rb
Outdated
private | ||
|
||
def parse_affix_line(aff_group) | ||
header = aff_group.first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do header, *rule_lines = aff_group
, then you'll not need ugly [1..-1]
below.
rules = aff_group[1..-1].each_with_object([]) do |el, arr| | ||
_, _, stripping_rule, affixes, condition = el.split(/[\s*]/) | ||
arr << { stripping_rule: stripping_rule, affixes: affixes, condition: condition } | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just map (instead of each_with_object([])
and <<
)
lib/parsers/dic_parser.rb
Outdated
@file.to_a | ||
.map { |ln| ln.tr("\n\t", '') } | ||
.yield_self do |cnt| | ||
{ approx_word_count: word_count_try(cnt) }.merge(fetch_words(cnt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe here is logic error (if the first line is not number → it is proper word, you just swallow it is unconvertible and never parse it as a word)
lib/dictionary_reader.rb
Outdated
def read_files_from_arguments(args) | ||
args.map do |arg| | ||
File.read(arg) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, you can args.map(&File.method(:read))
;)
.grep(AFFIX_REGEX) | ||
.group_by { |el| el[AFFIX_GROUP_REGEX] }.values | ||
.map(&method(:parse_affix_line)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need tap
here ;)
lib/dictionary_reader.rb
Outdated
|
||
def read_files_from_arguments(args) | ||
args.map(&File.method(:new)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does nothing worth extraction to the method, as for me :)
lib/dictionary_reader.rb
Outdated
args.map(&File.method(:new)) | ||
end | ||
|
||
def get_data_from_dic_files(dic_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need that long names for internal short methods. First, the internal names role is just to show what they do in this context, and it should be short (the whole main method's code should read like an explanation: args → read files → parse files). So the names for internal methods are enough to be "read" and "parse". But... Read below.
Dictionary.new(data) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this "array of files" approach is wrong. It is never array of files, it is exactly .aff
file and .dic
file. You needed the whole DictionaryParser
with fancy metaprogramming just to "hide" this fact, which was unnecessary (and even harmful for readability). This main parser's logic can look this way:
Dictionary.new(
aff: AffParser.parse(aff_path),
dic: DicParser.parse(dic_path)
)
...in abstract ideal world.
But, if you'll look at the task closely, you'll see that dic parsing depends on aff parsing! (first, encoding is defined in aff; then, suffix format: there could be "short", like M
, "long" -- I don't remember exactly, something like Mx
or something; and "numeric", like 19234). So the real logic of this is something more like
aff = AffParser.parse(aff_path)
dic = DicParser.parse(dic_path, aff)
Dictionary.new(aff, dic)
↑ no abstract arrays, no metaprogramming, just 3 lines of logic that reflects data structure.
lib/parsers/aff_parser.rb
Outdated
def parse_affix_line(aff_group) | ||
header, *rule_lines = aff_group | ||
name, flag, cross_product, line_count = header.split(/[\s*]/) | ||
rules = fetch_rules(rule_lines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it will be better to have rules = rule_lines.map(&method(:make_rule))
(because method map
s exactly 1 line to 1 rule, but current call sequence hides this fact)
lib/parsers/dic_parser.rb
Outdated
end | ||
|
||
def fetch_words(content) | ||
content.yield_self { |cnt| cnt - [word_count_try(cnt).to_s] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's dirty :) You just repeat this try
thing two times in two different methods... I'd extract first line into a separate variable in the main parse
method, if it mathes the "integera number" pattern.
Add
class Dictionary
for dictionary objects,class DictionaryReader
for parsing .dic and .aff files and instantiating Dictionary object.