Skip to content

Dictionary and dictionary reader #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

frolmr
Copy link

@frolmr frolmr commented Jun 28, 2018

Add class Dictionary for dictionary objects, class DictionaryReader for parsing .dic and .aff files and instantiating Dictionary object.

Copy link
Contributor

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Left some comments, maybe some would be helpful. I understand it is work in progress.)

.rubocop.yml Outdated
@@ -0,0 +1,18 @@
Metrics/LineLength:
Max: 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much for most eyes. 100 is good.

.rubocop.yml Outdated
Max: 100

Style/Documentation:
Description: 'Document classes and non-namespace modules.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Descriptions should not be here.

private

def parse_affix_line(aff_group)
header = aff_group.first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do header, *rule_lines = aff_group, then you'll not need ugly [1..-1] below.

rules = aff_group[1..-1].each_with_object([]) do |el, arr|
_, _, stripping_rule, affixes, condition = el.split(/[\s*]/)
arr << { stripping_rule: stripping_rule, affixes: affixes, condition: condition }
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just map (instead of each_with_object([]) and <<)

@file.to_a
.map { |ln| ln.tr("\n\t", '') }
.yield_self do |cnt|
{ approx_word_count: word_count_try(cnt) }.merge(fetch_words(cnt))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here is logic error (if the first line is not number → it is proper word, you just swallow it is unconvertible and never parse it as a word)

def read_files_from_arguments(args)
args.map do |arg|
File.read(arg)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you can args.map(&File.method(:read)) ;)

.grep(AFFIX_REGEX)
.group_by { |el| el[AFFIX_GROUP_REGEX] }.values
.map(&method(:parse_affix_line))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need tap here ;)


def read_files_from_arguments(args)
args.map(&File.method(:new))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does nothing worth extraction to the method, as for me :)

args.map(&File.method(:new))
end

def get_data_from_dic_files(dic_files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need that long names for internal short methods. First, the internal names role is just to show what they do in this context, and it should be short (the whole main method's code should read like an explanation: args → read files → parse files). So the names for internal methods are enough to be "read" and "parse". But... Read below.

Dictionary.new(data)
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this "array of files" approach is wrong. It is never array of files, it is exactly .aff file and .dic file. You needed the whole DictionaryParser with fancy metaprogramming just to "hide" this fact, which was unnecessary (and even harmful for readability). This main parser's logic can look this way:

Dictionary.new(
  aff: AffParser.parse(aff_path),
  dic: DicParser.parse(dic_path)
)

...in abstract ideal world.
But, if you'll look at the task closely, you'll see that dic parsing depends on aff parsing! (first, encoding is defined in aff; then, suffix format: there could be "short", like M, "long" -- I don't remember exactly, something like Mx or something; and "numeric", like 19234). So the real logic of this is something more like

aff = AffParser.parse(aff_path)
dic = DicParser.parse(dic_path, aff)
Dictionary.new(aff, dic)

↑ no abstract arrays, no metaprogramming, just 3 lines of logic that reflects data structure.

def parse_affix_line(aff_group)
header, *rule_lines = aff_group
name, flag, cross_product, line_count = header.split(/[\s*]/)
rules = fetch_rules(rule_lines)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it will be better to have rules = rule_lines.map(&method(:make_rule)) (because method maps exactly 1 line to 1 rule, but current call sequence hides this fact)

end

def fetch_words(content)
content.yield_self { |cnt| cnt - [word_count_try(cnt).to_s] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's dirty :) You just repeat this try thing two times in two different methods... I'd extract first line into a separate variable in the main parse method, if it mathes the "integera number" pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants