Skip to content

Dictionary and dictionary reader #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#Ignore editor backup and temporary files
*~
#*

#Ignore org files
*.org

#Ignore gemset file
.ruby-gemset
13 changes: 13 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
require: rubocop-rspec

Metrics/LineLength:
Max: 100

Metrics/MethodLength:
Max: 20

Metrics/ClassLength:
Max: 100

Style/Documentation:
Enabled: false
1 change: 1 addition & 0 deletions .ruby-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2.5.1
8 changes: 8 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# frozen_string_literal: true

source 'https://rubygems.org'

gem 'pry'
gem 'rspec'
gem 'rubocop', '~> 0.56.0', require: false
gem 'rubocop-rspec'
51 changes: 51 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
GEM
remote: https://rubygems.org/
specs:
ast (2.4.0)
coderay (1.1.2)
diff-lcs (1.3)
method_source (0.9.0)
parallel (1.12.1)
parser (2.5.1.0)
ast (~> 2.4.0)
powerpack (0.1.2)
pry (0.11.3)
coderay (~> 1.1.0)
method_source (~> 0.9.0)
rainbow (3.0.0)
rspec (3.7.0)
rspec-core (~> 3.7.0)
rspec-expectations (~> 3.7.0)
rspec-mocks (~> 3.7.0)
rspec-core (3.7.1)
rspec-support (~> 3.7.0)
rspec-expectations (3.7.0)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.7.0)
rspec-mocks (3.7.0)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.7.0)
rspec-support (3.7.1)
rubocop (0.56.0)
parallel (~> 1.10)
parser (>= 2.5)
powerpack (~> 0.1)
rainbow (>= 2.2.2, < 4.0)
ruby-progressbar (~> 1.7)
unicode-display_width (~> 1.0, >= 1.0.1)
rubocop-rspec (1.27.0)
rubocop (>= 0.56.0)
ruby-progressbar (1.9.0)
unicode-display_width (1.4.0)

PLATFORMS
ruby

DEPENDENCIES
pry
rspec
rubocop (~> 0.56.0)
rubocop-rspec

BUNDLED WITH
1.16.1
3 changes: 3 additions & 0 deletions dictionaries/personal.dic
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
Foo/Simpson
*bar
13 changes: 13 additions & 0 deletions dictionaries/test.aff
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'

REP 2
REP f ph
REP ph f

PFX A Y 1
PFX A 0 re .

SFX B Y 2
SFX B 0 ed [^y]
SFX B y ied y
4 changes: 4 additions & 0 deletions dictionaries/test.dic
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
3
hello
try/B
work/AB
Empty file added dictionaries/test.txt
Empty file.
6 changes: 6 additions & 0 deletions lib/dictionary.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# frozen_string_literal: true

require 'ostruct'

class Dictionary < OpenStruct
end
30 changes: 30 additions & 0 deletions lib/dictionary_reader.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# frozen_string_literal: true

require_relative 'parsers/dictionary_parser'
require_relative 'dictionary'

class DictionaryReader
class << self
def read(*args)
dic_files = read_files_from_arguments(args)
dic_files_data = get_data_from_dic_files(dic_files)
create_dictionary(dic_files_data)
end

private

def read_files_from_arguments(args)
args.map(&File.method(:new))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does nothing worth extraction to the method, as for me :)


def get_data_from_dic_files(dic_files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need that long names for internal short methods. First, the internal names role is just to show what they do in this context, and it should be short (the whole main method's code should read like an explanation: args → read files → parse files). So the names for internal methods are enough to be "read" and "parse". But... Read below.

dic_files.inject({}) do |acc, file|
acc.merge(Parsers::DictionaryParser.parse_file(file))
end
end

def create_dictionary(data)
Dictionary.new(data)
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this "array of files" approach is wrong. It is never array of files, it is exactly .aff file and .dic file. You needed the whole DictionaryParser with fancy metaprogramming just to "hide" this fact, which was unnecessary (and even harmful for readability). This main parser's logic can look this way:

Dictionary.new(
  aff: AffParser.parse(aff_path),
  dic: DicParser.parse(dic_path)
)

...in abstract ideal world.
But, if you'll look at the task closely, you'll see that dic parsing depends on aff parsing! (first, encoding is defined in aff; then, suffix format: there could be "short", like M, "long" -- I don't remember exactly, something like Mx or something; and "numeric", like 19234). So the real logic of this is something more like

aff = AffParser.parse(aff_path)
dic = DicParser.parse(dic_path, aff)
Dictionary.new(aff, dic)

↑ no abstract arrays, no metaprogramming, just 3 lines of logic that reflects data structure.

44 changes: 44 additions & 0 deletions lib/parsers/aff_parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# frozen_string_literal: true

module Parsers
class AffParser
AFFIX_REGEX = /^(PFX|SFX)(\s|\w|.)*/
AFFIX_GROUP_REGEX = /^(PFX|SFX)\s\w/

def initialize(file)
@file = file
end

def parse
{
affixes: @file.to_a
.map { |ln| ln.tr("\n\t", '') }
.grep(AFFIX_REGEX)
.group_by { |el| el[AFFIX_GROUP_REGEX] }.values
.map(&method(:parse_affix_line))
}
end

private

def parse_affix_line(aff_group)
header, *rule_lines = aff_group
name, flag, cross_product, line_count = header.split(/[\s*]/)
rules = fetch_rules(rule_lines)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it will be better to have rules = rule_lines.map(&method(:make_rule)) (because method maps exactly 1 line to 1 rule, but current call sequence hides this fact)

{
name: name,
flag: flag,
cross_product: cross_product,
line_count: line_count,
rules: rules
}
end

def fetch_rules(lines)
lines.map do |el|
_, _, stripping_rule, affixes, condition = el.split(/[\s*]/)
{ stripping_rule: stripping_rule, affixes: affixes, condition: condition }
end
end
end
end
37 changes: 37 additions & 0 deletions lib/parsers/dic_parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# frozen_string_literal: true

module Parsers
class DicParser
WORD_AFF_REGEX = %r{^(\*?)([^\/]+)(?:\/(.+))?$}

def initialize(file)
@file = file
end

def parse
@file.to_a
.map { |ln| ln.tr("\n\t", '') }
.yield_self do |cnt|
{ approx_word_count: word_count_try(cnt) }.merge(fetch_words(cnt))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here is logic error (if the first line is not number → it is proper word, you just swallow it is unconvertible and never parse it as a word)

end
end

private

def word_count_try(content)
Integer(content.first)
rescue ArgumentError
nil
end

def fetch_words(content)
content.yield_self { |cnt| cnt - [word_count_try(cnt).to_s] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's dirty :) You just repeat this try thing two times in two different methods... I'd extract first line into a separate variable in the main parse method, if it mathes the "integera number" pattern.

.map { |ln| ln.match(WORD_AFF_REGEX).values_at(1, 2, 3) }
.map { |star, word, affixes| star == '*' ? word : { word: word, affixes: affixes } }
.partition { |word| word.is_a?(String) }
.yield_self do |forbidden, regular|
{ words: regular, forbidden_words: forbidden }
end
end
end
end
14 changes: 14 additions & 0 deletions lib/parsers/dictionary_parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# frozen_string_literal: true

require_relative 'dic_parser'
require_relative 'aff_parser'

module Parsers
class DictionaryParser
def self.parse_file(file)
file_type = File.extname(file).delete('.')
class_name = 'Parsers::' + file_type.to_s.capitalize + 'Parser'
self.class.const_get(class_name).new(File.new(file)).parse
end
end
end
13 changes: 13 additions & 0 deletions spec/dictionaries/test_affix.aff
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'

REP 2
REP f ph
REP ph f

PFX A Y 1
PFX A 0 re .

SFX B Y 2
SFX B 0 ed [^y]
SFX B y ied y
4 changes: 4 additions & 0 deletions spec/dictionaries/test_dict.dic
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
3
hello
try/B
work/AB
Empty file.
3 changes: 3 additions & 0 deletions spec/dictionaries/test_personal.dic
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foo
Foo/Simpson
*bar
1 change: 1 addition & 0 deletions spec/dictionaries/unreadable_file
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
this file will be unreadable
57 changes: 57 additions & 0 deletions spec/dictionary_reader_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
require_relative '../lib/dictionary_reader'
require_relative './reference_parsers_results'

RSpec.describe DictionaryReader do
include_context 'reference_results'

subject { described_class.read(*args) }

context 'unsuccessfull cases:' do
context 'file doesn\'t exist' do
let(:args) { 'this/file/doesnt/exist' }

it 'returns no_such_file error' do
expect { subject }.to raise_error(Errno::ENOENT, /No such file or directory/)
end
end

context 'input wrong type' do
let(:args) { :some_symbol }

it 'returns type error' do
expect { subject }.to raise_error(TypeError, /no implicit conversion of/)
end
end

context 'file not readable' do
let(:args) { './spec/dictionaries/unreadable_file' }

before do
FileUtils.chmod 'u=,go=', args
end

after do
FileUtils.chmod 'u=wr,go=rr', args
end

it 'returns permission error' do
expect { subject }.to raise_error(Errno::EACCES, /Permission denied/)
end
end
end

context 'successfull cases' do
let(:args) { ['./spec/dictionaries/test_affix.aff', './spec/dictionaries/test_dict.dic'] }
let(:result) do
Dictionary.new(reference_reader_result)
end

it 'returns Dictionary type' do
expect(subject).to be_a(Dictionary)
end

it 'returns reference result' do
expect(subject).to eq result
end
end
end
9 changes: 9 additions & 0 deletions spec/dictionary_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
require_relative '../lib/dictionary'

RSpec.describe Dictionary do
subject { described_class }

it 'should be OpenStruct' do
expect(subject).to be < OpenStruct
end
end
18 changes: 18 additions & 0 deletions spec/parsers/aff_parser_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
require_relative '../../lib/parsers/aff_parser'
require_relative '../reference_parsers_results'

RSpec.describe Parsers::AffParser do
include_context 'reference_results'

let(:file) { File.new('./spec/dictionaries/test_affix.aff') }

subject { described_class.new(file).parse }

it 'returns hash' do
expect(subject).to be_a_kind_of(Hash)
end

it 'returns reference result' do
expect(subject).to eq reference_aff_result
end
end
26 changes: 26 additions & 0 deletions spec/parsers/dic_parser_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
require_relative '../../lib/parsers/dic_parser'
require_relative '../reference_parsers_results'

RSpec.describe Parsers::DicParser do
include_context 'reference_results'

let(:file) { File.new('./spec/dictionaries/test_dict.dic') }

subject { described_class.new(file).parse }

it 'returns hash' do
expect(subject).to be_a_kind_of(Hash)
end

it 'returns reference result' do
expect(subject).to eq reference_dic_result
end

context 'personal dictionary' do
let(:file) { File.new('./spec/dictionaries/test_personal.dic') }

it 'returns reference result' do
expect(subject).to eq reference_personal_dic_result
end
end
end
Loading