-
Notifications
You must be signed in to change notification settings - Fork 2
Dictionary and dictionary reader #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 21 commits
7319c46
f68d3e4
5f037d4
8e01948
1950567
a5e5ec2
97490c5
885d946
a5cdaeb
1f93d23
8a83939
b236713
7b9f00a
78737c9
2b4153f
96c8e1a
98cf2c5
a86f5e9
322fd1f
9dc52f1
e940f08
18f817c
67f6760
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#Ignore editor backup and temporary files | ||
*~ | ||
#* | ||
|
||
#Ignore org files | ||
*.org | ||
|
||
#Ignore gemset file | ||
.ruby-gemset |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
require: rubocop-rspec | ||
|
||
Metrics/LineLength: | ||
Max: 100 | ||
|
||
Metrics/MethodLength: | ||
Max: 20 | ||
|
||
Metrics/ClassLength: | ||
Max: 100 | ||
|
||
Style/Documentation: | ||
Enabled: false | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
2.5.1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# frozen_string_literal: true | ||
|
||
source 'https://rubygems.org' | ||
|
||
gem 'pry' | ||
gem 'rspec' | ||
gem 'rubocop', '~> 0.56.0', require: false | ||
gem 'rubocop-rspec' |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
GEM | ||
remote: https://rubygems.org/ | ||
specs: | ||
ast (2.4.0) | ||
coderay (1.1.2) | ||
diff-lcs (1.3) | ||
method_source (0.9.0) | ||
parallel (1.12.1) | ||
parser (2.5.1.0) | ||
ast (~> 2.4.0) | ||
powerpack (0.1.2) | ||
pry (0.11.3) | ||
coderay (~> 1.1.0) | ||
method_source (~> 0.9.0) | ||
rainbow (3.0.0) | ||
rspec (3.7.0) | ||
rspec-core (~> 3.7.0) | ||
rspec-expectations (~> 3.7.0) | ||
rspec-mocks (~> 3.7.0) | ||
rspec-core (3.7.1) | ||
rspec-support (~> 3.7.0) | ||
rspec-expectations (3.7.0) | ||
diff-lcs (>= 1.2.0, < 2.0) | ||
rspec-support (~> 3.7.0) | ||
rspec-mocks (3.7.0) | ||
diff-lcs (>= 1.2.0, < 2.0) | ||
rspec-support (~> 3.7.0) | ||
rspec-support (3.7.1) | ||
rubocop (0.56.0) | ||
parallel (~> 1.10) | ||
parser (>= 2.5) | ||
powerpack (~> 0.1) | ||
rainbow (>= 2.2.2, < 4.0) | ||
ruby-progressbar (~> 1.7) | ||
unicode-display_width (~> 1.0, >= 1.0.1) | ||
rubocop-rspec (1.27.0) | ||
rubocop (>= 0.56.0) | ||
ruby-progressbar (1.9.0) | ||
unicode-display_width (1.4.0) | ||
|
||
PLATFORMS | ||
ruby | ||
|
||
DEPENDENCIES | ||
pry | ||
rspec | ||
rubocop (~> 0.56.0) | ||
rubocop-rspec | ||
|
||
BUNDLED WITH | ||
1.16.1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
foo | ||
Foo/Simpson | ||
*bar |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
SET UTF-8 | ||
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ' | ||
|
||
REP 2 | ||
REP f ph | ||
REP ph f | ||
|
||
PFX A Y 1 | ||
PFX A 0 re . | ||
|
||
SFX B Y 2 | ||
SFX B 0 ed [^y] | ||
SFX B y ied y |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
3 | ||
hello | ||
try/B | ||
work/AB |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'ostruct' | ||
|
||
class Dictionary < OpenStruct | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# frozen_string_literal: true | ||
|
||
require_relative 'parsers/dictionary_parser' | ||
require_relative 'dictionary' | ||
|
||
class DictionaryReader | ||
class << self | ||
def read(*args) | ||
dic_files = read_files_from_arguments(args) | ||
dic_files_data = get_data_from_dic_files(dic_files) | ||
create_dictionary(dic_files_data) | ||
end | ||
|
||
private | ||
|
||
def read_files_from_arguments(args) | ||
args.map(&File.method(:new)) | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method does nothing worth extraction to the method, as for me :) |
||
|
||
def get_data_from_dic_files(dic_files) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need that long names for internal short methods. First, the internal names role is just to show what they do in this context, and it should be short (the whole main method's code should read like an explanation: args → read files → parse files). So the names for internal methods are enough to be "read" and "parse". But... Read below. |
||
dic_files.inject({}) do |acc, file| | ||
acc.merge(Parsers::DictionaryParser.parse_file(file)) | ||
end | ||
zverok marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
|
||
def create_dictionary(data) | ||
Dictionary.new(data) | ||
end | ||
end | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe that this "array of files" approach is wrong. It is never array of files, it is exactly Dictionary.new(
aff: AffParser.parse(aff_path),
dic: DicParser.parse(dic_path)
) ...in abstract ideal world. aff = AffParser.parse(aff_path)
dic = DicParser.parse(dic_path, aff)
Dictionary.new(aff, dic) ↑ no abstract arrays, no metaprogramming, just 3 lines of logic that reflects data structure. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# frozen_string_literal: true | ||
|
||
module Parsers | ||
class AffParser | ||
AFFIX_REGEX = /^(PFX|SFX)(\s|\w|.)*/ | ||
AFFIX_GROUP_REGEX = /^(PFX|SFX)\s\w/ | ||
|
||
def initialize(file) | ||
@file = file | ||
end | ||
|
||
def parse | ||
{ | ||
affixes: @file.to_a | ||
.map { |ln| ln.tr("\n\t", '') } | ||
.grep(AFFIX_REGEX) | ||
.group_by { |el| el[AFFIX_GROUP_REGEX] }.values | ||
.map(&method(:parse_affix_line)) | ||
} | ||
end | ||
|
||
private | ||
|
||
def parse_affix_line(aff_group) | ||
header, *rule_lines = aff_group | ||
name, flag, cross_product, line_count = header.split(/[\s*]/) | ||
rules = fetch_rules(rule_lines) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it will be better to have |
||
{ | ||
name: name, | ||
flag: flag, | ||
cross_product: cross_product, | ||
line_count: line_count, | ||
rules: rules | ||
} | ||
end | ||
|
||
def fetch_rules(lines) | ||
lines.map do |el| | ||
_, _, stripping_rule, affixes, condition = el.split(/[\s*]/) | ||
{ stripping_rule: stripping_rule, affixes: affixes, condition: condition } | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# frozen_string_literal: true | ||
|
||
module Parsers | ||
class DicParser | ||
WORD_AFF_REGEX = %r{^(\*?)([^\/]+)(?:\/(.+))?$} | ||
|
||
def initialize(file) | ||
@file = file | ||
end | ||
|
||
def parse | ||
@file.to_a | ||
.map { |ln| ln.tr("\n\t", '') } | ||
.yield_self do |cnt| | ||
{ approx_word_count: word_count_try(cnt) }.merge(fetch_words(cnt)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe here is logic error (if the first line is not number → it is proper word, you just swallow it is unconvertible and never parse it as a word) |
||
end | ||
end | ||
|
||
private | ||
|
||
def word_count_try(content) | ||
Integer(content.first) | ||
rescue ArgumentError | ||
nil | ||
end | ||
|
||
def fetch_words(content) | ||
content.yield_self { |cnt| cnt - [word_count_try(cnt).to_s] } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's dirty :) You just repeat this |
||
.map { |ln| ln.match(WORD_AFF_REGEX).values_at(1, 2, 3) } | ||
.map { |star, word, affixes| star == '*' ? word : { word: word, affixes: affixes } } | ||
.partition { |word| word.is_a?(String) } | ||
.yield_self do |forbidden, regular| | ||
{ words: regular, forbidden_words: forbidden } | ||
end | ||
end | ||
zverok marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# frozen_string_literal: true | ||
|
||
require_relative 'dic_parser' | ||
require_relative 'aff_parser' | ||
|
||
module Parsers | ||
class DictionaryParser | ||
def self.parse_file(file) | ||
file_type = File.extname(file).delete('.') | ||
class_name = 'Parsers::' + file_type.to_s.capitalize + 'Parser' | ||
self.class.const_get(class_name).new(File.new(file)).parse | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
SET UTF-8 | ||
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ' | ||
|
||
REP 2 | ||
REP f ph | ||
REP ph f | ||
|
||
PFX A Y 1 | ||
PFX A 0 re . | ||
|
||
SFX B Y 2 | ||
SFX B 0 ed [^y] | ||
SFX B y ied y |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
3 | ||
hello | ||
try/B | ||
work/AB |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
foo | ||
Foo/Simpson | ||
*bar |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
this file will be unreadable |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
require_relative '../lib/dictionary_reader' | ||
require_relative './reference_parsers_results' | ||
|
||
RSpec.describe DictionaryReader do | ||
include_context 'reference_results' | ||
|
||
subject { described_class.read(*args) } | ||
|
||
context 'unsuccessfull cases:' do | ||
context 'file doesn\'t exist' do | ||
let(:args) { 'this/file/doesnt/exist' } | ||
|
||
it 'returns no_such_file error' do | ||
expect { subject }.to raise_error(Errno::ENOENT, /No such file or directory/) | ||
end | ||
end | ||
|
||
context 'input wrong type' do | ||
let(:args) { :some_symbol } | ||
|
||
it 'returns type error' do | ||
expect { subject }.to raise_error(TypeError, /no implicit conversion of/) | ||
end | ||
end | ||
|
||
context 'file not readable' do | ||
let(:args) { './spec/dictionaries/unreadable_file' } | ||
|
||
before do | ||
FileUtils.chmod 'u=,go=', args | ||
end | ||
|
||
after do | ||
FileUtils.chmod 'u=wr,go=rr', args | ||
end | ||
|
||
it 'returns permission error' do | ||
expect { subject }.to raise_error(Errno::EACCES, /Permission denied/) | ||
end | ||
end | ||
end | ||
|
||
context 'successfull cases' do | ||
let(:args) { ['./spec/dictionaries/test_affix.aff', './spec/dictionaries/test_dict.dic'] } | ||
let(:result) do | ||
Dictionary.new(reference_reader_result) | ||
end | ||
|
||
it 'returns Dictionary type' do | ||
expect(subject).to be_a(Dictionary) | ||
end | ||
|
||
it 'returns reference result' do | ||
expect(subject).to eq result | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
require_relative '../lib/dictionary' | ||
|
||
RSpec.describe Dictionary do | ||
subject { described_class } | ||
|
||
it 'should be OpenStruct' do | ||
expect(subject).to be < OpenStruct | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
require_relative '../../lib/parsers/aff_parser' | ||
require_relative '../reference_parsers_results' | ||
|
||
RSpec.describe Parsers::AffParser do | ||
include_context 'reference_results' | ||
|
||
let(:file) { File.new('./spec/dictionaries/test_affix.aff') } | ||
|
||
subject { described_class.new(file).parse } | ||
|
||
it 'returns hash' do | ||
expect(subject).to be_a_kind_of(Hash) | ||
end | ||
|
||
it 'returns reference result' do | ||
expect(subject).to eq reference_aff_result | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
require_relative '../../lib/parsers/dic_parser' | ||
require_relative '../reference_parsers_results' | ||
|
||
RSpec.describe Parsers::DicParser do | ||
include_context 'reference_results' | ||
|
||
let(:file) { File.new('./spec/dictionaries/test_dict.dic') } | ||
|
||
subject { described_class.new(file).parse } | ||
|
||
it 'returns hash' do | ||
expect(subject).to be_a_kind_of(Hash) | ||
end | ||
|
||
it 'returns reference result' do | ||
expect(subject).to eq reference_dic_result | ||
end | ||
|
||
context 'personal dictionary' do | ||
let(:file) { File.new('./spec/dictionaries/test_personal.dic') } | ||
|
||
it 'returns reference result' do | ||
expect(subject).to eq reference_personal_dic_result | ||
end | ||
end | ||
end |
Uh oh!
There was an error while loading. Please reload this page.