Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

表記ゆれの対応をrelease/v0.2.0-rc.2にマージ #545

Merged
merged 14 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions core/src/adapter/orthographical_variant_adapter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ pub enum OrthographicalVariant {
濱,
祗,
曾,
國,
鉋,
鷆,
斑,
櫻,
櫟,
冨,
}

impl OrthographicalVariant {
Expand Down Expand Up @@ -61,6 +68,13 @@ impl OrthographicalVariant {
OrthographicalVariant::濱 => &['濱', '浜'],
OrthographicalVariant::祗 => &['祗', '祇'],
OrthographicalVariant::曾 => &['曾', '曽'],
OrthographicalVariant::國 => &['國', '国'],
OrthographicalVariant::鉋 => &['鉋', '飽'],
OrthographicalVariant::鷆 => &['鷆', '鷏'],
OrthographicalVariant::斑 => &['斑', '班'],
OrthographicalVariant::櫻 => &['櫻', '桜'],
OrthographicalVariant::櫟 => &['櫟', '擽'],
OrthographicalVariant::冨 => &['冨', '富'],
}
}

Expand Down
7 changes: 7 additions & 0 deletions core/src/tokenizer/read_town.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,13 @@ fn find_town(input: &str, candidates: &Vec<String>) -> Option<(String, String)>
OrthographicalVariant::濱,
OrthographicalVariant::祗,
OrthographicalVariant::曾,
OrthographicalVariant::國,
OrthographicalVariant::鉋,
OrthographicalVariant::鷆,
OrthographicalVariant::斑,
OrthographicalVariant::櫻,
OrthographicalVariant::櫟,
OrthographicalVariant::冨,
],
};
if let Some(result) = adapter.apply(input, candidate) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,12 @@ address,prefecture,city,town,rest
神奈川県鎌倉市山ノ内189,神奈川県,鎌倉市,山ノ内,189
神奈川県鎌倉市山の内189,神奈川県,鎌倉市,山ノ内,189
神奈川県鎌倉市山之内189,神奈川県,鎌倉市,山ノ内,189
# 「上氷鉋」と「上氷飽」の表記揺れへの対応
長野県長野市川中島町上氷鉋1368,長野県,長野市,川中島町上氷鉋,1368
長野県長野市川中島町上氷飽1368,長野県,長野市,川中島町上氷鉋,1368
# 「斑目」と「班目」の表記揺れへの対応
神奈川県南足柄市班目639,神奈川県,南足柄市,班目,639
神奈川県南足柄市斑目639,神奈川県,南足柄市,班目,639
# 「櫟」と「擽」の表記ゆれへの対応
兵庫県南あわじ市松帆櫟田196,兵庫県,南あわじ市,松帆櫟田,196
兵庫県南あわじ市松帆擽田196,兵庫県,南あわじ市,松帆櫟田,196
12 changes: 12 additions & 0 deletions tests/test_data/異字体旧字体への対応.csv
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,15 @@ address,prefecture,city,town,rest
# 「小曾根」と「小曽根」の表記ゆれへの対応
埼玉県熊谷市小曽根1220,埼玉県,熊谷市,小曽根,1220
埼玉県熊谷市小曾根1220,埼玉県,熊谷市,小曽根,1220
# 「神代國衙」と「神代国衙」の表記揺れへの対応
兵庫県南あわじ市神代國衙1680,兵庫県,南あわじ市,神代國衙,1680
兵庫県南あわじ市神代国衙1680,兵庫県,南あわじ市,神代國衙,1680
# 「鷏和」と「鷆和」の表記揺れへの対応
兵庫県赤穂市鷏和422,兵庫県,赤穂市,鷏和,422
兵庫県赤穂市鷆和422,兵庫県,赤穂市,鷏和,422
# 「南桜」と「南櫻」の表記揺れへの対応
滋賀県野洲市南桜1792,滋賀県,野洲市,南櫻,1792
滋賀県野洲市南櫻1792,滋賀県,野洲市,南櫻,1792
# 「富」と「冨」の表記ゆれへの対応
兵庫県神崎郡神河町吉冨88番地10号,兵庫県,神崎郡神河町,吉冨,88番地10号
兵庫県神崎郡神河町吉富88番地10号,兵庫県,神崎郡神河町,吉冨,88番地10号
Loading