japanese_numbers

A parser for Japanese number (Kanji, arabic) in the natural language.

The module japanese_numbers finds any numbers in the natural language, and converts to arabic numerals. The followings are example patterns what can be parsed.

二千万百一円
5百万
一を聞いて十を知る
五〇六号室

Installation

pip install japanese-numbers-python

Usage

Function to_arabic and to_arabic_numbers are almost stable.

to_arabic returns An array of [japanese_numbers.result.ParsedResult].

import japanese_numbers

japanese_numbers.to_arabic('銀河の向こう、六千三百二十一億千五百十一万二千百八十一光年彼方。')
# => [<ParsedResult 632115112181 : "六千三百二十一億千五百十一万二千百八十一" index=7>]

japanese_numbers.to_arabic('一を聞いて十を知る。')
# => [<ParsedResult 1 : "一" index=0>, <ParsedResult 10 : "十" index=5>]

Then you can see a numeric value (and others) in the instance of ParsedResult like as follows:

result = japanese_numbers.to_arabic('一を聞いて十を知る。')

result[0].number # => 1
result[0].text   # => '一'
result[0].index  # => 0 as position that number was found

result[1].number # => 10
result[1].text   # => '十'
result[1].index  # => 5

to_arabic_numbers returns a tuple of numbers directly.

import japanese_numbers

japanese_numbers.to_arabic_numbers('一を聞いて十を知る。')
# => (1, 10)

Charsets

Both to_arabic_numbers, to_arabic get encode option to specify encode of input.

It's utf8 by default, if you put non-unicode string into functions, it will be converted to unicode by using its encode first.

japanese_numbers.to_arabic_numbers('一を聞いて十を知る。')  # utf8 by default
japanese_numbers.to_arabic('一を聞いて十を知る。', encode='eucjp')  # set another charset

TODO

support float/double types
support negative types

Patch

Welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.circleci		.circleci
japanese_numbers		japanese_numbers
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

japanese_numbers

Installation

Usage

Charsets

TODO

Patch

About

Releases

Packages

Languages

License

takumakanari/japanese-numbers-python

Folders and files

Latest commit

History

Repository files navigation

japanese_numbers

Installation

Usage

Charsets

TODO

Patch

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages