Skip to content

Commit 46c9799

Browse files
committed
add optional tagger arg to compute_readability
1 parent a4d324c commit 46c9799

File tree

4 files changed

+107
-61
lines changed

4 files changed

+107
-61
lines changed

README.md

+20-4
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ print(score) # 5.596333333333334
4949

5050
Note that this readability calculator is specifically for <u>non-native speakers</u> learning to read Japanese. This is not to be confused with something like grade level or other readability scores meant for native speakers.
5151

52-
### Equation
52+
### Model
5353

5454
```
5555
readability = {mean number of words per sentence} * -0.056
@@ -62,8 +62,24 @@ readability = {mean number of words per sentence} * -0.056
6262

6363
*\* "kango" (漢語) means Japanese word of Chinese origin while "wago" (和語) means native Japanese word.*
6464

65-
---
66-
6765
#### Note on model consistency
6866

69-
The readability scores produced by this python package tend to differ slightly from the scores produced on the official [jreadability website](https://jreadability.net/sys/en). This is likely due to the version difference in UniDic between these two implementations as this package uses UniDic 2.1.2 while theirs uses UniDic 2.2.0. This issue will hopefully be resolved in the future.
67+
The readability scores produced by this python package tend to differ slightly from the scores produced on the official [jreadability website](https://jreadability.net/sys/en). This is likely due to the version difference in UniDic between these two implementations as this package uses UniDic 2.1.2 while theirs uses UniDic 2.2.0. This issue will hopefully be resolved in the future.
68+
69+
## Batch processing
70+
71+
jreadability makes use of [fugashi](https://github.com/polm/fugashi)'s tagger under the hood and initializes a new tagger everytime `compute_retrievability` is invoked. If you are processing a large number of texts, it is recommended to initialize the tagger first on your own, then pass it as an argument to each subsequent `compute_retrievability` call.
72+
73+
```python
74+
from fugashi import Tagger
75+
76+
texts = [...]
77+
78+
tagger = Tagger()
79+
80+
for text in texts:
81+
82+
score = compute_readability(text, tagger) # fast :D
83+
#score = compute_readability(text) # slow :'(
84+
...
85+
```

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "jreadability"
7-
version = "1.0.1"
7+
version = "1.1.0"
88
description = "Calculate readability scores for Japanese texts."
99
readme = "README.md"
1010
authors = [{ name = "Joshua Hamilton", email = "[email protected]" }]

src/jreadability/jreadability.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,25 @@
55
There are no other public functions, classes or variables.
66
"""
77

8-
import fugashi
9-
from typing import List
8+
from fugashi import Tagger
9+
from typing import List, Optional
1010
from fugashi.fugashi import UnidicNode
1111

12-
def compute_readability(text: str) -> float:
12+
def compute_readability(text: str, tagger: Optional[Tagger]=None) -> float:
1313
"""
1414
Computes the readability of a Japanese text.
1515
1616
Args:
1717
text (str): The text to be scored.
18+
tagger (Optional[Tagger]): The fugashi parser used to parse the text.
1819
1920
Returns:
2021
float: A float representing the readability score of the text.
2122
"""
2223

23-
# initialize mecab parser
24-
tagger = fugashi.Tagger()
24+
if tagger is None:
25+
# initialize mecab parser
26+
tagger = Tagger()
2527

2628
doc = tagger(text)
2729

0 commit comments

Comments
 (0)