Skip to content

Commit

Permalink
Merge #284
Browse files Browse the repository at this point in the history
284: Update README.md r=irevoire a=ManyTheFish



Co-authored-by: Many the fish <[email protected]>
  • Loading branch information
meili-bors[bot] and ManyTheFish authored Apr 18, 2024
2 parents 6cf2af4 + 0547264 commit 5a64163
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion charabia/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Charabia provides a simple API to segment, normalize, or tokenize (segment + nor
| **Latin** | ✅ CamelCase segmentation |[compatibility decomposition](https://unicode.org/reports/tr15/) + lowercase + [nonspacing-marks](https://www.compart.com/en/unicode/category/Mn) removal + `Ð vs Đ` spoofing normalization | 🟩 ~23MiB/sec | 🟨 ~9MiB/sec |
| **Greek** ||[compatibility decomposition](https://unicode.org/reports/tr15/) + lowercase + final sigma normalization | 🟩 ~27MiB/sec | 🟨 ~8MiB/sec |
| **Cyrillic** - **Georgian** ||[compatibility decomposition](https://unicode.org/reports/tr15/) + lowercase | 🟩 ~27MiB/sec | 🟨 ~9MiB/sec |
| **Chinese** **CMN** 🇨🇳 |[jieba](https://github.com/messense/jieba-rs) |[compatibility decomposition](https://unicode.org/reports/tr15/) + pinyin conversion | 🟨 ~10MiB/sec | 🟧 ~5MiB/sec |
| **Chinese** **CMN** 🇨🇳 |[jieba](https://github.com/messense/jieba-rs) |[compatibility decomposition](https://unicode.org/reports/tr15/) + kvariant conversion | 🟨 ~10MiB/sec | 🟧 ~5MiB/sec |
| **Hebrew** 🇮🇱 ||[compatibility decomposition](https://unicode.org/reports/tr15/) + [nonspacing-marks](https://www.compart.com/en/unicode/category/Mn) removal | 🟩 ~33MiB/sec | 🟨 ~11MiB/sec |
| **Arabic** |`ال` segmentation |[compatibility decomposition](https://unicode.org/reports/tr15/) + [nonspacing-marks](https://www.compart.com/en/unicode/category/Mn) removal + [Tatweel, Alef, Yeh, and Taa Marbuta normalization] | 🟩 ~36MiB/sec | 🟨 ~11MiB/sec |
| **Japanese** 🇯🇵 |[lindera](https://github.com/lindera-morphology/lindera) IPA-dict |[compatibility decomposition](https://unicode.org/reports/tr15/) | 🟧 ~3MiB/sec | 🟧 ~3MiB/sec |
Expand Down

0 comments on commit 5a64163

Please sign in to comment.