Incorrect sentence parsing using ja_core_news_trf #12106
-
I'm not sure if this is the proper place to report this, and this is the first time that I've seen something like this, but I wanted to create an issue in case this was something that should in fact be reported. How to reproduce the behaviour
outputs:
Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
In general issues like this fall under #3052, which basically amounts to "the models make mistakes sometimes". If the mistake is common and follows a clear pattern that might point to a fixable issue. In this case, there does seem to be something weird about how compound verbs are handled, so we'll take a closer look at that. Note that if your goal is actually just sentence segmentation for Japanese, you should get high quality results with a punctuation-based sentencizer instead of relying on the default sentence boundaries, which are based on the parse tree. |
Beta Was this translation helpful? Give feedback.
In general issues like this fall under #3052, which basically amounts to "the models make mistakes sometimes". If the mistake is common and follows a clear pattern that might point to a fixable issue. In this case, there does seem to be something weird about how compound verbs are handled, so we'll take a closer look at that.
Note that if your goal is actually just sentence segmentation for Japanese, you should get high quality results with a punctuation-based sentencizer instead of relying on the default sentence boundaries, which are based on the parse tree.