detect_multiple_languages_of() does not work at all for mixed English, Chinese and Japanese

You can easily test out sentences like the following:

`The name of that celebrity is 王菲`
everything will be classified as English (you can try any Chinese name or any English prefix sentence, it will always get it wrong).

`这首歌的名字叫 ロミオとシンデレラ`
`The name of the song is ロミオとシンデレラ`
everything will be classified as Japanese. FYI, Japanese character set does include almost all traditional Chinese character set, however, a person or a language model can easily tell the boundary between the two chunks, especially if the first chunk is in English.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

detect_multiple_languages_of() does not work at all for mixed English, Chinese and Japanese #219

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

detect_multiple_languages_of() does not work at all for mixed English, Chinese and Japanese #219

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions