Skip to content

wrong content resegmented #18

@drupchen

Description

@drupchen

bug

here is the Tibetan input: "ཨ་ར།"
What seems to happen is that when updating with the new content, the first token gets deleted here, yet when segmenting the new content here, no new content is given to the tokenizer. Instead, the content of the first remaining token is given to be resegmented.

This is how we end up with two punctuation tokens in the end.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions