Skip to content

UnicodeEncodeError: 'charmap' codec can't encode characters in position 578-694: character maps to <undefined> #429

@ryntml

Description

@ryntml

I am currently trying to do a training on Ottoman Turkish. This language consists of a mixture of the Arabic alphabet and the Persian alphabet. I created all the datasets, the moment I run train.py I get the following error:

Screenshot 2024-10-04 212521

A small example from labels.txt:

Screenshot 2024-10-04 213535

Even though I do UTF-8 encoding, I still get errors.

There is this problem with the characters:

This language, like Arabic, is written differently at the beginning, middle and end, and that's why I wrote all the characters.
For example, I added 3 spellings of the letter Noon.
Could this cause a problem? Does anyone know?
Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions