Skip to content

non ascii phrases aren't correctly determined #6

@Tarnak-public

Description

@Tarnak-public

When using custom model with non English phrases (exactly Polish words with accents) I had problems with correct classifying texts using is_spam().
As a workaround I've used accents remover during train and checking( code: https://gist.github.com/AdoHaha/a76157c6de5155bf6b0adc77988724d9 ) which works great.
So, could you add normalizing parameter into code or fix accents somehow?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions