Tatar language data quality issues

Hello, 

Where is this data for Tatar language is coming from?

I see a lot of garbage there, I barely found a Tatar words [here](https://github.com/tesseract-ocr/langdata_lstm/blob/main/tat/tat.wordlist).

I would like to improve this. 
1. do you have some page with guidance how to train the model? 
2. once I train it, should I create a PR with just model itself to that repo? where are storing raw data for training?

follow up for https://github.com/tesseract-ocr/langdata/issues/305

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tatar language data quality issues #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tatar language data quality issues #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions