-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can pydictor find and remove non-UTF-8 characters? #34
Comments
Currently not contains this function, you can use other tool to finish it, such as |
Thanks for answer. I applied the iconv tool with this command iconv -f utf-8 -t utf-8 -c test.txt -o clean_test.txt. But i got the following error:
|
I add the filter printable character tool for pydictor just now. |
Thank you for your help. |
I fixed the bug. |
try |
That's due to "memory remove duplicate file lines by preserving order" caused. |
maybe you can try version |
Download the latest version |
OK. I'm experimenting playing with the variables. I would like to ask you if there is a way for pydictor to show the progress in %, or in lines processed. Any indication of the progress made by Pydictor would be very helpful for troubleshooting and also benchmarking the performance effect of different variables :) |
So after a lot of testing. Reducing this variable |
可以加个限制字典生成行数的功能不 |
Can pydictor find and remove non-UTF-8 characters?
The text was updated successfully, but these errors were encountered: