-
Notifications
You must be signed in to change notification settings - Fork 79
Newly merged code shows as as chinese text #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, you're right, it's related to Propeller Tool saving data as UTF-16. That's because my judgement call 15 years ago was wrong - I predicted that the world would settle on UTF-16, or at-least support it naturally, as it seemed was already happening with simple tools like Windows Notepad. GitHub doesn't seem to process UTF-16 in their quick view, but does seem to in their RAW view. Also, it's difficult (in my experience) to get Git to play nicely with UTF-16; I haven't found a way to tell Git to that all .spin files are always text files that may be encoded in either ANSI or UTF-16. Unless we can find a proven-to-work-in-all-cases solution, I think the best solution (since I'm building newer Propeller Tool versions right now to support P2) is to add support to store in either ANSI or UTF-8 format (as needed) by default and to automatically open and convert UTF-16 to UTF-8 (with prompting on saving). It would mean that future .spin files may not be understandable by old Propeller Tools, and everyone would need to update if they want to continue with Propeller Tool. What do you think? |
My thoughts in a nutshell You can add a .gitattribute file in the root of the propeller repo and have this then handle conversion automatically like with line endings. Not sure if github already supports this. I have not tried propTool (can not install on the only, work supplied, windows machine I have), but do you save P2 files as .spin2 and P1 files as .spin? WRT to propTool:
EDIT: rational for UTF-8 is to still support the propeller font used in a lot of file for simple circuit/signal drawings. |
Thank you, @rosco-pc. I'm considering this and will experiment. I've used the working-tree-encoding feature and found that it didn't solve the problem in the way I expected; specifically (if I'm remembering right), it treats all files that match the expression (ie: *.spin) as UTF-16 that needs to be encoded internally at UTF-8 and re-converted to UTF-16 upon checkout, but that damages ANSI-encoded .spin files (as .spin files are ANSI unless it contains a non-ANSI character). I think what you are saying either already acknowledges that, or accommodates for that, by suggesting that all .spin entries in this repo be converted to UTF-16 for now and future use, along with the working-tree-encoding attributes added to the repo (and tested with GitHub) which would make it a smooth operation for GitHub viewing and Propeller Tool use. Ideally, this would be assisted by a custom smudge filter (I've never written one) that would do the conversion automatically to ensure UTF-16 .spin files are input. Actually... that may be the best solution overall... a custom smudge filter that understands that .spin files could be either ANSI or UTF-16 and it detects and converts as necessary in both directions. This would make it seamlessly handle the situation and could even be made to detect UTF-8 .spin and .spin2 files as well. |
I'm no expert on git or anything like it, but would it be possible to detect the byte order mark? ANSI/ASCII have no BOM, UTF-16 |
I'm far from a git expert as well, use it for work and now and then I still need to start with a fresh checkout as I messed up :P. UTF files can be stored with or without BOM. UTF16 files without BOM will be treated as binary files by git though. However I'm not sure this is needed as the git attribute as discussed seems to do the right thing
I tested it with a test repo: https://github.com/rosco-pc/test-utf-handling.git, which keeps the file in the right format and does not display them as something else. Checking out the repo keeps the format, although checking out on windows seems to add a BOM to the non-BOM UTF16 file original.zip Edit 2: mmm downloading the zip file shows all files as UTF16LE + BOM. But I do not see any corruption and the file still compiles with openSpin. |
Community Libraries that have been merged in May show up as chinese text in github's code view.

Looking at the raw file it looks OK (apart from not having the propeller font selected)

I assume this is related to propTool saving data in UTF-16
The text was updated successfully, but these errors were encountered: