Skip to content
This repository has been archived by the owner on Jun 16, 2021. It is now read-only.

handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先' #17

Open
greatken999 opened this issue Feb 15, 2017 · 5 comments

Comments

@greatken999
Copy link

when i run "python3 main.py preprocess data/other/tmp/wuqingyuan/" get this error info:
366 sgfs found.
Estimated number of chunks: 17
Traceback (most recent call last):
File "main.py", line 94, in
argh.dispatch(parser)
File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/usr/local/lib/python3.5/dist-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "main.py", line 49, in preprocess
test_chunk, training_chunks = parse_data_sets(*data_sets)
File "/mnt/ken-volume/MuGo/load_data_sets.py", line 140, in parse_data_sets
test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions)
File "/mnt/ken-volume/MuGo/load_data_sets.py", line 60, in split_test_training
positions_w_context = list(positions_w_context)
File "/mnt/ken-volume/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf
for position_w_context in replay_sgf(f.read()):
File "/mnt/ken-volume/MuGo/sgf_wrapper.py", line 124, in replay_sgf
handicap=int(sgf_prop(props.get('HA', [0]))),
ValueError: invalid literal for int() with base 10: '吴受先'

it's look same sgf file props.ge('HA',[0]) get a string ,not a int.

@brilee
Copy link
Owner

brilee commented Feb 22, 2017

Can you give me an example of the sgf file that it's running into issues on?

I suspect it's an sgf file that violates the standards, so having the file itself would be useful to be able to reproduce and verify the fix.

@greatken999
Copy link
Author

most sgff file use gb18030 codec in china ,so i changed load_data_sets.py :
line 48

wqy00.zip

#with open(file) as f:
 with open(file,'rt',encoding='gb18030',errors='iqnore') as f:

to fix bug 👍 :
366 sgfs found.
Estimated number of chunks: 17
Traceback (most recent call last):
File "main.py", line 94, in
argh.dispatch(parser)
File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/usr/lib/python3.5/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "main.py", line 49, in preprocess
test_chunk, training_chunks = parse_data_sets(*data_sets)
File "/home/ken/ai/go/MuGo/load_data_sets.py", line 140, in parse_data_sets
test_chunk, training_chunks = split_test_training(positions_w_context, est_num_positions)
File "/home/ken/ai/go/MuGo/load_data_sets.py", line 60, in split_test_training
positions_w_context = list(positions_w_context)
File "/home/ken/ai/go/MuGo/load_data_sets.py", line 52, in get_positions_from_sgf
for position_w_context in replay_sgf(f.read()):
File "/usr/lib64/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 5: invalid continuation byte

@brilee
Copy link
Owner

brilee commented Feb 24, 2017

Oh.. ugh, this makes me sad.
So, the SGF file should declare that its encoding is GB18030; I can't just assume it. Most western-generated SGFs assume UTF-8, so putting in this new assumption would just break the other half of SGFs.

The other issue is that the HA property should be a number http://www.red-bean.com/sgf/go.html#types , not "Wu played first", even though that was the convention back then. I can't really ask you to go fix whatever SGF editor created these files, though, so I think the best I could do is just have a try-except to try different encodings.

@greatken999
Copy link
Author

Yes,I fix this bug changed sgf_wrapper.py to 👍
try:
metadata = GameMetadata(
result=sgf_prop(props.get('RE')),
handicap=int(sgf_prop(props.get('HA', [0]))),
board_size=19)

except:
metadata = GameMetadata(
result=sgf_prop(props.get('RE')),
handicap=0,
board_size=19)
f=open("./error.txt",'a')
traceback.print_exc(file=f)
f.flush()
f.close()

@greatken999
Copy link
Author

greatken999 commented Feb 24, 2017

Hi brilee:

encoding bug fixed , tested ok both utf-8 and GB18030 sgf files.
need rum "pip3 install cchardet" to install cchardet modulle first

change load_data_sets.py line 48 to:
import cchardet as chardet

def get_positions_from_sgf(file):
with open(file,'rb') as f:
result = chardet.detect(f.read())['encoding']
f.close
with open(file,'rt',encoding=result,errors='iqnore') as f:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants