Skip to content
This repository has been archived by the owner on Jun 9, 2021. It is now read-only.

Only Apple's Tensorflow: Segmentation fault 11 / Abort trap 6 when running KataGo's training loop #258

Open
MarkTakken opened this issue May 12, 2021 · 1 comment

Comments

@MarkTakken
Copy link

Hi. I have upgraded the source code of KataGo (a Go-playing program, github.com/lightvector/KataGo) to Tensorflow 2, and the resulting code (github.com/MarkTakken/KataGoTF2MacOS) works fine with the official Tensorflow 2.4.0. However, when I try to run the training loop with Apple's Tensorflow, I get a "Segmentation fault: 11" or occasionally an "Abort trap: 6" error. You can recreate the error by running the following in the terminal (which will run smoothly with the official Tensorflow but crash with Apple's Tensorflow):

git clone https://github.com/MarkTakken/KataGoTF2MacOS.git
cd KataGoTF2MacOS
python/selfplay/train.sh TestRun testruntraining b6c96 128 trainonly

I have also found that the error is thrown in line 725 of train.py, that is, when the actual training begins. As this appears to be an issue solely with tensorflow_macos, I would greatly appreciate it if you could help identify and fix the problem. Thank you in advance.

@MarkTakken
Copy link
Author

P.S. The test data (produced from selfplay) that this trains on is at TestRun/shuffleddata/current/train/data0.tfrecord.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant