-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrokenPipeError: [Errno 32] Broken pipe #184
Comments
It seems that when running "python main.py -- worker" and "python main.py --train-server" on the same computer, the "worker" process will take up a lot of memory so that this kind of error occurs. In my last running time, this problem came out after 129400 episodes, and the "worker" took up about 50GB mem, and the "train-server" took up about 10GB mem.
|
Massively thank you for your report. We are considering enabling to select training scheme and avoid storing unused models. |
Thanks for your reply! But I just used the original GeeseNet... |
Thanks. That's a strange case... It's nothing wrong 129k episodes occupied 10GB in trainer process. |
Hi @han-x ! Could you give me an information like below to consider from various perspectives?
Thanks! |
|
I have noticed a possible cause from your stacktraces. Are you using the codes of current master branch? I think there are some differences between your script and script in master branch. The similar error happened before and we solved it in #145. Could you check it? And update your code if old code is used. Thanks. |
When training to 189 epoch, the training was interrupted in a server.
It seems OK on my own computer with the same config.
yaml
train_args:
turn_based_training: False
observation: True
gamma: 0.8
forward_steps: 32
compress_steps: 4
entropy_regularization: 2.0e-3
entropy_regularization_decay: 0.3
update_episodes: 300
batch_size: 400
minimum_episodes: 10000
maximum_episodes: 250000
num_batchers: 7
eval_rate: 0.1
worker:
num_parallel: 6
lambda: 0.7
policy_target: 'UPGO' # 'UPGO' 'VTRACE' 'TD' 'MC'
value_target: 'TD' # 'VTRACE' 'TD' 'MC'
seed: 0
restart_epoch: 0
worker_args:
server_address: ''
num_parallel: 6
The text was updated successfully, but these errors were encountered: