Skip to content

The cpu memory keeps increasing, and when it is full, an error got! #189

Open
@nizhihao

Description

@nizhihao

Hi, When I try to train this code with python main.py --train in my local computer. It's always happen that the cpu memory keeps increasing and when it is full, an error is reported. I think this problem cause by the memory out.

Traceback (most recent call last):
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/user/nzh_projects/kaggle-environments-master/kaggle_agent/HandyRL/handyrl/connection.py", line 190, in _receiver
data, cnt = conn.recv()
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/user/nzh_projects/kaggle-environments-master/kaggle_agent/HandyRL/handyrl/connection.py", line 175, in _sender
conn.send(next(self.send_generator))
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
self._send(header)
File "/home/user/.conda/envs/spinningup_nzh/lib/python3.6/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Because my Memory only have 32G, so i have try to turn the max_episodes to the 200000. when the number of the sample at 20w, the memoey use_rate about 95%, but this code still occupy the more cpu memory with continuous sampling. In my Option, I think when the sample number more than max_episodes, the replay buffer will delete the old sample. but i dont know why the memory still increase.

could you give me some ideas. and how can i solve this problem(not restart). Thanks very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions