feature(yzj): add multi-agent and structured observation env (GoBigger)#39
feature(yzj): add multi-agent and structured observation env (GoBigger)#39jayyoung0802 wants to merge 59 commits intoopendilab:mainfrom
Conversation
| discrete_action_encoding_type: str = 'one_hot', | ||
| norm_type: Optional[str] = 'BN', | ||
| res_connection_in_dynamics: bool = False, | ||
| state_encoder=None, |
There was a problem hiding this comment.
增加state_encoder的Type Hints以及相应的arguments注释
There was a problem hiding this comment.
https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈
| beg_index = observation_shape * step_i | ||
| end_index = observation_shape * (step_i + self._cfg.model.frame_stack_num) | ||
| obs_target_batch_new[k] = v[:, beg_index:end_index] | ||
| network_output = self._learn_model.initial_inference(obs_target_batch_new) |
There was a problem hiding this comment.
上面对结构化观察的处理或许可以抽象为一个函数
zoo/petting_zoo/model/model.py
Outdated
| self.encoder = FCEncoder(obs_shape=18, hidden_size_list=[256, 256], activation=nn.ReLU(), norm_type=None) | ||
|
|
||
| def forward(self, x): | ||
| x = x['agent_state'] |
There was a problem hiding this comment.
增加注释,为什么是agent_state,x中包含哪些key,每一项的含义是什么
| from pettingzoo.mpe._mpe_utils.simple_env import SimpleEnv, make_env | ||
| from pettingzoo.mpe.simple_spread.simple_spread import Scenario | ||
| from PIL import Image | ||
| import pygame |
| tmp[k] = v[i] | ||
| tmp['action_mask'] = [1 for _ in range(*self._action_dim)] | ||
| ret_transform.append(tmp) | ||
| return {'observation': ret_transform, 'action_mask': action_mask, 'to_play': to_play} |
There was a problem hiding this comment.
关于'observation'的详细注释加在_process_obs()方法的overview中
| last_game_priorities = [[None for _ in range(agent_num)] for _ in range(env_nums)] | ||
| # for priorities in self-play | ||
| search_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)] | ||
| pred_values_lst = [[[] for _ in range(agent_num)] for _ in range(env_nums)] |
There was a problem hiding this comment.
这样出现多次的代码段,或许可以抽象为class的一个工具函数
zoo/petting_zoo/config/__init__.py
Outdated
| @@ -0,0 +1 @@ | |||
| from .ptz_simple_spread_ez_config import main_config, create_config | |||
There was a problem hiding this comment.
所有lz中的petting_zoo换成pettingzoo或许更加简洁
| self.base_idx = 0 | ||
| self.clear_time = 0 | ||
|
|
||
| self.tmp_obs = None # for value obs list [46 + 4(td_step)] not < 50(game_segment) |
| m_obs = value_obs_list[beg_index:end_index] | ||
| m_obs = sum(m_obs, []) | ||
| m_obs = default_collate(m_obs) | ||
| m_obs = to_device(m_obs, self._cfg.device) |
| discrete_action_encoding_type: str = 'one_hot', | ||
| norm_type: Optional[str] = 'BN', | ||
| res_connection_in_dynamics: bool = False, | ||
| state_encoder=None, |
There was a problem hiding this comment.
https://aicarrier.feishu.cn/wiki/N4bqwLRO5iyQcAkb4HCcflbgnpR 可以参考这里的提示词优化注释哈
| """ | ||
| Overview: | ||
| The policy class for Multi Agent EfficientZero. | ||
| """ |
There was a problem hiding this comment.
说明目前的Multi Agent算法与单agent算法的区别,概述一下目前的indepent learning的实现方式。
| ) | ||
| # NOTE: Convert the ``action_index_in_legal_action_set`` to the corresponding ``action`` in the entire action set. | ||
| action = np.where(action_mask[i] == 1.0)[0][action_index_in_legal_action_set] | ||
| output[i // agent_num]['action'].append(action) |
| """ | ||
| Overview: | ||
| The policy class for Multi Agent MuZero. | ||
| """ |
| from ding.utils import ENV_REGISTRY, deep_merge_dicts | ||
| import math | ||
| from easydict import EasyDict | ||
| try: |
There was a problem hiding this comment.
加一下GoBigger原来仓库的链接,以及这里与其的区别吧?
There was a problem hiding this comment.
try except中加了链接
|
|
||
| main_config = dict( | ||
| exp_name= | ||
| f'data_mz_ctree/{env_name}_muzero_ns{num_simulations}_upc{update_per_collect}_rr{reanalyze_ratio}_seed{seed}', |
There was a problem hiding this comment.
目前这里的ptz_simple_spread_mz性能是如何的呀?如果不太好,先把ptz相关的去掉吧
| max_env_step: Optional[int] = int(1e10), | ||
| ) -> 'Policy': # noqa | ||
| """ | ||
| Overview: |
There was a problem hiding this comment.
因为需要单独传encoder
| @@ -47,12 +47,12 @@ def train_muzero( | |||
| """ | |||
There was a problem hiding this comment.
合并一下main分支,将mz ez的相关基线结果加在PR的description里面。然后优化好后新建一个分支 multi-agent, push到opendilab/lightzero 上去,在这个PR后面写一下,最新的稳定代码放在了 multi-agent 这个分支上面。
No description provided.