Skip to content

Commit ef2a358

Browse files
Jinyu-WysqyangysqyangyaqiuJinyu Wang
authored
V0.3: Upgrade RL Workflow; Add RL Benchmarks; Update Package Version (#588)
* call policy update only for AbsCorePolicy * add limitation of AbsCorePolicy in Actor.collect() * refined actor to return only experiences for policies that received new experiences * fix MsgKey issue in rollout_manager * fix typo in learner * call exit function for parallel rollout manager * update supply chain example distributed training scripts * 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype * reformat render * fix supply chain business engine action type problem * reset supply chain example render figsize from 4 to 3 * Add render to all modes of supply chain example * fix or policy typos * 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes * refined parallel policy manager * updated rl/__init__/py * fixed lint issues and CIM local learner bugs * deleted unwanted supply_chain test files * revised default config for cim-dqn * removed test_store.py as it is no longer needed * 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms * updated figures * removed unwanted import * refactored CIM-DQN example * added MultiProcessRolloutManager and MultiProcessTrainingManager * updated doc * lint issue fix * lint issue fix * fixed import formatting * [Feature] Prioritized Experience Replay (#355) * added prioritized experience replay * deleted unwanted supply_chain test files * fixed import order * import fix * fixed lint issues * fixed import formatting * added note in docstring that rank-based PER has yet to be implemented Co-authored-by: ysqyang <[email protected]> * rm AbsDecisionGenerator * small fixes * bug fix * reorganized training folder structure * fixed lint issues * fixed lint issues * policy manager refined * lint fix * restructured CIM-dqn sync code * added policy version index and used it as a measure of experience staleness * lint issue fix * lint issue fix * switched log_dir and proxy_kwargs order * cim example refinement * eval schedule sorted only when it's a list * eval schedule sorted only when it's a list * update sc env wrapper * added docker scripts for cim-dqn * refactored example folder structure and added workflow templates * fixed lint issues * fixed lint issues * fixed template bugs * removed unused imports * refactoring sc in progress * simplified cim meta * fixed build.sh path bug * template refinement * deleted obsolete svgs * updated learner logs * minor edits * refactored templates for easy merge with async PR * added component names for rollout manager and policy manager * fixed incorrect position to add last episode to eval schedule * added max_lag option in templates * formatting edit in docker_compose_yml script * moved local learner and early stopper outside sync_tools * refactored rl toolkit folder structure * refactored rl toolkit folder structure * moved env_wrapper and agent_wrapper inside rl/learner * refined scripts * fixed typo in script * changes needed for running sc * removed unwanted imports * config change for testing sc scenario * changes for perf testing * Asynchronous Training (#364) * remote inference code draft * changed actor to rollout_worker and updated init files * removed unwanted import * updated inits * more async code * added async scripts * added async training code & scripts for CIM-dqn * changed async to async_tools to avoid conflict with python keyword * reverted unwanted change to dockerfile * added doc for policy server * addressed PR comments and fixed a bug in docker_compose_yml.py * fixed lint issue * resolved PR comment * resolved merge conflicts * added async templates * added proxy.close() for actor and policy_server * fixed incorrect position to add last episode to eval schedule * reverted unwanted changes * added missing async files * rm unwanted echo in kill.sh Co-authored-by: ysqyang <[email protected]> * renamed sync to synchronous and async to asynchronous to avoid conflict with keyword * added missing policy version increment in LocalPolicyManager * refined rollout manager recv logic * removed a debugging print * added sleep in distributed launcher to avoid hanging * updated api doc and rl toolkit doc * refined dynamic imports using importlib * 1. moved policy update triggers to policy manager; 2. added version control in policy manager * fixed a few bugs and updated cim RL example * fixed a few more bugs * added agent wrapper instantiation to workflows * added agent wrapper instantiation to workflows * removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet * fixed incorrect get_ac_policy signature for CIM * moved exploration inside core policy * added state to exploration call to support context-dependent exploration * separated non_rl_policy_index and rl_policy_index in workflows * modified sc example code according to workflow changes * modified sc example code according to workflow changes * added replay_agent_ids parameter to get_env_func for RL examples * fixed a few bugs * added maro/simulator/scenarios/supply_chain as bind mount * added post-step, post-collect, post-eval and post-update callbacks * fixed lint issues * fixed lint issues * moved instantiation of policy manager inside simple learner * fixed env_wrapper get_reward signature * minor edits * removed get_eperience kwargs from env_wrapper * 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows * added rollout exp disribution option in RL examples * removed unwanted files * 1. made logger internal in learner; 2 removed logger creation in abs classes * checked out supply chain test files from v0.2_sc * 1. added missing model.eval() to choose_action; 2.added entropy features to AC * fixed a bug in ac entropy * abbreviated coefficient to coeff * removed -dqn from job name in rl example config * added tmp patch to dev.df * renamed image name for running rl examples * added get_loss interface for core policies * added policy manager in rl_toolkit.rst * 1. env_wrapper bug fix; 2. policy manager update logic refinement * refactored policy and algorithms * policy interface redesigned * refined policy interfaces * fixed typo * fixed bugs in refactored policy interface * fixed some bugs * refactoring in progress * policy interface and policy manager redesigned * 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts * fixed bug in distributed policy manager * fixed lint issues * fixed lint issues * added scipy in setup * 1. trimmed rollout manager code; 2. added option to docker scripts * updated api doc for policy manager * 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script * 1. simplified rl example structure; 2. fixed lint issues * further rl toolkit code simplifications * more numpy-based optimization in RL toolkit * moved replay buffer inside policy * bug fixes * numpy optimization and associated refactoring * extracted shaping logic out of env_sampler * fixed bug in CIM shaping and lint issues * preliminary implemetation of parallel batch inference * fixed bug in ddpg transition recording * put get_state, get_env_actions, get_reward back in EnvSampler * simplified exploration and core model interfaces * bug fixes and doc update * added improve() interface for RLPolicy for single-thread support * fixed simple policy manager bug * updated doc, rst, notebook * updated notebook * fixed lint issues * fixed entropy bugs in ac.py * reverted to simple policy manager as default * 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit * fixed lint issues and updated rl toolkit images * removed obsolete images * added back agent2policy for general workflow use * V0.2 rl refinement dist (#377) * Support `slice` operation in ExperienceSet * Support naive distributed policy training by proxy * Dynamically allocate trainers according to number of experience * code check * code check * code check * Fix a bug in distributed trianing with no gradient * Code check * Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy * 1.call allocate_trainer() at first of update(); 2.refine according to code review * Code check * Refine code with new interface * Update docs of PolicyManger and ExperienceSet * Add images for rl_toolkit docs * Update diagram of PolicyManager * Refine with new interface * Extract allocation strategy into `allocation_strategy.py` * add `distributed_learn()` in policies for data-parallel training * Update doc of RL_toolkit * Add gradient workers for data-parallel * Refine code and update docs * Lint check * Refine by comments * Rename `trainer` to `worker` * Rename `distributed_learn` to `learn_with_data_parallel` * Refine allocator and remove redundant code in policy_manager * remove arugments in allocate_by_policy and so on * added checkpointing for simple and multi-process policy managers * 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager * added missing set_state and get_state for CIM policies * removed blank line * updated RL workflow README * Integrate `data_parallel` arguments into `worker_allocator` (#402) * 1. simplified workflow config; 2. added comments to CIM shaping * lint issue fix * 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading * 1. moved post_step callback inside env sampler; 2. updated README for rl workflows * refined READEME for CIM * VM scheduling with RL (#375) * added part of vm scheduling RL code * refined vm env_wrapper code style * added DQN * added get_experiences func for ac in vm scheduling * added post_step callback to env wrapper * moved Aiming's tracking and plotting logic into callbacks * added eval env wrapper * renamed AC config variable name for VM * vm scheduling RL code finished * updated README * fixed various bugs and hard coding for vm_scheduling * uncommented callbacks for VM scheduling * Minor revision for better code style * added part of vm scheduling RL code * refined vm env_wrapper code style * vm scheduling RL code finished * added config.py for vm scheduing * vm example refactoring * fixed bugs in vm_scheduling * removed unwanted files from cim dir * reverted to simple policy manager as default * added part of vm scheduling RL code * refined vm env_wrapper code style * vm scheduling RL code finished * added config.py for vm scheduing * resolved rebase conflicts * fixed bugs in vm_scheduling * added get_state and set_state to vm_scheduling policy models * updated README for vm_scheduling with RL Co-authored-by: yaqiu <[email protected]> Co-authored-by: Huoran Li <[email protected]> * SC refinement (#397) * Refine test scripts & pending_order_daily logic * Refactor code for better code style: complete type hint, correct typos, remove unused items. Refactor code for better code style: complete type hint, correct typos, remove unused items. * Polish test_supply_chain.py * update import format * Modify vehicle steps logic & remove outdated test case * Optimize imports * Optimize imports * Lint error * Lint error * Lint error * Add SupplyChainAction * Lint error Co-authored-by: Jinyu Wang <[email protected]> * refined workflow scripts * fixed bug in ParallelAgentWrapper * 1. fixed lint issues; 2. refined main script in workflows * lint issue fix * restored default config for rl example * Update rollout.py * refined env var processing in policy manager workflow * added hasattr check in agent wrapper * updated docker_compose_yml.py * Minor refinement * Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412) * Prepare to merge master_mirror * Lint error * Minor * Merge latest master into v0.3 (#426) * update docker hub init (#367) * update docker hub init * replace personal account with maro-team * update hello files for CIM * update docker repository name * update docker file name * fix bugs in notebook, rectify docs * fix doc build issue * remove docs from playground; fix citibike lp example Event issue * update the exampel for vector env * update vector env example * update README due to PR comments * add link to playground above MARO installation in README * fix some typos Co-authored-by: Jinyu Wang <[email protected]> * update package version * update README for package description * update image links for pypi package description * update image links for pypi package description * change the input topology schema for CIM real data mode (#372) * change the input topology schema for CIM real data mode * remove unused importing * update test config file correspondingly * add Exception for env test * add cost factors to cim data dump * update CimDataCollection field name * update field name of data collection related code * update package version * adjust interface to reflect actual signature (#374) Co-authored-by: Jeremy Reynolds <[email protected]> * update dataclasses requirement to setup * fix: fixing spelling grammarr * fix: fix typo spelling code commented and data_model.rst * Fix Geo vis IP address & SQL logic bugs. (#383) Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)). * Fix the "Wrong future stop tick predictions" bug (#386) * Propose my new solution Refine to the pre-process version . * Optimize import * Fix reset random seed bug (#387) * update the reset interface of Env and BE * Try to fix reset routes generation seed issue * Refine random related logics. * Minor refinement * Test check * Minor * Remove unused functions so far * Minor Co-authored-by: Jinyu Wang <[email protected]> * update package version * Add _init_vessel_plans in business_engine.reset (#388) * update package version * change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391) Co-authored-by: Jinyu Wang <[email protected]> * Refine `event_buffer/` module (#389) * Core & Business Engine code refinement (#392) * First version * Optimize imports * Add typehint * Lint check * Lint check * add higher python version (#398) * add higher python version * update pytorch version * update torchvision version Co-authored-by: Jinyu Wang <[email protected]> * CIM scenario refinement (#400) * Cim scenario refinement (#394) * CIM refinement * Fix lint error * Fix lint error * Cim test coverage (#395) * Enrich tests * Refactor CimDataGenerator * Refactor CIM parsers * Minor refinement * Fix lint error * Fix lint error * Fix lint error * Minor refactor * Type * Add two test file folders. Make a slight change to CIM BE. * Lint error * Lint error * Remove unnecessary public interfaces of CIM BE * Cim disable auto action type detection (#399) * Haven't been tested * Modify document * Add ActionType checking * Minor * Lint error * Action quantity should be a position number * Modify related docs & notebooks * Minor * Change test file name. Prepare to merge into master. * . * Minor test patch * Add `clear()` function to class `SimRandom` (#401) * Add SimRandom.clear() * Minor * Remove commented codes * Lint error * update package version * Minor * Remove docs/source/examples/multi_agent_dqn_cim.rst * Update .gitignore * Update .gitignore Co-authored-by: Jinyu-W <[email protected]> Co-authored-by: Jinyu Wang <[email protected]> Co-authored-by: Jinyu Wang <[email protected]> Co-authored-by: Jeremy Reynolds <[email protected]> Co-authored-by: Jeremy Reynolds <[email protected]> Co-authored-by: slowy07 <[email protected]> * Change `Env.set_seed()` logic (#456) * Change Env.set_seed() logic * Redesign CIM reset logic; fix lint issues; * Lint * Seed type assertion * Remove all SC related files (#473) * RL Toolkit V3 (#471) * added daemon=True for multi-process rollout, policy manager and inference * removed obsolete files * [REDO][PR#406]V0.2 rl refinement taskq (#408) * Add a usable task_queue * Rename some variables * 1. Add ; 2. Integrate related files; 3. Remove * merge `data_parallel` and `num_grad_workers` into `data_parallelism` * Fix bugs in docker_compose_yml.py and Simple/Multi-process mode. * Move `grad_worker` into marl/rl/workflows * 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue. * Refine code and update docs of `TaskQueue` * Support priority for tasks in `task_queue` * Update diagram of policy manager and task queue. * Add configurable `single_task_limit` and correct docstring about `data_parallelism` * Fix lint errors in `supply chain` * RL policy redesign (V2) (#405) * Drafi v2.0 for V2 * Polish models with more comments * Polish policies with more comments * Lint * Lint * Add developer doc for models. * Add developer doc for policies. * Remove policy manager V2 since it is not used and out-of-date * Lint * Lint * refined messy workflow code * merged 'scenario_dir' and 'scenario' in rl config * 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods * 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2 * merged cim and cim_v2 * lint issue fix * refined logging logic * lint issue fix * reversed unwanted changes * . . . . ReplayMemory & IndexScheduler ReplayMemory & IndexScheduler . MultiReplayMemory get_actions_with_logps EnvSampler on the road EnvSampler Minor * LearnerManager * Use batch to transfer data & add SHAPE_CHECK_FLAG * Rename learner to trainer * Add property for policy._is_exploring * CIM test scenario for V3. Manual test passed. Next step: run it, make it works. * env_sampler.py could run * env_sampler refine on the way * First runnable version done * AC could run, but the result is bad. Need to check the logic * Refine abstract method & shape check error info. * Docs * Very detailed compare. Try again. * AC done * DQN check done * Minor * DDPG, not tested * Minors * A rough draft of MAAC * Cannot use CIM as the multi-agent scenario. * Minor * MAAC refinement on the way * Remove ActionWithAux * Refine batch & memory * MAAC example works * Reproduce-able fix. Policy share between env_sampler and trainer_manager. * Detail refinement * Simplify the user configed workflow * Minor * Refine example codes * Minor polishment * Migrate rollout_manager to V3 * Error on the way * Redesign torch.device management * Rl v3 maddpg (#418) * Add MADDPG trainer * Fit independent critics and shared critic modes. * Add a new property: num_policies * Lint * Fix a bug in `sum(rewards)` * Rename `MADDPG` to `DiscreteMADDPG` and fix type hint. * Rename maddpg in examples. * Preparation for data parallel (#420) * Preparation for data parallel * Minor refinement & lint fix * Lint * Lint * rename atomic_get_batch_grad to get_batch_grad * Fix a unexpected commit * distributed maddpg * Add critic worker * Minor * Data parallel related minorities * Refine code structure for trainers & add more doc strings * Revert a unwanted change * Use TrainWorker to do the actual calculations. * Some minor redesign of the worker's abstraction * Add set/get_policy_state_dict back * Refine set/get_policy_state_dict * Polish policy trainers move train_batch_size to abs trainer delete _train_step_impl() remove _record_impl remove unused methods a minor bug fix in maddpg * Rl v3 data parallel grad worker (#432) * Fit new `trainer_worker` in `grad_worker` and `task_queue`. * Add batch dispatch * Add `tensor_dict` for task submit interface * Move `_remote_learn` to `AbsTrainWorker`. * Complement docstring for task queue and trainer. * Rename train worker to train ops; add placeholder for abstract methods; * Lint Co-authored-by: GQ.Chen <[email protected]> * [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450) * Preparation for data parallel * Minor refinement & lint fix * Lint * Lint * rename atomic_get_batch_grad to get_batch_grad * Fix a unexpected commit * distributed maddpg * Add critic worker * Minor * Data parallel related minorities * Refine code structure for trainers & add more doc strings * Revert a unwanted change * Use TrainWorker to do the actual calculations. * Some minor redesign of the worker's abstraction * Add set/get_policy_state_dict back * Refine set/get_policy_state_dict * Polish policy trainers move train_batch_size to abs trainer delete _train_step_impl() remove _record_impl remove unused methods a minor bug fix in maddpg * Rl v3 data parallel grad worker (#432) * Fit new `trainer_worker` in `grad_worker` and `task_queue`. * Add batch dispatch * Add `tensor_dict` for task submit interface * Move `_remote_learn` to `AbsTrainWorker`. * Complement docstring for task queue and trainer. * dsitributed training pipeline draft * added temporary test files for review purposes * Several code style refinements (#451) * Polish rl_v3/utils/ * Polish rl_v3/distributed/ * Polish rl_v3/policy_trainer/abs_trainer.py * fixed merge conflicts * unified sync and async interfaces * refactored rl_v3; refinement in progress * Finish the runnable pipeline under new design * Remove outdated files; refine class names; optimize imports; * Lint * Minor maddpg related refinement * Lint Co-authored-by: Default <[email protected]> Co-authored-by: Huoran Li <[email protected]> Co-authored-by: GQ.Chen <[email protected]> Co-authored-by: ysqyang <[email protected]> * Miner bug fix * Coroutine-related bug fix ("get_policy_state") (#452) * fixed rebase conflicts * renamed get_policy_func_dict to policy_creator * deleted unwanted folder * removed unwanted changes * resolved PR452 comments Co-authored-by: ysqyang <[email protected]> * Quick fix * Redesign experience recording logic (#453) * Two not important fix * Temp draft. Prepare to WFH * Done * Lint * Lint * Calculating advantages / returns (#454) * V1.0 * Complete DDPG * Rl v3 hanging issue fix (#455) * fixed rebase conflicts * renamed get_policy_func_dict to policy_creator * unified worker interfaces * recovered some files * dist training + cli code move * fixed bugs * added retry logic to client * 1. refactored CIM with various algos; 2. lint * lint * added type hint * removed some logs * lint * Make main.py more IDE friendly * Make main.py more IDE friendly * Lint * Final test & format. Ready to merge. Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> Co-authored-by: Huoran Li <[email protected]> * Rl v3 parallel rollout (#457) * fixed rebase conflicts * renamed get_policy_func_dict to policy_creator * unified worker interfaces * recovered some files * dist training + cli code move * fixed bugs * added retry logic to client * 1. refactored CIM with various algos; 2. lint * lint * added type hint * removed some logs * lint * Make main.py more IDE friendly * Make main.py more IDE friendly * Lint * load balancing dispatcher * added parallel rollout * lint * Tracker variable type issue; rename to env_sampler_creator; * Rl v3 parallel rollout follow ups (#458) * AbsWorker & AbsDispatcher * Pass env idx to AbsTrainer.record() method, and let the trainer to decide how to record experiences sampled from different worlds. * Fix policy_creator reuse bug * Format code * Merge AbsTrainerManager & SimpleTrainerManager * AC test passed * Lint * Remove AbsTrainer.build() method. Put all initialization operations into __init__ * Redesign AC preprocess batches logic Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> Co-authored-by: Huoran Li <[email protected]> * MADDPG performance bug fix (#459) * Fix MARL (MADDPG) terminal recording bug; some other minor refinements; * Restore Trainer.build() method * Calculate latest action in the get_actor_grad method in MADDPG. * Share critic bug fix * Rl v3 example update (#461) * updated vm_scheduling example and cim notebook * fixed bugs in vm_scheduling * added local train method * bug fix * modified async client logic to fix hidden issue * reverted to default config * fixed PR comments and some bugs * removed hardcode Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * Done (#462) * Rl v3 load save (#463) * added load/save feature * fixed some bugs * reverted unwanted changes * lint * fixed PR comments Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * RL Toolkit data parallelism revamp & config utils (#464) * added load/save feature * fixed some bugs * reverted unwanted changes * lint * fixed PR comments * 1. fixed data parallelism issue; 2. added config validator; 3. refactored cli local * 1. fixed rollout exit issue; 2. refined config * removed config file from example * fixed lint issues * fixed lint issues * added main.py under examples/rl * fixed lint issues Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * RL doc string (#465) * First rough draft * Minors * Reformat * Lint * Resolve PR comments * Rl type specific env getter (#466) * 1. type-sensitive env variable getter; 2. updated READMEs for examples * fixed bugs * fixed bugs * bug fixes * lint Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * Example bug fix * Optimize parser.py * Resolve PR comments * Rl config doc (#467) * 1. type-sensitive env variable getter; 2. updated READMEs for examples * added detailed doc * lint * wording refined * resolved some PR comments * resolved more PR comments * typo fix Co-authored-by: ysqyang <[email protected]> Co-authored-by: ysqyang <[email protected]> Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * RL online doc (#469) * Model, policy, trainer * RL workflows and env sampler doc in RST (#468) * First rough draft * Minors * Reformat * Lint * Resolve PR comments * 1. type-sensitive env variable getter; 2. updated READMEs for examples * Rl type specific env getter (#466) * 1. type-sensitive env variable getter; 2. updated READMEs for examples * fixed bugs * fixed bugs * bug fixes * lint Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * Example bug fix * Optimize parser.py * Resolve PR comments * added detailed doc * lint * wording refined * resolved some PR comments * rewriting rl toolkit rst * resolved more PR comments * typo fix * updated rst Co-authored-by: Huoran Li <[email protected]> Co-authored-by: Default <[email protected]> Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * Finish docs/source/key_components/rl_toolkit.rst * API doc * RL online doc image fix (#470) * resolved some PR comments * fix * fixed PR comments * added numfig=True setting in conf.py for sphinx Co-authored-by: ysqyang <[email protected]> * Resolve PR comments * Add example github link Co-authored-by: ysqyang <[email protected]> Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> * Rl v3 pr comment resolution (#474) * added load/save feature * 1. resolved pr comments; 2. reverted maro/cli/k8s * fixed some bugs Co-authored-by: ysqyang <[email protected]> Co-authored-by: yaqiu <[email protected]> Co-authored-by: yaqiu <[email protected]> Co-authored-by: GQ.Chen <[email protected]> Co-authored-by: ysqyang <[email protected]> Co-authored-by: ysqyang <[email protected]> * RL renaming v2 (#476) * Change all Logger in RL to LoggerV2 * TrainerManager => TrainingManager * Add Trainer suffix to all algorithms * Finish docs * Update interface names * Minor fix * Cherry pick latest RL (#498) * Cherry pick * Remove SC related files * Cherry pick RL changes from `sc_refinement` (latest commit: `2a4869`) (#509) * Cherry pick RL changes from sc_refinement (2a4869) * Limit time display precision * RL incremental refactor (#501) * Refactor rollout logic. Allow multiple sampling in one epoch, so that we can generate more data for training. AC & PPO for continuous action policy; refine AC & PPO logic. Cherry pick RL changes from GYM-DDPG Cherry pick RL changes from GYM-SAC Minor error in doc string * Add min_n_sample in template and parser * Resolve PR comments. Fix a minor issue in SAC. * RL component bundle (#513) * CIM passed * Update workers * Refine annotations * VM passed * Code formatting. * Minor import loop issue * Pass batch in PPO again * Remove Scenario * Complete docs * Minor * Remove segment * Optimize logic in RLComponentBundle * Resolve PR comments * Move 'post methods from RLComponenetBundle to EnvSampler * Add method to get mapping of available tick to frame index (#415) * add method to get mapping of available tick to frame index * fix lint issue * fix naming issue * Cherry pick from sc_refinement (#527) * Cherry pick from sc_refinement * Cherry pick from sc_refinement * Refine `terminal` / `next_agent_state` logic (#531) * Optimize RL toolkit * Fix bug in terminal/next_state generation * Rewrite terminal/next_state logic again * Minor renaming * Minor bug fix * Resolve PR comments * Merge master into v0.3 (#536) * update docker hub init (#367) * update docker hub init * replace personal account with maro-team * update hello files for CIM * update docker repository name * update docker file name * fix bugs in notebook, rectify docs * fix doc build issue * remove docs from playground; fix citibike lp example Event issue * update the exampel for vector env * update vector env example * update README due to PR comments * add link to playground above MARO installation in README * fix some typos Co-authored-by: Jinyu Wang <[email protected]> * update package version * update README for package description * update image links for pypi package description * update image links for pypi package description * change the input topology schema for CIM real data mode (#372) * change the input topology schema for CIM real data mode * remove unused importing * update test config file correspondingly * add Exception for env test * add cost factors to cim data dump * update CimDataCollection field name * update field name of data collection related code * update package version * adjust interface to reflect actual signature (#374) Co-authored-by: Jeremy Reynolds <[email protected]> * update dataclasses requirement to setup * fix: fixing spelling grammarr * fix: fix typo spelling code commented and data_model.rst * Fix Geo vis IP address & SQL logic bugs. (#383) Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)). * Fix the "Wrong future stop tick predictions" bug (#386) * Propose my new solution Refine to the pre-process version . * Optimize import * Fix reset random seed bug (#387) * update the reset interface of Env and BE * Try to fix reset routes generation seed issue * Refine random related logics. * Minor refinement * Test check * Minor * Remove unused functions so far * Minor Co-authored-by: Jinyu Wang <[email protected]> * update package version * Add _init_vessel_plans in business_engine.reset (#388) * update package version * change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391) Co-authored-by: Jinyu Wang <[email protected]> * Refine `event_buffer/` module (#389) * Core & Business Engine code refinement (#392) * First version * Optimize imports * Add typehint * Lint check * Lint check * add higher python version (#398) * add higher python version * update pytorch version * update torchvision version Co-authored-by: Jinyu Wang <[email protected]> * CIM scenario refinement (#400) * Cim scenario refinement (#394) * CIM refinement * Fix lint error * Fix lint error * Cim test coverage (#395) * Enrich tests * Refactor CimDataGenerator * Refactor CIM parsers * Minor refinement * Fix lint error * Fix lint error * Fix lint error * Minor refactor * Type * Add two test file folders. Make a slight change to CIM BE. * Lint error * Lint error * Remove unnecessary public interfaces of CIM BE * Cim disable auto action type detection (#399) * Haven't been tested * Modify document * Add ActionType checking * Minor * Lint error * Action quantity should be a position number * Modify related docs & notebooks * Minor * Change test file name. Prepare to merge into master. * . * Minor test patch * Add `clear()` function to class `SimRandom` (#401) * Add SimRandom.clear() * Minor * Remove commented codes * Lint error * update package version * add branch v0.3 to github workflow * update github test workflow * Update requirements.dev.txt (#444) Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact. * Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460) Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3) --- updated-dependencies: - dependency-name: ipython dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add & sort requirements.dev.txt Co-authored-by: Jinyu-W <[email protected]> Co-authored-by: Jinyu Wang <[email protected]> Co-authored-by: Jinyu Wang <[email protected]> Co-authored-by: Jeremy Reynolds <[email protected]> Co-authored-by: Jeremy Reynolds <[email protected]> Co-authored-by: slowy07 <[email protected]> Co-authored-by: solosilence <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Merge master into v0.3 (#545) * update docker hub init (#367) * update docker hub init * replace personal account with maro-team * update hello files for CIM * update docker repository name * update docker file name * fix bugs in notebook, rectify docs * fix doc build issue * remove docs from playground; fix citibike lp example Event issue * update the exampel for vector env * update vector env example * update README due to PR comments * add link to playground above MARO installation in README * fix some typos Co-authored-by: Jinyu Wang <[email protected]> * update package version * update README for package description * update image links for pypi package description * update image links for pypi package description * change the input topology schema for CIM real data mode (#372) * change the input topology schema for CIM real data mode * remove unused importing * update test config file correspondingly * add Exception for env test * add cost factors to cim data dump * update CimDataCollection field name * update field name of data collection related code * update package version * adjust interface to reflect actual signature (#374) Co-authored-by: Jeremy Reynolds <[email protected]> * update dataclasses requirement to setup * fix: fixing spelling grammarr * fix: fix typo spelling code commented and data_model.rst * Fix Geo vis IP address & SQL logic bugs. (#383) Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)). * Fix the "Wrong future stop tick predictions" bug (#386) * Propose my new solution Refine to the pre-process version . * Optimize import * Fix reset random seed bug (#387) * update the reset interface of Env and BE * Try to fix reset routes generation seed issue * Refine random related logics. * Minor refinement * Test check * Minor * Remove unused functions so far * Minor Co-authored-by: Jinyu Wang <[email protected]> * update package version * Add _init_vessel_plans in business_engine.reset (#388) * update package version * change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391) Co-authored-by: Jinyu Wang <[email protected]> * Refine `event_buffer/` module (#389) * Core & Business Engine code refinement (#392) * First version * Optimize imports * Add typehint * Lint check * Lint check * add higher python version (#398) * add higher python version * update pytorch version * update torchvision version Co-authored-by: Jinyu Wang <[email protected]> * CIM scenario refinement (#400) * Cim scenario refinement (#394) * CIM refinement * Fix lint error * Fix lint error * Cim test coverage (#395) * Enrich tests * Refactor CimDataGenerator * Refactor CIM parsers * Minor refinement * Fix lint error * Fix lint error * Fix lint error * Minor refactor * Type * Add two test file folders. Make a slight change to CIM BE. * Lint error * Lint error * Remove unnecessary public interfaces of CIM BE * Cim disable auto action type detection (#399) * Haven't been tested * Modify document * Add ActionType checking * Minor * Lint error * Action quantity should be a position number * Modify related docs & notebooks * Minor * Change test file name. Prepare to merge into master. * . * Minor test patch * Add `clear()` function to class `SimRandom` (#401) * Add SimRandom.clear() * Minor * Remove commented codes * Lint error * update package version * add branch v0.3 to github workflow * update github test workflow * Update requirements.dev.txt (#444) Added the versions of dependencies and resolve some conflicts occurs when installing. By adding these version number it will tell you the exact. * Bump ipython from 7.10.1 to 7.16.3 in /notebooks (#460) Bumps [ipython](https://github.com/ipython/ipython) from 7.10.1 to 7.16.3. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](https://github.com/ipython/ipython/compare/7.10.1...7.16.3) --- updated-dependencies: - dependency-name: ipython dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * update github woorkflow config * MARO v0.3: a new design of RL Toolkit, CLI refactorization, and corresponding updates. (#539) * refined proxy coding style * updated images and refined doc * updated images * updated CIM-AC example * refined proxy retry logic * call policy update only for AbsCorePolicy * add limitation of AbsCorePolicy in Actor.collect() * refined actor to return only experiences for policies that received new experiences * fix MsgKey issue in rollout_manager * fix typo in learner * call exit function for parallel rollout manager * update supply chain example distributed training scripts * 1. moved exploration scheduling to rollout manager; 2. fixed bug in lr schedule registration in core model; 3. added parallel policy manager prorotype * reformat render * fix supply chain business engine action type problem * reset supply chain example render figsize from 4 to 3 * Add render to all modes of supply chain example * fix or policy typos * 1. added parallel policy manager prototype; 2. used training ep for evaluation episodes * refined parallel policy manager * updated rl/__init__/py * fixed lint issues and CIM local learner bugs * deleted unwanted supply_chain test files * revised default config for cim-dqn * removed test_store.py as it is no longer needed * 1. changed Actor class to rollout_worker function; 2. renamed algorithm to algorithms * updated figures * removed unwanted import * refactored CIM-DQN example * added MultiProcessRolloutManager and MultiProcessTrainingManager * updated doc * lint issue fix * lint issue fix * fixed import formatting * [Feature] Prioritized Experience Replay (#355) * added prioritized experience replay * deleted unwanted supply_chain test files * fixed import order * import fix * fixed lint issues * fixed import formatting * added note in docstring that rank-based PER has yet to be implemented Co-authored-by: ysqyang <[email protected]> * rm AbsDecisionGenerator * small fixes * bug fix * reorganized training folder structure * fixed lint issues * fixed lint issues * policy manager refined * lint fix * restructured CIM-dqn sync code * added policy version index and used it as a measure of experience staleness * lint issue fix * lint issue fix * switched log_dir and proxy_kwargs order * cim example refinement * eval schedule sorted only when it's a list * eval schedule sorted only when it's a list * update sc env wrapper * added docker scripts for cim-dqn * refactored example folder structure and added workflow templates * fixed lint issues * fixed lint issues * fixed template bugs * removed unused imports * refactoring sc in progress * simplified cim meta * fixed build.sh path bug * template refinement * deleted obsolete svgs * updated learner logs * minor edits * refactored templates for easy merge with async PR * added component names for rollout manager and policy manager * fixed incorrect position to add last episode to eval schedule * added max_lag option in templates * formatting edit in docker_compose_yml script * moved local learner and early stopper outside sync_tools * refactored rl toolkit folder structure * refactored rl toolkit folder structure * moved env_wrapper and agent_wrapper inside rl/learner * refined scripts * fixed typo in script * changes needed for running sc * removed unwanted imports * config change for testing sc scenario * changes for perf testing * Asynchronous Training (#364) * remote inference code draft * changed actor to rollout_worker and updated init files * removed unwanted import * updated inits * more async code * added async scripts * added async training code & scripts for CIM-dqn * changed async to async_tools to avoid conflict with python keyword * reverted unwanted change to dockerfile * added doc for policy server * addressed PR comments and fixed a bug in docker_compose_yml.py * fixed lint issue * resolved PR comment * resolved merge conflicts * added async templates * added proxy.close() for actor and policy_server * fixed incorrect position to add last episode to eval schedule * reverted unwanted changes * added missing async files * rm unwanted echo in kill.sh Co-authored-by: ysqyang <[email protected]> * renamed sync to synchronous and async to asynchronous to avoid conflict with keyword * added missing policy version increment in LocalPolicyManager * refined rollout manager recv logic * removed a debugging print * added sleep in distributed launcher to avoid hanging * updated api doc and rl toolkit doc * refined dynamic imports using importlib * 1. moved policy update triggers to policy manager; 2. added version control in policy manager * fixed a few bugs and updated cim RL example * fixed a few more bugs * added agent wrapper instantiation to workflows * added agent wrapper instantiation to workflows * removed abs_block and added max_prob option for DiscretePolicyNet and DiscreteACNet * fixed incorrect get_ac_policy signature for CIM * moved exploration inside core policy * added state to exploration call to support context-dependent exploration * separated non_rl_policy_index and rl_policy_index in workflows * modified sc example code according to workflow changes * modified sc example code according to workflow changes * added replay_agent_ids parameter to get_env_func for RL examples * fixed a few bugs * added maro/simulator/scenarios/supply_chain as bind mount * added post-step, post-collect, post-eval and post-update callbacks * fixed lint issues * fixed lint issues * moved instantiation of policy manager inside simple learner * fixed env_wrapper get_reward signature * minor edits * removed get_eperience kwargs from env_wrapper * 1. renamed step_callback to post_step in env_wrapper; 2. added get_eval_env_func to RL workflows * added rollout exp disribution option in RL examples * removed unwanted files * 1. made logger internal in learner; 2 removed logger creation in abs classes * checked out supply chain test files from v0.2_sc * 1. added missing model.eval() to choose_action; 2.added entropy features to AC * fixed a bug in ac entropy * abbreviated coefficient to coeff * removed -dqn from job name in rl example config * added tmp patch to dev.df * renamed image name for running rl examples * added get_loss interface for core policies * added policy manager in rl_toolkit.rst * 1. env_wrapper bug fix; 2. policy manager update logic refinement * refactored policy and algorithms * policy interface redesigned * refined policy interfaces * fixed typo * fixed bugs in refactored policy interface * fixed some bugs * refactoring in progress * policy interface and policy manager redesigned * 1. fixed bugs in ac and pg; 2. fixed bugs rl workflow scripts * fixed bug in distributed policy manager * fixed lint issues * fixed lint issues * added scipy in setup * 1. trimmed rollout manager code; 2. added option to docker scripts * updated api doc for policy manager * 1. simplified rl/learning code structure; 2. fixed bugs in rl example docker script * 1. simplified rl example structure; 2. fixed lint issues * further rl toolkit code simplifications * more numpy-based optimization in RL toolkit * moved replay buffer inside policy * bug fixes * numpy optimization and associated refactoring * extracted shaping logic out of env_sampler * fixed bug in CIM shaping and lint issues * preliminary implemetation of parallel batch inference * fixed bug in ddpg transition recording * put get_state, get_env_actions, get_reward back in EnvSampler * simplified exploration and core model interfaces * bug fixes and doc update * added improve() interface for RLPolicy for single-thread support * fixed simple policy manager bug * updated doc, rst, notebook * updated notebook * fixed lint issues * fixed entropy bugs in ac.py * reverted to simple policy manager as default * 1. unified single-thread and distributed mode in learning_loop.py; 2. updated api doc for algorithms and rst for rl toolkit * fixed lint issues and updated rl toolkit images * removed obsolete images * added back agent2policy for general workflow use * V0.2 rl refinement dist (#377) * Support `slice` operation in ExperienceSet * Support naive distributed policy training by proxy * Dynamically allocate trainers according to number of experience * code check * code check * code check * Fix a bug in distributed trianing with no gradient * Code check * Move Back-Propagation from trainer to policy_manager and extract trainer-allocation strategy * 1.call allocate_trainer() at first of update(); 2.refine according to code review * Code check * Refine code with new interface * Update docs of PolicyManger and ExperienceSet * Add images for rl_toolkit docs * Update diagram of PolicyManager * Refine with new interface * Extract allocation strategy into `allocation_strategy.py` * add `distributed_learn()` in policies for data-parallel training * Update doc of RL_toolkit * Add gradient workers for data-parallel * Refine code and update docs * Lint check * Refine by comments * Rename `trainer` to `worker` * Rename `distributed_learn` to `learn_with_data_parallel` * Refine allocator and remove redundant code in policy_manager * remove arugments in allocate_by_policy and so on * added checkpointing for simple and multi-process policy managers * 1. bug fixes in checkpointing; 2. removed version and max_lag in rollout manager * added missing set_state and get_state for CIM policies * removed blank line * updated RL workflow README * Integrate `data_parallel` arguments into `worker_allocator` (#402) * 1. simplified workflow config; 2. added comments to CIM shaping * lint issue fix * 1. added algorithm type setting in CIM config; 2. added try-except clause for initial policy state loading * 1. moved post_step callback inside env sampler; 2. updated README for rl workflows * refined READEME for CIM * VM scheduling with RL (#375) * added part of vm scheduling RL code * refined vm env_wrapper code style * added DQN * added get_experiences func for ac in vm scheduling * added post_step callback to env wrapper * moved Aiming's tracking and plotting logic into callbacks * added eval env wrapper * renamed AC config variable name for VM * vm scheduling RL code finished * updated README * fixed various bugs and hard coding for vm_scheduling * uncommented callbacks for VM scheduling * Minor revision for better code style * added part of vm scheduling RL code * refined vm env_wrapper code style * vm scheduling RL code finished * added config.py for vm scheduing * vm example refactoring * fixed bugs in vm_scheduling * removed unwanted files from cim dir * reverted to simple policy manager as default * added part of vm scheduling RL code * refined vm env_wrapper code style * vm scheduling RL code finished * added config.py for vm scheduing * resolved rebase conflicts * fixed bugs in vm_scheduling * added get_state and set_state to vm_scheduling policy models * updated README for vm_scheduling with RL Co-authored-by: yaqiu <[email protected]> Co-authored-by: Huoran Li <[email protected]> * SC refinement (#397) * Refine test scripts & pending_order_daily logic * Refactor code for better code style: complete type hint, correct typos, remove unused items. Refactor code for better code style: complete type hint, correct typos, remove unused items. * Polish test_supply_chain.py * update import format * Modify vehicle steps logic & remove outdated test case * Optimize imports * Optimize imports * Lint error * Lint error * Lint error * Add SupplyChainAction * Lint error Co-authored-by: Jinyu Wang <[email protected]> * refined workflow scripts * fixed bug in ParallelAgentWrapper * 1. fixed lint issues; 2. refined main script in workflows * lint issue fix * restored default config for rl example * Update rollout.py * refined env var processing in policy manager workflow * added hasattr check in agent wrapper * updated docker_compose_yml.py * Minor refinement * Minor PR. Prepare to merge latest master branch into v0.3 branch. (#412) * Prepare to merge master_mirror * Lint error * Minor * Merge latest master into v0.3 (#426) * update docker hub init (#367) * update docker hub init * replace personal account with maro-team * update hello files for CIM * update docker repository name * update docker file name * fix bugs in notebook, rectify docs * fix doc build issue * remove docs from playground; fix citibike lp example Event issue * update the exampel for vector env * update vector env example * update README due to PR comments * add link to playground above MARO installation in README * fix some typos Co-authored-by: Jinyu Wang <[email protected]> * update package version * update README for package description * update image links for pypi package description * update image links for pypi package description * change the input topology schema for CIM real data mode (#372) * change the input topology schema for CIM real data mode * remove unused importing * update test config file correspondingly * add Exception for env test * add cost factors to cim data dump * update CimDataCollection field name * update field name of data collection related code * update package version * adjust interface to reflect actual signature (#374) Co-authored-by: Jeremy Reynolds <[email protected]> * update dataclasses requirement to setup * fix: fixing spelling grammarr * fix: fix typo spelling code commented and data_model.rst * Fix Geo vis IP address & SQL logic bugs. (#383) Fix Geo vis IP address & SQL logic bugs (issue [352](https://github.com/microsoft/maro/issues/352) and [314](https://github.com/microsoft/maro/issues/314)). * Fix the "Wrong future stop tick predictions" bug (#386) * Propose my new solution Refine to the pre-process version . * Optimize import * Fix reset random seed bug (#387) * update the reset interface of Env and BE * Try to fix reset routes generation seed issue * Refine random related logics. * Minor refinement * Test check * Minor * Remove unused functions so far * Minor Co-authored-by: Jinyu Wang <[email protected]> * update package version * Add _init_vessel_plans in business_engine.reset (#388) * update package version * change the default solver used in Citibike OnlineLP example, from GLPK to CBC (#391) Co-authored-by: Jinyu Wang <[email protected]> * Refine `event_buffer/` module (#389) * Core & Business Engine code refinement (#392) * First version * Optimize imports * Add typehint * Lint check * Lint check * add higher python version (#398) * add higher python version * update pytorch version * update torchvision version Co-authored-by: Jinyu Wang <[email protected]> * CIM scenario refinement (#400) * Cim scenario refinement (#394) * CIM refinement * Fix lint error * Fix lint error * Cim test coverage (#395) * Enrich tests * Refactor CimDataGenerator * Refactor CIM parsers * Minor refinement * Fix lint error * Fix lint error * Fix lint error * Minor refactor * Type * Add two test file folders. Make a slight change to CIM BE. * Lint error * Lint error * Remove unnecessary public interfaces of CIM BE * Cim disable auto action type detection (#399) * Haven't been tested * Modify document * Add ActionType checking * Minor * Lint error * Action quantity should be a position number * Modify related docs & notebooks * Minor * Change test file name. Prepare to merge into master. * . * Minor test patch * Add `clear()` function to class `SimRandom` (#401) * Add SimRandom.clear() * Minor * Remove commented codes * Lint error * update package version * Minor * Remove docs/source/examples/multi_agent_dqn_cim.rst * Update .gitignore * Update .gitignore Co-authored-by: Jinyu-W <[email protected]> Co-authored-by: Jinyu Wang <[email protected]> Co-authored-by: Jinyu Wang <[email protected]> Co-authored-by: Jeremy Reynolds <[email protected]> Co-authored-by: Jeremy Reynolds <[email protected]> Co-authored-by: slowy07 <[email protected]> * Change `Env.set_seed()` logic (#456) * Change Env.set_seed() logic * Redesign CIM reset logic; fix lint issues; * Lint * Seed type assertion * Remove all SC related files (#473) * RL Toolkit V3 (#471) * added daemon=True for multi-process rollout, policy manager and inference * removed obsolete files * [REDO][PR#406]V0.2 rl refinement taskq (#408) * Add a usable task_queue * Rename some variables * 1. Add ; 2. Integrate related files; 3. Remove * merge `data_parallel` and `num_grad_workers` into `data_parallelism` * Fix bugs in docker_compose_yml.py and Simple/Multi-process mode. * Move `grad_worker` into marl/rl/workflows * 1.Merge data_parallel and num_workers into data_parallelism in config; 2.Assign recently used workers as possible in task_queue. * Refine code and update docs of `TaskQueue` * Support priority for tasks in `task_queue` * Update diagram of policy manager and task queue. * Add configurable `single_task_limit` and correct docstring about `data_parallelism` * Fix lint errors in `supply chain` * RL policy redesign (V2) (#405) * Drafi v2.0 for V2 * Polish models with more comments * Polish policies with more comments * Lint * Lint * Add developer doc for models. * Add developer doc for policies. * Remove policy manager V2 since it is not used and out-of-date * Lint * Lint * refined messy workflow code * merged 'scenario_dir' and 'scenario' in rl config * 1. refined env_sampler and agent_wrapper code; 2. added docstrings for env_sampler methods * 1. temporarily renamed RLPolicy from polivy_v2 to RLPolicyV2; 2. merged env_sampler and env_sampler_v2 * merged cim and cim_v2 * lint issue fix * refined logging logic * lint issue fix * reversed unwanted changes * . . . . ReplayMemory & IndexScheduler ReplayMemory & IndexScheduler . MultiReplayMemory get_actions_with_logps EnvSampler on the road EnvSampler Minor * LearnerManager * Use batch to transfer data & add SHAPE_CHECK_FLAG * Rename learner to trainer * Add property for policy._is_exploring * CIM test scenario for V3. Manual test passed. Next step: run it, make it works. * env_sampler.py could run * env_sampler refine on the way * First runnable version done * AC could run, but the result is bad. Need to check the logic * Refine abstract method & shape check error info. * Docs * Very detailed compare. Try again. * AC done * DQN check done * Minor * DDPG, not tested * Minors * A rough draft of MAAC * Cannot use CIM as the multi-agent scenario. * Minor * MAAC refinement on the way * Remove ActionWithAux * Refine batch & memory * MAAC example works * Reproduce-able fix. Policy share between env_sampler and trainer_manager. * Detail refinement * Simplify the user configed workflow * Minor * Refine example codes * Minor polishment * Migrate rollout_manager to V3 * Error on the way * Redesign torch.device management * Rl v3 maddpg (#418) * Add MADDPG trainer * Fit independent critics and shared critic modes. * Add a new property: num_policies * Lint * Fix a bug in `sum(rewards)` * Rename `MADDPG` to `DiscreteMADDPG` and fix type hint. * Rename maddpg in examples. * Preparation for data parallel (#420) * Preparation for data parallel * Minor refinement & lint fix * Lint * Lint * rename atomic_get_batch_grad to get_batch_grad * Fix a unexpected commit * distributed maddpg * Add critic worker * Minor * Data parallel related minorities * Refine code structure for trainers & add more doc strings * Revert a unwanted change * Use TrainWorker to do the actual calculations. * Some minor redesign of the worker's abstraction * Add set/get_policy_state_dict back * Refine set/get_policy_state_dict * Polish policy trainers move train_batch_size to abs trainer delete _train_step_impl() remove _record_impl remove unused methods a minor bug fix in maddpg * Rl v3 data parallel grad worker (#432) * Fit new `trainer_worker` in `grad_worker` and `task_queue`. * Add batch dispatch * Add `tensor_dict` for task submit interface * Move `_remote_learn` to `AbsTrainWorker`. * Complement docstring for task queue and trainer. * Rename train worker to train ops; add placeholder for abstract methods; * Lint Co-authored-by: GQ.Chen <[email protected]> * [DRAFT] distributed training pipeline based on RL Toolkit V3 (#450) * Preparation for data parallel * Minor refinement & lint fix * Lint * Lint * rename atomic_get_batch_grad to get_batch_grad * Fix a unexpected commit * distributed maddpg * Add critic worker * Minor * Data parallel related minorities * Refine code structure for trainers & add more doc strings * Revert a unwanted change * Use TrainWorker to do the actual calculations. * Some minor redesign of the worker's abstraction * Add set/get_policy_state_dict back * Refine set/get_policy_state_dict * Polish policy trainers move train_batch_size to abs trainer delete _train_step_impl() remove _record_impl remove unused methods a minor bug fix in maddpg * Rl v3 data parallel grad worker (#432) * Fit new `trainer_worker` in `grad_worker` and `task_queue`. * Add batch dispatch * Add `tensor_dict` for task submit interface * Move `_remote_learn` to `AbsTrainWorker`. * Complement docstring for task queue and trainer. * dsitributed training pipeline draft * added temporary test files for review purposes * Several code style refinements (#451) * Polish rl_v3/utils/ * Polish rl_v3/distributed/ * Polish rl_v3/policy_trainer/abs_trainer.py * fixed merge conflicts * unified sync and async interfaces * refactored rl_v3; refinement in progress * Finish the runnable pipeline under new design * Remove outdated files; refine class names; optimize imports; * Lint * Minor maddpg related refinement * Lint Co-authored-by: Default <[email protected]> Co-authored-by: Huoran Li <[email protected]> Co-authored-by: GQ.Chen <[email protected]> Co-authored-by: ysqyang <[email protected]> * Miner bug fix * Coroutine-related bug fix ("get_policy_state") (#452) * fixed rebase conflicts * renamed get_policy_func_dict to policy_creator * deleted unwanted folder * removed unwanted changes * resolved PR452 comments Co-authored-by: ysqyang <[email protected]> * Quick fix * Redesign experience recording logic (#453) * Two not important fix * Temp draft. Prepare to WFH * Done * Lint * Lint * Calculating advantages / returns (#454) * V1.0 * Complete DDPG * Rl v3 hanging issue fix (#455) * fixed rebase conflicts * renamed get_policy_func_dict to policy_creator * unified worker interfaces * recovered some files * dist training …
1 parent 94548c7 commit ef2a358

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+2223
-550
lines changed

examples/cim/rl/algorithms/ac.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
actor_net_conf = {
1212
"hidden_dims": [256, 128, 64],
1313
"activation": torch.nn.Tanh,
14+
"output_activation": torch.nn.Tanh,
1415
"softmax": True,
1516
"batch_norm": False,
1617
"head": True,
@@ -19,6 +20,7 @@
1920
"hidden_dims": [256, 128, 64],
2021
"output_dim": 1,
2122
"activation": torch.nn.LeakyReLU,
23+
"output_activation": torch.nn.LeakyReLU,
2224
"softmax": False,
2325
"batch_norm": True,
2426
"head": True,

examples/cim/rl/algorithms/dqn.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
q_net_conf = {
1313
"hidden_dims": [256, 128, 64, 32],
1414
"activation": torch.nn.LeakyReLU,
15+
"output_activation": torch.nn.LeakyReLU,
1516
"softmax": False,
1617
"batch_norm": True,
1718
"skip_connection": False,

examples/cim/rl/algorithms/maddpg.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
actor_net_conf = {
1515
"hidden_dims": [256, 128, 64],
1616
"activation": torch.nn.Tanh,
17+
"output_activation": torch.nn.Tanh,
1718
"softmax": True,
1819
"batch_norm": False,
1920
"head": True,
@@ -22,6 +23,7 @@
2223
"hidden_dims": [256, 128, 64],
2324
"output_dim": 1,
2425
"activation": torch.nn.LeakyReLU,
26+
"output_activation": torch.nn.LeakyReLU,
2527
"softmax": False,
2628
"batch_norm": True,
2729
"head": True,

examples/cim/rl/env_sampler.py

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -90,11 +90,25 @@ def post_collect(self, info_list: list, ep: int) -> None:
9090
for info in info_list:
9191
print(f"env summary (episode {ep}): {info['env_metric']}")
9292

93-
# print the average env metric
94-
if len(info_list) > 1:
95-
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
96-
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
97-
print(f"average env summary (episode {ep}): {avg_metric}")
93+
# average env metric
94+
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
95+
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
96+
print(f"average env summary (episode {ep}): {avg_metric}")
97+
98+
self.metrics.update(avg_metric)
99+
self.metrics = {k: v for k, v in self.metrics.items() if not k.startswith("val/")}
98100

99101
def post_evaluate(self, info_list: list, ep: int) -> None:
100-
self.post_collect(info_list, ep)
102+
# print the env metric from each rollout worker
103+
for info in info_list:
104+
print(f"env summary (episode {ep}): {info['env_metric']}")
105+
106+
# average env metric
107+
metric_keys, num_envs = info_list[0]["env_metric"].keys(), len(info_list)
108+
avg_metric = {key: sum(info["env_metric"][key] for info in info_list) / num_envs for key in metric_keys}
109+
print(f"average env summary (episode {ep}): {avg_metric}")
110+
111+
self.metrics.update({"val/" + k: v for k, v in avg_metric.items()})
112+
113+
def monitor_metrics(self) -> float:
114+
return -self.metrics["val/container_shortage"]

examples/cim/rl/rl_component_bundle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
# Environments
1515
learn_env = Env(**env_conf)
16-
test_env = learn_env
16+
test_env = Env(**env_conf)
1717

1818
# Agent, policy, and trainers
1919
num_agents = len(learn_env.agent_idx_list)

examples/rl/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This folder contains scenarios that employ reinforcement learning. MARO's RL too
77
The entrance of a RL workflow is a YAML config file. For readers' convenience, we call this config file `config.yml` in the rest part of this doc. `config.yml` specifies the path of all necessary resources, definitions, and configurations to run the job. MARO provides a comprehensive template of the config file with detailed explanations (`maro/maro/rl/workflows/config/template.yml`). Meanwhile, MARO also provides several simple examples of `config.yml` under the current folder.
88

99
There are two ways to start the RL job:
10-
- If you only need to have a quick look and try to start an out-of-box workflow, just run `python .\examples\rl\run_rl_example.py PATH_TO_CONFIG_YAML`. For example, `python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml` will run the complete example RL training workflow of CIM scenario. If you only want to run the evaluation workflow, you could start the job with `--evaluate_only`.
10+
- If you only need to have a quick look and try to start an out-of-box workflow, just run `python .\examples\rl\run.py PATH_TO_CONFIG_YAML`. For example, `python .\examples\rl\run.py .\examples\rl\cim.yml` will run the complete example RL training workflow of CIM scenario. If you only want to run the evaluation workflow, you could start the job with `--evaluate_only`.
1111
- (**Require install MARO from source**) You could also start the job through MARO CLI. Use the command `maro local run [-c] path/to/your/config` to run in containerized (with `-c`) or non-containerized (without `-c`) environments. Similar, you could add `--evaluate_only` if you only need to run the evaluation workflow.
1212

1313
## Create Your Own Scenarios

examples/rl/cim.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,17 @@
55
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.
66

77
# Run this workflow by executing one of the following commands:
8-
# - python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml
9-
# - (Requires installing MARO from source) maro local run .\examples\rl\cim.yml
8+
# - python ./examples/rl/run.py ./examples/rl/cim.yml
9+
# - (Requires installing MARO from source) maro local run ./examples/rl/cim.yml
1010

1111
job: cim_rl_workflow
1212
scenario_path: "examples/cim/rl"
13-
log_path: "log/rl_job/cim.txt"
13+
log_path: "log/cim_rl/"
1414
main:
1515
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
1616
num_steps: null
1717
eval_schedule: 5
18+
early_stop_patience: 5
1819
logging:
1920
stdout: INFO
2021
file: DEBUG
@@ -27,7 +28,7 @@ training:
2728
load_path: null
2829
load_episode: null
2930
checkpointing:
30-
path: "checkpoint/rl_job/cim"
31+
path: "log/cim_rl/checkpoints"
3132
interval: 5
3233
logging:
3334
stdout: INFO

examples/rl/cim_distributed.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
# Copyright (c) Microsoft Corporation.
22
# Licensed under the MIT license.
33

4-
# Example RL config file for CIM scenario.
4+
# Example RL config file for CIM scenario (distributed version).
55
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.
66

77
# Run this workflow by executing one of the following commands:
8-
# - python .\examples\rl\run_rl_example.py .\examples\rl\cim.yml
9-
# - (Requires installing MARO from source) maro local run .\examples\rl\cim.yml
8+
# - python ./examples/rl/run.py ./examples/rl/cim_distributed.yml
9+
# - (Requires installing MARO from source) maro local run ./examples/rl/cim_distributed.yml
1010

1111
job: cim_rl_workflow
1212
scenario_path: "examples/cim/rl"
13-
log_path: "log/rl_job/cim.txt"
13+
log_path: "log/cim_rl/"
1414
main:
1515
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
1616
num_steps: null
@@ -35,7 +35,7 @@ training:
3535
load_path: null
3636
load_episode: null
3737
checkpointing:
38-
path: "checkpoint/rl_job/cim"
38+
path: "log/cim_rl/checkpoints"
3939
interval: 5
4040
proxy:
4141
host: "127.0.0.1"
File renamed without changes.

examples/rl/vm_scheduling.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@
55
# Please refer to `maro/rl/workflows/config/template.yml` for the complete template and detailed explanations.
66

77
# Run this workflow by executing one of the following commands:
8-
# - python .\examples\rl\run_rl_example.py .\examples\rl\vm_scheduling.yml
9-
# - (Requires installing MARO from source) maro local run .\examples\rl\vm_scheduling.yml
8+
# - python ./examples/rl/run.py ./examples/rl/vm_scheduling.yml
9+
# - (Requires installing MARO from source) maro local run ./examples/rl/vm_scheduling.yml
1010

1111
job: vm_scheduling_rl_workflow
1212
scenario_path: "examples/vm_scheduling/rl"
13-
log_path: "log/rl_job/vm_scheduling.txt"
13+
log_path: "log/vm_rl/"
1414
main:
1515
num_episodes: 30 # Number of episodes to run. Each episode is one cycle of roll-out and training.
1616
num_steps: null
@@ -27,7 +27,7 @@ training:
2727
load_path: null
2828
load_episode: null
2929
checkpointing:
30-
path: "checkpoint/rl_job/vm_scheduling"
30+
path: "log/vm_rl/checkpoints"
3131
interval: 5
3232
logging:
3333
stdout: INFO

examples/vm_scheduling/rl/algorithms/ac.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
actor_net_conf = {
1212
"hidden_dims": [64, 32, 32],
1313
"activation": torch.nn.LeakyReLU,
14+
"output_activation": torch.nn.LeakyReLU,
1415
"softmax": True,
1516
"batch_norm": False,
1617
"head": True,
@@ -19,6 +20,7 @@
1920
critic_net_conf = {
2021
"hidden_dims": [256, 128, 64],
2122
"activation": torch.nn.LeakyReLU,
23+
"output_activation": torch.nn.LeakyReLU,
2224
"softmax": False,
2325
"batch_norm": False,
2426
"head": True,

examples/vm_scheduling/rl/algorithms/dqn.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
q_net_conf = {
1515
"hidden_dims": [64, 128, 256],
1616
"activation": torch.nn.LeakyReLU,
17+
"output_activation": torch.nn.LeakyReLU,
1718
"softmax": False,
1819
"batch_norm": False,
1920
"skip_connection": False,

maro/__misc__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22
# Licensed under the MIT license.
33

44

5-
__version__ = "0.3.1a2"
5+
__version__ = "0.3.2a1"
66

77
__data_version__ = "0.2"

maro/cli/data_pipeline/citi_bike.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
from enum import Enum
99

1010
import geopy.distance
11-
import numpy as np
1211
import pandas as pd
1312
from yaml import safe_load
1413

@@ -320,7 +319,7 @@ def _process_distance(self, station_info: pd.DataFrame):
320319
0,
321320
index=station_info["station_index"],
322321
columns=station_info["station_index"],
323-
dtype=np.float,
322+
dtype=float,
324323
)
325324
look_up_df = station_info[["latitude", "longitude"]]
326325
return distance_adj.apply(
@@ -617,7 +616,7 @@ def _gen_distance(self, station_init: pd.DataFrame):
617616
0,
618617
index=station_init["station_index"],
619618
columns=station_init["station_index"],
620-
dtype=np.float,
619+
dtype=float,
621620
)
622621
look_up_df = station_init[["latitude", "longitude"]]
623622
distance_df = distance_adj.apply(

maro/cli/local/commands.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def get_redis_conn(port=None):
6161

6262

6363
# Functions executed on CLI commands
64-
def run(conf_path: str, containerize: bool = False, evaluate_only: bool = False, **kwargs):
64+
def run(conf_path: str, containerize: bool = False, seed: int = None, evaluate_only: bool = False, **kwargs):
6565
# Load job configuration file
6666
parser = ConfigParser(conf_path)
6767
if containerize:
@@ -71,13 +71,14 @@ def run(conf_path: str, containerize: bool = False, evaluate_only: bool = False,
7171
LOCAL_MARO_ROOT,
7272
DOCKERFILE_PATH,
7373
DOCKER_IMAGE_NAME,
74+
seed=seed,
7475
evaluate_only=evaluate_only,
7576
)
7677
except KeyboardInterrupt:
7778
stop_rl_job_with_docker_compose(parser.config["job"], LOCAL_MARO_ROOT)
7879
else:
7980
try:
80-
start_rl_job(parser, LOCAL_MARO_ROOT, evaluate_only=evaluate_only)
81+
start_rl_job(parser, LOCAL_MARO_ROOT, seed=seed, evaluate_only=evaluate_only)
8182
except KeyboardInterrupt:
8283
sys.exit(1)
8384

maro/cli/local/utils.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import os
55
import subprocess
66
from copy import deepcopy
7-
from typing import List
7+
from typing import List, Optional
88

99
import docker
1010
import yaml
@@ -110,12 +110,15 @@ def exec(cmd: str, env: dict, debug: bool = False) -> subprocess.Popen:
110110
def start_rl_job(
111111
parser: ConfigParser,
112112
maro_root: str,
113+
seed: Optional[int],
113114
evaluate_only: bool,
114115
background: bool = False,
115116
) -> List[subprocess.Popen]:
116117
procs = [
117118
exec(
118-
f"python {script}" + ("" if not evaluate_only else " --evaluate_only"),
119+
f"python {script}"
120+
+ ("" if not evaluate_only else " --evaluate_only")
121+
+ ("" if seed is None else f" --seed {seed}"),
119122
format_env_vars({**env, "PYTHONPATH": maro_root}, mode="proc"),
120123
debug=not background,
121124
)
@@ -169,6 +172,7 @@ def start_rl_job_with_docker_compose(
169172
context: str,
170173
dockerfile_path: str,
171174
image_name: str,
175+
seed: Optional[int],
172176
evaluate_only: bool,
173177
) -> None:
174178
common_spec = {
@@ -185,7 +189,9 @@ def start_rl_job_with_docker_compose(
185189
**deepcopy(common_spec),
186190
**{
187191
"container_name": component,
188-
"command": f"python3 {script}" + ("" if not evaluate_only else " --evaluate_only"),
192+
"command": f"python3 {script}"
193+
+ ("" if not evaluate_only else " --evaluate_only")
194+
+ ("" if seed is None else f" --seed {seed}"),
189195
"environment": format_env_vars(env, mode="docker-compose"),
190196
},
191197
}

maro/rl/model/abs_net.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from __future__ import annotations
55

66
from abc import ABCMeta
7-
from typing import Any, Dict
7+
from typing import Any, Dict, Optional
88

99
import torch.nn
1010
from torch.optim import Optimizer
@@ -18,6 +18,8 @@ class AbsNet(torch.nn.Module, metaclass=ABCMeta):
1818
def __init__(self) -> None:
1919
super(AbsNet, self).__init__()
2020

21+
self._device: Optional[torch.device] = None
22+
2123
@property
2224
def optim(self) -> Optimizer:
2325
optim = getattr(self, "_optim", None)
@@ -119,3 +121,7 @@ def unfreeze_all_parameters(self) -> None:
119121
"""Unfreeze all parameters."""
120122
for p in self.parameters():
121123
p.requires_grad = True
124+
125+
def to_device(self, device: torch.device) -> None:
126+
self._device = device
127+
self.to(device)

maro/rl/model/algorithm_nets/ac_based.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,23 @@ class ContinuousACBasedNet(ContinuousPolicyNet, metaclass=ABCMeta):
4343
- set_state(self, net_state: dict) -> None:
4444
"""
4545

46-
def _get_actions_impl(self, states: torch.Tensor, exploring: bool) -> torch.Tensor:
47-
actions, _ = self._get_actions_with_logps_impl(states, exploring)
46+
def _get_actions_impl(self, states: torch.Tensor, exploring: bool, **kwargs) -> torch.Tensor:
47+
actions, _ = self._get_actions_with_logps_impl(states, exploring, **kwargs)
4848
return actions
4949

50-
def _get_actions_with_probs_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
50+
def _get_actions_with_probs_impl(
51+
self,
52+
states: torch.Tensor,
53+
exploring: bool,
54+
**kwargs,
55+
) -> Tuple[torch.Tensor, torch.Tensor]:
5156
# Not used in Actor-Critic or PPO
5257
pass
5358

54-
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
59+
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
60+
# Not used in Actor-Critic or PPO
61+
pass
62+
63+
def _get_random_actions_impl(self, states: torch.Tensor, **kwargs) -> torch.Tensor:
5564
# Not used in Actor-Critic or PPO
5665
pass

maro/rl/model/algorithm_nets/ddpg.py

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,32 @@ class ContinuousDDPGNet(ContinuousPolicyNet, metaclass=ABCMeta):
2525
- set_state(self, net_state: dict) -> None:
2626
"""
2727

28-
def _get_actions_with_probs_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
28+
def _get_actions_with_probs_impl(
29+
self,
30+
states: torch.Tensor,
31+
exploring: bool,
32+
**kwargs,
33+
) -> Tuple[torch.Tensor, torch.Tensor]:
2934
# Not used in DDPG
3035
pass
3136

32-
def _get_actions_with_logps_impl(self, states: torch.Tensor, exploring: bool) -> Tuple[torch.Tensor, torch.Tensor]:
37+
def _get_actions_with_logps_impl(
38+
self,
39+
states: torch.Tensor,
40+
exploring: bool,
41+
**kwargs,
42+
) -> Tuple[torch.Tensor, torch.Tensor]:
3343
# Not used in DDPG
3444
pass
3545

36-
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
46+
def _get_states_actions_probs_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
3747
# Not used in DDPG
3848
pass
3949

40-
def _get_states_actions_logps_impl(self, states: torch.Tensor, actions: torch.Tensor) -> torch.Tensor:
50+
def _get_states_actions_logps_impl(self, states: torch.Tensor, actions: torch.Tensor, **kwargs) -> torch.Tensor:
51+
# Not used in DDPG
52+
pass
53+
54+
def _get_random_actions_impl(self, states: torch.Tensor, **kwargs) -> torch.Tensor:
4155
# Not used in DDPG
4256
pass

0 commit comments

Comments
 (0)