-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DuelingDQN #127
base: master
Are you sure you want to change the base?
DuelingDQN #127
Conversation
Just so you know, I had in mind to implement rainbow: DLR-RM/stable-baselines3#622 btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176 Related: DLR-RM/stable-baselines3#487 |
Agree, it would be much better
Thanks for the ref! |
@araffin I want your opinion on this: |
"MultiInputPolicy": MultiInputPolicy, | ||
} | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if init is the same as DQN, I guess you can drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only (but necessary) diff is the policy type hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...
you mean DQN a special case of DuelingDQN?
you are mainly changing the policy, no? |
I'm changing only the policy. |
Yes, it makes more sense in that order. But this would require putting DuelingDQN in the main repo.
I'll make a draft PR in SB3 |
yes, so for now, I would keep it as is. My goal with Vanilla DQN in SB3 was to keep it as simple as possible to be a reference when learning or implementing other things. |
On second thought, it might be more practical to work in reverse. I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default. And at the renaming DQN to RAINBOW, changing the defaults. Roughly we would have: Currently: class DQN:
def __init__(self, policy, env, ...): ... We implement dueling: class DQN:
def __init__(self, policy, env, ..., dueling=False): ...
class DuelingDQN(DQN):
def __init__(self, policy, env, ...):
super().__init__(policy, env, ..., dueling=True) Then we implement PER class DQN:
def __init__(self, policy, env, ..., dueling=False, per=False): ...
class DuelingDQN(DQN):
def __init__(self, policy, env, ..., per=False):
super().__init__(policy, env, ..., dueling=True, per=per)
class PERDQN(DQN):
def __init__(self, policy, env, ..., dueling=False):
super().__init__(policy, env, ..., dueling=dueling, per=True) And so on. Once all the features have been implemented, we can rename DQN to RAINBOW, changing only the default values, and add a DQN class that inherits from RAINBOW: class RAINBOW:
def __init__(self, policy, env, ..., dueling=True, per=True): ...
class DuelingDQN(RAINBOW):
def __init__(self, policy, env, ..., per=False):
super().__init__(policy, env, ..., dueling=True, per=per)
class PERDQN(RAINBOW):
def __init__(self, policy, env, ..., dueling=False):
super().__init__(policy, env, ..., dueling=dueling, per=True)
class DQN(RAINBOW):
def __init__(self, policy, env, ...):
super().__init__(policy, env, ..., dueling=False, per=False) The main constraint is that when renaming DQN -> RAINBOW, and then creating a DQN class that inherits from RAINBOW, we introduce a breaking change, since arguments from DQN are removed (per, dueling, etc.). They would have to be deprecated first. |
why not implement all at once? Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong). As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions). What you propose is what I had in mind (so Dueling special case of RAINBOW) except for DQN. |
We could. I don't know if the same reasoning applies to other rainbow features.
Right now I imagine that it should not be complicated, but I may be wrong, I would have to look into it in detail.
So DQN would be a special case for which we would keep a readable implementation by not deriving it from RAINBOW, right? It seems legitimate. |
Description
TODOs
Naming convention
Method : Dueling DQN
Class name : DuelingDQN
Lowercase : dueling_dqn
branch : dueling-dqn
Context
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)Note: we are using a maximum length of 127 characters per line