Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuelingDQN #127

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

DuelingDQN #127

wants to merge 16 commits into from

Conversation

qgallouedec
Copy link
Contributor

@qgallouedec qgallouedec commented Dec 10, 2022

Description

TODOs

  • Make learning curves
  • Complete DuelingDQN short presentation
  • Fill the result tabular
  • Results validation against another implementation
  • Create a branch in rl-zoo3
  • Check if that command lines in the doc work

Naming convention

Method : Dueling DQN
Class name : DuelingDQN
Lowercase : dueling_dqn
branch : dueling-dqn

Context

  • I have raised an issue to propose this change (required)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • The functionality/performance matches that of the source (required for new training algorithms or training-related features).
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have included an example of using the feature (required for new features).
  • I have included baseline results (required for new training algorithms or training-related features).
  • I have updated the documentation accordingly.
  • I have updated the changelog accordingly (required).
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

@qgallouedec qgallouedec mentioned this pull request Dec 10, 2022
18 tasks
@araffin
Copy link
Member

araffin commented Dec 10, 2022

Just so you know, I had in mind to implement rainbow: DLR-RM/stable-baselines3#622
and make double dqn / dueling dqn special case of it.
But so far I had no time, so I think it's good to have reference implementation here =)

btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176

Related: DLR-RM/stable-baselines3#487

@qgallouedec
Copy link
Contributor Author

Just so you know, I had in mind to implement rainbow: DLR-RM/stable-baselines3#622
and make double dqn / dueling dqn special case of it.

Agree, it would be much better

btw, double dqn is implemented here: https://github.com/osigaud/stable-baselines3/blob/feat/double-dqn/stable_baselines3/dqn/dqn.py#L170-L176

Thanks for the ref!

@qgallouedec
Copy link
Contributor Author

@araffin I want your opinion on this:
Wouldn't it make more sense to modify DQN to make DuelingDQN a special case of DQN? Rather than rewriting DQN, as I am doing on this PR.
Eventually, we would have the same structure as DDPG/TD3.
Moreover, the code written would be more easily reusable for Rainbow.

"MultiInputPolicy": MultiInputPolicy,
}

def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if init is the same as DQN, I guess you can drop it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only (but necessary) diff is the policy type hint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...

@araffin
Copy link
Member

araffin commented Dec 12, 2022

e to modify DQN to make DuelingDQN a special case of DQN?

you mean DQN a special case of DuelingDQN?
how would that look like?

Rather than rewriting DQN, as I am doing on this PR.

you are mainly changing the policy, no?

@qgallouedec
Copy link
Contributor Author

you are mainly changing the policy, no?

I'm changing only the policy.

@qgallouedec
Copy link
Contributor Author

you mean DQN a special case of DuelingDQN?

Yes, it makes more sense in that order. But this would require putting DuelingDQN in the main repo.
That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW (in which all tricks except dueling are disabled), and DQN is another special case of RAINBOW (in which all tricks are disabled).

how would that look like?

I'll make a draft PR in SB3

@araffin
Copy link
Member

araffin commented Dec 16, 2022

That said, the cleanest way for future implementation would be that DuelingDQN is a special case of RAINBOW

yes, so for now, I would keep it as is. My goal with Vanilla DQN in SB3 was to keep it as simple as possible to be a reference when learning or implementing other things.

@qgallouedec
Copy link
Contributor Author

On second thought, it might be more practical to work in reverse. I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default. And at the renaming DQN to RAINBOW, changing the defaults. Roughly we would have:

Currently:

class DQN:
    def __init__(self, policy, env, ...): ...

We implement dueling:

class DQN:
    def __init__(self, policy, env, ..., dueling=False): ...

class DuelingDQN(DQN):
    def __init__(self, policy, env, ...):
        super().__init__(policy, env, ..., dueling=True)

Then we implement PER

class DQN:
   def __init__(self, policy, env, ..., dueling=False, per=False): ...

class DuelingDQN(DQN):
   def __init__(self, policy, env, ..., per=False):
       super().__init__(policy, env, ..., dueling=True, per=per)

class PERDQN(DQN):
   def __init__(self, policy, env, ..., dueling=False):
       super().__init__(policy, env, ..., dueling=dueling, per=True)

And so on.

Once all the features have been implemented, we can rename DQN to RAINBOW, changing only the default values, and add a DQN class that inherits from RAINBOW:

class RAINBOW:
   def __init__(self, policy, env, ..., dueling=True, per=True):  ...

class DuelingDQN(RAINBOW):
   def __init__(self, policy, env, ..., per=False):
       super().__init__(policy, env, ..., dueling=True, per=per)

class PERDQN(RAINBOW):
   def __init__(self, policy, env, ..., dueling=False):
       super().__init__(policy, env, ..., dueling=dueling, per=True)

class DQN(RAINBOW):
   def __init__(self, policy, env, ...):
       super().__init__(policy, env, ..., dueling=False, per=False)

The main constraint is that when renaming DQN -> RAINBOW, and then creating a DQN class that inherits from RAINBOW, we introduce a breaking change, since arguments from DQN are removed (per, dueling, etc.). They would have to be deprecated first.

@araffin
Copy link
Member

araffin commented Jan 13, 2023

I mean, instead of implementing RAINBOW all at once, I was thinking of implementing the features one by one upon DQN, disabling them by default.

why not implement all at once?

Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong). As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions).
And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN, DuelingDQN, ... would all derive from RAINBOW but not DQN.

What you propose is what I had in mind (so Dueling special case of RAINBOW) except for DQN.

@qgallouedec
Copy link
Contributor Author

qgallouedec commented Jan 13, 2023

why not implement all at once?

We could.
I don't know exactly what changes will be needed in the library to implement RAINBOW. But let's take PER as an example. I think it would be best to implement it as HER, i.e. as a subclass of ReplayBuffer. To make PER compatible with all off-policy algorithms, we will have to find a way to update the weights after the TD error calculation. Probably we will have to add in all algorithms a call to the method self.replay_buffer.update_weight(td_errors) or (or self.replay_buffer.update_weight(idxs, td_errors), depending on how we go about it), and thus add this method to the parent class ReplayBuffer.
All in all, it shouldn't be complicated but it concerns a lot of files, so I think it would be much easier to work on it on a dedicated PR.

I don't know if the same reasoning applies to other rainbow features.

Compared to DDPG vs TD3, obtaining DQN from Rainbow is not trivial (correct me if I'm wrong).

Right now I imagine that it should not be complicated, but I may be wrong, I would have to look into it in detail.

As DQN is a key algorithm in RL, I would really like to keep it simple so people can study it (even if it means duplicated code when implementing DQN extensions). And I would see two "family" of algorithms: DQN vs RAINBOW (with more or less extensions), so PERDQN, DuelingDQN, ... would all derive from RAINBOW but not DQN.

So DQN would be a special case for which we would keep a readable implementation by not deriving it from RAINBOW, right? It seems legitimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants