Skip to content

Implement JSRL like training strategy #1

Open
@masus04

Description

@masus04

This approach intends to make use of a prior strategy in order to unroll the game up to a certain point in time t, then let the exploration strategy being trained take over. t is then gradually reduced as the exploration strategy improves.

In order to generate game_state(t)s, we intend to perform the following steps:

  • Choose a prior strategy that can be configured to play deterministic or non-deterministic
  • Play the non-deterministic version of prior strategy either against itself or a deterministic version of itself up to time t
  • Determine which player is favoured according to the deterministic prior strategy
  • Play the exploration strategy against the deterministic prior strategy, playing as the favoured player in order to guarantee it has a chance of winning.

Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions