Skip to content
This repository has been archived by the owner on May 21, 2022. It is now read-only.

API proposal #2

Open
sglyon opened this issue Aug 5, 2016 · 27 comments
Open

API proposal #2

sglyon opened this issue Aug 5, 2016 · 27 comments

Comments

@sglyon
Copy link
Contributor

sglyon commented Aug 5, 2016

After a chat with @tbreloff, we roughly decided on the following core API methods:

  • actions(::AbstractEnvironment, s::State) -> A(s): This would return the set A(s) that contains all valid actions from the state s
  • step(::AbstractEnvironment, a::Action) -> s', r: This method returns the next state s' and the reward r associated with taking action a. Note that the current state s is a field of the AbstractEnvironment. This method is also responsible for setting this state field to be s'
  • action(::AbstractPolicy, s'::State, r::Reward, A(s')::Actions) -> a': This method should be implemented by each subtype of AbstractPolicy and should observe a state transition (result of the calling step method with the previous action a) and output the next action a'.

@tbreloff also mentioned an episode(::AbstractEnvironment) method, but I wasn't clear on its purpose so I'll let him fill in the blanks there.

@sglyon
Copy link
Contributor Author

sglyon commented Aug 5, 2016

We should also have a reset!(::AbstractEnvironment) method.

Also we should discuss "mandatory" fields for an environment.

@Evizero
Copy link
Member

Evizero commented Aug 8, 2016

Do you guys know about https://github.com/JuliaPOMDP/POMDPs.jl ? POMDPs are a pretty related problem to RL. could be some synergy there

@tbreloff
Copy link
Member

tbreloff commented Aug 8, 2016

Thanks... I didn't know about that org. Certainly related... I'll have to
review it.

On Monday, August 8, 2016, Christof Stocker [email protected]
wrote:

Do you guys know about https://github.com/JuliaPOMDP/POMDPs.jl ? POMDPs
are a pretty related problem to RL. could be some synergy there


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492oJdKCUsrw-6tbPVUJJFu8BO0tANks5qdt8VgaJpZM4Jdcsq
.

@tbreloff
Copy link
Member

tbreloff commented Aug 10, 2016

After reviewing what's in JuliaPOMDP, I'd like to move forward with what we've already discussed, and possibly write some (conditionally included) code to link our abstractions to the JuliaPOMDP world. They have large web of interdependencies, most of which are not registered packages, and their own package manager to handle it. I'm weary of depending on that, and I'd prefer to make our own path.

In addition, they have much more of a focus (seemingly) on finite, table-based, offline solvers, and I don't know if there's really that much overlap beyond some of the verbs and test environments.

tl;dr Someday we should revisit whether we can merge, but for now I think it's easier to stay arms-length.

@tbreloff
Copy link
Member

@spencerlyon2 I'm thinking about how/where we could convey that there are no more steps to take (i.e. the episode is done). I wonder if we can fit environments into the iteration pattern, or maybe that an "episode" is a iterator/wrapper around an environment and policy:

immutable Episode{E,P}
    env::E
    policy::P
    total_reward::Float64
    niter::Int
end

# todo: start, next, done definitions

for (s,a,r) in Episode(env, policy)
    # do something, or not?
end

@tbreloff
Copy link
Member

Also thinking we need the state as input to step, and that it should be mutating:

r, s′ = step!(env, s, a)

I'd like for a "sarsa" update to make sense:

# assume we have (s,a) for this iteration already (it's last iteration's a′)
r, s′ = step!(env, s, a)
done(env) && break # ??
a′ = action(policy, r, s′, actions(env, s′)

@sglyon
Copy link
Contributor Author

sglyon commented Aug 11, 2016

step(env, s, a) sounds good to me.

What would mutate in a call to step!?

@zahachtah
Copy link

zahachtah commented Aug 11, 2016

I just wanted to say that I am hugely excited about the prospect of machine learning API with different backends similar to the absolutely amazing Plots.jl. I am not good enough to contribute substantially to this except from a users point of view, but hugs to you all for this :-)

@tbreloff
Copy link
Member

Presumably the call to step! is expected to mutate the environment in some way. There may be some cases where that's not true, but it's probably the exception.

@tbreloff
Copy link
Member

I just wanted to say that I am hugely excited about the prospect of machine learning API with different backends similar to the absolutely amazing Plots.jl.

Thanks... yeah I hope all works out well.

I am not good enough to contribute substantially to this except from a users point of view

If you have time to contribute, we could use help with QA... either writing tests, or just using and reporting back on problems/suggestions!

@tbreloff
Copy link
Member

I made many of the changes we discussed, and wrote out a bare minimum readme. Please review when you have time @spencerlyon2. No hurry.

@pkofod
Copy link

pkofod commented Sep 15, 2016

FWIW I think the current API is very easy to understand, and I think it will be quite easy for users to use it for their own extensions. One thing I am not quite sure of is that of check_constraints at https://github.com/tbreloff/Reinforce.jl/blob/a9218995afb699846f48dcf06695d0156278f47e/src/episodes.jl#L31. Maybe I'm just not used to this, but what if the constraints are not satisfied? Shouldn't a new action be chosen? Or am I misunderstanding the function names here?

@tbreloff
Copy link
Member

@pkofod I agree. maybe this line could be a = check_constraints(...) instead??

@pkofod
Copy link

pkofod commented Sep 15, 2016

Yes, something like that. Would check_constraints then potentially be a recursion, or? I mean a new a could violate the constraints as well, and then check_constraints would have to be called in check_constraints, or?

Edit: alternatively something like

function Base.next(ep::Episode, i)
    env = ep.env
    s = state(env)
    passed = false
    while !passed
        a = action(ep.policy, reward(env), s, actions(env))
        passed = check_constraints(env, s, a)
    end
    r, s′ = step!(env, s, a)
    ep.total_reward += r
    ep.niter = i + 1
    (s, a, r, s′), i+1
end

@tbreloff
Copy link
Member

I'd be ok throwing an error too. In theory the agent should be choosing
something from the valid set of actions.

On Thursday, September 15, 2016, Patrick Kofod Mogensen <
[email protected]> wrote:

Yes, something like that. Would check_constraints then potentially be a
recursion, or? I mean a new a could violate the constraints as well, and
then check_constraints would have to be called in check_constraints, or?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492jHy9hqyMfLJJ7kvMSxeSPqHtwFrks5qqcNlgaJpZM4Jdcsq
.

@sglyon
Copy link
Contributor Author

sglyon commented Sep 16, 2016

When I wrote that bit I was thinking that we we should throw an error.

Really I want a way to communicate that the action space maybe current state dependent.

@tbreloff
Copy link
Member

The agent is given a valid action set in the line before, so I think it's
fine to throw an error.

On Thursday, September 15, 2016, Spencer Lyon [email protected]
wrote:

When I wrote that bit I was thinking that we we should throw an error.

Really I want a way to communicate that the action space maybe current
state dependent.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492ovMetfjhdNNwhZCRgLhgFbQFfEpks5qqgLxgaJpZM4Jdcsq
.

@pkofod
Copy link

pkofod commented Sep 16, 2016

Of course. My question is then, is it really necessary? Is it just "to be sure"? I mean, it will only error if the actions method for the subtype of AbstractPolicy is incorrectly implemented.

@sglyon
Copy link
Contributor Author

sglyon commented Sep 16, 2016

@tbreloff I'm probably missing something obvious here, but In that routine I don't see where we get an action set based on the current state.

What I have in mind is that each period s \in \mathcal{S} and the agent must pick a = A(s) \subseteq \mathcal{A} (a la Barto & Sutton).

I think that a better way to do this would be to define actions(::Env, ::State) so that the action set passed to the actor is always valid given the current state. Then there would be no need for the check_constraints routine.

Thoughts?

@pkofod
Copy link

pkofod commented Sep 16, 2016

Makes sense to have env in actions to constrain the actions before picking one .

@tbreloff
Copy link
Member

In my mind that's already true because I assume the environment knows the
state, so actions(env) can be based on the state if appropriate. But I
agree it would be more general (and allow for dispatch) if the call was 'A
= actions(env, state)' instead.

The check_constraints could just '@Assert a in A'

Also I can't remember if I switched it yet but I added "math sets" to
LearnBase and we should be returning those from 'actions'.

On Friday, September 16, 2016, Spencer Lyon [email protected]
wrote:

@tbreloff https://github.com/tbreloff I'm probably missing something
obvious here, but In that routine I don't see where we get an action set
based on the current state.

What I have in mind is that each period s \in \mathcal{S} and the agent
must pick a = A(s) \subseteq \mathcal{A} (a la Barto & Sutton).

I think that a better way to do this would be to define actions(::Env,
::State) so that the action set passed to the actor is always valid given
the current state. Then there would be no need for the check_constraints
routine.

Thoughts?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492pwwvLAXc912GVPmfbDKoUyNqViWks5qqmhTgaJpZM4Jdcsq
.

@sglyon
Copy link
Contributor Author

sglyon commented Sep 16, 2016

Ok I didn't understand that. We should make a note somewhere so that is very clear to users (and devs).

I can't recall, do we already have a way to describe the action space for the env (the \mathcal{A} I talked about earlier)?

@sglyon
Copy link
Contributor Author

sglyon commented Sep 16, 2016

Note that the action space is different from the set of appropriate actions, given the state. Sorry for the double message, I just didn't think I made that clear

@tbreloff
Copy link
Member

I don't remember if we distinguish actions from action_space, but that
certainly makes sense.

On Friday, September 16, 2016, Spencer Lyon [email protected]
wrote:

Note that the action space is different from the set of appropriate
actions, given the state. Sorry for the double message, I just didn't think
I made that clear


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492vGQA0sh0Eav99ccllRQQBS5KxScks5qqnZPgaJpZM4Jdcsq
.

@tbreloff
Copy link
Member

I'm going to make the following API changes:

actions(env) --> A
done(env) --> bool

# becomes:

actions(env, s) --> A
finished(env, s′) --> bool

and I'll change check_constraints into an assert.

@tbreloff
Copy link
Member

ref: f21b5cf

@tbreloff
Copy link
Member

I want to link the solvers to the work I'm doing in StochasticOptimization, so there's still a lot of changes to be made here, but hopefully one could develop the environments without too much changing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants