API proposal #2

sglyon · 2016-08-05T08:05:59Z

After a chat with @tbreloff, we roughly decided on the following core API methods:

actions(::AbstractEnvironment, s::State) -> A(s): This would return the set A(s) that contains all valid actions from the state s
step(::AbstractEnvironment, a::Action) -> s', r: This method returns the next state s' and the reward r associated with taking action a. Note that the current state s is a field of the AbstractEnvironment. This method is also responsible for setting this state field to be s'
action(::AbstractPolicy, s'::State, r::Reward, A(s')::Actions) -> a': This method should be implemented by each subtype of AbstractPolicy and should observe a state transition (result of the calling step method with the previous action a) and output the next action a'.

@tbreloff also mentioned an episode(::AbstractEnvironment) method, but I wasn't clear on its purpose so I'll let him fill in the blanks there.

The text was updated successfully, but these errors were encountered:

sglyon · 2016-08-05T09:52:18Z

We should also have a reset!(::AbstractEnvironment) method.

Also we should discuss "mandatory" fields for an environment.

Evizero · 2016-08-08T07:47:33Z

Do you guys know about https://github.com/JuliaPOMDP/POMDPs.jl ? POMDPs are a pretty related problem to RL. could be some synergy there

tbreloff · 2016-08-08T11:20:40Z

Thanks... I didn't know about that org. Certainly related... I'll have to
review it.

On Monday, August 8, 2016, Christof Stocker [email protected]
wrote:

Do you guys know about https://github.com/JuliaPOMDP/POMDPs.jl ? POMDPs
are a pretty related problem to RL. could be some synergy there

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492oJdKCUsrw-6tbPVUJJFu8BO0tANks5qdt8VgaJpZM4Jdcsq
.

tbreloff · 2016-08-10T14:02:33Z

After reviewing what's in JuliaPOMDP, I'd like to move forward with what we've already discussed, and possibly write some (conditionally included) code to link our abstractions to the JuliaPOMDP world. They have large web of interdependencies, most of which are not registered packages, and their own package manager to handle it. I'm weary of depending on that, and I'd prefer to make our own path.

In addition, they have much more of a focus (seemingly) on finite, table-based, offline solvers, and I don't know if there's really that much overlap beyond some of the verbs and test environments.

tl;dr Someday we should revisit whether we can merge, but for now I think it's easier to stay arms-length.

tbreloff · 2016-08-10T15:00:08Z

@spencerlyon2 I'm thinking about how/where we could convey that there are no more steps to take (i.e. the episode is done). I wonder if we can fit environments into the iteration pattern, or maybe that an "episode" is a iterator/wrapper around an environment and policy:

immutable Episode{E,P}
    env::E
    policy::P
    total_reward::Float64
    niter::Int
end

# todo: start, next, done definitions

for (s,a,r) in Episode(env, policy)
    # do something, or not?
end

tbreloff · 2016-08-10T15:04:35Z

Also thinking we need the state as input to step, and that it should be mutating:

r, s′ = step!(env, s, a)

I'd like for a "sarsa" update to make sense:

# assume we have (s,a) for this iteration already (it's last iteration's a′)
r, s′ = step!(env, s, a)
done(env) && break # ??
a′ = action(policy, r, s′, actions(env, s′)

sglyon · 2016-08-11T07:56:04Z

step(env, s, a) sounds good to me.

What would mutate in a call to step!?

zahachtah · 2016-08-11T08:30:57Z

I just wanted to say that I am hugely excited about the prospect of machine learning API with different backends similar to the absolutely amazing Plots.jl. I am not good enough to contribute substantially to this except from a users point of view, but hugs to you all for this :-)

tbreloff · 2016-08-11T17:20:45Z

Presumably the call to step! is expected to mutate the environment in some way. There may be some cases where that's not true, but it's probably the exception.

tbreloff · 2016-08-11T17:22:46Z

I just wanted to say that I am hugely excited about the prospect of machine learning API with different backends similar to the absolutely amazing Plots.jl.

Thanks... yeah I hope all works out well.

I am not good enough to contribute substantially to this except from a users point of view

If you have time to contribute, we could use help with QA... either writing tests, or just using and reporting back on problems/suggestions!

tbreloff · 2016-08-12T15:17:10Z

I made many of the changes we discussed, and wrote out a bare minimum readme. Please review when you have time @spencerlyon2. No hurry.

pkofod · 2016-09-15T18:13:17Z

FWIW I think the current API is very easy to understand, and I think it will be quite easy for users to use it for their own extensions. One thing I am not quite sure of is that of check_constraints at https://github.com/tbreloff/Reinforce.jl/blob/a9218995afb699846f48dcf06695d0156278f47e/src/episodes.jl#L31. Maybe I'm just not used to this, but what if the constraints are not satisfied? Shouldn't a new action be chosen? Or am I misunderstanding the function names here?

tbreloff · 2016-09-15T18:35:18Z

@pkofod I agree. maybe this line could be a = check_constraints(...) instead??

pkofod · 2016-09-15T22:15:01Z

Yes, something like that. Would check_constraints then potentially be a recursion, or? I mean a new a could violate the constraints as well, and then check_constraints would have to be called in check_constraints, or?

Edit: alternatively something like

function Base.next(ep::Episode, i)
    env = ep.env
    s = state(env)
    passed = false
    while !passed
        a = action(ep.policy, reward(env), s, actions(env))
        passed = check_constraints(env, s, a)
    end
    r, s′ = step!(env, s, a)
    ep.total_reward += r
    ep.niter = i + 1
    (s, a, r, s′), i+1
end

tbreloff · 2016-09-15T22:20:33Z

I'd be ok throwing an error too. In theory the agent should be choosing
something from the valid set of actions.

On Thursday, September 15, 2016, Patrick Kofod Mogensen <
[email protected]> wrote:

Yes, something like that. Would check_constraints then potentially be a
recursion, or? I mean a new a could violate the constraints as well, and
then check_constraints would have to be called in check_constraints, or?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492jHy9hqyMfLJJ7kvMSxeSPqHtwFrks5qqcNlgaJpZM4Jdcsq
.

sglyon · 2016-09-16T02:46:08Z

When I wrote that bit I was thinking that we we should throw an error.

Really I want a way to communicate that the action space maybe current state dependent.

tbreloff · 2016-09-16T02:59:36Z

The agent is given a valid action set in the line before, so I think it's
fine to throw an error.

On Thursday, September 15, 2016, Spencer Lyon [email protected]
wrote:

When I wrote that bit I was thinking that we we should throw an error.

Really I want a way to communicate that the action space maybe current
state dependent.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492ovMetfjhdNNwhZCRgLhgFbQFfEpks5qqgLxgaJpZM4Jdcsq
.

pkofod · 2016-09-16T06:36:11Z

Of course. My question is then, is it really necessary? Is it just "to be sure"? I mean, it will only error if the actions method for the subtype of AbstractPolicy is incorrectly implemented.

sglyon · 2016-09-16T09:58:43Z

@tbreloff I'm probably missing something obvious here, but In that routine I don't see where we get an action set based on the current state.

What I have in mind is that each period s \in \mathcal{S} and the agent must pick a = A(s) \subseteq \mathcal{A} (a la Barto & Sutton).

I think that a better way to do this would be to define actions(::Env, ::State) so that the action set passed to the actor is always valid given the current state. Then there would be no need for the check_constraints routine.

Thoughts?

pkofod · 2016-09-16T10:51:22Z

Makes sense to have env in actions to constrain the actions before picking one .

tbreloff · 2016-09-16T10:54:54Z

In my mind that's already true because I assume the environment knows the
state, so actions(env) can be based on the state if appropriate. But I
agree it would be more general (and allow for dispatch) if the call was 'A
= actions(env, state)' instead.

The check_constraints could just '@Assert a in A'

Also I can't remember if I switched it yet but I added "math sets" to
LearnBase and we should be returning those from 'actions'.

On Friday, September 16, 2016, Spencer Lyon [email protected]
wrote:

@tbreloff https://github.com/tbreloff I'm probably missing something
obvious here, but In that routine I don't see where we get an action set
based on the current state.

What I have in mind is that each period s \in \mathcal{S} and the agent
must pick a = A(s) \subseteq \mathcal{A} (a la Barto & Sutton).

I think that a better way to do this would be to define actions(::Env,
::State) so that the action set passed to the actor is always valid given
the current state. Then there would be no need for the check_constraints
routine.

Thoughts?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492pwwvLAXc912GVPmfbDKoUyNqViWks5qqmhTgaJpZM4Jdcsq
.

sglyon · 2016-09-16T10:56:52Z

Ok I didn't understand that. We should make a note somewhere so that is very clear to users (and devs).

I can't recall, do we already have a way to describe the action space for the env (the \mathcal{A} I talked about earlier)?

sglyon · 2016-09-16T10:58:22Z

Note that the action space is different from the set of appropriate actions, given the state. Sorry for the double message, I just didn't think I made that clear

tbreloff · 2016-09-16T11:00:55Z

I don't remember if we distinguish actions from action_space, but that
certainly makes sense.

On Friday, September 16, 2016, Spencer Lyon [email protected]
wrote:

Note that the action space is different from the set of appropriate
actions, given the state. Sorry for the double message, I just didn't think
I made that clear

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492vGQA0sh0Eav99ccllRQQBS5KxScks5qqnZPgaJpZM4Jdcsq
.

tbreloff · 2016-09-16T13:19:27Z

I'm going to make the following API changes:

actions(env) --> A
done(env) --> bool

# becomes:

actions(env, s) --> A
finished(env, s′) --> bool

and I'll change check_constraints into an assert.

tbreloff · 2016-09-16T13:42:08Z

ref: f21b5cf

tbreloff · 2016-09-16T13:43:51Z

I want to link the solvers to the work I'm doing in StochasticOptimization, so there's still a lot of changes to be made here, but hopefully one could develop the environments without too much changing.

etotheipluspi mentioned this issue Jan 5, 2017

Reinforcement Learning Interface JuliaPOMDP/POMDPs.jl#126

Closed

tbreloff mentioned this issue Jan 5, 2017

Is the state part of the environment? #5

Closed

CarloLucibello mentioned this issue Nov 16, 2017

saying hi CarloLucibello/DeepRLexamples.jl#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API proposal #2

API proposal #2

sglyon commented Aug 5, 2016

sglyon commented Aug 5, 2016

Evizero commented Aug 8, 2016

tbreloff commented Aug 8, 2016

tbreloff commented Aug 10, 2016 •

edited

Loading

tbreloff commented Aug 10, 2016

tbreloff commented Aug 10, 2016

sglyon commented Aug 11, 2016

zahachtah commented Aug 11, 2016 •

edited

Loading

tbreloff commented Aug 11, 2016

tbreloff commented Aug 11, 2016

tbreloff commented Aug 12, 2016

pkofod commented Sep 15, 2016

tbreloff commented Sep 15, 2016

pkofod commented Sep 15, 2016 •

edited

Loading

tbreloff commented Sep 15, 2016

sglyon commented Sep 16, 2016

tbreloff commented Sep 16, 2016

pkofod commented Sep 16, 2016

sglyon commented Sep 16, 2016

pkofod commented Sep 16, 2016

tbreloff commented Sep 16, 2016

sglyon commented Sep 16, 2016 •

edited

Loading

sglyon commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Sep 16, 2016

API proposal #2

API proposal #2

Comments

sglyon commented Aug 5, 2016

sglyon commented Aug 5, 2016

Evizero commented Aug 8, 2016

tbreloff commented Aug 8, 2016

tbreloff commented Aug 10, 2016 • edited Loading

tbreloff commented Aug 10, 2016

tbreloff commented Aug 10, 2016

sglyon commented Aug 11, 2016

zahachtah commented Aug 11, 2016 • edited Loading

tbreloff commented Aug 11, 2016

tbreloff commented Aug 11, 2016

tbreloff commented Aug 12, 2016

pkofod commented Sep 15, 2016

tbreloff commented Sep 15, 2016

pkofod commented Sep 15, 2016 • edited Loading

tbreloff commented Sep 15, 2016

sglyon commented Sep 16, 2016

tbreloff commented Sep 16, 2016

pkofod commented Sep 16, 2016

sglyon commented Sep 16, 2016

pkofod commented Sep 16, 2016

tbreloff commented Sep 16, 2016

sglyon commented Sep 16, 2016 • edited Loading

sglyon commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Sep 16, 2016

tbreloff commented Aug 10, 2016 •

edited

Loading

zahachtah commented Aug 11, 2016 •

edited

Loading

pkofod commented Sep 15, 2016 •

edited

Loading

sglyon commented Sep 16, 2016 •

edited

Loading