-
Notifications
You must be signed in to change notification settings - Fork 0
Reinforcement Learning RFC #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alpha
Are you sure you want to change the base?
Conversation
This RFC proposes how Hypha can be improved in order to support Reinforcemnt Learning.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| The second requirement will be satisfied by improving the Scheduler to redirect Worker requests for data to | ||
| different RL Data Nodes. Thus, the Scheduler needs to balance latency between RL Data Nodes and Workers as | ||
| well as sampling and processing speed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should model this not in the scheduler but via the different connector/bridge much like the stochastic wiring described in the SWARM learning paper. We already have a many reference and different selection strategies allowing us to point from one work to many data nodes and only need to extend this with a strategy that considers the connection and delivery speed (latency, bandwidth, generation) to optimally connect workers with data nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be possible. However, I would like to avoid mixing concepts. We decided to go with DiLoCo and a centralized scheduler. If we now start to loosen this by introducing a form of decentralized scheduling, it will complicate things more than it will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it be super interesting to benchmark one approach against the other no matter what we'll use as the standard moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree. But would rather have a working baseling and start improving from there on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's start with the scheduler approach then
|
We should probably do #80 first. |
This RFC proposes how Hypha can be improved in order to support Reinforcemnt Learning.