-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate ability to optimize user environment for use with Ray #92
Comments
Possibly relevant-- appears to have support in aws for respecting the shm flag vis-a-vi docker containers: They say use at least 30% of ram; so for our massive high memory, I think 300GB, although 350 is probably safer to pad a bit... |
One more link for this-- on the kubernetes side, seems like this is possible but not well documented atm: |
Ray has always worked locally on the jupyter-hub instance, but if we're looking into it as a tractable replacement for dask, it would be helpful to try it as a proper cluster. @consideRatio can we look into a basic configuration to set this up for ray? There's some documentation on the helm setup here: https://docs.ray.io/en/latest/cluster/kubernetes.html |
@espg thanks for the followup and sorry for being slow to respond on this. I've worked on dask-gateway recently and learned a lot more about things in general that gives me a bit of an overview navigating the distributed ecosystem of tools. I've arrived at a classification of some sorts of distributed setups, and what can be sustainable to push development efforts towards.
For the Ray ecosystem, there is a "ray operator" mapping to level 3, and that may be sufficient. But, the ray operator is one part, is there also a ray client that can interact effectively? Operators or controllers in k8s mean that they defined a custom resource kind, and if someone creates/edits/deletes these resources the controller will do things based on that - such as create 1 scheduler and X workers if a DaskCluster resource is created. But, is there software that helps users not work with I'll set out to learn about if there a Ray client to work against a k8s cluster with a Ray operator running within it. I'll then also try to verify that if there is such client to interact with an operator, that it can be adjusted to help shut down the workers if the jupyter server that created them is shut down, or if there is another protection mechanism from starting but forgetting to stop workers that otherwise could live on forever. |
...perhaps this is something that one of the ray core devs can answer for us? @robertnishihara @ericl @richardliaw the context here is that we're looking into replacing dask with ray as the cluster compute backend for the jupyter meets the earth as part of the 2i2c project. Any insight on @consideRatio 's k8s scaling questions? |
Hey @espg @consideRatio indeed "level 3" is supported via the Kuberay project: https://github.com/ray-project/kuberay
We also have a Ray client available that I think could be used to implement "level 4": https://docs.ray.io/en/latest/cluster/ray-client.html
There isn't anything "out of the box" that does this, but you could potentially track if there is an active connection from the client, or use an API like Hope this helps! If you have more questions about how to do this, the https://discuss.ray.io/c/ray-clusters/13 forum is fairly responsive. |
@espg asked me on Slack:
I need to look into this and evaluate if this is sensible to try resolve in our k8s context, and if so, how to do it in a sensible way.
The text was updated successfully, but these errors were encountered: