In this talk, attendees will get an introduction to Dask, a distributed computing framework in the PyData ecosystem. The first half of the talk will describe the current state of the project and its ecosystem including distributed data collections, cloud deployment options, distributed machine learning projects, and workflow orchestration. The second half of the talk will be a live demo showing the programming model for machine learning on Dask, with specific examples using LightGBM.
NOTE: Versions of this talk have also been given under the title "Scaling LightGBM with Python and Dask"
.
The demo code from this talk is available at https://github.com/jameslamb/lightgbm-dask-testing.
If you'd prefer to not build docker images and run containers yourself, you can also try the LightGBM quickstart in Saturn Cloud Community: https://www.saturncloud.io/s/
- (virtual) Chicago Cloud Conference, September 2020 (slides | video)
- (virtual) PyData Montreal, January 2021 (slides | video)
- (virtual) Chicago ML, January 2021 (slides | video)
- (virtual) Orlando ML & DS, February 2021 (slides)
- (virtual) DataDays 2021, March 2021 (slides)
- (virtual) ODSC East 2021, March 2021 (slides | video)