Migrate cuml exploration notebook#153
Closed
jacobtomlinson wants to merge 5 commits intorapidsai:mainfrom
Closed
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…s-dask-cuml-exploration
Member
Author
|
This notebook is ready for review when someone has a minute. |
Member
Notebook runs and prose reads well beyond the above comments. |
Contributor
|
It doesn't seem like this notebook is live anymore. I'll close this PR as not planned. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Leverages the
docrefadmonition added in #152. Reviewer may want to just look at 756b9cf or the notebook on its own in ReviewNB.Closes rapidsai/cloud-ml-examples#207
The notebook seems to be reasonably useful. I updated the instructions from Dask Kubernetes Classic to the Dask Kubernetes Operator and added a little GKE instructions to get folks going. I've also added more prose and tweaked headings to make nav look sensible.
However, it seems the performance sweeps fail with aStopIterationexception and I'm a little out of my depth in debugging it.Full traceback
Starting weak-scaling performance sweep for: model : <class 'cuml.dask.ensemble.randomforestregressor.RandomForestRegressor'> data loader: <function <lambda> at 0x7fb02fac8f70>. Configuration ========================== Worker counts : [8] Fit/Predict samples : 5 Data load samples : 1 - Max data fraction : 1.0 Model fit : X ~ y - Response DType : <class 'numpy.int32'> Writing results to : ./taxi_large_random_forest_regression.csv - Method : append Sampling <1> load times with 8 workers. 100%|██████████| 1/1 [00:00<00:00, 11522.81it/s] Finished loading <1>, samples, to <8> workers with a mean time of 0.0000 sec. Sweeping <class 'cuml.dask.ensemble.randomforestregressor.RandomForestRegressor'> 'fit' with <8> workers. Sampling <5> times. 0%| | 0/5 [00:06<?, ?it/s] --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) /opt/conda/envs/rapids/lib/python3.9/site-packages/cuml/dask/common/part_utils.py in _extract_partitions(dask_obj, client) 161 --> 162 raise gen.Return([(first(who_has[key]), part) 163 for key, part in key_to_part]) /opt/conda/envs/rapids/lib/python3.9/site-packages/cuml/dask/common/part_utils.py in <listcomp>(.0) 161 --> 162 raise gen.Return([(first(who_has[key]), part) 163 for key, part in key_to_part]) /opt/conda/envs/rapids/lib/python3.9/site-packages/toolz/itertoolz.py in first(seq) 374 """ --> 375 return next(iter(seq)) 376 StopIteration: The above exception was the direct cause of the following exception: RuntimeError Traceback (most recent call last) /tmp/ipykernel_280/2978372879.py in <module> 8 rf_csv_path = f"./{out_prefix}_random_forest_regression.csv" 9 ---> 10 performance_sweep(client=client, model=RandomForestRegressor, 11 **sweep_kwargs, 12 out_path=rf_csv_path, /tmp/ipykernel_280/1927266979.py in performance_sweep(client, model, data_loader, hardware_type, worker_counts, samples, load_samples, max_data_frac, predict_frac, scaling_type, xy_fit, fit_requires_compute, update_workers_in_kwargs, response_dtype, out_path, append_to_existing, model_name, fit_func_id, predict_func_id, scaling_denom, model_args, model_kwargs) 186 m = model(*model_args, **model_kwargs) 187 if (fit_func_id): --> 188 fit_timings = sweep_fit_func(model=m, func_id=fit_func_id, 189 require_compute=fit_requires_compute, 190 X=X, y=y, xy_fit=xy_fit, count=samples) /tmp/ipykernel_280/1927266979.py in sweep_fit_func(model, func_id, require_compute, X, y, xy_fit, count) 49 fit_func = partial(_fit_func_attr, X) 50 ---> 51 return collect_func_time_samples(func=fit_func, count=count) 52 53 /tmp/ipykernel_280/1927266979.py in collect_func_time_samples(func, count, verbose) 30 for k in tqdm(range(count)): 31 with SimpleTimer() as timer: ---> 32 func() 33 timings.append(timer.elapsed) 34 /opt/conda/envs/rapids/lib/python3.9/site-packages/cuml/dask/ensemble/randomforestregressor.py in fit(self, X, y, convert_dtype, broadcast_data) 251 """ 252 self.internal_model = None --> 253 self._fit(model=self.rfs, 254 dataset=(X, y), 255 convert_dtype=convert_dtype, /opt/conda/envs/rapids/lib/python3.9/site-packages/cuml/dask/ensemble/base.py in _fit(self, model, dataset, convert_dtype, broadcast_data) 99 100 def _fit(self, model, dataset, convert_dtype, broadcast_data): --> 101 data = DistributedDataHandler.create(dataset, client=self.client) 102 self.active_workers = data.workers 103 self.datatype = data.datatype /opt/conda/envs/rapids/lib/python3.9/site-packages/cuml/dask/common/input_utils.py in create(cls, data, client) 103 datatype, multiple = _get_datatype_from_inputs(data) 104 --> 105 gpu_futures = client.sync(_extract_partitions, data, client) 106 107 workers = tuple(set(map(lambda x: x[0], gpu_futures))) /opt/conda/envs/rapids/lib/python3.9/site-packages/distributed/utils.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 337 return future 338 else: --> 339 return sync( 340 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs 341 ) /opt/conda/envs/rapids/lib/python3.9/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 404 if error: 405 typ, exc, tb = error --> 406 raise exc.with_traceback(tb) 407 else: 408 return result /opt/conda/envs/rapids/lib/python3.9/site-packages/distributed/utils.py in f() 377 future = asyncio.wait_for(future, callback_timeout) 378 future = asyncio.ensure_future(future) --> 379 result = yield future 380 except Exception: 381 error = sys.exc_info() /opt/conda/envs/rapids/lib/python3.9/site-packages/tornado/gen.py in run(self) 760 761 try: --> 762 value = future.result() 763 except Exception: 764 exc_info = sys.exc_info() /opt/conda/envs/rapids/lib/python3.9/site-packages/tornado/gen.py in run(self) 773 exc_info = None 774 else: --> 775 yielded = self.gen.send(value) 776 777 except (StopIteration, Return) as e: RuntimeError: generator raised StopIteration