Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WholeGraph Support #6

Closed
wants to merge 24 commits into from

Conversation

alexbarghi-nv
Copy link
Owner

@alexbarghi-nv alexbarghi-nv commented Jan 5, 2024

Adds WholeGraph support for the performance tests. Also adds some fixes to cuGraph-PyG found whole testing WG.

alexbarghi-nv and others added 20 commits January 5, 2024 10:51
Removes experimental wrappers from GNN code in `cuGraph`, `cuGraph-PyG`, and `cuGraph-DGL`.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Vibhu Jawa (https://github.com/VibhuJawa)

URL: rapidsai#4070
…ai#4035)

Hi! I choose to go further with some simple work other than docs. This PR is going to close rapidsai#3218.

Here is what I have done in this PR:
1. Added `get_dask_edgelist()` and `get_dask_graph()` (and another internal helper function `__download_dask_csv()`) to  Dataset API.
2. Executed all necessary tests for these new functions.
3. Improved existing functions in the Dataset API and conducted tests to verify improvements.

Here are some additional details regarding this PR:
1. The building and testing were conducted using version 23.12 instead of the default 24.02. Since Cugraph-ops library is no longer open, I failed to build from source using version 24.02. I built and tested the code in version 23.12 and then transferred the updated file to 24.02 before creating this PR. (I would appreciate any guidance on how to build from version 24.02 for external contributors).
2.  All tests from the test file have passed, but some warnings remain, as shown below
```bash
============================================================ warnings summary ============================================================
cugraph/tests/utils/test_dataset.py::test_get_dask_graph[dataset0]
cugraph/tests/utils/test_dataset.py::test_get_dask_graph[dataset0]
cugraph/tests/utils/test_dataset.py::test_get_dask_graph[dataset0]
cugraph/tests/utils/test_dataset.py::test_weights_dask[dataset0]
cugraph/tests/utils/test_dataset.py::test_weights_dask[dataset0]
cugraph/tests/utils/test_dataset.py::test_weights_dask[dataset0]
cugraph/tests/utils/test_dataset.py::test_weights_dask[dataset0]
cugraph/tests/utils/test_dataset.py::test_weights_dask[dataset0]
cugraph/tests/utils/test_dataset.py::test_weights_dask[dataset0]
  /home/ubuntu/miniconda3/envs/cugraph_dev/lib/python3.10/site-packages/cudf/core/index.py:3284: FutureWarning: cudf.StringIndex is deprecated and will be removed from cudf in a future version. Use cudf.Index with the appropriate dtype instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
```
I think above warnings came from the function call `from_dask_cudf_edgelist` but currently I have no idea how to remove them. I will do my best to address it if anyone has any ideas about it.

 3. The `get_edgelist()` function returns a deep copy of the object, but this is not supported for `get_dask_edgelist()` since only shallow copy is allowed for Dask cuDF dataframe (see [docs](https://docs.rapids.ai/api/dask-cudf/legacy/api/#dask_cudf.DataFrame.copy)). This will lead to a problem where if a user modifies the dataframe, the changes will be reflected in the internal `self._edgelist` object. So when `get_dask_graph()` is called later, the resulting graph will differ from the one directly constructed from the data file.
 4. I am uncertain about the requirements for (1) Identifying datasets and (2) Adding them to Dataset. If there is a need to add another function for determining whether a dataset requires MG handling based on its size, or to tag the dataset metadata (.yaml file) to indicate the necessity for MG processing, please let me know. Also, I welcome any suggestions for further features.
 5. When I ran pytest on other test files, the most common warnings were
 ```bash
/home/ubuntu/miniconda3/envs/cugraph_dev/lib/python3.10/site-packages/dask_cudf/io/csv.py:79: FutureWarning: `chunksize` is deprecated and will be removed in the future. Please use `blocksize` instead.
 ```
 The keyword `chunksize` is no longer in use (check [docs](https://docs.rapids.ai/api/dask-cudf/legacy/api/#dask_cudf.read_csv) here). I have checked all related functions in the repository and found that they currently use `chunksize`. If there is a need to change them to `blocksize`, I will create another PR to address this issue.

Any comments and suggestions are welcome!

Authors:
  - Huiyu Xie (https://github.com/huiyuxie)
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#4035
MG weighted similarity tests assume symmetric graphs (undirected). When we remove multi-edges, we pick arbitrary edges and this can lead to an asymmetry in edge weights. This PR adds an additional flag to keep minimum value edges in `remove_multi_edges` and use this function if the input graph is symmetric to maintain weight symmetry.

Applying the non-breaking label as this PR does not change existing code behavior if `keep_min_value_edge` is not provided.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Naim (https://github.com/naimnv)

URL: rapidsai#4054
…atz_centrality, degree_centrality, eigenvector_centrality) (rapidsai#4065)

closes rapidsai/graph_dl#404

* Adds benchmarks for algos added in the 23.12 release
  * SSSP
  * pagerank
  * hits
  * katz_centrality
  * degree_centrality
  * eigenvector_centrality
* Refactors fixtures for easier usage and mainentance
* uses `benchmark.pedantic` instead of `benchmark` to provide complete control over how to benchmark the algos, since `benchmark` will result in no fewer than 3 runs (calibrate timer, timed run, run to generate func return value) which can be too time consuming for slower runs.
* Removes code to create `Dataset` objects for larger datasets and replaces with the equivalent objects that are now part of `cugraph.datasets`

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Erik Welch (https://github.com/eriknw)

URL: rapidsai#4065
…rapidsai#4083)

This is an oversight from rapidsai#4075. With this fix, the generated wheels should have proper a CUDA suffix (e.g. `cugraph-pyg-cu12`).

Edit: This PR also fixes rapidsai/graph_dl#421 
The mg tests were not skipped as expected due to wrong relative path in `test_wheel_cugraph-pyg.sh`.

Authors:
  - Tingyu Wang (https://github.com/tingyu66)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: rapidsai#4083
Contributes to rapidsai/build-planning#7.

Proposes splitting the `cuda-version` dependency in `dependencies.yaml` out to its own thing, separate from the bits of the CUDA Toolkit this project needs.

### Benefits of this change

* prevents accidental inclusion of multiple `cuda-version` version in environments
* reduces update effort (via enabling more use of globs like `"12.*"`)
* improves the chance that errors like "`conda` recipe is missing a dependency" are caught in CI

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#4084
…apidsai#4069)

Pretty simple PR. I would like for us to use this metadata when creating tables of supported algorithms.

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#4069
Adds performance benchmarking scripts for testing MNMG cuGraph GNN workflows.
This branch is the head branch for the cuGraph benchmarking effort.  All work supporting the benchmarks should be merged into this branch.  It will be merged into branch-24.02 once all features are ready.

Includes patches to cuGraph-PyG required for the latest DLFW container.

To-Do:

- [x] Refactor for branch-24.02
- [x] <s>Add WholeGraph training portion</s> Deferred to future PR (see #6)
- [x] <s>Add WholeGraph generators</s> Included in above
- [x] <s>Support DGL</s> Deferred to future PR
- [x] <s>Use appropriate docker containers</s> Deferred, waiting on DLFW release

Closes rapidsai#3839

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Rick Ratzel (https://github.com/rlratzel)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: rapidsai#3584
…valid vertex (rapidsai#4077)

* Added error check to be sure that the source vertex is a valid vertex in the graph.
* Updated `nx_cugraph.Graph` class to create PLC graphs using `vertices_array` in order to include isolated vertices. This is now needed since the error check added in this PR prevents NetworkX tests from passing if isolated vertices are treated as invalid, so this change prevents that.
* This resolves the problem that required the test workarounds done [here](rapidsai#4029 (comment)) in [4029](rapidsai#4029), so those workarounds have been removed in this PR.

Closes rapidsai#4067

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Ray Douglass (https://github.com/raydouglass)
  - Erik Welch (https://github.com/eriknw)

URL: rapidsai#4077
ECG based on Louvain.

Authors:
  - Naim (https://github.com/naimnv)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: rapidsai#4030
Reference: rapidsai/ops#2766

Replace rapids-env-update with rapids-configure-conda-channels,
rapids-configure-sccache, and rapids-date-string.

Authors:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#4090
…apidsai#4092)

Hooray for removing and cleaning code! Tests also added (we already tested isolated nodes for Louvain).

nx-cugraph was updated to handle isolated nodes by passing the node set to PLC in rapidsai#4077

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#4092
This doesn't currently work, because `plc.weakly_connected_components` only works on symmetric graphs (so it's not actually performing wcc now is it?):

> RuntimeError: non-success value returned from cugraph_weakly_connected_components: CUGRAPH_UNKNOWN_ERROR cuGraph failure at file=[...]/cugraph/cpp/src/components/weakly_connected_components_impl.cuh line=283: Invalid input argument: input graph should be symmetric for weakly connected components.

_These are high-priority algorithms for `nx-cugraph`, because they are widely used by networkx dependents._

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#4071
This PR fixes up cuGraph to avoid usage that will soon be deprecated in RMM.

Depends on rapidsai/rmm#1417

Fixes rapidsai#4066

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: rapidsai#4086
…apidsai#4063)

Getting `G.to_undirected` to work was more involved than I expected, but at least we got two algorithms "for free" out of the effort!

We raise `NotImplementedError` for `multidigraph.to_undirected()` for now.

I would say that understanding the reciprocity algorithms is the first step to understanding `to_undirected`.

Authors:
  - Erik Welch (https://github.com/eriknw)
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#4063
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.