Here are the Paper and Slides at SC'18.
In a directed graph, an SCC is a maximal subset of the vertices that every vertex has a directed path to all the others.
SCC detection will find all the SCCs in the directed graph. Each shaded area in the picture is an SCC.
Most real-world graphs have one large SCC that contains the majority of the vertices, as well as many small SCCs whose sizes are reversely proportional to the frequency of their occurrences. For both types of SCCs, current approaches that rely on depth or breadth first search (DFS and BFS) face the challenges of both strict synchronization requirement and high computation cost.
Motivated, we advocate a new paradigm of identifying SCCs with simple spanning trees, since SCC detection requires only the knowledge of connectivity among the vertices. We have developed a prototype called iSpan, which consists of parallel, relaxed synchronization construction of spanning trees for detecting the large and small SCCs, combined with fast trims for small SCCs.
We further scale iSpan to distributed memory system by applying different distribution strategies to the data and task parallel jobs.
The evaluations show that iSpan is able to significantly outperform current state-of-the-art DFS and BFS-based methods by average 18x and 4x, respectively.
You can find the source code of the shared-memory version in "src/", the source code of the distribute-memory version in "src_mpi", some useful scripts in "script/", and some test result under "result/"
More tutorials will be released soon.
The following software are required, but the versions do not have to be the same. The versions listed are used in our experiments.
GCC-4.8.5
OpenMP-3.1
Open MPI-2.1.1
Makefile
Get into the source code directory, "src" for shared-memory, "src_mpi" for distributed-memory, then compile the source code with Makefile,
cd src/
make
If the prerequisites are correct, the make process should be good. You will get the executable file "ispan". Run "ispan" to see the parameters and use the correct ones. For simplicity, you can change the "bash_one.sh" file and run it.
./bash_one.sh
For the distributed-memory version, we used the clusters of GWU Colonial One, and MGHPCC. There is a script for running jobs on GWU Colonial One, named "run_batch.sh". You should write your own script for running on a different cluster.
We are using compressed sparse row (CSR) format stored in binaries. We provide a converter from regular text edge list to our CSR binary format. One can find the converter under "graph_converter/".
Yuede Ji, email: [email protected]
Hang Liu, email: [email protected]
H. Howie Huang, email: [email protected]
If you use iSpan in your project, please cite the following paper.
@inproceedings{ji2018s,
title={iSpan: parallel identification of strongly connected components with spanning trees},
author={Ji, Yuede and Liu, Hang and Huang, H Howie},
booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis},
pages={58},
year={2018},
organization={IEEE Press}
}