|
| 1 | +## Introduction |
| 2 | + |
| 3 | +This package provides data structures and algorithms for network analysis in `java`. |
| 4 | +Currently, the focus of the package is restricted to clustering (or community detection) in networks. |
| 5 | +In particular, the package contains an implementation of the [Leiden algorithm](https://arxiv.org/abs/xxx.xxxx) and the [Louvain algorithm](https://arxiv.org/abs/0803.0476). |
| 6 | +Only undirected networks are supported. |
| 7 | + |
| 8 | +## Usage |
| 9 | + |
| 10 | +To run the clustering algorithms, the command-line tool `RunNetworkClustering` is provided. |
| 11 | +The latest version of the tool is available as a pre-compiled `jar` file in the GitHub [release](https://github.com/CWTSLeiden/networkanalysis/releases/latest). |
| 12 | +The source code is also available in this repository. |
| 13 | +You can use it to [compile](#compilation) the code yourself. |
| 14 | +The `.jar` file can be executed as follows: |
| 15 | + |
| 16 | +``` |
| 17 | +java -jar RunNetworkClustering.jar |
| 18 | +``` |
| 19 | + |
| 20 | +If no further arguments are provided, the following usage notice will be displayed: |
| 21 | + |
| 22 | +``` |
| 23 | +Usage: RunNetworkClustering [options] <filename> |
| 24 | +
|
| 25 | +Identify clusters (also known as communities) in a network, using either the |
| 26 | +Leiden or the Louvain algorithm. |
| 27 | +
|
| 28 | +The file in <filename> is expected to contain a tab-separated edge list |
| 29 | +(without a header line). Nodes are represented by zero-index integer numbers. |
| 30 | +Only undirected networks are supported. Each edge should be included only once |
| 31 | +in the file. |
| 32 | +
|
| 33 | +Options: |
| 34 | +-q --quality-function {CPM|modularity} (default: CPM) |
| 35 | + Quality function to be optimized. Either the CPM (constant Potts model) or |
| 36 | + the modularity quality function can be used. |
| 37 | +-r --resolution <resolution> (default: 1.0) |
| 38 | + Resolution parameter of the quality function. |
| 39 | +-a --algorithm {Leiden|Louvain} (default: Leiden) |
| 40 | + Algorithm for optimizing the quality function. Either the Leiden or the |
| 41 | + Louvain algorithm can be used. |
| 42 | +-s --random-starts <random starts> (default: 1) |
| 43 | + Number of random starts of the algorithm. |
| 44 | +-i --iterations <iterations> (default: 10) |
| 45 | + Number of iterations of the algorithm. |
| 46 | +--randomness <randomness> (default: 0.01) |
| 47 | + Randomness parameter of the Leiden algorithm. |
| 48 | +--seed <seed> (default: random) |
| 49 | + Seed of the random number generator. |
| 50 | +-w --weighted-edges |
| 51 | + Indicates that the edge list file has a third column containing edge |
| 52 | + weights. |
| 53 | +--sorted-edge-list |
| 54 | + Indicates that the edge list file is sorted. The file should be sorted based |
| 55 | + on the nodes in the first column, followed by the nodes in the second |
| 56 | + column. Each edge should be included in both directions in the file. |
| 57 | +--input-clustering <filename> (default: singleton clustering) |
| 58 | + Read the initial clustering from the specified file. The file is expected to |
| 59 | + contain two tab-separated columns (without a header line), first a column of |
| 60 | + nodes and then a column of clusters. Nodes and clusters are both represented |
| 61 | + by zero-index integer numbers. If no file is specified, a singleton |
| 62 | + clustering (in which each node has its own cluster) is used as the initial |
| 63 | + clustering. |
| 64 | +-o --output-clustering <filename> (default: standard output) |
| 65 | + Write the final clustering to the specified file. If no file is specified, |
| 66 | + the standard output is used. |
| 67 | +``` |
| 68 | + |
| 69 | +To run the clustering algorithms, you need `java 1.8.0` or higher. |
| 70 | + |
| 71 | +### Example |
| 72 | + |
| 73 | +The following example illustrates the use of the `RunNetworkClustering` tool. |
| 74 | +Consider this network: |
| 75 | + |
| 76 | +```text |
| 77 | + 0-----1 |
| 78 | + \ / |
| 79 | + \ / |
| 80 | + 2 |
| 81 | + | |
| 82 | + 3 |
| 83 | + / \ |
| 84 | + / \ |
| 85 | + 4-----5 |
| 86 | +``` |
| 87 | + |
| 88 | +The network is encoded as an edge list that is saved in a tab-separated text file: |
| 89 | + |
| 90 | +```text |
| 91 | +0 1 |
| 92 | +1 2 |
| 93 | +2 0 |
| 94 | +2 3 |
| 95 | +3 5 |
| 96 | +5 4 |
| 97 | +4 3 |
| 98 | +``` |
| 99 | + |
| 100 | +Nodes must be represented by integer numbers starting from 0. |
| 101 | +Assuming that the edge list has been saved in the file `network.txt`, the `RunNetworkClustering` tool can be run as follows: |
| 102 | + |
| 103 | +``` |
| 104 | +java -jar RunNetworkClustering.jar -r 0.2 -o clusters.txt network.txt |
| 105 | +``` |
| 106 | + |
| 107 | +In this case, clusters are identified using the Leiden algorithm based on the CPM quality function with a value of `0.2` for the resolution parameter. |
| 108 | +The resulting clustering is saved in the text file `clusters.txt`: |
| 109 | + |
| 110 | +```text |
| 111 | +0 |
| 112 | +0 |
| 113 | +0 |
| 114 | +1 |
| 115 | +1 |
| 116 | +1 |
| 117 | +``` |
| 118 | + |
| 119 | +The file `clusters.txt` shows that two clusters have been identified. |
| 120 | +Cluster 0 includes nodes 0, 1, and 2. |
| 121 | +Cluster 1 includes nodes 3, 4, and 5. |
| 122 | +In the above example, the edges in the file `network.txt` have not been sorted. |
| 123 | +To provide a sorted edge list as input, include the edges in both directions and use the option ``--sorted-edge-list``. |
| 124 | +Furthermore, edge weights can be provided by adding a third column to the file `network.txt` and by using the option ``--weighted-edges``. |
| 125 | + |
| 126 | +## Compilation |
| 127 | + |
| 128 | +The source code can be compiled as follows: |
| 129 | + |
| 130 | +``` |
| 131 | +javac -d build src/cwts/networkanalysis/*.java src/cwts/networkanalysis/run/*.java src/cwts/util/*.java |
| 132 | +``` |
| 133 | + |
| 134 | +The compiled `class` files will be output to the directory `build`. |
| 135 | +There are no external dependencies. |
| 136 | +The `main` method is provided in the class `cwts.networkanalysis.run.RunNetworkClustering`. |
| 137 | +After the code has been compiled, the `RunNetworkClustering` tool can be run as follows: |
| 138 | + |
| 139 | +``` |
| 140 | +java -cp build cwts.networkanalysis.run.RunNetworkClustering |
| 141 | +``` |
| 142 | + |
| 143 | +The latest stable version of the code is available from the [`master`](https://github.com/CWTSLeiden/networkanalysis/tree/master) branch on GitHub. |
| 144 | +The most recent code, which may be under development, is available from the [`develop`](https://github.com/CWTSLeiden/networkanalysis/tree/develop) branch. |
| 145 | + |
| 146 | +## Issues |
| 147 | + |
| 148 | +If you encounter any issues, please report them using the [issue tracker](https://github.com/CWTSLeiden/networkanalysis/issues). |
| 149 | +Before submitting, please examine whether issues have not yet been reported before. |
| 150 | + |
| 151 | +## Documentation |
| 152 | + |
| 153 | +Documentation of the source code is provided in the code in `javadoc` format. |
| 154 | +The documentation is also available in a [compiled format](https://CWTSLeiden.github.io/networkanalysis). |
| 155 | + |
| 156 | +## Contribution |
| 157 | + |
| 158 | +You are welcome to contribute to this package. |
| 159 | +Please follow the typical GitHub workflow: fork from this repository and make a pull request to submit your changes. |
| 160 | +At the moment, we have not yet set up any continuous integration, so please make sure that any proposed pull request compiles and functions correctly. |
| 161 | + |
| 162 | +## License |
| 163 | + |
| 164 | +This package is distributed under the MIT License. |
| 165 | +Please refer to the [`LICENSE`](LICENSE) file for further details. |
0 commit comments