OpenNE-PyTorch

This is an open-source framework for self-supervised/unsupervised graph embedding implemented by PyTorch, migrated from the earlier version implemented by Tensorflow.

Overview

New Features

A unified framework: We provide a unified framework for self-supervised/unsupervised node representation learning. Our models include unsupervised network embedding (NE) methods (DeepWalk, Node2vec, HOPE, GraRep, LLE, Lap, TADW, GF, LINE, SDNE) and recent self-supervised graph embedding methods (GAE, VGAE).
More datasets: We provide both unattributed datasets (Wiki, BlogCatalog, Flickr, Wikipedia, PPI) and attributed datasets (Cora, CiteSeer, Pubmed) of all sizes.
Efficiency: We provide faster and more efficient models and better default hyper-parameter settings than those in the previous version. The table below shows performances of OpenNE-PyTorch models on Cora Dataset as compared with the previous version, where "F1/Accuracy" refers to accuracy in GCN and micro F1-scores in other models, and "Time" refers to training time. Hyperparameters are set to default values unless specified in "Remarks". We also list results of our new models, GAE and VGAE.

method	Time		F1/Accuracy		Remarks
method	OpenNE(old)	OpenNE-Pytorch	OpenNE(old)	OpenNE-Pytorch	Remarks
DeepWalk	85.85	74.98	.832	.832	-
Node2vec	143.67	38.18	.814	.807	-
HOPE	2.66	2.45	.634	.743	-
GraRep	44.27	4.04	.770	.776	-
TADW	43.42	59.12	.852	.843	-
GF	15.01	19.53	.546	.775	default # epochs changed
LINE	86.75	98.69	.417	.722	default # epochs changed
SDNE	195.02	10.22	.532	.742	-
GCN	17.4	11.22	.857	.861	--sparse
GAE	-	55.97	-	.788
VGAE	-	124.03	-	.809

See Experimental Results for performances on Wiki and BlogCatalog.

Modularity: We entangle the codes into three parts: Dataloader, Model and Task. Users can easily customize the datasets, methods and tasks. It is also easy to define their specific datasets and methods.

Future Plan

We plan to add more models and tasks in our framework. Our future plan includes:

More self-supervised models such as ARGA/ARVGA, GALA and AGE.
New tasks for link prediction, graph clustering and graph classification.

You are welcomed to add your own datasets and methods by proposing new pull requests.

Usage

Installation

Clone this repo.
Enter the directory where you clone it, and run the following code:
```
pip install -r requirements.txt
cd src
```
You can start using OpenNE by simply changing directory to OpenNE/src. If instead you want to install OpenNE as a site-package, run the following command in OpenNE/src:
```
python setup.py install
```

Input Instructions

Use default values

It is easy to get started with OpenNE. Here are some commands for basic usages with default values:

python -m openne --model gf --dataset blogcatalog
python -m openne --model gcn --dataset cora

`store_true` and `store_false` parameters

Parameters like --sparse have action store_true, which means they are False by default, and should be specified if you want to assign True. Run GCN with sparsed matrices by the following command:

python -m openne --model gcn --dataset cora --sparse

You can use store_false parameters, eg. --no-save, in a similar way:

python -m openne --model gcn --dataset cora --sparse --no-save

OpenNE saves your models and training results to file by default, which may cost longer time. The above command is used when you wish not to save the results.

Use your own datasets

Use --local-dataset (which is also a store_true parameter!) and specify --root-dir, --edgefile/--adjfile, --labelfile, --features and --status to import dataset from file.

Optionally, specify store_true parameters --weighted and --directed to view the graph as weighted and/or directed.

If you wish to use your dataset in "~/mydataset", which includes edges.txt, an edgelist file, and labels.txt, a label file, input the following:

python -m openne --model gf --local-dataset --root-dir ~/mydataset --edgefile edges.txt --labelfile labels.txt

Input values

While all parameter names must be provided in lower case, string input values are case insensitive:

python -m openne --model SDnE --dataset coRA

The way to provide a Python list (as of --encoder-layer-list in SDNE and --hiddens in GCN) is input each elements separated by spaces:

python -m openne --model sdne --dataset cora --encoder-layer-list 1000 128

CUDA and multi-GPU

OpenNE uses CUDA by default if torch.cuda.is_available() == True. To disable CUDA, use --cpu.

When CUDA is enabled, you can select multiple GPU devices by using --devices [device_ids]. [device_ids] includes a number of integers, on the first of which your model and input are stored. Use --data-parallel to utilize data parallelism on the chosen devices.

General Options

You can check out the other options available to use with OpenNE using:

python -m openne --help

--model {deepwalk, line, node2vec, grarep, tadw, gcn, lap, gf, hope and sdne} the specified NE model;
--dataset {ppi, wikipedia, flickr, blogcatalog, wiki, pubmed, cora, citeseer} standard dataset as provided by OpenNE;

If instead you want to create a dataset from file, you can provide your own graph by using switch

--local-dataset (action store_true; mutually exclusive with --dataset)

and the following arguments:

--root-dir, root directory of input files. If empty, you should provide absolute paths for graph files;
--edgefile, description of input graph in edgelist format;
--adjfile, description of input graph in adjlist format (mutually exclusive with --edgefile);
--label-file, node label file;
--features, node feature file for certain models (optional);
--name, dataset name, "SelfDefined" by default;
--weighted, view graph as weighted (action store_true);
--directed, view graph as directed (action store_true);

For general training options:

--dim, dimension of node representation, 128 by default;
--clf-ratio, the ratio of training data for node classification, 0.5 by default;
--no-save, choose not to save the result (action store_false, dest=save);
--output, output file for vectors, which will be saved to "results" by default;
--sparse, calculate by sparse matrices (action store_true) (only supports lle & gcn);

For models with multiple epochs:

--epochs, number of epochs;
--validate, True if validation is needed; by default it is False except with GCN;
--validation-interval, number of epochs between two validations, 5 by default;
--debug-output-interval, number of epochs between two debug outputs, 5 by default;

For device options:

--cpu, force OpenNE to run on CPU. Ignored if torch.cuda.is_available() == False.
--devices, specify CUDA devices for OpenNE to run on (default 0). Devices other than device_id[0] are ignored except with --data-parallel. Ignored if torch.cuda.is_available() == False.
--data-parallel, split input batch and perform data parallelism (action store_true). Only works for methods with --batch-size (i.e. line, sdne).

Specific Options

GraphFactorization:

--weight-decay, weight for l2-loss of embedding matrix (1.0 by default);
--lr, learning rate (0.003 by default)

GraRep:

--kstep, use k-step transition probability matrix（requires dim % kstep == 0).

HOPE:

--measurement {katz, cn, rpr, aa} mesurement matrix, katz by default;
--beta, parameter with katz measurement, 0.02 by default;
--alpha, parameter with rpr measurement, 0.5 by default;

LINE:

--lr, learning rate, 0.001 by default;
--batch-size, 1024 by default;
--negative-ratio, 5 by default;
--order, 1 for the 1st-order, 2 for the 2nd-order and 3 for 1st + 2nd, 3 by default;

SDNE:

--encoder-layer-list, list of neuron numbers at each encoder layer. In SDNE, the last number --encoder-layer-list, instead of --dim, is the dimension of the output node representation. [128] by default;
--alpha, parameter that controls the first-order proximity loss, 1e-6 by default;
--beta, parameter used for construct matrix B, 5 by default;
--nu1, parameter that controls l1-loss of weights in autoencoder, 1e-8 by default;
--nu2, parameter that controls l2-loss of weights in autoencoder, 1e-5 by default;
--bs, batch size, 200 by default;
--lr, learning rate, 0.001 by default;
--decay, allow decay in learning rate (action store_true);

TADW: (requires attributed graph, eg. cora, pubmed, citeseer)

--lamb, parameter that controls the weight of regularization terms, 0.4 by default;

GCN: (requires attributed graph)

--lr, learning rate, 0.01 by default;
--dropout, dropout rate, 0.5 by default;
--weight-decay, weight for l2-loss of embedding matrix, 0.0001 by default;
--hiddens, list of neuron numbers in each hidden layer, [16] by default;
--max-degree, maximum Chebyshev polynomial degree. 0 (disable Chebyshev polynomial) by default;

GAE and VGAE: (requires attributed graph) shares the same parameter list with GCN.

--lr, default 0.01;
--dropout, default 0.0;
--weight-decay, default 1e-4;
--early-stopping, default 100;
--hiddens, default [32];
--max-degree, default 0;

DeepWalk and node2vec:

--num-paths, number of random walks that starts at each node, 10 by default;
--path-length, length of random walk started at each node, 80 by default;
--window, window size of skip-gram model; 10 by default;
--q (only node2vec), 1.0 by default;
--p (only node2vec), 1.0 by default.

Experimental Results

We provide experimental results of OpenNE models on Wiki and BlogCatalog datasets. For performances on Cora, checkout section "Overview" - "New Features".

Wiki

Algorithm	F1-micro	F1-macro	Time	Remarks
HOPE	0.613	0.432	1.89	-
GF	0.618	0.432	61.22	-
GraRep	0.608	0.42	4.33	-
Node2vec	0.656	0.535	49.18	-
DeepWalk	0.662	0.522	97.47	-
SDNE	0.655	0.522	81.19	-
LINE	0.631	0.488	234.12	epochs=40

BlogCatalog

Algorithm	F1-micro	F1-macro	Time	Remarks
HOPE	0.336	0.157	96.63	-
GF	0.235	0.066	800.02	-
GraRep	0.399	0.233	103.27	-
Node2Vec	0.396	0.26	1962.93	-
DeepWalk	0.398	0.261	516.64	-
SDNE	0.372	0.232	1323.93	-
LINE	0.384	0.235	4739.79	-

Citing

If you find OpenNE is useful for your research, please consider citing the following papers:

@InProceedings{perozzi2014deepwalk,
  Title                    = {Deepwalk: Online learning of social representations},
  Author                   = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
  Booktitle                = {Proceedings of KDD},
  Year                     = {2014},
  Pages                    = {701--710}
}

@InProceedings{tang2015line,
  Title                    = {Line: Large-scale information network embedding},
  Author                   = {Tang, Jian and Qu, Meng and Wang, Mingzhe and Zhang, Ming and Yan, Jun and Mei, Qiaozhu},
  Booktitle                = {Proceedings of WWW},
  Year                     = {2015},
  Pages                    = {1067--1077}
}

@InProceedings{grover2016node2vec,
  Title                    = {node2vec: Scalable feature learning for networks},
  Author                   = {Grover, Aditya and Leskovec, Jure},
  Booktitle                = {Proceedings of KDD},
  Year                     = {2016},
  Pages                    = {855--864}
}

@article{kipf2016semi,
  Title                    = {Semi-Supervised Classification with Graph Convolutional Networks},
  Author                   = {Kipf, Thomas N and Welling, Max},
  journal                  = {arXiv preprint arXiv:1609.02907},
  Year                     = {2016}
}

@InProceedings{cao2015grarep,
  Title                    = {Grarep: Learning graph representations with global structural information},
  Author                   = {Cao, Shaosheng and Lu, Wei and Xu, Qiongkai},
  Booktitle                = {Proceedings of CIKM},
  Year                     = {2015},
  Pages                    = {891--900}
}

@InProceedings{yang2015network,
  Title                    = {Network representation learning with rich text information},
  Author                   = {Yang, Cheng and Liu, Zhiyuan and Zhao, Deli and Sun, Maosong and Chang, Edward},
  Booktitle                = {Proceedings of IJCAI},
  Year                     = {2015}
}

@Article{tu2017network,
  Title                    = {Network representation learning: an overview},
  Author                   = {TU, Cunchao and YANG, Cheng and LIU, Zhiyuan and SUN, Maosong},
  Journal                  = {SCIENTIA SINICA Informationis},
  Volume                   = {47},
  Number                   = {8},
  Pages                    = {980--996},
  Year                     = {2017}
}

@inproceedings{ou2016asymmetric,
  title                    = {Asymmetric transitivity preserving graph embedding},
  author                   = {Ou, Mingdong and Cui, Peng and Pei, Jian and Zhang, Ziwei and Zhu, Wenwu},
  booktitle                = {Proceedings of the 22nd ACM SIGKDD},
  pages                    = {1105--1114},
  year                     = {2016},
  organization             = {ACM}
}

@inproceedings{belkin2002laplacian,
  title                    = {Laplacian eigenmaps and spectral techniques for embedding and clustering},
  author                   = {Belkin, Mikhail and Niyogi, Partha},
  booktitle                = {Advances in neural information processing systems},
  pages                    = {585--591},
  year                     = {2002}
}

@inproceedings{ahmed2013distributed,
  title                    = {Distributed large-scale natural graph factorization},
  author                   = {Ahmed, Amr and Shervashidze, Nino and Narayanamurthy, Shravan and Josifovski, Vanja and Smola, Alexander J},
  booktitle                = {Proceedings of the 22nd international conference on World Wide Web},
  pages                    = {37--48},
  year                     = {2013},
  organization             = {ACM}
}

@inproceedings{wang2016structural,
  title                    = {Structural deep network embedding},
  author                   = {Wang, Daixin and Cui, Peng and Zhu, Wenwu},
  booktitle                = {Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining},
  pages                    = {1225--1234},
  year                     = {2016},
  organization             = {ACM}
}

@inproceedings{kipf2016variational,
  title                    = {Variational graph auto-encoders},
  author                   = {Kipf, Thomas N and Welling, Max},
  booktitle                = {NIPS Workshop on Bayesian Deep Learning},
  numpages                 = {3},
  year                     = {2016}
}

Contributers

The OpenNE-pytorch Project is contributed by Yufeng Du, Ganqu Cui and Jie Zhou.

Project Organizers

Zhiyuan Liu
- Tsinghua University
- Homepage
Cheng Yang
- Beijing University of Posts and Telecommunications
- Homepage

Sponsor

This research is supported by Tencent.

Name		Name	Last commit message	Last commit date
Latest commit History 564 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenNE-PyTorch

Overview

New Features

Future Plan

Usage

Installation

Input Instructions

Use default values

`store_true` and `store_false` parameters

Use your own datasets

Input values

CUDA and multi-GPU

General Options

Specific Options

Experimental Results

Citing

Contributers

Project Organizers

Sponsor

About

Releases

Packages

Languages

License

Bznkxs/OpenNE

Folders and files

Latest commit

History

Repository files navigation

OpenNE-PyTorch

Overview

New Features

Future Plan

Usage

Installation

Input Instructions

Use default values

store_true and store_false parameters

Use your own datasets

Input values

CUDA and multi-GPU

General Options

Specific Options

Experimental Results

Citing

Contributers

Project Organizers

Sponsor

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`store_true` and `store_false` parameters

Packages