NodePiece for OGB WikiKG 2

The code to run NodePiece on the OGB WikiKG 2 dataset.

OGB repo: github
Decoder models taken from the AutoSF repo: github
Experiments executed on: Tesla V100 32GB, 64GB RAM

NodePiece dramatically reduces the entity embeddings matrix and can be used with standard decoders from the OGB Leaderboard.

Paired with AutoSF scoring function, NodePiece yields the following performance being 70-180x smaller in #parameters (cf. OGB WikiKG2 Leaderboard):

Model	Vocabulary Size	Parameters	Validation MRR	Test MRR
`NodePiece + AutoSF`	20k	6,860,602	0.5806 ± 0.0047	0.5703 ± 0.0035
`AutoSF`	2.5M	500,227,800	0.5510 ± 0.0063	0.5458 ± 0.0052
`TransE (500d)`	2.5M	1,250,569,500	0.4272 ± 0.0030	0.4256 ± 0.0030

Running the experiment

We have pre-computed a vocabulary of 20k anchor nodes (~910 MB). Download it using the download.sh script:

sh download.sh

Install the requirements from the requirements.txt
Run the code with the best hyperparameters using the main script

sh run_ogb.sh

Vocabulary construction (Tokenization)

Alternatively, you can create a new vocab and run the tokenization with any number of anchors. For that, we improved the anchor mining procedure to be parallelizable and work in a batch fashion:

We first run METIS partitioning available in PyTorch-Geometric so that anchor mining will be done independently for each partition. The number of partitions --part should ideally be equal or twice larger that the --cpu_num parameter.
Then we tokenize --tkn_batch nodes in each iteration (eg, 500K nodes will create 5000 batches)
Note that this process might be RAM-hungry. Approximate requirements would be a server with 8 CPUs and 64 GB RAM.
Tokenization time depends on several parameters (cf. the main codebase). For the above server configuration, mining the vocab of 20k anchors might take 2-8 hours in the multiprocessing mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

NodePiece for OGB WikiKG 2

Running the experiment

Vocabulary construction (Tokenization)

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

NodePiece for OGB WikiKG 2

Running the experiment

Vocabulary construction (Tokenization)