This is the Pytorch implementation for our SIGIR 2020 paper:
SIGIR 2020. Xiangnan He, Kuan Deng ,Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang(2020). LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation, Paper in arXiv.
Author: Prof. Xiangnan He (staff.ustc.edu.cn/~hexn/)
(Also see Tensorflow implementation)
In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN,including only the most essential component in GCN—neighborhood aggregation—for collaborative filtering
pip install -r requirements.txt
We provide three processed datasets: Gowalla, Yelp2018 and Amazon-book and one small dataset LastFM.
see more in dataloader.py
run LightGCN on Gowalla dataset:
- command
cd code && python main.py --decay=1e-4 --lr=0.001 --layer=3 --seed-2020 --dataset="gowalla" --topks=[20] --recdim=64
- log output
...
======================
EPOCH[5/1000]
BPR[sample time][16.2=15.84+0.42]
[saved][[BPR[aver loss1.128e-01]]
[0;30;43m[TEST][0m
{'precision': array([0.03315359]), 'recall': array([0.10711388]), 'ndcg': array([0.08940792])}
[TOTAL TIME] 35.9975962638855
...
======================
EPOCH[116/1000]
BPR[sample time][16.9=16.60+0.45]
[saved][[BPR[aver loss2.056e-02]]
[TOTAL TIME] 30.99874997138977
...
NOTE:
- Even though we offer the code to split user-item matrix for matrix multiplication, we strongly suggest you don't enable it since it will extremely slow down the training speed.
- If you feel the test process is slow, try to increase the
testbatch
and enablemulticore
(Windows system may encounter problems withmulticore
option enabled) - Use
tensorboard
option, it's good. - Since we fix the seed(
--seed=2020
) ofnumpy
andtorch
in the beginning, if you run the command as we do above, you should have the exact output log despite the running time (check your output of epoch 5 and epoch 116).
code structure is below.
code
├── parse.py
├── Procedure.py
├── dataloader.py
├── main.py
├── model.py
├── utils.py
└── world.py
if you want to run lightGCN on your own dataset, you should go to dataloader.py
, and implement a dataloader.
all metrics is under top-20
pytorch version results (stop at 1000 epochs):
(for seed=2020)
- gowalla:
Recall | ndcg | precision | |
---|---|---|---|
layer=1 | 0.1687 | 0.1417 | 0.05106 |
layer=2 | 0.1786 | 0.1524 | 0.05456 |
layer=3 | 0.1824 | 0.1547 | 0.05589 |
layer=4 | 0.1825 | 0.1537 | 0.05576 |
NOTE: layers=4 we use seed=1000 to attain a better performance
- yelp2018
Recall | ndcg | precision | |
---|---|---|---|
layer=1 | 0.05604 | 0.4557 | 0.02519 |
layer=2 | 0.05988 | 0.04956 | 0.0271 |
layer=3 | 0.06347 | 0.05238 | 0.0285 |
layer=4 | 0.06515 | 0.05325 | 0.02917 |
For those who want the well-trained models, please e-mail me ( gusye AT mail.ustc.edu.cn
)