- Pretrained model weights:
config.json
,pytorch_model.bin
(also available on Huggingfacemalteos/scincl-wol
) - Tokenizer: See w/ leakage release
- Triples (query, positive, negative) and paper metadata:
train_triples.csv.gz
,train_metadata.jsonl.gz
- Corpus and query papers:
s2orc_paper_ids.seed_0.json
,query_s2orc_paper_ids.seed_0.json