This repository contains the code, data, and instructions to reproduce the experiments from our EMNLP 2025 paper "Schema Generation for Large Knowledge Graphs Using Large Language Models" [arXiv].
conda create -n shapespresso python=3.11
conda activate shapespresso
pip install -r requirements.txt
Note: It is recommended to have a locally running or stable endpoint to avoid potential timeout errors. If you do not have one, you can set it up easily using qEndpoint.
Setting up Wikidata Endpoint
docker run -p 1234:1234 --name qendpoint-wikidata qacompany/qendpoint-wikidata
Setting up YAGO Endpoint
docker run -p 1234:1234 --name qendpoint-yago qacompany/qendpoint
After setting up the YAGO endpoint, upload the YAGO 4.5 triples from here.
The endpoint URL will then be accessible at: http://localhost:1234/api/endpoint/sparql.
Example 1: Generate Prompts (Local Setting, WES Dataset)
python main.py --task prompt \
--dataset wes \
--output_dir output/prompts/local/wes/entity_id/5 \
--mode local \
--num_instances 5 \
--sort_by entity_id \
--few_shot \
--few_shot_example_path dataset/wes/Q4220917.shex \
--save_log
Example 2: Generate Schema (Global Setting, YAGOS Dataset)
python main.py --task generate \
--model_name gpt-4o-mini \
--dataset wes \
--mode local \
--output_dir output/prompts/local/gpt-4o-mini/wes/entity_id/5 \
--prompts_dir output/prompts/local/wes/entity_id/5 \
--num_instances 5 \
--sort_by entity_id \
--few_shot \
--few_shot_example_path resources/wes_global_few_shot_examples.toml \
--graph_info_path resources/wikidata_property_information.json \
--save_log
Example 3: Evaluate (Classification Metrics, Exact Matching)
python evaluate.py --dataset wes \
--ground_truth_dir dataset/wes \
--predictions_dir output/results/local/gpt-4o-mini/wes/entity_id/5 \
--node_constraint_matching_level exact \
--cardinality_matching_level exact \
--classification
Example 4: Evaluate (Similarity Metrics)
python evaluate.py --dataset wes \
--ground_truth_dir dataset/wes \
--predictions_dir output/results/local/gpt-4o-mini/wes/entity_id/5 \
--similarity
List of few-shot example files and graph information files:
mode |
dataset |
few_shot_example_path |
graph_info_path |
|---|---|---|---|
| local | WES | dataset/wes/Q4220917.shex | resources/wes_predicate_count_instances.json |
| local | YAGOS | dataset/yagos/Scientist.shex | resources/yagos_predicate_count_instances.json |
| global | WES | resources/wes_global_few_shot_examples.toml | resources/wikidata_property_information.json |
| global | YAGOS | resources/yagos_global_few_shot_examples.toml | / |
| triples | WES | dataset/wes/Q4220917.shex | resources/wes_predicate_count_instances.json |
| triples | YAGOS | dataset/yagos/Scientist.shex | resources/yagos_predicate_count_instances.json |
The dataset can be accessed from Zenodo and the dataset folder in this repository.
If you find this repository useful, please cite our paper:
@inproceedings{zhang-et-al-2025-schema,
title = "Schema Generation for Large Knowledge Graphs Using Large Language Models",
author = "Zhang, Bohui and
He, Yuan and
Pintscher, Lydia and
Mero{\~n}o-Pe{\~n}uela, Albert and
Simperl, Elena",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-emnlp.671/",
doi = "10.18653/v1/2025.findings-emnlp.671",
pages = "12561--12580",
ISBN = "979-8-89176-335-7",
}
For questions, reach out to [email protected].