shapespresso ☕️

This repository contains the code, data, and instructions to reproduce the experiments from our EMNLP 2025 paper "Schema Generation for Large Knowledge Graphs Using Large Language Models" [arXiv].

Setup Environment

conda create -n shapespresso python=3.11
conda activate shapespresso
pip install -r requirements.txt

Note: It is recommended to have a locally running or stable endpoint to avoid potential timeout errors. If you do not have one, you can set it up easily using qEndpoint.

Setting up Wikidata Endpoint

docker run -p 1234:1234 --name qendpoint-wikidata qacompany/qendpoint-wikidata

Setting up YAGO Endpoint

docker run -p 1234:1234 --name qendpoint-yago qacompany/qendpoint

After setting up the YAGO endpoint, upload the YAGO 4.5 triples from here.

The endpoint URL will then be accessible at: http://localhost:1234/api/endpoint/sparql.

Running Experiments

Example 1: Generate Prompts (Local Setting, WES Dataset)

python main.py --task prompt \
               --dataset wes \
               --output_dir output/prompts/local/wes/entity_id/5 \
               --mode local \
               --num_instances 5 \
               --sort_by entity_id \
               --few_shot \
               --few_shot_example_path dataset/wes/Q4220917.shex \
               --save_log

Example 2: Generate Schema (Global Setting, YAGOS Dataset)

python main.py --task generate \
               --model_name gpt-4o-mini \
               --dataset wes \
               --mode local \
               --output_dir output/prompts/local/gpt-4o-mini/wes/entity_id/5 \
               --prompts_dir output/prompts/local/wes/entity_id/5 \
               --num_instances 5 \
               --sort_by entity_id \
               --few_shot \
               --few_shot_example_path resources/wes_global_few_shot_examples.toml \
               --graph_info_path resources/wikidata_property_information.json \
               --save_log

Example 3: Evaluate (Classification Metrics, Exact Matching)

python evaluate.py --dataset wes \
                   --ground_truth_dir dataset/wes \
                   --predictions_dir output/results/local/gpt-4o-mini/wes/entity_id/5 \
                   --node_constraint_matching_level exact \
                   --cardinality_matching_level exact \
                   --classification

Example 4: Evaluate (Similarity Metrics)

python evaluate.py --dataset wes \
                   --ground_truth_dir dataset/wes \
                   --predictions_dir output/results/local/gpt-4o-mini/wes/entity_id/5 \
                   --similarity

Resources

List of few-shot example files and graph information files:

`mode`	`dataset`	`few_shot_example_path`	`graph_info_path`
local	WES	dataset/wes/Q4220917.shex	resources/wes_predicate_count_instances.json
local	YAGOS	dataset/yagos/Scientist.shex	resources/yagos_predicate_count_instances.json
global	WES	resources/wes_global_few_shot_examples.toml	resources/wikidata_property_information.json
global	YAGOS	resources/yagos_global_few_shot_examples.toml	/
triples	WES	dataset/wes/Q4220917.shex	resources/wes_predicate_count_instances.json
triples	YAGOS	dataset/yagos/Scientist.shex	resources/yagos_predicate_count_instances.json

Dataset

The dataset can be accessed from Zenodo and the dataset folder in this repository.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{zhang-et-al-2025-schema,
    title = "Schema Generation for Large Knowledge Graphs Using Large Language Models",
    author = "Zhang, Bohui  and
      He, Yuan  and
      Pintscher, Lydia  and
      Mero{\~n}o-Pe{\~n}uela, Albert  and
      Simperl, Elena",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.671/",
    doi = "10.18653/v1/2025.findings-emnlp.671",
    pages = "12561--12580",
    ISBN = "979-8-89176-335-7",
}

Contact

For questions, reach out to [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

shapespresso ☕️

Setup Environment

Running Experiments

Resources

Dataset

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
resources		resources
shapespresso		shapespresso
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
main.py		main.py
requirements.txt		requirements.txt

King-s-Knowledge-Graph-Lab/shapespresso

Folders and files

Latest commit

History

Repository files navigation

shapespresso ☕️

Setup Environment

Running Experiments

Resources

Dataset

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages