Skip to content

King-s-Knowledge-Graph-Lab/shapespresso

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shapespresso ☕️

DOI

This repository contains the code, data, and instructions to reproduce the experiments from our EMNLP 2025 paper "Schema Generation for Large Knowledge Graphs Using Large Language Models" [arXiv].

Setup Environment

conda create -n shapespresso python=3.11
conda activate shapespresso
pip install -r requirements.txt

Note: It is recommended to have a locally running or stable endpoint to avoid potential timeout errors. If you do not have one, you can set it up easily using qEndpoint.

Setting up Wikidata Endpoint

docker run -p 1234:1234 --name qendpoint-wikidata qacompany/qendpoint-wikidata

Setting up YAGO Endpoint

docker run -p 1234:1234 --name qendpoint-yago qacompany/qendpoint

After setting up the YAGO endpoint, upload the YAGO 4.5 triples from here.

The endpoint URL will then be accessible at: http://localhost:1234/api/endpoint/sparql.

Running Experiments

Example 1: Generate Prompts (Local Setting, WES Dataset)

python main.py --task prompt \
               --dataset wes \
               --output_dir output/prompts/local/wes/entity_id/5 \
               --mode local \
               --num_instances 5 \
               --sort_by entity_id \
               --few_shot \
               --few_shot_example_path dataset/wes/Q4220917.shex \
               --save_log

Example 2: Generate Schema (Global Setting, YAGOS Dataset)

python main.py --task generate \
               --model_name gpt-4o-mini \
               --dataset wes \
               --mode local \
               --output_dir output/prompts/local/gpt-4o-mini/wes/entity_id/5 \
               --prompts_dir output/prompts/local/wes/entity_id/5 \
               --num_instances 5 \
               --sort_by entity_id \
               --few_shot \
               --few_shot_example_path resources/wes_global_few_shot_examples.toml \
               --graph_info_path resources/wikidata_property_information.json \
               --save_log

Example 3: Evaluate (Classification Metrics, Exact Matching)

python evaluate.py --dataset wes \
                   --ground_truth_dir dataset/wes \
                   --predictions_dir output/results/local/gpt-4o-mini/wes/entity_id/5 \
                   --node_constraint_matching_level exact \
                   --cardinality_matching_level exact \
                   --classification

Example 4: Evaluate (Similarity Metrics)

python evaluate.py --dataset wes \
                   --ground_truth_dir dataset/wes \
                   --predictions_dir output/results/local/gpt-4o-mini/wes/entity_id/5 \
                   --similarity

Resources

List of few-shot example files and graph information files:

mode dataset few_shot_example_path graph_info_path
local WES dataset/wes/Q4220917.shex resources/wes_predicate_count_instances.json
local YAGOS dataset/yagos/Scientist.shex resources/yagos_predicate_count_instances.json
global WES resources/wes_global_few_shot_examples.toml resources/wikidata_property_information.json
global YAGOS resources/yagos_global_few_shot_examples.toml /
triples WES dataset/wes/Q4220917.shex resources/wes_predicate_count_instances.json
triples YAGOS dataset/yagos/Scientist.shex resources/yagos_predicate_count_instances.json

Dataset

The dataset can be accessed from Zenodo and the dataset folder in this repository.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{zhang-et-al-2025-schema,
    title = "Schema Generation for Large Knowledge Graphs Using Large Language Models",
    author = "Zhang, Bohui  and
      He, Yuan  and
      Pintscher, Lydia  and
      Mero{\~n}o-Pe{\~n}uela, Albert  and
      Simperl, Elena",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.671/",
    doi = "10.18653/v1/2025.findings-emnlp.671",
    pages = "12561--12580",
    ISBN = "979-8-89176-335-7",
}

Contact

For questions, reach out to [email protected].

About

Schema Generation for Large Knowledge Graphs Using Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages