This repository contains the code implementation for the TaxoRankConstruct methodology, designed to construct, refine, and expand taxonomies using Large Language Models (LLMs). The process is iterative, ensuring a balance between creativity, diversity, and accuracy. The repository utilizes three distinct models for generating, refining, and verifying taxonomical concepts.
- Overview
- Model Initialization
- Taxonomy Generation
- Iterative Concept Expansion
- Exporting and Evaluating the Taxonomy
- Usage
- Customization
- License
TaxoRankConstruct is a flexible methodology that leverages LLMs to build hierarchical taxonomies. By combining the strengths of three distinct models, we achieve robust concept generation and refinement. The process consists of the following key steps:
- Model Initialization
- Taxonomy Generation
- Iterative Concept Expansion
- Export and Evaluation
The methodology begins by initializing three models, each with specific hyperparameters tailored to different stages of the taxonomy construction process:
-
Verification Model (
model_verify
)
Based on thegpt-4o-mini
architecture, this model ensures the accuracy and validity of generated concepts. It is configured with:temperature
: 0.9top_p
: 0.90presence_penalty
: 1.00frequency_penalty
: 0.00
-
Re-Generation Model (
model_re_generate
)
Also based on thegpt-4o-mini
architecture, this model focuses on regenerating and refining concepts. Configuration:temperature
: 1.40top_p
: 0.85presence_penalty
: 0.50frequency_penalty
: 1.00
-
New Concept Generation Model (
model_generate_new
)
Using thegpt-4o
architecture, this model generates creative new concepts for expanding the taxonomy. Configuration:temperature
: 1.40top_p
: 0.98presence_penalty
: 1.30frequency_penalty
: 1.40
These models are initialized with the init_models
function and a session is started using the start_session
function. You are encouraged to experiment with different hyperparameters to optimize performance for specific tasks.
After model initialization, the taxonomy generation begins. The root concept (e.g., "Art") is defined, and the system constructs a taxonomy around it. Key parameters include:
definition_amount
: Number of definitions to generate.definition_max_words
: Maximum word count for each definition.
Taxonomy generation is performed via the construct_taxonomy
function, which uses the three initialized models to iteratively build the hierarchy.
Once the base taxonomy is built, the next step is iterative expansion. This phase generates new subconcepts and integrates them into the taxonomy. Expansion is guided by:
iteration_amm
: Maximum number of attempts to generate valid concepts.max_words_context
: Maximum word count for the contextual description.max_subconcept_length
: Maximum length of generated subconcepts.
The iterate_level
function manages this process, ensuring that new concepts are both relevant and cohesive within the taxonomy structure.
When the taxonomy is sufficiently developed, it can be exported to the OWL (Web Ontology Language) format for use in external tools and applications. This is done using the export_to_owl
function. Additionally, the taxonomy structure and its content can be evaluated using the info
method, which provides detailed insights into the taxonomy.
To use the code for your own taxonomy construction tasks:
- Clone this repository.
git clone https://github.com/your-username/TaxoRankConstruct.git
- Initialize the models and start a session:
from src.core.helper_functions import start_session, init_models log = start_session(api_key = api_key) model_generate_new, model_re_generate, model_verify = init_models(log)
- Generate the taxonomy:
from src.core.taxonomy_construction import construct_taxonomy tax_t = construct_taxonomy(root_concept, model_generate_new, model_re_generate, model_verify, log = log, check_existance = False)
- Expand the taxonomy:
from src.core.taxonomy_construction import iterate_level tax_t = iterate_level(tax_t, 0, model_generate_new, model_re_generate, model_verify, log = log, iteration_amm = iteration_amm, max_words_context = max_words_context, max_subconcept_lenght = max_subconcept_lenght)
- Export the final taxonomy:
tax_t.export_to_owl()
- Load taxonomy:
from src.core.helper_functions import load_taxonomy, Taxonomy loaded_tax = load_taxonomy(path)
- Visualize the taxonomy:
from src.visualisation import visualize_taxonomy_as_graph_spaced generated_graph = visualize_taxonomy_as_graph_spaced(tax_t)
You can modify various hyperparameters for the models and taxonomy generation process to suit your specific needs. Adjusting parameters like temperature, top_p, and penalty settings can lead to more creative or conservative outputs, depending on the task at hand.
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.