LLM Metadata Repository

This repository provides structured, machine-readable metadata for a wide range of large language models (LLMs). It is designed to support tools and applications that require detailed information about model capabilities, configurations, and supported parameters. Used by BasiliskLLM and OpenAI NVDA Add-on.

This metadata enables:

Dynamic population of model selection UIs
Feature-aware prompting and parameter tuning
Compatibility and capability checks for downstream tools

The metadata is stored in JSON format and is inspired by the OpenRouter API model listing schema.

Each JSON file in the data/ directory contains a list of model objects with the following structure:

{
  "id": "gpt-5",
  "name": "GPT-5",
  "description": "OpenAI’s most advanced model...",
  "created": 1754587413,
  "context_length": 400000,
  "architecture": {
    "modality": "text+image->text",
    "input_modalities": ["text", "image", "file"],
    "output_modalities": ["text"],
    "tokenizer": "GPT",
    "instruct_type": null
  },
  "top_provider": {
    "context_length": 400000,
    "max_completion_tokens": 128000,
    "is_moderated": true
  },
  "supported_parameters": [
    "max_tokens",
    "temperature",
    "response_format",
    "structured_outputs"
  ]
}

Metadata Fields

Top-Level Fields

id: Unique model identifier (e.g., gpt-4-turbo)
name: Human-readable model name
description: Summary of model capabilities and use cases
created: Unix timestamp of model release
context_length: Maximum input context length (in tokens)

architecture

modality: Overall input/output modality (e.g., text->text, text+image->text)
input_modalities: Supported input types (e.g., text, image, file)
output_modalities: Supported output types (e.g., text)
tokenizer: Tokenizer type used by the model (e.g., GPT)
instruct_type: Instruction tuning format (e.g., chatml, alpaca), or null if not applicable

top_provider

context_length: Maximum context length supported by the top provider
max_completion_tokens: Maximum number of tokens in a single response
is_moderated: Indicates whether the model is subject to content moderation

supported_parameters

A list of tunable parameters supported by the model, such as:

max_tokens
temperature
top_p
frequency_penalty
presence_penalty
tools
seed
response_format
structured_outputs

Contributing

Contributions are welcome! To add or update metadata for a model:

Fork the repository
Add or edit the appropriate JSON file in the data/ directory
Submit a pull request

Please ensure your JSON is valid and follows the schema outlined above.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Metadata Repository

Metadata Fields

Top-Level Fields

architecture

top_provider

supported_parameters

Contributing

About

Uh oh!

Releases

Packages

License

SigmaNight/model-metadata

Folders and files

Latest commit

History

Repository files navigation

LLM Metadata Repository

Metadata Fields

Top-Level Fields

architecture

top_provider

supported_parameters

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages