Skip to content

spraakbanken/metadata-api

Repository files navigation

Språkbanken Text Metadata API

The Språkbanken Text Metadata API is a RESTful web service that provides access to metadata for various resources maintained by Språkbanken Text, including corpora, lexicons, models, analyses, and utilities. The metadata is stored in YAML files in a separate metadata repository.

For more technical details please refer to the developer documentation.

API Usage

Available API calls (please note that the URL contains the API version, e.g. /v3, /dev etc):

Endpoint Description
📁 / List all resources
📁 /?resource-type=[resource-type] List all resources of a specific type.
Available types: corpus, lexicon, model, analysis, utility, collection
📁 /list-ids List all existing resource IDs
🔍 /?resource=saldo Retrieve a specific resource and its description (if available)
🔍 /bibtex?resource=[resource-id] Return BibTeX citation for the specified resource
🔍 /check-id-availability?id=[resource-id] Check if a given resource ID is available
🔧 /renew-cache Update all metadata files from git, re-process JSON, and update cache.
🔧 /renew-cache?resource-paths=[resource-type]/[resource-id] Update cache for specific resources, e.g.:
resource-paths=corpus/attasidor,lexicon/saldo
📘 /schema Return JSON schema for resources
📘 /openapi.json Serve API documentation as JSON

Requirements

Installation

To install the dependencies, we recommend using uv.

  1. Install uv if you don't have it already.

  2. While in the metadata-api directory, run:

    uv sync --no-install-project

    This will create a virtual environment in the .venv directory and install the dependencies listed in pyproject.toml.

Alternatively, you can set up a virtual environment manually using Python's built-in venv module and install the dependencies using pip:

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Configuration

The default configuration is specified in metadata_api/settings.py. You can override these settings using environment variables or by creating a local .env file in the project's root directory. Common configuration options include:

  • LOG_LEVEL (default: INFO)
  • LOG_TO_FILE (default: True): Logs always go to stdout; if True, they are also saved to logs/metadata_api_<DATE>.log.
  • ROOT_PATH: The root path for the API, e.g., "/metadata-api" if served from a subpath.
  • METADATA_DIR: Absolute path to the directory containing the metadata YAML files.
  • CELERY_BROKER_URL: URL for the Celery broker used for background tasks.
  • MEMCACHED_SERVER: Host and port of the Memcached server, or path to the socket file.
  • SLACK_WEBHOOK: URL to a Slack webhook for error notifications (optional).

Example .env file:

LOG_LEVEL=DEBUG
LOG_TO_FILE=False
ROOT_PATH="/metadata-api"
METADATA_DIR="/path-to-metadata-dir"
CELERY_BROKER_URL="redis://localhost:6379/1"
MEMCACHED_SERVER="localhost:11211"  # Set to None to disable caching
SLACK_WEBHOOK="https://hooks.slack.com/services/..."

Running a test server

For testing purposes, you can run the app using the following script (with an activated virtual environment, or by prefixing with uv run). The default settings when using run.py are:

  • Host/port: 127.0.0.1:8000
  • ENV=development
  • LOG_LEVEL=DEBUG
  • LOG_TO_FILE=False (logs to console only)
  • reload=True (auto-restart on code changes)
python run.py [--host HOST] [--port PORT] [--log-level LOG_LEVEL]

If you prefer to run the app with uvicorn, you can use the following command:

uvicorn metadata_api.main:app

You also need to have a running Celery worker for background tasks. You can start a worker with:

celery -A metadata_api.tasks worker --loglevel=INFO

Please note that you need to have a running Redis server for Celery to work.

About

REST-API that serves meta data for SB's corpora and lexicons

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •