Skip to content

Use Pydantic for validation of env config #21

@tomschr

Description

@tomschr

Situation

Currently, the configuration for the docbuild environment isn't validated against errors.
The issue about docbuild app config is described in #88.

Use Case

Having a mechanism for validating the content of the env config helps in discovering configuration problems like typos or improper types.

Possible Implementation

Use Pydantic and create a hierarchy of classes. Each class serves as a "section" in the TOML file. It's properties are the respective keys with their values (see below as an example).

  • Prefer Field() for each attribute.
  • Use title, description, and possible examples as arguments in Field. They are used in the documentation.
  • Add an additional docstring after Field() (yes, it's a weird requirement for our documentation build tool)
Draft of the env config model
"""Pydantic models for application and environment configuration."""

from pathlib import Path
from typing import Annotated

from pydantic import BaseModel, Field, HttpUrl, IPvAnyAddress

from ..models.language import LanguageCode

# A type for domain names, validated with a regex.
# This ensures the string looks like a valid domain, but doesn't do a DNS lookup.
# TODO: Add title, description, examples
DomainName = Annotated[
    str,
    Field(
        pattern=r"^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$"
    ),
]


class Env_Server(BaseModel):
    """Defines server settings."""

    # TODO: Add Field and title, description, examples
    name: str
    role: str
    host: IPvAnyAddress | DomainName
    port: int | None = None
    enable_mail: bool


class Env_GeneralConfig(BaseModel):
    """Defines general configuration."""

    # TODO: Add Field and title, description, examples
    default_lang: LanguageCode
    languages: list[LanguageCode]
    canonical_url_domain: HttpUrl


class Env_TmpPaths(BaseModel):
    """Defines temporary paths."""

    # TODO: Add Field and title, description, examples
    tmp_base_path: Path
    tmp_path: Path
    tmp_deliverable_path: Path
    tmp_build_dir: Path
    tmp_out_path: Path
    log_path: Path
    tmp_deliverable_name: str


class Env_TargetPaths(BaseModel):
    """Defines target paths."""

    # TODO: Add Field and title, description, examples
    target_path: str
    backup_path: Path


class Env_PathsConfig(BaseModel):
    """Defines various application paths."""

    # TODO: Add Field and title, description, examples
    config_dir: Path
    repo_dir: Path
    temp_repo_dir: Path
    base_cache_dir: Path
    cache_dir: Path
    meta_cache_dir: Path
    tmp: TmpPaths
    target: Env_TargetPaths


class EnvConfig(BaseModel):
    """Root model for the environment configuration (env.toml)."""

    # TODO: Add Field and title, description, examples
    server: Env_Server
    config: Env_GeneralConfig
    paths: Env_PathsConfig
    xslt_params: dict[str, str | int] = Field(alias='xslt-params')

Additionally, we need to rethink how we want to deal with the placeholders. This can be done in two ways:

  1. Load the raw data, replace the placeholders, and pass the structure to EnvConfig.
  2. Let Pydantic's method model_validator deal with the replacement inside.

To make it possible, you need to add in EnvConfig:

from copy import deepcopy
from pydantic import BaseModel, Field, HttpUrl, IPvAnyAddress, model_validator

class EnvConfig(BaseModel):
    # ...
    @model_validator(mode='before')
    @classmethod
    def _resolve_placeholders(cls, data: Any) -> Any:
        """Resolve placeholders before any other validation."""
        if isinstance(data, dict):
            # Make a deep copy to avoid modifying the original input data,
            # as replace_placeholders works in-place.
            return replace_placeholders(deepcopy(data))
        return data

The execution flow looks like this:

  1. You call EnvConfig.model_validate(raw_config_dict).
  2. Pydantic sees the @model_validator(mode='before') on EnvConfig.
  3. It calls our _resolve_placeholders method, passing it the raw_config_dict.
  4. Our method calls replace_placeholders(raw_config_dict), which walks the entire dictionary and resolves all placeholders in one pass.
  5. Pydantic then takes the processed dictionary (with all placeholders resolved) and begins its normal validation, creating instances of Env_ServerConfig etc., from the clean data.

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions