plumberlama

It's lama with one l! Generate documentation for repeated cross-sectional surveys (anonymous participants) created with LamaPoll and process results to simplify self-service data analysis and visualization.

Deployment

Option 1: Install as Package

Install plumberlama as a Python package, for example in a uv project:

# From GitHub
uv pip install "git+https://github.com/correlaid/plumberlama.git"

# Create .env file with configuration (see Configuration section below), then set environment with
set -a && source .env && set +a

#Optionally start a local database
docker compose -f docker-compose.example.yml up -d postgres

# Run the etl pipeline (requires a database)
uv run plumberlama etl

# Generate documentation (requires metadata to be loaded to database)
uv run plumberlama docs

You can then serve the generated site, for example with the following command (requires busybox utilities to be installed on your OS):

busybox httpd -f -vv -p 1102 -h /tmp/site  # Use the SITE_OUTPUT_DIR you configured
#see localhost:1102

Option 2: Use containerized pipeline

See the example docker compose and Dockerfile for how this could work. The Dockerfile contained in this repository installs the python code from the local source. See the comment in it for how to install from Github repository.

 docker compose -f docker-compose.example.yml up

This will:

Start a PostgreSQL database
Run the ETL pipeline to fetch and process survey data
Generate documentation as a static MkDocs site
Serve the documentation at http://localhost:8080

The pipeline runs ETL first, then generates documentation. Once complete, you can view the documentation in your browser at http://localhost:8080.

Configuration

Create a .env file with your configuration:

# Survey Configuration
SURVEY_ID=my_survey                    # Stable identifier across poll iterations
LP_POLL_ID=1850964                     # LamaPoll poll ID
LP_API_TOKEN=your_token_here           # LamaPoll API token
LP_API_BASE_URL=https://app.lamapoll.de/api/v2

# LLM Configuration (for variable naming)
LLM_MODEL=openrouter/anthropic/claude-3.5-sonnet
OR_KEY=your_openrouter_key
LLM_BASE_URL=https://openrouter.ai/api/v1

# Documentation Configuration
SITE_OUTPUT_DIR=/tmp/site              # Directory for built HTML files
MKDOCS_SITE_NAME=My Survey Documentation
MKDOCS_SITE_AUTHOR=Survey Team
MKDOCS_REPO_URL=https://github.com/yourorg/yourrepo
MKDOCS_LOGO_URL=https://example.com/logo.svg

DB_HOST=postgres
DB_PORT=5432
DB_NAME=survey_data
DB_USER=plumberlama
DB_PASSWORD=plumberlama_dev

Development

For contributing or local development:

# Clone the repository
git clone <repo-url>
cd plumberlama

# Install dependencies and set up environment
uv sync

# Set up pre-commit hooks
uv run pre-commit install

# Run e.g. unit tests after making changes
uv run pytest tests/unit/ -s -vv

Project Structure

plumberlama/
├── src/plumberlama/
│   ├── cli.py                      # Command-line interface
│   ├── config.py                   # Configuration dataclass
│   ├── states.py                   # Immutable state objects
│   ├── transitions.py              # State transition functions
│   ├── validation_schemas.py       # Pandera validation schemas
│   ├── generated_api_models.py     # Pydantic API models (auto-generated)
│   ├── parse_metadata.py           # Question parsing and type inference
│   ├── documentation.py            # MkDocs generation
│   ├── type_mapping.py             # Polars ↔ String type conversion
│   ├── logging_config.py           # Logging configuration
│   ├── extract/
│   │   └── question_type.py        # Question type extraction and inference
│   ├── transform/
│   │   ├── cast_types.py           # Type casting
│   │   ├── decode.py               # Choice decoding
│   │   ├── llm.py                  # LLM integration
│   │   ├── rename_results_columns.py # Column renaming
│   │   └── variable_naming.py      # Semantic variable naming
│   └── io/
│       ├── api.py                  # LamaPoll API client
│       ├── database.py             # Database operations
│       └── database_queries.py     # SQL query templates
├── scripts/
│   ├── generate_api_models.py      # Generate Pydantic models from OpenAPI
│   └── query_db.py                 # Database query utility
├── tests/
│   ├── unit/                       # Unit tests
│   ├── integration/                # Integration tests
│   ├── e2e/                        # End-to-end tests
│   ├── conftest.py                 # Pytest configuration
│   └── docker-compose.test.yml     # Test database setup
├── docker-compose.example.yml      # Example deployment setup
├── Dockerfile                      # Container image definition
└── pyproject.toml                  # Project dependencies and metadata

How It Works

The pipeline is built using explicit state transitions following functional programming principles. Each transition is a pure function that takes the current state and returns a new state.

Pipeline Architecture

flowchart TD
    Config["Config<br/><small>SURVEY_ID + LP_POLL_ID</small>"]

    Config --> FetchMeta["Fetch Metadata<br/><small>from LP_POLL_ID</small>"]

    FetchMeta --> ParseMeta[Parse Metadata<br/>Extract Variables]
    ParseMeta --> ProcessMeta[Process Metadata<br/>Variable Renaming etc.]

    ProcessMeta --> PreloadCheck{"Preload Check<br/><small>Query {SURVEY_ID}_metadata</small>"}

    PreloadCheck -->|"✓ No tables<br/>load_counter=0<br/>CREATE"| FetchResults["Fetch Results<br/><small>from LP_POLL_ID</small>"]
    PreloadCheck -->|"✓ Match<br/>load_counter>0<br/>APPEND"| FetchResults
    PreloadCheck -->|"✗ Mismatch<br/>STOP"| Stop["❌ Aborted<br/>"]

    FetchResults --> ProcessResults[Process Results<br/>Transform Data]

    ProcessResults --> LoadData["Load Data<br/><small>to {SURVEY_ID}_{results&metadata}</small>"]

    LoadData -.->|Optional:<br/>plumberlama docs| Document["Documentation<br/><small>from {SURVEY_ID}_metadata</small>"]

    style Config fill:#e1f5ff,stroke:#333,stroke-width:2px,color:#000
    style FetchMeta fill:#fff4e1,stroke:#333,stroke-width:2px,color:#000
    style FetchResults fill:#fff4e1,stroke:#333,stroke-width:2px,color:#000
    style ParseMeta fill:#f0e1ff,stroke:#333,stroke-width:2px,color:#000
    style ProcessMeta fill:#f0e1ff,stroke:#333,stroke-width:2px,color:#000
    style PreloadCheck fill:#ffeb3b,stroke:#333,stroke-width:3px,color:#000
    style ProcessResults fill:#e1ffe1,stroke:#333,stroke-width:2px,color:#000
    style LoadData fill:#ffe1e1,stroke:#333,stroke-width:2px,color:#000
    style Document fill:#ffe1f5,stroke:#333,stroke-width:2px,color:#000
    style Stop fill:#ff5252,stroke:#333,stroke-width:2px,color:#fff

Survey Identity & Cross-Sectional Data

SURVEY_ID: Stable identifier for the cross-sectional survey. Names database tables ({survey_id}_metadata, {survey_id}_results)
LP_POLL_ID: LamaPoll poll ID, can change between waves. Data from different polls with identical structure is appended to the same SURVEY_ID tables
load_counter: Tracks which waves data came from (0=first load/CREATE, >0=subsequent loads/APPEND)

Example: Three yearly waves with different LP_POLL_IDs but same SURVEY_ID=yearly_feedback → all stored in yearly_feedback_* tables with load_counter 0, 1, 2.

Question Type Inference

LamaPoll's native question types are refined based on structure:

LamaPoll Type	Groups	Variables	Inferred Type	Schema
INPUT	1	1	`input_single_<type>`	String/Int64
INPUT	>1	1 per group (>1 total)	`input_multiple_<type>`	Multiple String/Int64
CHOICE	1	1	`single_choice`	String (Enum)
CHOICE	1	>1	`multiple_choice`	Multiple Boolean
CHOICE	2	>1	`multiple_choice_other`	Boolean + String
SCALE	1	1	`scale`	Int64 with range
MATRIX	1	>1	`matrix`	Multiple Int64 with range

See src/plumberlama/parse_metadata.py for full inference logic.

Design Principles

Functional Programming:

Pure functions with no side effects
Immutable state objects (frozen dataclasses)
Explicit data flow through state transitions
Declarative style

Contract Programming:

(Preconditions) and postconditions enforced by state validation
Type annotations guarantee correct data flow
Pandera schemas enforce DataFrame structure invariants

Data-Oriented Programming:

Separate data from code
Generic data structures (DataFrames) over custom classes
Immutable by default
Schema separated from representation

Querying the Database

After running the ETL pipeline, you can query the PostgreSQL database using predefined query functions:

# List available query functions
uv run plumberlama query --list

# Use query functions (table_prefix automatically set from SURVEY_ID in .env)
uv run plumberlama query get_question_metadata  27937539
uv run plumberlama query get_frequency_distribution Q5

The command automatically loads database credentials and survey ID from your .env file. See src/plumberlama/io/database_queries.py for all available query functions.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
scripts		scripts
src/plumberlama		src/plumberlama
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.example.yml		docker-compose.example.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

plumberlama

Deployment

Option 1: Install as Package

Option 2: Use containerized pipeline

Configuration

Development

Project Structure

How It Works

Pipeline Architecture

Survey Identity & Cross-Sectional Data

Question Type Inference

Design Principles

Querying the Database

About

Uh oh!

Releases

Packages

Languages

License

CorrelAid/plumberlama

Folders and files

Latest commit

History

Repository files navigation

plumberlama

Deployment

Option 1: Install as Package

Option 2: Use containerized pipeline

Configuration

Development

Project Structure

How It Works

Pipeline Architecture

Survey Identity & Cross-Sectional Data

Question Type Inference

Design Principles

Querying the Database

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages