It's lama with one l! Generate documentation for repeated cross-sectional surveys (anonymous participants) created with LamaPoll and process results to simplify self-service data analysis and visualization.
Install plumberlama as a Python package, for example in a uv project:
# From GitHub
uv pip install "git+https://github.com/correlaid/plumberlama.git"
# Create .env file with configuration (see Configuration section below), then set environment with
set -a && source .env && set +a
#Optionally start a local database
docker compose -f docker-compose.example.yml up -d postgres
# Run the etl pipeline (requires a database)
uv run plumberlama etl
# Generate documentation (requires metadata to be loaded to database)
uv run plumberlama docs
You can then serve the generated site, for example with the following command (requires busybox utilities to be installed on your OS):
busybox httpd -f -vv -p 1102 -h /tmp/site # Use the SITE_OUTPUT_DIR you configured
#see localhost:1102
See the example docker compose and Dockerfile for how this could work. The Dockerfile contained in this repository installs the python code from the local source. See the comment in it for how to install from Github repository.
docker compose -f docker-compose.example.yml up
This will:
- Start a PostgreSQL database
- Run the ETL pipeline to fetch and process survey data
- Generate documentation as a static MkDocs site
- Serve the documentation at http://localhost:8080
The pipeline runs ETL first, then generates documentation. Once complete, you can view the documentation in your browser at http://localhost:8080.
Create a .env
file with your configuration:
# Survey Configuration
SURVEY_ID=my_survey # Stable identifier across poll iterations
LP_POLL_ID=1850964 # LamaPoll poll ID
LP_API_TOKEN=your_token_here # LamaPoll API token
LP_API_BASE_URL=https://app.lamapoll.de/api/v2
# LLM Configuration (for variable naming)
LLM_MODEL=openrouter/anthropic/claude-3.5-sonnet
OR_KEY=your_openrouter_key
LLM_BASE_URL=https://openrouter.ai/api/v1
# Documentation Configuration
SITE_OUTPUT_DIR=/tmp/site # Directory for built HTML files
MKDOCS_SITE_NAME=My Survey Documentation
MKDOCS_SITE_AUTHOR=Survey Team
MKDOCS_REPO_URL=https://github.com/yourorg/yourrepo
MKDOCS_LOGO_URL=https://example.com/logo.svg
DB_HOST=postgres
DB_PORT=5432
DB_NAME=survey_data
DB_USER=plumberlama
DB_PASSWORD=plumberlama_dev
For contributing or local development:
# Clone the repository
git clone <repo-url>
cd plumberlama
# Install dependencies and set up environment
uv sync
# Set up pre-commit hooks
uv run pre-commit install
# Run e.g. unit tests after making changes
uv run pytest tests/unit/ -s -vv
plumberlama/
├── src/plumberlama/
│ ├── cli.py # Command-line interface
│ ├── config.py # Configuration dataclass
│ ├── states.py # Immutable state objects
│ ├── transitions.py # State transition functions
│ ├── validation_schemas.py # Pandera validation schemas
│ ├── generated_api_models.py # Pydantic API models (auto-generated)
│ ├── parse_metadata.py # Question parsing and type inference
│ ├── documentation.py # MkDocs generation
│ ├── type_mapping.py # Polars ↔ String type conversion
│ ├── logging_config.py # Logging configuration
│ ├── extract/
│ │ └── question_type.py # Question type extraction and inference
│ ├── transform/
│ │ ├── cast_types.py # Type casting
│ │ ├── decode.py # Choice decoding
│ │ ├── llm.py # LLM integration
│ │ ├── rename_results_columns.py # Column renaming
│ │ └── variable_naming.py # Semantic variable naming
│ └── io/
│ ├── api.py # LamaPoll API client
│ ├── database.py # Database operations
│ └── database_queries.py # SQL query templates
├── scripts/
│ ├── generate_api_models.py # Generate Pydantic models from OpenAPI
│ └── query_db.py # Database query utility
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── e2e/ # End-to-end tests
│ ├── conftest.py # Pytest configuration
│ └── docker-compose.test.yml # Test database setup
├── docker-compose.example.yml # Example deployment setup
├── Dockerfile # Container image definition
└── pyproject.toml # Project dependencies and metadata
The pipeline is built using explicit state transitions following functional programming principles. Each transition is a pure function that takes the current state and returns a new state.
flowchart TD
Config["Config<br/><small>SURVEY_ID + LP_POLL_ID</small>"]
Config --> FetchMeta["Fetch Metadata<br/><small>from LP_POLL_ID</small>"]
FetchMeta --> ParseMeta[Parse Metadata<br/>Extract Variables]
ParseMeta --> ProcessMeta[Process Metadata<br/>Variable Renaming etc.]
ProcessMeta --> PreloadCheck{"Preload Check<br/><small>Query {SURVEY_ID}_metadata</small>"}
PreloadCheck -->|"✓ No tables<br/>load_counter=0<br/>CREATE"| FetchResults["Fetch Results<br/><small>from LP_POLL_ID</small>"]
PreloadCheck -->|"✓ Match<br/>load_counter>0<br/>APPEND"| FetchResults
PreloadCheck -->|"✗ Mismatch<br/>STOP"| Stop["❌ Aborted<br/>"]
FetchResults --> ProcessResults[Process Results<br/>Transform Data]
ProcessResults --> LoadData["Load Data<br/><small>to {SURVEY_ID}_{results&metadata}</small>"]
LoadData -.->|Optional:<br/>plumberlama docs| Document["Documentation<br/><small>from {SURVEY_ID}_metadata</small>"]
style Config fill:#e1f5ff,stroke:#333,stroke-width:2px,color:#000
style FetchMeta fill:#fff4e1,stroke:#333,stroke-width:2px,color:#000
style FetchResults fill:#fff4e1,stroke:#333,stroke-width:2px,color:#000
style ParseMeta fill:#f0e1ff,stroke:#333,stroke-width:2px,color:#000
style ProcessMeta fill:#f0e1ff,stroke:#333,stroke-width:2px,color:#000
style PreloadCheck fill:#ffeb3b,stroke:#333,stroke-width:3px,color:#000
style ProcessResults fill:#e1ffe1,stroke:#333,stroke-width:2px,color:#000
style LoadData fill:#ffe1e1,stroke:#333,stroke-width:2px,color:#000
style Document fill:#ffe1f5,stroke:#333,stroke-width:2px,color:#000
style Stop fill:#ff5252,stroke:#333,stroke-width:2px,color:#fff
SURVEY_ID
: Stable identifier for the cross-sectional survey. Names database tables ({survey_id}_metadata
,{survey_id}_results
)LP_POLL_ID
: LamaPoll poll ID, can change between waves. Data from different polls with identical structure is appended to the sameSURVEY_ID
tablesload_counter
: Tracks which waves data came from (0=first load/CREATE, >0=subsequent loads/APPEND)
Example: Three yearly waves with different LP_POLL_ID
s but same SURVEY_ID=yearly_feedback
→ all stored in yearly_feedback_*
tables with load_counter 0, 1, 2.
LamaPoll's native question types are refined based on structure:
LamaPoll Type | Groups | Variables | Inferred Type | Schema |
---|---|---|---|---|
INPUT | 1 | 1 | input_single_<type> |
String/Int64 |
INPUT | >1 | 1 per group (>1 total) | input_multiple_<type> |
Multiple String/Int64 |
CHOICE | 1 | 1 | single_choice |
String (Enum) |
CHOICE | 1 | >1 | multiple_choice |
Multiple Boolean |
CHOICE | 2 | >1 | multiple_choice_other |
Boolean + String |
SCALE | 1 | 1 | scale |
Int64 with range |
MATRIX | 1 | >1 | matrix |
Multiple Int64 with range |
See src/plumberlama/parse_metadata.py
for full inference logic.
Functional Programming:
- Pure functions with no side effects
- Immutable state objects (frozen dataclasses)
- Explicit data flow through state transitions
- Declarative style
Contract Programming:
- (Preconditions) and postconditions enforced by state validation
- Type annotations guarantee correct data flow
- Pandera schemas enforce DataFrame structure invariants
Data-Oriented Programming:
- Separate data from code
- Generic data structures (DataFrames) over custom classes
- Immutable by default
- Schema separated from representation
After running the ETL pipeline, you can query the PostgreSQL database using predefined query functions:
# List available query functions
uv run plumberlama query --list
# Use query functions (table_prefix automatically set from SURVEY_ID in .env)
uv run plumberlama query get_question_metadata 27937539
uv run plumberlama query get_frequency_distribution Q5
The command automatically loads database credentials and survey ID from your .env
file. See src/plumberlama/io/database_queries.py
for all available query functions.