Run your data pipelines in Python or the browser.
AnkaFlow is a YAML + SQL-powered data pipeline engine that works in local Python, JupyterLite, or fully in-browser via Pyodide.
- Run pipelines using DuckDB with SQL and optional Python
- Supports Parquet, REST APIs, BigQuery, ClickHouse (server only)
- Browser-compatible: works in JupyterLite, GitHub Pages, VS Code Web and more
# Server
pip install ankaflow[server]
# Dev
pip install -e .[dev,server]> ankaflow /path/to/stages.yamlfrom ankaflow import (
ConnectionConfiguration,
Stages,
Flow,
)
connections = ConnectionConfiguration()
stages = Stages.load("path/to/stages.yaml")
flow = Flow(stages, connections)
flow.run()Stages is the object that holds your pipeline definition parsed from a YAML file.
Each stage is one of: tap, transform, or sink.
- name: Extract Data
kind: tap
connection:
kind: Parquet
locator: input.parquet
- name: Transform Data
kind: transform
query: SELECT * FROM "Extract Data" WHERE "amount" > 100
- name: Load Data
kind: sink
connection:
kind: Parquet
locator: output.parquet