A production-grade dbt project that classifies stock market volatility regimes using the CBOE Volatility Index (VIX) and analyzes stock performance across different market conditions.
This project implements a multi-layered data pipeline that:
- Classifies market volatility into four distinct regimes based on VIX levels
- Tracks regime periods over time using temporal joins and window functions
- Enriches stock price data with volatility classifications
- Calculates RSI technical indicators using both SMA and Wilder smoothing methods
- Maintains historical versions of data using SCD-2 snapshots
- Risk Assessment: Identify market stress periods for portfolio adjustments
- Performance Analysis: Understand how stocks behave during different volatility regimes
- Technical Analysis: Generate trading signals using RSI indicators calibrated to market conditions
- Historical Tracking: Maintain complete audit trail of regime classifications and price changes
Pipeline Flow:
- Python (yfinance/FRED): Extract stock and VIX data via APIs
- Data Sources: Yahoo Finance for stocks, FRED for VIX
- dbt + MotherDuck: Transform raw data into analytics-ready models
- GitHub Actions: Automated testing and deployment on every commit
Raw Sources (Yahoo Finance, FRED)
β
Bronze Layer (src_*)
β
Silver Layer (stg_*, int_*)
β
Gold Layer (fct_*, mrt_*)
β
Snapshots (SCD-2)
VIX Regime Classification:
src_vix β stg__vix_with_versions β int__regime_classification β int__regime_changes β mrt__stocks_with_regimes
RSI Calculation:
src_stocks β stg__stocks β int__price_changes β int__rsi_sma β fct__rsi
β int__rsi_wilder β
- Temporal Joins: Version-aware threshold selection using effective dates
- Window Functions: Running sum grouping for regime period detection
- Recursive CTEs: Wilder smoothing implementation for RSI calculation
- Incremental Models: Efficient daily updates with merge strategy
- 111 Automated Tests: Comprehensive validation across all layers
- Contract Enforcement: Type-safe schemas with explicit constraints
- Business Logic Validation: OHLC consistency, RSI bounds, threshold ordering
- Referential Integrity: Cross-model relationship testing
- SCD-2 Snapshots: Historical tracking of regime classifications and stock prices
- Versioned Configuration: Reproducible regime definitions via seed files
- Incremental Processing: 15% performance improvement over full refresh
- Comprehensive Documentation: Model descriptions, column definitions, grain specifications
portfolio_project_2/
βββ assets/
β βββ data_ingestion.ipynb # Python scripts for fetching Yahoo/FRED data
βββ models/
β βββ bronze/ # Source models (raw data views)
β β βββ src_stocks.sql
β β βββ src_vix.sql
β βββ silver/
β β βββ stg/ # Staging models (cleaned, typed)
β β β βββ stg__stocks.sql
β β β βββ stg__vix_with_versions.sql
β β βββ int/ # Intermediate models (business logic)
β β βββ int__regime_classification.sql
β β βββ int__regime_changes.sql
β β βββ int__price_changes.sql
β β βββ int__rsi_sma.sql
β β βββ int__rsi_wilder.sql
β βββ gold/ # Mart models (analytics-ready)
β βββ fct__rsi.sql
β βββ mrt__stocks_with_regimes.sql
βββ seeds/
β βββ regime_thresholds.csv # Versioned VIX thresholds
βββ snapshots/
β βββ snapshots.yml # SCD-2 configurations
βββ tests/
β βββ assert_regime_threshold_ordering.sql
β βββ wilder_seed_equals_sma.sql
β βββ TESTING_RESULTS.md # Detailed test documentation
βββ macros/
β βββ ingest_raw.sql # Data loading utilities
βββ README.md
Assigns volatility regime to each VIX trading day using versioned thresholds.
Grain: One row per trading date
Regime Definitions:
| Regime | VIX Range | Market Condition |
|---|---|---|
| Complacent | < 12 | Very low volatility |
| Normal | 12-20 | Typical conditions |
| Elevated | 20-30 | Increased uncertainty |
| Crisis | 30+ | Extreme volatility |
Tracks consecutive trading day periods where VIX maintained the same regime.
Grain: One row per regime period
Key Columns:
regime_period_id: Sequential period identifierstart_date/end_date: Period boundariesdays_in_regime: Duration of regime period
Fact table enriching daily stock prices with VIX regime classifications.
Grain: One row per ticker per trading date
Materialization: Incremental (merge strategy)
Key Features:
- OHLC price data
- Volume metrics
- VIX regime classification
- Regime period identifier
Relative Strength Index calculations using both SMA and Wilder smoothing methods.
Grain: One row per ticker per trading date
Indicators:
sma_rsi: 14-day RSI using Simple Moving Averagewilder_rsi: 14-day RSI using Wilder's exponential smoothingsma_rs/wilder_rs: Relative strength ratios
| Category | Count | Purpose |
|---|---|---|
| Data Quality | 60 | not_null, unique, relationships |
| Business Logic | 51 | OHLC consistency, RSI validation, threshold ordering |
| Constraints | 9 | Contract enforcement on mart models |
| Snapshots | 2 | SCD-2 temporal tracking |
| Total | 122 | Comprehensive validation across all layers |
- Financial Data Integrity: OHLC consistency, positive prices, volume validation
- RSI Calculation Accuracy: Gain/loss logic, precision checks (1e-9 tolerance), Wilder initialization
- Regime Classification: Threshold ordering, accepted values, temporal versioning
- Referential Integrity: Cross-model relationships, date consistency
- Temporal Accuracy: Date bounds, timestamp validation, period tracking
See Testing Documentation for detailed test scenarios and results.
- dbt Core 1.10.13 or higher
- DuckDB with MotherDuck connection
- Python 3.8+ (for data ingestion scripts)
- Required libraries:
yfinance,pandas(seeassets/data_ingestion.ipynb)
- Required libraries:
# Clone the repository
git clone <repository-url>
cd portfolio_project_2
# Install dbt dependencies
dbt deps
# Configure profiles.yml with your MotherDuck credentials
# See: https://docs.getdbt.com/docs/core/connect-data-platform/motherduck-setupWhile dbt handles the transformation layer, the data extraction is performed using Python. I have included a Jupyter Notebook that demonstrates how raw financial data is fetched, cleaned, and prepared for the bronze layer.
File Location: assets/data_ingestion.ipynb
Workflow:
- Extract: Fetches daily OHLC data using the
yfinancelibrary and VIX data from CSV/FRED. - Format: Stacks and renames columns to match the target schema for DuckDB/MotherDuck.
- Load: Exports standardized CSVs to the raw data directory for
dbt seedor direct ingestion.
# Load seed data
dbt seed
# Run all models
dbt run
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serve# Daily incremental update
dbt run --select mrt__stocks_with_regimes fct__rsi
# Run snapshots to capture SCD-2 history
dbt snapshotThis project uses GitHub Actions for continuous integration and deployment of dbt models.
Triggered on pushes to dev or any feature/** branch.
Validates code changes by:
- Spinning up a temporary DuckDB/MotherDuck dev environment
- Running
dbt build --defer --state prod_state --select "state:modified+"
(only builds modified models vs production state) - Downloads production
manifest.jsonfrom latest successful CD run for state comparison - Cost Optimization: Utilizing state:modified+ and --defer ensures only changed models and their downstream dependencies are processed, significantly reducing compute costs and build time compared to full refreshes.
- Ensures no regressions or test failures before merge.
Triggered on merge/push to main.
Performs a full production build and artifact upload:
- Installs dependencies and configures MotherDuck prod profile
- Runs
dbt build --target prodto rebuild analytics marts - Uploads
manifest.jsonandrun_results.jsonas build artifacts (prod-state) for CI reuse
Together, these workflows enforce deployment hygiene, schema consistency, and safe promotion from dev β prod.
SELECT
regime,
ticker,
AVG(close) as avg_price,
STDDEV(close) as price_volatility,
COUNT(*) as trading_days
FROM analytics_dev.mrt__stocks_with_regimes
GROUP BY regime, ticker
ORDER BY ticker, regime;SELECT
regime_period_id,
regime,
start_date,
end_date,
days_in_regime
FROM analytics_dev.int__regime_changes
WHERE regime IN ('elevated', 'crisis')
ORDER BY start_date DESC
LIMIT 10;SELECT
s.trade_date,
s.ticker,
s.close,
s.regime,
r.sma_rsi,
r.wilder_rsi,
CASE
WHEN r.sma_rsi > 70 THEN 'Overbought'
WHEN r.sma_rsi < 30 THEN 'Oversold'
ELSE 'Neutral'
END as signal
FROM analytics_dev.mrt__stocks_with_regimes s
JOIN analytics_dev.fct__rsi r
ON s.trade_date = r.trade_date
AND s.ticker = r.ticker
WHERE s.regime IN ('elevated', 'crisis')
ORDER BY s.trade_date DESC;Regime thresholds are managed via seed file for reproducibility:
version_id,version_name,complacent_max,normal_max,elevated_max,effective_date,notes
1,initial,12,20,30,2024-11-01,Initial thresholds based on historical VIX averagesTo update thresholds:
- Add new row to
seeds/regime_thresholds.csv - Run
dbt seed --full-refresh - Run
dbt run --select stg__vix_with_versions+
Historical data will automatically reclassify using appropriate version based on effective_date.
- Stock Prices: Yahoo Finance via yFinance Python package
- VIX Data: Federal Reserve Economic Data (FRED) API
- Time Period: January 2024 - November 2025
- Tickers: AAPL, GOOGL, MSFT, JNJ, NVDA, SPY, TSLA, XLP
This project demonstrates proficiency in:
- dbt Core: Incremental models, snapshots, seeds, tests, contracts
- Advanced SQL: Window functions, CTEs, recursive queries, temporal joins
- Data Modeling: Dimensional modeling, SCD-2, fact/dimension tables
- Data Quality: Comprehensive testing strategy, business logic validation
- Production Patterns: Versioning, incremental processing, performance optimization
- Technical Writing: Documentation, testing methodology, runbooks
David - Analytics Engineer
LinkedIn | GitHub | Email
This project is part of a personal portfolio and is available for review and educational purposes under the MIT License.
Note: The underlying dataset used in this project is derived from publicly available sources including the Yahoo Finance API and FRED CBOE Volatility (VIXCLS) data. This repository is for educational and demonstration purposes only and does not distribute or claim ownership over any Yahoo or FRED data.
- dbt Labs for the incredible dbt framework
- MotherDuck for cloud DuckDB capabilities
- Yahoo Finance and FRED for financial data
Built with β€οΈ using dbt, DuckDB, and MotherDuck
