Skip to content

launchdarkly-labs/scarlett_ai_configs_ci_cd-

Repository files navigation

LaunchDarkly AI Config CI/CD Pipeline

Automated validation and testing for LaunchDarkly AI Configs. Catch broken configs before they reach production.

👉 Get started in 5 minutes - see installation and examples below


What It Does

Prevents broken AI Config deployments with automated checks:

  • Validates AI Configs exist and are properly configured
  • Tests quality with LLM-as-judge evaluation
  • Syncs production defaults for fallback behavior
  • Blocks bad deployments in CI/CD

Installation

# From GitHub (testing branch - use this until merged)
pip install git+https://github.com/launchdarkly-labs/ld-aic-cicd.git@feature/user-friendly-setup

# From PyPI (coming soon)
pip install ld-aic-cicd

Quick Example

# Setup
export LD_SDK_KEY=sdk-xxxxx
export LD_API_KEY=api-xxxxx
export LD_PROJECT_KEY=your-project

# Validate your AI Configs
ld-aic validate

# Run quality tests
ld-aic test --evaluation-dataset test_data.yaml

# Sync production defaults
ld-aic sync --generate-module

Usage

1. Validate AI Configs

Scans code for AI Config references and verifies they exist in LaunchDarkly:

ld-aic validate --fail-on-error

2. Test Quality with LLM Judge

Evaluates AI responses using GPT-4o or Claude as a judge:

ld-aic test \
  --evaluation-dataset test_data.yaml \
  --config-keys "support-agent,sales-agent"

Choosing an Evaluator

The framework provides two evaluators with different testing scopes:

Direct Evaluator (Unit Testing)

Tests individual AI configs in isolation by calling the LaunchDarkly SDK directly.

  • What it tests: Individual AI config variations (model, prompt, tools)
  • What it doesn't test: Your application code, routing logic, API endpoints
  • Best for: Single-config apps, config changes, fast CI checks
  • Advantage: No server needed, faster execution
  • Usage: ld-aic test --evaluator direct

HTTP Evaluator (Integration Testing)

Tests your full AI application by making HTTP requests to your running API server.

  • What it tests: Complete system including routing, multi-agent workflows, API endpoints
  • What it doesn't test: N/A - tests the full stack
  • Best for: Multi-agent systems, supervisor routing, production-like validation
  • Requirement: Your API server must be running
  • Usage: ld-aic test --evaluator http --api-url http://localhost:8000

When to use which:

  • Use Direct if you have a simple app with a single AI config
  • Use HTTP if you have:
    • Multi-agent systems with supervisor routing
    • Complex application logic between user request and AI config
    • Custom middleware, authentication, or request processing
    • Need to verify that routing selects the correct AI config

Example: Multi-Agent System

# Requires HTTP evaluator to test this routing logic:
User Request → API → Supervisor Agent → Routes to:
                                       ├─ Security Agent (config: security-agent)
                                       └─ Support Agent (config: support-agent)

Direct evaluator would only test security-agent and support-agent configs individually, missing the critical supervisor routing logic.

Test data format (standardized criteria for all tests):

default_evaluation_criteria:
  - name: Relevance
    description: "Does it address the question?"
    weight: 2.0
  - name: Accuracy
    description: "Is information correct?"
    weight: 2.0

cases:
  - id: test_1
    input: "How do I reset my password?"
    context:
      user_type: "customer"

3. Sync Production Defaults

Pulls default config values for runtime fallbacks:

ld-aic sync --generate-module

Creates .ai_config_defaults.json for your app to use when LaunchDarkly is unavailable.

GitHub Actions Integration

Minimal workflow for PR validation:

name: AI Config Validation
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install
        run: pip install git+https://github.com/launchdarkly-labs/ld-aic-cicd.git@feature/user-friendly-setup

      - name: Validate
        env:
          LD_SDK_KEY: ${{ secrets.LD_SDK_KEY }}
          LD_API_KEY: ${{ secrets.LD_API_KEY }}
          LD_PROJECT_KEY: ${{ secrets.LD_PROJECT_KEY }}
        run: ld-aic validate --fail-on-error

Required secrets: LD_SDK_KEY, LD_API_KEY, LD_PROJECT_KEY

See examples/ for complete workflow templates.

Configuration

Environment Variables

# Required
LD_SDK_KEY=sdk-xxxxx              # LaunchDarkly SDK key
LD_API_KEY=api-xxxxx              # LaunchDarkly API token
LD_PROJECT_KEY=your-project       # Your project key

# Optional (for testing)
OPENAI_API_KEY=sk-xxxxx           # For GPT-4o judge
ANTHROPIC_API_KEY=sk-ant-xxxxx    # For Claude judge

Custom Evaluators

For custom AI systems (agents, RAG, etc), implement LocalEvaluator:

from ld_aic_cicd.evaluator import LocalEvaluator, EvaluationResult

class MyEvaluator(LocalEvaluator):
    async def evaluate_case(self, config_key, test_input, context_attributes):
        # Call your AI system
        response = await my_ai_system.chat(test_input, context_attributes)

        return EvaluationResult(
            response=response,
            latency_ms=latency,
            variation="my-variation",
            config_key=config_key
        )

Use it:

ld-aic test \
  --evaluator my_evaluator:MyEvaluator \
  --evaluation-dataset test_data.yaml

Architecture

Code Changes → Validate Configs → Test Quality → Sync Defaults → Deploy ✅
                     ↓                 ↓               ↓
                  Pass/Fail        Pass/Fail       Drift Check

Documentation

Troubleshooting

"No configs found": Use --config-keys to specify explicitly

"Module not found": Ensure ld-aic-cicd is installed: pip list | grep ld-aic-cicd

Judge evaluation fails: Set OPENAI_API_KEY or ANTHROPIC_API_KEY

For more help, see full troubleshooting guide

Contributing

Issues and PRs welcome! See full documentation for development setup.

About

demo of implementing ci cd with ai configs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published