LaunchDarkly AI Config CI/CD Pipeline

Automated validation and testing for LaunchDarkly AI Configs. Catch broken configs before they reach production.

👉 Get started in 5 minutes - see installation and examples below

What It Does

Prevents broken AI Config deployments with automated checks:

✅ Validates AI Configs exist and are properly configured
✅ Tests quality with LLM-as-judge evaluation
✅ Syncs production defaults for fallback behavior
✅ Blocks bad deployments in CI/CD

Installation

# From GitHub (testing branch - use this until merged)
pip install git+https://github.com/launchdarkly-labs/ld-aic-cicd.git@feature/user-friendly-setup

# From PyPI (coming soon)
pip install ld-aic-cicd

Quick Example

# Setup
export LD_SDK_KEY=sdk-xxxxx
export LD_API_KEY=api-xxxxx
export LD_PROJECT_KEY=your-project

# Validate your AI Configs
ld-aic validate

# Run quality tests
ld-aic test --evaluation-dataset test_data.yaml

# Sync production defaults
ld-aic sync --generate-module

Usage

1. Validate AI Configs

Scans code for AI Config references and verifies they exist in LaunchDarkly:

ld-aic validate --fail-on-error

2. Test Quality with LLM Judge

Evaluates AI responses using GPT-4o or Claude as a judge:

ld-aic test \
  --evaluation-dataset test_data.yaml \
  --config-keys "support-agent,sales-agent"

Choosing an Evaluator

The framework provides two evaluators with different testing scopes:

Direct Evaluator (Unit Testing)

Tests individual AI configs in isolation by calling the LaunchDarkly SDK directly.

What it tests: Individual AI config variations (model, prompt, tools)
What it doesn't test: Your application code, routing logic, API endpoints
Best for: Single-config apps, config changes, fast CI checks
Advantage: No server needed, faster execution
Usage: ld-aic test --evaluator direct

HTTP Evaluator (Integration Testing)

Tests your full AI application by making HTTP requests to your running API server.

What it tests: Complete system including routing, multi-agent workflows, API endpoints
What it doesn't test: N/A - tests the full stack
Best for: Multi-agent systems, supervisor routing, production-like validation
Requirement: Your API server must be running
Usage: ld-aic test --evaluator http --api-url http://localhost:8000

When to use which:

Use Direct if you have a simple app with a single AI config
Use HTTP if you have:
- Multi-agent systems with supervisor routing
- Complex application logic between user request and AI config
- Custom middleware, authentication, or request processing
- Need to verify that routing selects the correct AI config

Example: Multi-Agent System

# Requires HTTP evaluator to test this routing logic:
User Request → API → Supervisor Agent → Routes to:
                                       ├─ Security Agent (config: security-agent)
                                       └─ Support Agent (config: support-agent)

Direct evaluator would only test security-agent and support-agent configs individually, missing the critical supervisor routing logic.

Test data format (standardized criteria for all tests):

default_evaluation_criteria:
  - name: Relevance
    description: "Does it address the question?"
    weight: 2.0
  - name: Accuracy
    description: "Is information correct?"
    weight: 2.0

cases:
  - id: test_1
    input: "How do I reset my password?"
    context:
      user_type: "customer"

3. Sync Production Defaults

Pulls default config values for runtime fallbacks:

ld-aic sync --generate-module

Creates .ai_config_defaults.json for your app to use when LaunchDarkly is unavailable.

GitHub Actions Integration

Minimal workflow for PR validation:

name: AI Config Validation
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install
        run: pip install git+https://github.com/launchdarkly-labs/ld-aic-cicd.git@feature/user-friendly-setup

      - name: Validate
        env:
          LD_SDK_KEY: ${{ secrets.LD_SDK_KEY }}
          LD_API_KEY: ${{ secrets.LD_API_KEY }}
          LD_PROJECT_KEY: ${{ secrets.LD_PROJECT_KEY }}
        run: ld-aic validate --fail-on-error

Required secrets: LD_SDK_KEY, LD_API_KEY, LD_PROJECT_KEY

See examples/ for complete workflow templates.

Configuration

Environment Variables

# Required
LD_SDK_KEY=sdk-xxxxx              # LaunchDarkly SDK key
LD_API_KEY=api-xxxxx              # LaunchDarkly API token
LD_PROJECT_KEY=your-project       # Your project key

# Optional (for testing)
OPENAI_API_KEY=sk-xxxxx           # For GPT-4o judge
ANTHROPIC_API_KEY=sk-ant-xxxxx    # For Claude judge

Custom Evaluators

For custom AI systems (agents, RAG, etc), implement LocalEvaluator:

from ld_aic_cicd.evaluator import LocalEvaluator, EvaluationResult

class MyEvaluator(LocalEvaluator):
    async def evaluate_case(self, config_key, test_input, context_attributes):
        # Call your AI system
        response = await my_ai_system.chat(test_input, context_attributes)

        return EvaluationResult(
            response=response,
            latency_ms=latency,
            variation="my-variation",
            config_key=config_key
        )

Use it:

ld-aic test \
  --evaluator my_evaluator:MyEvaluator \
  --evaluation-dataset test_data.yaml

Architecture

Code Changes → Validate Configs → Test Quality → Sync Defaults → Deploy ✅
                     ↓                 ↓               ↓
                  Pass/Fail        Pass/Fail       Drift Check

Documentation

docs/tutorial.md - Step-by-step guide
docs/complete-reference.md - Full documentation
examples/ - Sample code and workflows

Troubleshooting

"No configs found": Use --config-keys to specify explicitly

"Module not found": Ensure ld-aic-cicd is installed: pip list | grep ld-aic-cicd

Judge evaluation fails: Set OPENAI_API_KEY or ANTHROPIC_API_KEY

For more help, see full troubleshooting guide

Contributing

Issues and PRs welcome! See full documentation for development setup.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.githooks		.githooks
docs		docs
examples		examples
ld_aic_cicd		ld_aic_cicd
templates		templates
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LaunchDarkly AI Config CI/CD Pipeline

What It Does

Installation

Quick Example

Usage

1. Validate AI Configs

2. Test Quality with LLM Judge

Choosing an Evaluator

Direct Evaluator (Unit Testing)

HTTP Evaluator (Integration Testing)

3. Sync Production Defaults

GitHub Actions Integration

Configuration

Environment Variables

Custom Evaluators

Architecture

Documentation

Troubleshooting

Contributing

About

Uh oh!

Releases

Packages

Languages

launchdarkly-labs/scarlett_ai_configs_ci_cd-

Folders and files

Latest commit

History

Repository files navigation

LaunchDarkly AI Config CI/CD Pipeline

What It Does

Installation

Quick Example

Usage

1. Validate AI Configs

2. Test Quality with LLM Judge

Choosing an Evaluator

Direct Evaluator (Unit Testing)

HTTP Evaluator (Integration Testing)

3. Sync Production Defaults

GitHub Actions Integration

Configuration

Environment Variables

Custom Evaluators

Architecture

Documentation

Troubleshooting

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages