Automated validation and testing for LaunchDarkly AI Configs. Catch broken configs before they reach production.
👉 Get started in 5 minutes - see installation and examples below
Prevents broken AI Config deployments with automated checks:
- ✅ Validates AI Configs exist and are properly configured
- ✅ Tests quality with LLM-as-judge evaluation
- ✅ Syncs production defaults for fallback behavior
- ✅ Blocks bad deployments in CI/CD
# From GitHub (testing branch - use this until merged)
pip install git+https://github.com/launchdarkly-labs/ld-aic-cicd.git@feature/user-friendly-setup
# From PyPI (coming soon)
pip install ld-aic-cicd# Setup
export LD_SDK_KEY=sdk-xxxxx
export LD_API_KEY=api-xxxxx
export LD_PROJECT_KEY=your-project
# Validate your AI Configs
ld-aic validate
# Run quality tests
ld-aic test --evaluation-dataset test_data.yaml
# Sync production defaults
ld-aic sync --generate-moduleScans code for AI Config references and verifies they exist in LaunchDarkly:
ld-aic validate --fail-on-errorEvaluates AI responses using GPT-4o or Claude as a judge:
ld-aic test \
--evaluation-dataset test_data.yaml \
--config-keys "support-agent,sales-agent"The framework provides two evaluators with different testing scopes:
Tests individual AI configs in isolation by calling the LaunchDarkly SDK directly.
- What it tests: Individual AI config variations (model, prompt, tools)
- What it doesn't test: Your application code, routing logic, API endpoints
- Best for: Single-config apps, config changes, fast CI checks
- Advantage: No server needed, faster execution
- Usage:
ld-aic test --evaluator direct
Tests your full AI application by making HTTP requests to your running API server.
- What it tests: Complete system including routing, multi-agent workflows, API endpoints
- What it doesn't test: N/A - tests the full stack
- Best for: Multi-agent systems, supervisor routing, production-like validation
- Requirement: Your API server must be running
- Usage:
ld-aic test --evaluator http --api-url http://localhost:8000
When to use which:
- Use Direct if you have a simple app with a single AI config
- Use HTTP if you have:
- Multi-agent systems with supervisor routing
- Complex application logic between user request and AI config
- Custom middleware, authentication, or request processing
- Need to verify that routing selects the correct AI config
Example: Multi-Agent System
# Requires HTTP evaluator to test this routing logic:
User Request → API → Supervisor Agent → Routes to:
├─ Security Agent (config: security-agent)
└─ Support Agent (config: support-agent)Direct evaluator would only test security-agent and support-agent configs individually, missing the critical supervisor routing logic.
Test data format (standardized criteria for all tests):
default_evaluation_criteria:
- name: Relevance
description: "Does it address the question?"
weight: 2.0
- name: Accuracy
description: "Is information correct?"
weight: 2.0
cases:
- id: test_1
input: "How do I reset my password?"
context:
user_type: "customer"Pulls default config values for runtime fallbacks:
ld-aic sync --generate-moduleCreates .ai_config_defaults.json for your app to use when LaunchDarkly is unavailable.
Minimal workflow for PR validation:
name: AI Config Validation
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install
run: pip install git+https://github.com/launchdarkly-labs/ld-aic-cicd.git@feature/user-friendly-setup
- name: Validate
env:
LD_SDK_KEY: ${{ secrets.LD_SDK_KEY }}
LD_API_KEY: ${{ secrets.LD_API_KEY }}
LD_PROJECT_KEY: ${{ secrets.LD_PROJECT_KEY }}
run: ld-aic validate --fail-on-errorRequired secrets: LD_SDK_KEY, LD_API_KEY, LD_PROJECT_KEY
See examples/ for complete workflow templates.
# Required
LD_SDK_KEY=sdk-xxxxx # LaunchDarkly SDK key
LD_API_KEY=api-xxxxx # LaunchDarkly API token
LD_PROJECT_KEY=your-project # Your project key
# Optional (for testing)
OPENAI_API_KEY=sk-xxxxx # For GPT-4o judge
ANTHROPIC_API_KEY=sk-ant-xxxxx # For Claude judgeFor custom AI systems (agents, RAG, etc), implement LocalEvaluator:
from ld_aic_cicd.evaluator import LocalEvaluator, EvaluationResult
class MyEvaluator(LocalEvaluator):
async def evaluate_case(self, config_key, test_input, context_attributes):
# Call your AI system
response = await my_ai_system.chat(test_input, context_attributes)
return EvaluationResult(
response=response,
latency_ms=latency,
variation="my-variation",
config_key=config_key
)Use it:
ld-aic test \
--evaluator my_evaluator:MyEvaluator \
--evaluation-dataset test_data.yamlCode Changes → Validate Configs → Test Quality → Sync Defaults → Deploy ✅
↓ ↓ ↓
Pass/Fail Pass/Fail Drift Check
- docs/tutorial.md - Step-by-step guide
- docs/complete-reference.md - Full documentation
- examples/ - Sample code and workflows
"No configs found": Use --config-keys to specify explicitly
"Module not found": Ensure ld-aic-cicd is installed: pip list | grep ld-aic-cicd
Judge evaluation fails: Set OPENAI_API_KEY or ANTHROPIC_API_KEY
For more help, see full troubleshooting guide
Issues and PRs welcome! See full documentation for development setup.