ReplicantX is an end-to-end testing harness for AI agents that communicates via web service APIs. It enables you to run comprehensive test scenarios against live HTTP APIs with support for multiple authentication methods and detailed reporting.
- Two Test Levels:
- Level 1 (Basic): Fixed user messages with deterministic assertions
- Level 2 (Agent): Intelligent Replicant agent with configurable facts and conversation goals
- Pydantic-Based Replicant Agent: Smart conversational agent that acts like a real user
- Configurable Facts & Behavior: Agents can have knowledge (Name, Preferences) and custom personalities
- Real-time Monitoring: Watch mode (
--watch
) for live conversation monitoring - Technical Debugging: Debug mode (
--debug
) with detailed HTTP, validation, and AI processing logs - Multiple Authentication: Supabase email+password, custom JWT, or no-auth
- CLI Interface: Easy-to-use command-line interface with
replicantx run
- Parallel Execution: Run multiple test scenarios concurrently for faster execution
- Automatic .env Loading: No manual environment variable sourcing required
- GitHub Actions Ready: Built-in workflow for PR testing with Render preview URLs
- Rich Reporting: Markdown and JSON reports with timing and assertion results
- Retry & Backoff: Robust HTTP client with automatic retry logic
pip install replicantx[cli]
- Create a test scenario YAML file:
Basic Scenario (Level 1):
# tests/basic_test.yaml
name: "Test AI Agent Conversation"
base_url: https://your-api.com/api/chat
auth:
provider: noop # or supabase, jwt
level: basic
steps:
- user: "Hello, I need help with booking a flight"
expect_contains: ["flight", "booking"]
- user: "I want to go to Paris"
expect_regex: "(?i)paris.*available"
Agent Scenario (Level 2):
Generic Customer Support Example:
# tests/support_test.yaml
name: "Customer Support - Account Issue"
base_url: https://your-api.com/api/support
auth:
provider: noop
level: agent
replicant:
goal: "Get help with account access issue"
facts:
name: "Alex Chen"
email: "[email protected]"
account_id: "ACC-123456"
issue_type: "login_problem"
last_login: "2 weeks ago"
system_prompt: |
You are a customer seeking help with an account issue. You have the
necessary information but don't provide all details upfront.
Answer questions based on your available facts.
initial_message: "Hi, I'm having trouble accessing my account."
max_turns: 12
completion_keywords: ["resolved", "ticket created", "issue fixed"]
Travel Booking Example:
# tests/travel_test.yaml
name: "Travel Booking - Flight Reservation"
base_url: https://your-api.com/api/chat
auth:
provider: noop
level: agent
replicant:
goal: "Book a business class flight to Paris"
facts:
name: "Sarah Johnson"
email: "[email protected]"
travel_class: "business"
destination: "Paris"
budget: "$2000"
system_prompt: |
You are a customer trying to book a flight. You have the
necessary information but don't provide all details upfront.
Answer questions based on your available facts.
initial_message: "Hi, I'd like to book a flight to Paris."
max_turns: 15
completion_keywords: ["booked", "confirmed", "reservation number"]
- Run the test:
replicantx run tests/my_test.yaml --report report.md
- View the generated report in
report.md
ReplicantX automatically detects environment variables from your system, .env
files, and CI/CD environments. No special configuration needed when installed as a dependency!
When you install ReplicantX in your project:
# Your project setup
pip install replicantx[cli]
# Your environment variables (any of these methods work)
export OPENAI_API_KEY=sk-your-key # Shell environment
echo "OPENAI_API_KEY=sk-key" > .env # .env file
# OR set in your CI/CD platform
# ReplicantX automatically finds them!
replicantx run tests/*.yaml
Essential variables for different use cases:
# LLM Integration (PydanticAI auto-detects these)
export OPENAI_API_KEY=sk-your-openai-key
export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
# Supabase Authentication
export SUPABASE_URL=https://your-project.supabase.co
export SUPABASE_ANON_KEY=your-supabase-anon-key
# Target API Configuration
export REPLICANTX_TARGET=your-api-domain.com
# Custom Authentication
export JWT_TOKEN=your-jwt-token
export MY_API_KEY=your-custom-api-key
Local Development:
# Create .env file (ReplicantX automatically loads it!)
cat > .env << 'EOF'
OPENAI_API_KEY=sk-dev-key
REPLICANTX_TARGET=dev-api.example.com
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-supabase-key
EOF
# Just run tests - no need to source .env!
replicantx run tests/*.yaml
# Or export manually (old way still works)
export OPENAI_API_KEY=sk-dev-key
replicantx run tests/*.yaml
GitHub Actions (No .env files needed!):
# .github/workflows/test-api.yml
jobs:
test:
runs-on: ubuntu-latest
env:
# GitHub Secrets β Environment Variables
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
REPLICANTX_TARGET: ${{ secrets.API_TARGET_URL }}
steps:
- run: pip install replicantx[cli]
- run: replicantx run tests/*.yaml --ci
# ReplicantX automatically finds the variables!
Reference variables with {{ env.VARIABLE_NAME }}
syntax:
name: "API Test"
base_url: "https://{{ env.REPLICANTX_TARGET }}/api/chat"
auth:
provider: supabase
project_url: "{{ env.SUPABASE_URL }}"
api_key: "{{ env.SUPABASE_ANON_KEY }}"
level: agent
replicant:
facts:
api_key: "{{ env.MY_API_KEY }}"
llm:
model: "openai:gpt-4o" # Uses OPENAI_API_KEY automatically
-
Go to Repository Settings β Secrets and Variables β Actions
-
Add secrets:
OPENAI_API_KEY
=sk-your-openai-key
SUPABASE_URL
=https://your-project.supabase.co
SUPABASE_ANON_KEY
=your-supabase-key
REPLICANTX_TARGET
=api.yourproject.com
-
Use in workflow:
env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} REPLICANTX_TARGET: ${{ secrets.REPLICANTX_TARGET }}
π‘ Key Benefits:
- β Automatic .env loading - Just create a .env file and run tests
- β Zero configuration - ReplicantX finds variables automatically
- β Works everywhere - local, CI/CD, Docker, cloud platforms
- β Secure by default - no hardcoded secrets in code
- β Standard patterns - uses industry-standard environment variable detection
Note: Create a
.env.example
file in your project to document which variables are needed. See our comprehensive environment variable guide in the LLM Integration section.
ReplicantX automatically loads environment variables from .env
files using python-dotenv. No manual sourcing required!
# Create .env file in your project root
cat > .env << 'EOF'
# LLM API Keys
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
# Target API
REPLICANTX_TARGET=https://api.yourproject.com
# Supabase Authentication
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-supabase-anon-key
[email protected]
TEST_USER_PASSWORD=testpassword123
# JWT Authentication
JWT_TOKEN=your-jwt-token
EOF
# Just run - ReplicantX finds .env automatically!
replicantx run tests/*.yaml
# Validate test files
replicantx validate tests/*.yaml
# Generate reports
replicantx run tests/*.yaml --report report.md
- Automatic Discovery: ReplicantX looks for
.env
files in current directory and parent directories - Non-intrusive: If no
.env
file exists, it continues normally - Environment Priority: Existing environment variables take precedence over
.env
values - Secure:
.env
files should be added to.gitignore
to avoid committing secrets
# Add .env to .gitignore
echo ".env" >> .gitignore
# Create .env.example for documentation
cat > .env.example << 'EOF'
# Copy this file to .env and fill in your values
OPENAI_API_KEY=sk-your-openai-key-here
REPLICANTX_TARGET=https://your-api-domain.com
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-supabase-anon-key-here
EOF
β No more manual environment variable management!
Basic scenarios use fixed user messages with deterministic assertions:
name: "Basic Test Scenario"
base_url: "https://api.example.com/chat"
auth:
provider: noop
level: basic
steps:
- user: "User message"
expect_contains: ["expected", "text"]
expect_regex: "regex_pattern"
expect_equals: "exact_match"
expect_not_contains: ["forbidden", "text"]
Agent scenarios use intelligent Replicant agents that converse naturally:
name: "Agent Test Scenario"
base_url: "https://api.example.com/chat"
auth:
provider: supabase
email: [email protected]
password: password123
project_url: "{{ env.SUPABASE_URL }}"
api_key: "{{ env.SUPABASE_ANON_KEY }}"
level: agent
validate_politeness: false # Optional: validate conversational tone (default: false)
replicant:
goal: "Description of what the agent should achieve"
facts:
name: "User Name"
email: "[email protected]"
# Add any facts the agent should know
system_prompt: |
Customize the agent's personality and behavior.
This prompt defines how the agent should act.
initial_message: "Starting message for the conversation"
max_turns: 20
completion_keywords:
- "success"
- "completed"
- "finished"
fullconversation: true # Send full conversation history with each request
By default, ReplicantX focuses on functional API validation. However, you can optionally enable politeness/conversational tone validation:
# Disable politeness validation (default) - focuses on functional responses
validate_politeness: false
# Enable politeness validation - also checks for conversational tone
validate_politeness: true
When to use politeness validation:
- β Customer-facing APIs where tone matters
- β Chatbots and conversational AI services
- β User experience testing scenarios
When to skip politeness validation:
- β Internal APIs focused on functionality
- β Data APIs returning structured responses
- β Technical integrations where tone is irrelevant
Note: Politeness validation is subjective and based on common conversational patterns. It looks for polite phrases like "please", "thank you", "how can I help", question patterns, and helpful language.
# Run all tests in a directory
replicantx run tests/*.yaml --report report.md
# Run with CI mode (exits 1 on failure)
replicantx run tests/*.yaml --report report.md --ci
# Run specific test file
replicantx run tests/specific_test.yaml
# Real-time conversation monitoring
replicantx run tests/*.yaml --watch
# Technical debugging with detailed logs
replicantx run tests/*.yaml --debug
# Combined monitoring and debugging
replicantx run tests/*.yaml --debug --watch
# Run tests in parallel for faster execution
replicantx run tests/*.yaml --parallel
# Run with limited concurrency to prevent API overload
replicantx run tests/*.yaml --parallel --max-concurrent 3
# Validate test files without running
replicantx validate tests/*.yaml --verbose
ReplicantX provides comprehensive monitoring and debugging capabilities to help you understand what's happening during test execution.
Real-time conversation monitoring for observing test execution as it happens:
replicantx run tests/agent_test.yaml --watch
What you see:
- π₯ Live conversation setup with goal and facts
- π€ Replicant messages as they're sent (with timestamps)
- β±οΈ Response waiting indicators
- π€ Agent responses as received
- β /β Step results with pass/fail status and timing
- π Final summary with success rate, duration, goal achievement
Perfect for:
- β Live demos - Show clients real AI conversations
- β Test monitoring - Watch long-running tests progress
- β User experience validation - See conversation flow
- β Performance monitoring - Track response times
Example Output:
[22:04:42] π₯ LIVE CONVERSATION - Starting agent scenario
[22:04:42] π― Goal: Book a business class flight to Paris
[22:04:42] π€ Replicant: Hi, I'd like to book a flight to Paris.
[22:04:52] β
Step 1: PASSED (10.2s)
[22:04:52] π€ Agent: What cabin class would you prefer?
[22:04:53] π€ Replicant: Business class, please.
[22:05:03] β
Step 2: PASSED (9.8s)
Technical deep-dive with detailed system information:
replicantx run tests/agent_test.yaml --debug
What you see:
- π HTTP client setup (URL, timeout, auth provider, headers)
- π Replicant agent initialization (goal, facts, AI model settings)
- π HTTP requests (payload details, conversation history)
- π API responses (status codes, latency, content preview)
- π Response validation (assertion counts, individual results)
- π AI processing (response parsing, message generation)
Perfect for:
- π Troubleshooting - Diagnose failed assertions
- π Performance tuning - Analyze HTTP latency and bottlenecks
- π Integration debugging - Check payload formats and API calls
- π AI behavior analysis - Understand PydanticAI decision making
Example Output:
π DEBUG HTTP Client initialized
ββ base_url: https://api.example.com/chat
ββ timeout: 120s
ββ auth_provider: supabase
ββ auth_headers: 2 headers
π DEBUG HTTP request payload
ββ message: Hi, I'd like to book a flight to Paris.
ββ conversation_history_length: 1
ββ payload_size: 229 chars
π DEBUG Response validation completed
ββ total_assertions: 2
ββ passed_assertions: 2
ββ overall_passed: True
Get both real-time conversation flow and technical details:
replicantx run tests/agent_test.yaml --debug --watch
Perfect for:
- π― Development - Full visibility during feature building
- π― Complex debugging - When you need everything
- π― Training - Teaching others how the system works
- π― Comprehensive analysis - Complete test execution insight
For Long-running Tests:
# Watch progress while generating a report
replicantx run tests/*.yaml --watch --report detailed.md
For CI/CD Debugging:
# Debug mode with CI exit codes
replicantx run tests/*.yaml --debug --ci
For Performance Analysis:
# Combined with verbose output
replicantx run tests/*.yaml --debug --verbose --report performance.json
ReplicantX supports parallel execution of test scenarios for significantly faster test runs, especially when testing multiple scenarios against the same API.
Run all scenarios in parallel (overrides individual scenario settings):
# Run all tests in parallel
replicantx run tests/*.yaml --parallel
# Run with limited concurrency to prevent API overload
replicantx run tests/*.yaml --parallel --max-concurrent 3
Control parallel execution at the scenario level:
# tests/parallel_scenario.yaml
name: "Parallel Test Scenario"
base_url: "https://api.example.com/chat"
auth:
provider: noop
level: basic
parallel: true # Enable parallel execution for this scenario
steps:
- user: "Hello, test message"
expect_contains: ["response"]
# tests/sequential_scenario.yaml
name: "Sequential Test Scenario"
base_url: "https://api.example.com/chat"
auth:
provider: noop
level: basic
parallel: false # Run sequentially (default)
steps:
- user: "Hello, test message"
expect_contains: ["response"]
Automatic Detection:
- If any scenario has
parallel: true
, all scenarios run in parallel - If
--parallel
flag is used, all scenarios run in parallel (overrides individual settings) - Otherwise, scenarios run sequentially
Mixed Execution:
# Some scenarios parallel, some sequential - all run in parallel
replicantx run tests/parallel_*.yaml tests/sequential_*.yaml
Unlimited Concurrency (Default):
replicantx run tests/*.yaml --parallel
Limited Concurrency:
# Limit to 3 concurrent scenarios
replicantx run tests/*.yaml --parallel --max-concurrent 3
# Limit to 1 (effectively sequential but with parallel infrastructure)
replicantx run tests/*.yaml --parallel --max-concurrent 1
Example: 10 scenarios, each taking 5 seconds
Mode | Duration | Speed Improvement |
---|---|---|
Sequential | ~50 seconds | 1x |
Parallel (unlimited) | ~5 seconds | 10x |
Parallel (max 3) | ~17 seconds | 3x |
API Rate Limits:
- Use
--max-concurrent
to avoid overwhelming your API - Monitor API response times during parallel execution
- Consider your API's rate limiting policies
Resource Usage:
- Parallel execution uses more memory and network connections
- Monitor system resources during large parallel test runs
Test Dependencies:
- Tests that depend on execution order should use
parallel: false
- Consider using sequential execution for tests that modify shared state
Debugging:
- Parallel execution may make debugging more complex
- Use
--verbose
to see detailed output from all scenarios - Consider running problematic tests sequentially for debugging
auth:
provider: supabase
email: [email protected]
password: password123
project_url: "{{ env.SUPABASE_URL }}"
api_key: "{{ env.SUPABASE_ANON_KEY }}"
auth:
provider: jwt
token: "{{ env.JWT_TOKEN }}"
auth:
provider: noop
The Replicant agent is a Pydantic-based intelligent conversational agent that:
- Fact-Based Responses: Uses configured facts to answer API questions intelligently
- Natural Conversation: Acts like a real user who doesn't provide all information upfront
- Customizable Behavior: System prompts allow different personalities and response patterns
- Goal-Oriented: Works toward specific objectives with completion detection
- Context Awareness: Maintains conversation history and state
The agent intelligently uses configured facts through LLM integration:
- Context-aware: LLMs understand when facts are relevant to questions
- Natural integration: Facts are woven naturally into conversation responses
- Smart timing: Agent knows when to volunteer information vs. wait to be asked
- Conversation memory: Recent chat history provides context for fact usage
ReplicantX provides flexible conversation state management to handle different API architectures:
Sends the complete conversation history (including all responses) with each request:
replicant:
fullconversation: true # Default behavior
Benefits:
- β Complete Context: API receives full conversation state
- β Stateless APIs: Works with APIs that don't maintain session state
- β Microservices: Suitable for distributed systems
- β Testing Realism: Mimics real-world stateless interactions
Sends only the last 10 messages for performance:
replicant:
fullconversation: false
Use Cases:
- π§ Performance Testing: Reduce payload size for high-volume testing
- π§ Legacy APIs: Compatible with APIs expecting limited context
- π§ Memory Constraints: When API has payload size limitations
ReplicantX supports multiple API payload formats for maximum compatibility with any conversational API:
Industry-standard OpenAI chat completion format:
replicant:
payload_format: openai # Default behavior
Payload Structure:
{
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
{"role": "user", "content": "How are you?"}
]
}
Best For:
- β OpenAI APIs and compatible services
- β Industry standard - widely supported
- β Full conversation context with message arrays
- β Modern conversational AI platforms
Minimal message-only format for basic APIs:
replicant:
payload_format: simple
Payload Structure:
{
"message": "Hello, how are you?"
}
Best For:
- β Simple APIs that only need the current message
- β Performance-critical scenarios
- β Legacy systems with minimal payload requirements
- β Testing basic functionality without conversation context
Anthropic Claude-compatible format:
replicant:
payload_format: anthropic
Payload Structure:
{
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"}
]
}
Best For:
- β Anthropic Claude APIs and compatible services
- β Claude-based applications and integrations
- β Conversational AI platforms using Claude models
Original ReplicantX format for backward compatibility:
replicant:
payload_format: legacy
Payload Structure:
{
"message": "Hello, how are you?",
"timestamp": "2025-07-09T10:30:00",
"conversation_history": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"}
]
}
Best For:
- π§ Existing ReplicantX integrations (backward compatibility)
- π§ Custom APIs expecting the original format
- π§ Migration scenarios when transitioning to new formats
ReplicantX supports session management to maintain conversation state across multiple API requests without sending the full conversation history each time. This is more efficient and realistic for production APIs.
replicant:
# Session management configuration
session_mode: auto # disabled, auto, fixed, or env
session_id: null # Required when session_mode is "fixed"
session_timeout: 300 # Session timeout in seconds (default: 5 minutes)
session_format: uuid # replicantx or uuid (default: uuid)
session_placement: body # header, body, or url (default: body)
session_variable_name: session_id # Custom name for session variable
Session Modes:
disabled
(default): No session management, uses full conversation historyauto
: Auto-generate unique session ID for each test runfixed
: Use a specific session ID (requiressession_id
field)env
: Use session ID fromREPLICANTX_SESSION_ID
environment variable
Session Formats:
uuid
(default): Standard UUID format (e.g.,550e8400-e29b-41d4-a716-446655440000
)replicantx
: ReplicantX format (e.g.,replicantx_c8ff856c
)
Session Placement:
body
(default): Session ID in request body/payloadheader
: Session ID in HTTP headersurl
: Session ID in URL path (RESTful)
Session Variable Name:
- Custom name for the session variable in headers or body
- Examples:
session_id
,conversation_id
,x-conversation-id
,chat_id
Session-Aware Payload Formats:
openai_session
: OpenAI-compatible with session supportsimple_session
: Simple format with session supportrestful_session
: RESTful format with session support
Auto-Generated UUID Session (Recommended):
replicant:
session_mode: auto
session_format: uuid # Standard UUID format
session_placement: body
session_variable_name: session_id
payload_format: openai_session
fullconversation: false
session_timeout: 600 # 10 minutes
ReplicantX Format with Header:
replicant:
session_mode: auto
session_format: replicantx # ReplicantX format
session_placement: header
session_variable_name: x-conversation-id
payload_format: openai_session
fullconversation: false
Fixed Session ID with Custom Variable:
replicant:
session_mode: fixed
session_id: "test_session_12345"
session_format: uuid
session_placement: body
session_variable_name: conversation_id
payload_format: simple_session
fullconversation: false
RESTful Session (Session ID in URL):
replicant:
session_mode: auto
session_format: uuid
session_placement: url
payload_format: restful_session
fullconversation: false
# Results in requests to: /conversations/{session_id}/messages
Environment-Based Session with Header:
# Set environment variable
export REPLICANTX_SESSION_ID="prod_session_abc123"
# Use in YAML
replicant:
session_mode: env
session_format: uuid
session_placement: header
session_variable_name: x-chat-id
payload_format: openai_session
fullconversation: false
β Efficiency:
- Reduces payload size significantly
- Faster API requests
- Lower bandwidth usage
β Realistic Testing:
- Matches production API patterns
- Tests session handling logic
- Validates conversation continuity
β Flexibility:
- Multiple session ID strategies
- Configurable timeouts
- Different payload formats
β Debugging:
- Session information in debug logs
- Session lifecycle tracking
- Timeout monitoring
Use session management when:
- β Your API supports conversation IDs
- β You want to test session handling
- β Payload size is a concern
- β Testing production-like scenarios
Use full conversation when:
- β API doesn't support sessions
- β Testing legacy endpoints
- β Need complete conversation context
- β Debugging conversation flow
name: "Universal API Test"
base_url: https://api.example.com/chat
auth:
provider: noop
level: agent
replicant:
goal: "Test API with OpenAI-compatible format"
facts:
name: "Test User"
email: "[email protected]"
system_prompt: |
You are a helpful user testing an API.
initial_message: "Hello, I'm testing the API."
max_turns: 10
completion_keywords: ["complete", "finished", "done"]
fullconversation: true # Send full conversation history
payload_format: openai # Use OpenAI-compatible format
llm:
model: "test"
temperature: 0.7
max_tokens: 150
UUID Format with Header (Recommended for APIs):
name: "Flight Booking with UUID Header"
base_url: "https://api.example.com/chat"
auth:
provider: "noop"
level: "agent"
replicant:
goal: "Book a flight from London to Paris"
facts:
name: "Alex Johnson"
email: "[email protected]"
departure: "London"
destination: "Paris"
date: "next Friday"
system_prompt: "You are a customer booking a flight. Be polite and provide details when asked."
initial_message: "Hi, I need to book a flight from London to Paris for next Friday."
max_turns: 15
session_mode: auto
session_format: uuid # Standard UUID format
session_placement: header # In HTTP headers
session_variable_name: x-conversation-id # Custom header name
payload_format: openai_session
fullconversation: false
session_timeout: 600
llm:
model: "test"
ReplicantX Format with Body:
name: "Customer Support with ReplicantX Body"
base_url: "https://api.example.com/support"
auth:
provider: "noop"
level: "agent"
replicant:
goal: "Get help with billing issue"
facts:
name: "Sarah Chen"
account_id: "ACC-789456"
issue: "billing"
system_prompt: "You are a customer with a billing question."
initial_message: "Hello, I have a billing question."
max_turns: 10
session_mode: auto
session_format: replicantx # ReplicantX format
session_placement: body # In request body
session_variable_name: conversation_id # Custom variable name
payload_format: simple_session
fullconversation: false
session_timeout: 300
llm:
model: "test"
UUID Format with URL (RESTful):
name: "Pizza Order with UUID URL"
base_url: "https://api.example.com"
auth:
provider: "noop"
level: "agent"
replicant:
goal: "Order a pizza for delivery"
facts:
name: "Mike Rodriguez"
address: "123 Main St, NYC"
phone: "+1 555 123 4567"
system_prompt: "You are ordering a pizza. Be friendly and provide your details."
initial_message: "Hi, I'd like to order a pizza for delivery."
max_turns: 8
session_mode: auto
session_format: uuid # UUID format
session_placement: url # In URL path
payload_format: restful_session
fullconversation: false
# Results in requests to: /conversations/{uuid}/messages
llm:
model: "test"
From Legacy to OpenAI Format:
# Old configuration (still works)
replicant:
payload_format: legacy # or omit entirely
# New recommended configuration
replicant:
payload_format: openai # More compatible with modern APIs
For Simple APIs:
replicant:
payload_format: simple
fullconversation: false # Not needed for simple format
For Anthropic APIs:
replicant:
payload_format: anthropic
fullconversation: true # Maintain conversation context
Helpful User:
system_prompt: |
You are a polite user trying to achieve your goal. You have the
necessary information but need prompting to remember details.
Forgetful Customer:
system_prompt: |
You are a customer who sometimes forgets details and needs
multiple prompts. You're friendly but can be a bit scattered.
Demanding User:
system_prompt: |
You are an impatient user who wants quick results. You provide
information when asked but expect efficient service.
ReplicantX provides intelligent goal evaluation to accurately determine when conversation objectives have been achieved, solving the common problem of false positives with simple keyword matching.
Traditional keyword-based completion detection can produce false positives:
# Problematic scenario
completion_keywords: ["confirmed", "booked"]
# False positive examples:
# β "I'll let you know when your booking has been confirmed" (contains "confirmed")
# β "Have you booked with us before?" (contains "booked")
# β "Your booking confirmation is pending" (contains "booking")
Simple substring matching - the original behavior:
replicant:
goal: "Book a flight to Paris"
goal_evaluation_mode: "keywords" # Default
completion_keywords: ["confirmed", "booked", "reservation number"]
Use when:
- β Maintaining existing test compatibility
- β Simple scenarios with clear completion signals
- β Performance is critical (no LLM calls)
Reporting: Simplified reporting that focuses on keyword matching:
- Watch mode shows:
π Keyword matched: (matched: confirmed, booked)
orπ No completion keywords found
- Justification shows:
Goal achieved via keyword matching: Keyword evaluation: Found completion keywords (matched: confirmed, booked)
- JSON reports include simple
keyword_result
field instead of detailed reasoning
LLM-powered analysis that understands context and intent:
replicant:
goal: "Book a business class flight to Paris"
goal_evaluation_mode: "intelligent"
goal_evaluation_model: "openai:gpt-4o-mini" # Optional: separate model for evaluation
completion_keywords: ["confirmed", "booked"] # Still required for compatibility
Benefits:
- β Context-aware: Distinguishes promises from accomplishments
- β False positive reduction: "I'll confirm later" β "Your booking is confirmed"
- β Intent understanding: Recognizes goal completion without exact keywords
- β Reasoning provided: Detailed explanation of evaluation decisions
Reporting: Detailed reporting with full analysis:
- Watch mode shows:
π§ Evaluation method: intelligent
,π Confidence: 0.85
,π Reasoning: Detailed explanation...
- Justification includes confidence scores and detailed reasoning
- JSON reports include full evaluation metadata (confidence, reasoning, fallback_used)
Attempts LLM evaluation first, falls back to keywords if uncertain:
replicant:
goal: "Get help with billing issue"
goal_evaluation_mode: "hybrid"
goal_evaluation_model: "openai:gpt-4o-mini"
completion_keywords: ["resolved", "ticket created", "issue closed"]
Benefits:
- β Smart evaluation when LLM is confident
- β Reliable fallback when LLM is uncertain
- β Cost-effective for mixed scenarios
- β Production-ready with built-in safety net
For domain-specific scenarios, customize the evaluation logic:
replicant:
goal: "Complete a customer support ticket"
goal_evaluation_mode: "intelligent"
goal_evaluation_prompt: |
Evaluate if the customer support goal is achieved. Look for:
1. Issue resolution confirmation from the agent
2. Ticket number or reference provided
3. Customer satisfaction or acknowledgment
4. Clear closure statements
Goal: {goal}
User Facts: {facts}
Recent Conversation: {conversation}
Respond exactly:
RESULT: [ACHIEVED or NOT_ACHIEVED]
CONFIDENCE: [0.0 to 1.0]
REASONING: [Brief explanation]
completion_keywords: ["resolved", "ticket created"]
name: "Smart Flight Booking Test"
base_url: "https://api.example.com/chat"
auth:
provider: noop
level: agent
replicant:
goal: "Book a round-trip business class flight to Paris"
facts:
name: "Sarah Johnson"
email: "[email protected]"
travel_class: "business"
destination: "Paris"
departure_city: "New York"
travel_date: "next Friday"
return_date: "following Monday"
budget: "$3000"
system_prompt: |
You are a customer booking a flight. Provide information when asked
but don't volunteer everything upfront. Be conversational and natural.
initial_message: "Hi, I'd like to book a flight to Paris."
max_turns: 15
# Intelligent goal evaluation
goal_evaluation_mode: "intelligent"
goal_evaluation_model: "openai:gpt-4o-mini" # Fast, cost-effective model
# Still needed for fallback/compatibility
completion_keywords: ["booked", "confirmed", "reservation number"]
llm:
model: "openai:gpt-4o"
temperature: 0.7
max_tokens: 150
The watch mode now shows detailed evaluation information:
π CONVERSATION COMPLETE
π Status: β
SUCCESS
π― Goal achieved: Yes
π§ Evaluation method: intelligent
π Confidence: 0.89
π Reasoning: The flight has been successfully booked with confirmation number ABC123 provided
Phase 1: Test Intelligent Mode
# Update specific tests to use intelligent evaluation
goal_evaluation_mode: "intelligent"
Phase 2: Adopt Hybrid Mode
# Use hybrid for safety while gaining intelligence
goal_evaluation_mode: "hybrid"
Phase 3: Gradual Rollout
# Eventually make intelligent/hybrid the default for new tests
goal_evaluation_mode: "intelligent"
Mode | Use Case | Pros | Cons |
---|---|---|---|
keywords | Legacy tests, simple APIs | Fast, deterministic | False positives |
intelligent | Modern apps, complex goals | Accurate, context-aware | Requires LLM |
hybrid | Production, mixed scenarios | Smart + safe fallback | Slightly more complex |
Recommendation: Start with hybrid
mode for new tests to get the benefits of intelligent evaluation with keyword fallback safety.
See a complete example that demonstrates false positive prevention:
# Download the example test
curl -O https://raw.githubusercontent.com/helixtechnologies/replicantx/main/tests/intelligent_evaluation_example.yaml
# Run with intelligent evaluation
replicantx run intelligent_evaluation_example.yaml --watch
# Compare with keyword-only mode by changing goal_evaluation_mode to "keywords"
This example shows how intelligent evaluation distinguishes between:
- β "I'll create a ticket for your issue" (promise)
- β "Your refund has been processed, reference #REF123" (completion)
ReplicantX uses PydanticAI for powerful LLM integration with multiple providers:
- OpenAI: GPT-4, GPT-4o, and other OpenAI models
- Anthropic: Claude 4.0 Sonnet, Claude 4 Haiku, and other Claude models
- Google: Gemini models via Google AI and VertexAI
- Groq: Fast inference with Llama, Mixtral, and other models
- Ollama: Local LLM deployment
- Test: Built-in test model for development (no API keys needed)
Add LLM configuration to your agent scenarios using PydanticAI model strings:
Technical Support Example:
level: agent
replicant:
goal: "Get technical support for my account"
facts:
name: "Jordan Smith"
# ... other facts
system_prompt: |
You are a customer seeking help with a technical issue.
Use your available facts to answer questions naturally.
# ... other config
llm:
model: "openai:gpt-4.1-mini" # PydanticAI model string
temperature: 0.7 # Response creativity (0.0-1.0)
max_tokens: 150 # Maximum response length
Flight Booking Example:
level: agent
replicant:
goal: "Book a business class flight to Paris"
facts:
name: "Sarah Johnson"
destination: "Paris"
travel_class: "business"
# ... other facts
system_prompt: |
You are a customer trying to book a flight. You have the
necessary information but don't provide all details upfront.
# ... other config
llm:
model: "anthropic:claude-3-5-sonnet-latest" # PydanticAI model string
temperature: 0.8 # Response creativity (0.0-1.0)
max_tokens: 200 # Maximum response length
# OpenAI models
model: "openai:gpt-4o"
model: "openai:gpt-4.1-mini"
model: "openai:gpt-4.1-nano"
# Anthropic models
model: "anthropic:claude-3-5-sonnet-latest"
model: "anthropic:claude-3-haiku-20240307"
# Google models
model: "gemini-1.5-pro"
model: "gemini-1.5-flash"
# Groq models
model: "groq:llama-3.1-8b-instant"
model: "groq:mixtral-8x7b-32768"
# Test model (no API key needed)
model: "test"
PydanticAI automatically detects API keys from environment variables:
# OpenAI
export OPENAI_API_KEY=sk-your-api-key
# Anthropic
export ANTHROPIC_API_KEY=sk-ant-your-api-key
# Google AI
export GOOGLE_API_KEY=your-google-api-key
# Groq
export GROQ_API_KEY=your-groq-api-key
# Install with all LLM providers
pip install replicantx[all]
# Install with specific providers
pip install replicantx[openai]
pip install replicantx[anthropic]
# Core installation (includes PydanticAI with test model)
pip install replicantx
- Smart Prompting: System prompts are enhanced with available facts and conversation context
- Natural Responses: LLMs generate contextually appropriate responses based on user personas
- Fact Integration: Available facts are automatically included in prompts for relevant responses
- Graceful Fallback: If LLM calls fail, the system falls back to rule-based responses
- Conversation Memory: Recent conversation history is maintained for context
Customer Support Example:
name: "Customer Support - Billing Issue"
base_url: https://api.example.com/support
auth:
provider: noop
level: agent
replicant:
goal: "Get customer support for billing issue"
facts:
name: "Alex Chen"
account_number: "ACC-12345"
issue_type: "billing"
last_payment: "$99.99 on Jan 15th"
system_prompt: |
You are a customer who is polite but slightly frustrated about
a billing issue. You have the necessary account information but
may need prompting to remember specific details.
initial_message: "Hi, I have a question about my recent bill."
max_turns: 12
completion_keywords: ["resolved", "ticket created", "issue closed"]
fullconversation: true # Send full conversation history with each request
payload_format: openai # Use OpenAI-compatible format
llm:
model: "openai:gpt-4o" # PydanticAI model string
temperature: 0.8
max_tokens: 120
Flight Booking Example:
name: "Travel Booking - Flight to Paris"
base_url: https://api.example.com/chat
auth:
provider: supabase
project_url: "{{ env.SUPABASE_URL }}"
api_key: "{{ env.SUPABASE_ANON_KEY }}"
email: "{{ env.TEST_USER_EMAIL }}"
password: "{{ env.TEST_USER_PASSWORD }}"
level: agent
replicant:
goal: "Book a business class flight to Paris for next weekend"
facts:
name: "Sarah Johnson"
email: "[email protected]"
travel_class: "business"
destination: "Paris"
departure_city: "New York"
travel_date: "next Friday"
passengers: 1
budget: "$3000"
preferences: "aisle seat, vegetarian meal"
system_prompt: |
You are a helpful user trying to book a flight. You have all the
necessary information but you're a typical user who doesn't
provide all details upfront.
initial_message: "Hi, I'd like to book a flight to Paris for next weekend."
max_turns: 15
completion_keywords: ["booked", "confirmed", "reservation number", "booking complete"]
fullconversation: true # Send full conversation history with each request
payload_format: openai # Use OpenAI-compatible format
llm:
model: "openai:gpt-4o"
temperature: 0.7
max_tokens: 150
These examples enable much more natural and contextually aware conversations compared to rule-based responses.
Add this workflow to .github/workflows/replicantx.yml
:
name: ReplicantX E2E Tests
on:
pull_request: { types: [opened, synchronize, reopened] }
jobs:
replicantx:
runs-on: ubuntu-latest
env:
SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
SUPABASE_ANON_KEY: ${{ secrets.SUPABASE_ANON_KEY }}
REPLICANTX_TARGET: pr-${{ github.event.pull_request.number }}-helix-api.onrender.com
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install "replicantx[cli]"
- run: |
until curl -sf "https://$REPLICANTX_TARGET/health"; do
echo "Waiting for previewβ¦"; sleep 5; done
- run: replicantx run tests/*.yaml --report report.md --ci
- uses: marocchino/sticky-pull-request-comment@v2
if: always()
with: { path: report.md }