Skip to content

emergent-lab/voice-agent

Repository files navigation

Voice Hackathon - Customer Support Agent

What is this?

A voice AI customer support agent built with Pipecat, Gemini, and Cartesia specifically targetting LTL freight carriers. LTL freight carriers handle support for thousands of customers, with backlogs that can take weeks to resolve. Automating simple tasks lets teams focus on complex issues while giving customers instant answers.

Demo video

https://screen.studio/share/Doeue1WL

How did we use Gemini and Pipecat?

We built a voice AI customer support agent with Gemini's powerful function calling at the core:

Architecture: Three-Model Cascade 🎼

Deepgram Flux STT → Gemini LLM (with function calling) → Cartesia TTS

Gemini for LLM + Function Calling 🧠

Gemini 2.5 Flash powers the intelligence, providing:

  • Natural conversation understanding - Interprets customer requests with context awareness
  • Dynamic function calling - Automatically invokes the right tools (shipment tracking, appointment scheduling, FAQ search, ticket creation, escalation) based on conversation flow
  • 5 custom function handlers registered with Gemini for LTL freight operations
  • Real-time decision making - Decides when to call functions, when to ask clarifying questions, and when to escalate to humans

Pipecat Orchestration 🎵

Pipecat provides the real-time voice pipeline framework:

  • Pipeline architecture - Chains STT → LLM → TTS processors seamlessly
  • Frame-based processing - Efficient streaming of audio and LLM responses
  • Transport abstraction - Supports SmallWebRTC (browser), Twilio (phone), and Daily.co
  • Context aggregation - Manages conversation history for Gemini
  • Transcript logging - Captures full conversation flow
  • RTVI integration - Real-time voice interface for browser testing

Other tools we used

  • Deepgram Flux - Ultra-low latency STT optimized for conversational AI
  • Cartesia - High-quality natural voice synthesis
  • Twilio - Phone integration for production voice calls
  • Redis - Real-time tool configuration and feature flags
  • Next.js - Admin dashboard for dynamic tool management

New things we tried

This entire project was built today. This is the first time we've created a voice AI agent and first time using Pipecat.

  • Deepgram Flux - Ultra-low latency STT optimized for conversational AI
  • Redis Tool Configuration - Real-time admin dashboard to enable/disable tools without restart
  • Pipecat Cloud - Production deployment

Feedback for tools we used

Gemini Live - We initially tried using Gemini's native audio model (Gemini Live) to simplify the architecture into a single model. While the native audio processing was impressive, we found that function calling wasn't reliable enough with Gemini Live. The model would sometimes fail to invoke tools when needed or call them at inappropriate times. We ended up going back to the three-model cascade approach (Deepgram Flux STT → Gemini 2.5 Flash LLM → Cartesia TTS) which gave us much more reliable and consistent function calling behavior.

We tried using Gemini TTS, but the voice didn't sound as natural as we wanted.

Pipecat was great to use. It was our first time using it and getting it up and running was a breeze.

🚀 Try It Live!

🚀 Call the Agent

844-996-0993

Pick up your phone and call our live customer support agent! Try asking about shipment status or scheduling delivery appointments.

🎛️ Admin Dashboard

Dashboard: https://voice-agent-green.vercel.app/

Configure which tools are enabled/disabled in real-time using our Redis-powered admin panel.

💡 Quick Test Guide

Available PRO numbers: 1324, 1589, 2401, 2750, 4012, 4678

Try these prompts:

  • "What's the status of shipment 1324?"
  • "I need to schedule a delivery appointment for PRO number 2401"
  • "Can you help me change my delivery time?"
  • "What are your business hours?"

Additional Documentation

Two Implementations Available

This project includes two versions of the customer support bot:

  1. Cascade Model (bot.py) - Uses Deepgram Flux STT, Gemini LLM, and Cartesia TTS
  2. Gemini Live (bot_gemini_live.py) - Uses Gemini's native audio model for end-to-end processing

Which One Should I Use?

Use Gemini Live (bot_gemini_live.py) if you want:

  • ✅ Simpler architecture (fewer components)
  • ✅ More natural conversation flow
  • ✅ Lower latency
  • ✅ Native audio understanding (better at detecting emotion, tone)

Use Cascade Model (bot.py) if you want:

  • ✅ More control over individual components
  • ✅ Ability to swap STT/TTS providers
  • ✅ Custom voice configurations
  • ✅ Separate tuning of speech recognition vs synthesis

Features

  • Voice Interaction: Real-time voice conversations using Google AI
  • Phone Support: Optional Twilio integration for telephone-based support
  • Intelligent Support: Powered by Gemini 2.5 Flash for natural conversations
  • LTL Logistics Skills: Built-in demo data for PRO tracking and delivery appointments
  • FAQ Search: Automated answers to common questions
  • Ticket Creation: Automatic support ticket generation
  • Human Escalation: Seamless handoff to human agents when needed
  • Transcript Logging: Full conversation logging for quality assurance

Prerequisites

  • Python 3.12+
  • uv package manager
  • For both bots: Google API key for Gemini LLM
  • For cascade model (bot.py) only: Deepgram + Cartesia API keys
  • For phone support (optional): Twilio account and phone number

Setup

1. Install Dependencies

# Install dependencies with uv
uv sync

2. Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys

3. Get Your API Keys

Google API Key (required for both bots):

  1. Visit Google AI Studio
  2. Sign in with your Google account
  3. Click "Create API Key"
  4. Copy the key and add it to your .env file

Deepgram API Key (required for bot.py only):

  1. Visit Deepgram Console
  2. Sign up for a free account
  3. Create a new API key
  4. Copy the key and add it to your .env file

Cartesia API Key (required for bot.py only):

  1. Visit Cartesia
  2. Sign up for an account
  3. Generate an API key
  4. Copy the key and add it to your .env file

Running the Bot

Local Development with SmallWebRTC

Option 1: Run the Gemini Live version (recommended)

uv run bot_gemini_live.py

Option 2: Run the Cascade Model version

uv run bot.py

The bot will start and display a URL (typically http://localhost:7860). Open this URL in your browser to interact with the voice agent using SmallWebRTC.

Running with Twilio (Phone Support)

To enable phone-based interactions through Twilio:

1. Get Twilio Credentials

  1. Sign up at Twilio
  2. Get your TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN from the console
  3. Add them to your .env file:
    TWILIO_ACCOUNT_SID=your_twilio_account_sid_here
    TWILIO_AUTH_TOKEN=your_twilio_auth_token_here

2. Purchase a Phone Number

  1. In the Twilio console, navigate to Phone Numbers → Manage → Buy a number
  2. Purchase a phone number with Voice capabilities
  3. Note the phone number for testing

3. Set Up Local Testing with ngrok

For local development, you'll need to expose your bot to the internet:

# Install ngrok (if you haven't already)
brew install ngrok  # macOS
# or download from ngrok.com

# Start your bot with Twilio transport
uv run bot.py

# In another terminal, expose it with ngrok
ngrok http 7860

Copy the ngrok HTTPS URL (e.g., https://abc123.ngrok.io)

4. Configure Your Phone Number

  1. Go to Phone Numbers → Manage → Active Numbers in Twilio console
  2. Click your phone number
  3. Under "Voice Configuration" → "A call comes in":
    • Select Webhook
    • Enter your ngrok URL: https://your-ngrok-url.ngrok.io
    • Set HTTP method to POST
  4. Click Save

5. Test Your Bot

Call your Twilio phone number to speak with your customer support agent!

Note: For production deployments, configure a TwiML Bin instead of a webhook. See the Twilio WebSocket documentation for more details.

Project Structure

voice-hackathon/
├── bot.py                     # Cascade model implementation (STT → LLM → TTS)
├── bot_gemini_live.py         # Gemini Live implementation (native audio)
├── support_tools.py           # Customer support functions (FAQ, tickets, escalation)
├── pyproject.toml             # Project dependencies
├── .env.example               # Environment variables template
└── README.md                  # This file

How It Works

Cascade Model Architecture (bot.py)

  1. Speech-to-Text (STT): Deepgram Flux converts customer speech to text with ultra-low latency
  2. Language Model (LLM): Gemini 2.5 Flash processes requests and generates responses
  3. Text-to-Speech (TTS): Cartesia converts responses back to speech
  4. Transport: SmallWebRTC handles real-time audio streaming

Why Flux? Deepgram Flux is optimized specifically for real-time conversational AI, providing faster recognition and better handling of natural speech patterns.

Gemini Live Architecture (bot_gemini_live.py)

  1. Native Audio Processing: Gemini Live handles speech input and output natively
  2. Function Calling: Same customer support tools (FAQ, tickets, escalation)
  3. LTL Tools: Includes authentication, shipment lookup, and delivery appointment management
  4. Transport: SmallWebRTC handles real-time audio streaming

Key Difference: Gemini Live uses a single model for everything, eliminating the need for separate STT/TTS services.

Customer Support Features

1. FAQ Search

The bot can search a knowledge base for common questions:

  • Business hours
  • Shipping information
  • Return policy
  • Payment methods
  • Account management

2. Ticket Creation

For issues requiring follow-up, the bot creates support tickets with:

  • Ticket ID
  • Priority level
  • Issue description
  • Estimated response time

3. Human Escalation

The bot escalates to human agents when:

  • Customer explicitly requests human assistance
  • Issue is complex or beyond bot capabilities
  • Customer appears frustrated
  • Multiple resolution attempts fail

LTL Demo Dataset

  • Sample customers and shipments live in data/ltl_dataset.json
  • Agents can search shipments by customer ID, company name, or directly by PRO number (sample IDs: CUST-1001, CUST-2002, CUST-3003)
  • Each shipment tracks status, ETA, and delivery appointment date/time windows
  • Modify this JSON file to tailor the demo experience or plug in live data sources

Customization

Adding FAQ Entries

Edit support_tools.py and add entries to the FAQ_DATABASE dictionary:

FAQ_DATABASE = {
    "your_key": {
        "question": "Your question here?",
        "answer": "Your answer here.",
    },
    # ... more entries
}

Modifying the System Prompt

For Cascade Model (bot.py): Edit the system prompt around line 197:

messages = [
    {
        "role": "system",
        "content": """Your custom instructions here...""",
    },
]

For Gemini Live (bot_gemini_live.py): Edit the instructions variable around line 172:

instructions = """Your custom instructions here..."""

Changing Voice & Model Settings

All voice and model settings are now configurable via environment variables in your .env file!

For Cascade Model (bot.py):

Set in .env:

# Change the Gemini model
GEMINI_MODEL=gemini-2.5-flash

# Change the Cartesia voice
CARTESIA_VOICE_ID=f786b574-daa5-4673-aa0c-cbe3e8534c02

Available Cartesia voices:

  • f786b574-daa5-4673-aa0c-cbe3e8534c02 - Friendly Australian Man (default)
  • 71a7ad14-091c-4e8e-a314-022ece01c121 - British Reading Lady
  • a0e99841-438c-4a64-b679-ae501e7d6091 - Barbershop Man
  • 694f9389-aac1-45b6-b726-9d9369183238 - Classy British Man
  • Find more at: Cartesia Voice Library

For Gemini Live (bot_gemini_live.py):

Set in .env:

# Change the Gemini Live model
GEMINI_LIVE_MODEL=gemini-2.5-flash-native-audio-preview-09-2025

# Change the voice
GEMINI_VOICE_ID=Charon

Available Gemini Live voices:

  • Puck - Energetic and youthful
  • Charon - Deep and authoritative (default)
  • Kore - Warm and friendly
  • Fenrir - Powerful and commanding
  • Aoede - Melodic and expressive
  • Leda - Calm and professional
  • Orus - Wise and mature
  • Zephyr - Light and breezy

Troubleshooting

API Key Issues

"Missing GOOGLE_API_KEY"

  • Make sure you've created a .env file (copy from .env.example)
  • Verify your API key is correctly set in the .env file

"Missing DEEPGRAM_API_KEY" (bot.py only)

"Missing CARTESIA_API_KEY" (bot.py only)

Connection Issues

  • Check that port 7860 is not being used by another application
  • Try accessing http://localhost:7860 in a different browser
  • Ensure your microphone permissions are enabled in the browser

Audio Issues

  • Verify your microphone is working and selected in browser settings
  • Check browser console for WebRTC errors
  • Try using Chrome or Edge (best WebRTC support)

Model-Specific Issues

Gemini Live (bot_gemini_live.py):

  • Ensure you're using a recent version of pipecat-ai (>= 0.0.90)
  • If you get model errors, verify the model name is correct
  • Check that your Google API key has access to Gemini Live models

Cascade Model (bot.py):

  • Deepgram errors usually indicate API key issues or rate limits
  • Cartesia errors may indicate voice ID problems - check the voice library
  • If STT isn't working, verify your Deepgram account has credits

Development

Running with Custom Port

# Set custom port
uv run bot.py --port 8080
# or
uv run bot_gemini_live.py --port 8080

Viewing Logs

The bot uses loguru for logging. All logs are printed to the console, including:

  • Client connections/disconnections
  • Transcripts of conversations
  • Function calls (FAQ searches, ticket creation, etc.)
  • Errors and warnings

Next Steps

  • Integrate Real Backend: Replace mock implementations in support_tools.py with real APIs
  • Add Authentication: Verify customer identity before accessing account information
  • Expand FAQ: Add more entries to the knowledge base
  • Analytics: Track common issues and bot performance
  • Multi-language: Add support for multiple languages
  • Sentiment Analysis: Detect customer emotion and adjust responses accordingly

Resources

License

MIT

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •