Voice Hackathon - Customer Support Agent

What is this?

A voice AI customer support agent built with Pipecat, Gemini, and Cartesia specifically targetting LTL freight carriers. LTL freight carriers handle support for thousands of customers, with backlogs that can take weeks to resolve. Automating simple tasks lets teams focus on complex issues while giving customers instant answers.

Demo video

https://screen.studio/share/Doeue1WL

How did we use Gemini and Pipecat?

We built a voice AI customer support agent with Gemini's powerful function calling at the core:

Architecture: Three-Model Cascade 🎼

Deepgram Flux STT → Gemini LLM (with function calling) → Cartesia TTS

Gemini for LLM + Function Calling 🧠

Gemini 2.5 Flash powers the intelligence, providing:

Natural conversation understanding - Interprets customer requests with context awareness
Dynamic function calling - Automatically invokes the right tools (shipment tracking, appointment scheduling, FAQ search, ticket creation, escalation) based on conversation flow
5 custom function handlers registered with Gemini for LTL freight operations
Real-time decision making - Decides when to call functions, when to ask clarifying questions, and when to escalate to humans

Pipecat Orchestration 🎵

Pipecat provides the real-time voice pipeline framework:

Pipeline architecture - Chains STT → LLM → TTS processors seamlessly
Frame-based processing - Efficient streaming of audio and LLM responses
Transport abstraction - Supports SmallWebRTC (browser), Twilio (phone), and Daily.co
Context aggregation - Manages conversation history for Gemini
Transcript logging - Captures full conversation flow
RTVI integration - Real-time voice interface for browser testing

Other tools we used

Deepgram Flux - Ultra-low latency STT optimized for conversational AI
Cartesia - High-quality natural voice synthesis
Twilio - Phone integration for production voice calls
Redis - Real-time tool configuration and feature flags
Next.js - Admin dashboard for dynamic tool management

New things we tried

This entire project was built today. This is the first time we've created a voice AI agent and first time using Pipecat.

Deepgram Flux - Ultra-low latency STT optimized for conversational AI
Redis Tool Configuration - Real-time admin dashboard to enable/disable tools without restart
Pipecat Cloud - Production deployment

Feedback for tools we used

Gemini Live - We initially tried using Gemini's native audio model (Gemini Live) to simplify the architecture into a single model. While the native audio processing was impressive, we found that function calling wasn't reliable enough with Gemini Live. The model would sometimes fail to invoke tools when needed or call them at inappropriate times. We ended up going back to the three-model cascade approach (Deepgram Flux STT → Gemini 2.5 Flash LLM → Cartesia TTS) which gave us much more reliable and consistent function calling behavior.

We tried using Gemini TTS, but the voice didn't sound as natural as we wanted.

Pipecat was great to use. It was our first time using it and getting it up and running was a breeze.

🚀 Try It Live!

🚀 Call the Agent

844-996-0993

Pick up your phone and call our live customer support agent! Try asking about shipment status or scheduling delivery appointments.

🎛️ Admin Dashboard

Dashboard: https://voice-agent-green.vercel.app/

Configure which tools are enabled/disabled in real-time using our Redis-powered admin panel.

💡 Quick Test Guide

Available PRO numbers: 1324, 1589, 2401, 2750, 4012, 4678

Try these prompts:

"What's the status of shipment 1324?"
"I need to schedule a delivery appointment for PRO number 2401"
"Can you help me change my delivery time?"
"What are your business hours?"

Additional Documentation

Two Implementations Available

This project includes two versions of the customer support bot:

Cascade Model (bot.py) - Uses Deepgram Flux STT, Gemini LLM, and Cartesia TTS
Gemini Live (bot_gemini_live.py) - Uses Gemini's native audio model for end-to-end processing

Which One Should I Use?

Use Gemini Live (bot_gemini_live.py) if you want:

✅ Simpler architecture (fewer components)
✅ More natural conversation flow
✅ Lower latency
✅ Native audio understanding (better at detecting emotion, tone)

Use Cascade Model (bot.py) if you want:

✅ More control over individual components
✅ Ability to swap STT/TTS providers
✅ Custom voice configurations
✅ Separate tuning of speech recognition vs synthesis

Features

Voice Interaction: Real-time voice conversations using Google AI
Phone Support: Optional Twilio integration for telephone-based support
Intelligent Support: Powered by Gemini 2.5 Flash for natural conversations
LTL Logistics Skills: Built-in demo data for PRO tracking and delivery appointments
FAQ Search: Automated answers to common questions
Ticket Creation: Automatic support ticket generation
Human Escalation: Seamless handoff to human agents when needed
Transcript Logging: Full conversation logging for quality assurance

Prerequisites

Python 3.12+
uv package manager
For both bots: Google API key for Gemini LLM
For cascade model (bot.py) only: Deepgram + Cartesia API keys
For phone support (optional): Twilio account and phone number

Setup

1. Install Dependencies

# Install dependencies with uv
uv sync

2. Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit .env and add your API keys

3. Get Your API Keys

Google API Key (required for both bots):

Visit Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the key and add it to your .env file

Deepgram API Key (required for bot.py only):

Visit Deepgram Console
Sign up for a free account
Create a new API key
Copy the key and add it to your .env file

Cartesia API Key (required for bot.py only):

Visit Cartesia
Sign up for an account
Generate an API key
Copy the key and add it to your .env file

Running the Bot

Local Development with SmallWebRTC

Option 1: Run the Gemini Live version (recommended)

uv run bot_gemini_live.py

Option 2: Run the Cascade Model version

uv run bot.py

The bot will start and display a URL (typically http://localhost:7860). Open this URL in your browser to interact with the voice agent using SmallWebRTC.

Running with Twilio (Phone Support)

To enable phone-based interactions through Twilio:

1. Get Twilio Credentials

Sign up at Twilio
Get your TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN from the console

Add them to your .env file:

TWILIO_ACCOUNT_SID=your_twilio_account_sid_here
TWILIO_AUTH_TOKEN=your_twilio_auth_token_here

2. Purchase a Phone Number

In the Twilio console, navigate to Phone Numbers → Manage → Buy a number
Purchase a phone number with Voice capabilities
Note the phone number for testing

3. Set Up Local Testing with ngrok

For local development, you'll need to expose your bot to the internet:

# Install ngrok (if you haven't already)
brew install ngrok  # macOS
# or download from ngrok.com

# Start your bot with Twilio transport
uv run bot.py

# In another terminal, expose it with ngrok
ngrok http 7860

Copy the ngrok HTTPS URL (e.g., https://abc123.ngrok.io)

4. Configure Your Phone Number

Go to Phone Numbers → Manage → Active Numbers in Twilio console
Click your phone number
Under "Voice Configuration" → "A call comes in":
- Select Webhook
- Enter your ngrok URL: https://your-ngrok-url.ngrok.io
- Set HTTP method to POST
Click Save

5. Test Your Bot

Call your Twilio phone number to speak with your customer support agent!

Note: For production deployments, configure a TwiML Bin instead of a webhook. See the Twilio WebSocket documentation for more details.

Project Structure

voice-hackathon/
├── bot.py                     # Cascade model implementation (STT → LLM → TTS)
├── bot_gemini_live.py         # Gemini Live implementation (native audio)
├── support_tools.py           # Customer support functions (FAQ, tickets, escalation)
├── pyproject.toml             # Project dependencies
├── .env.example               # Environment variables template
└── README.md                  # This file

How It Works

Cascade Model Architecture (`bot.py`)

Speech-to-Text (STT): Deepgram Flux converts customer speech to text with ultra-low latency
Language Model (LLM): Gemini 2.5 Flash processes requests and generates responses
Text-to-Speech (TTS): Cartesia converts responses back to speech
Transport: SmallWebRTC handles real-time audio streaming

Why Flux? Deepgram Flux is optimized specifically for real-time conversational AI, providing faster recognition and better handling of natural speech patterns.

Gemini Live Architecture (`bot_gemini_live.py`)

Native Audio Processing: Gemini Live handles speech input and output natively
Function Calling: Same customer support tools (FAQ, tickets, escalation)
LTL Tools: Includes authentication, shipment lookup, and delivery appointment management
Transport: SmallWebRTC handles real-time audio streaming

Key Difference: Gemini Live uses a single model for everything, eliminating the need for separate STT/TTS services.

Customer Support Features

1. FAQ Search

The bot can search a knowledge base for common questions:

Business hours
Shipping information
Return policy
Payment methods
Account management

2. Ticket Creation

For issues requiring follow-up, the bot creates support tickets with:

Ticket ID
Priority level
Issue description
Estimated response time

3. Human Escalation

The bot escalates to human agents when:

Customer explicitly requests human assistance
Issue is complex or beyond bot capabilities
Customer appears frustrated
Multiple resolution attempts fail

LTL Demo Dataset

Sample customers and shipments live in data/ltl_dataset.json
Agents can search shipments by customer ID, company name, or directly by PRO number (sample IDs: CUST-1001, CUST-2002, CUST-3003)
Each shipment tracks status, ETA, and delivery appointment date/time windows
Modify this JSON file to tailor the demo experience or plug in live data sources

Customization

Adding FAQ Entries

Edit support_tools.py and add entries to the FAQ_DATABASE dictionary:

FAQ_DATABASE = {
    "your_key": {
        "question": "Your question here?",
        "answer": "Your answer here.",
    },
    # ... more entries
}

Modifying the System Prompt

For Cascade Model (bot.py): Edit the system prompt around line 197:

messages = [
    {
        "role": "system",
        "content": """Your custom instructions here...""",
    },
]

For Gemini Live (bot_gemini_live.py): Edit the instructions variable around line 172:

instructions = """Your custom instructions here..."""

Changing Voice & Model Settings

All voice and model settings are now configurable via environment variables in your .env file!

For Cascade Model (bot.py):

Set in .env:

# Change the Gemini model
GEMINI_MODEL=gemini-2.5-flash

# Change the Cartesia voice
CARTESIA_VOICE_ID=f786b574-daa5-4673-aa0c-cbe3e8534c02

Available Cartesia voices:

f786b574-daa5-4673-aa0c-cbe3e8534c02 - Friendly Australian Man (default)
71a7ad14-091c-4e8e-a314-022ece01c121 - British Reading Lady
a0e99841-438c-4a64-b679-ae501e7d6091 - Barbershop Man
694f9389-aac1-45b6-b726-9d9369183238 - Classy British Man
Find more at: Cartesia Voice Library

For Gemini Live (bot_gemini_live.py):

Set in .env:

# Change the Gemini Live model
GEMINI_LIVE_MODEL=gemini-2.5-flash-native-audio-preview-09-2025

# Change the voice
GEMINI_VOICE_ID=Charon

Available Gemini Live voices:

Puck - Energetic and youthful
Charon - Deep and authoritative (default)
Kore - Warm and friendly
Fenrir - Powerful and commanding
Aoede - Melodic and expressive
Leda - Calm and professional
Orus - Wise and mature
Zephyr - Light and breezy

Troubleshooting

API Key Issues

"Missing GOOGLE_API_KEY"

Make sure you've created a .env file (copy from .env.example)
Verify your API key is correctly set in the .env file

"Missing DEEPGRAM_API_KEY" (bot.py only)

Add your Deepgram API key to .env
Verify you've signed up at console.deepgram.com

"Missing CARTESIA_API_KEY" (bot.py only)

Add your Cartesia API key to .env
Verify you've signed up at play.cartesia.ai

Connection Issues

Check that port 7860 is not being used by another application
Try accessing http://localhost:7860 in a different browser
Ensure your microphone permissions are enabled in the browser

Audio Issues

Verify your microphone is working and selected in browser settings
Check browser console for WebRTC errors
Try using Chrome or Edge (best WebRTC support)

Model-Specific Issues

Gemini Live (bot_gemini_live.py):

Ensure you're using a recent version of pipecat-ai (>= 0.0.90)
If you get model errors, verify the model name is correct
Check that your Google API key has access to Gemini Live models

Cascade Model (bot.py):

Deepgram errors usually indicate API key issues or rate limits
Cartesia errors may indicate voice ID problems - check the voice library
If STT isn't working, verify your Deepgram account has credits

Development

Running with Custom Port

# Set custom port
uv run bot.py --port 8080
# or
uv run bot_gemini_live.py --port 8080

Viewing Logs

The bot uses loguru for logging. All logs are printed to the console, including:

Client connections/disconnections
Transcripts of conversations
Function calls (FAQ searches, ticket creation, etc.)
Errors and warnings

Next Steps

Integrate Real Backend: Replace mock implementations in support_tools.py with real APIs
Add Authentication: Verify customer identity before accessing account information
Expand FAQ: Add more entries to the knowledge base
Analytics: Track common issues and bot performance
Multi-language: Add support for multiple languages
Sentiment Analysis: Detect customer emotion and adjust responses accordingly

Resources

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
admin-dashboard		admin-dashboard
data		data
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
bot.py		bot.py
bot_gemini_live.py		bot_gemini_live.py
main.py		main.py
pcc-deploy.toml		pcc-deploy.toml
pyproject.toml		pyproject.toml
redis_config.py		redis_config.py
support_tools.py		support_tools.py
uv.lock		uv.lock

emergent-lab/voice-agent

Folders and files

Latest commit

History

Repository files navigation

Voice Hackathon - Customer Support Agent

What is this?

Demo video

How did we use Gemini and Pipecat?

Architecture: Three-Model Cascade 🎼

Gemini for LLM + Function Calling 🧠

Pipecat Orchestration 🎵

Other tools we used

New things we tried

Feedback for tools we used

🚀 Try It Live!

🚀 Call the Agent

🎛️ Admin Dashboard

💡 Quick Test Guide

Additional Documentation

Two Implementations Available

Which One Should I Use?

Features

Prerequisites

Setup

1. Install Dependencies

2. Configure Environment Variables

3. Get Your API Keys

Running the Bot

Local Development with SmallWebRTC

Running with Twilio (Phone Support)

1. Get Twilio Credentials

2. Purchase a Phone Number

3. Set Up Local Testing with ngrok

4. Configure Your Phone Number

5. Test Your Bot

Project Structure

How It Works

Cascade Model Architecture (bot.py)

Gemini Live Architecture (bot_gemini_live.py)

Customer Support Features

1. FAQ Search

2. Ticket Creation

3. Human Escalation

LTL Demo Dataset

Customization

Adding FAQ Entries

Modifying the System Prompt

Changing Voice & Model Settings

Troubleshooting

API Key Issues

Connection Issues

Audio Issues

Model-Specific Issues

Development

Running with Custom Port

Viewing Logs

Next Steps

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Cascade Model Architecture (`bot.py`)

Gemini Live Architecture (`bot_gemini_live.py`)

Packages