A voice AI customer support agent built with Pipecat, Gemini, and Cartesia specifically targetting LTL freight carriers. LTL freight carriers handle support for thousands of customers, with backlogs that can take weeks to resolve. Automating simple tasks lets teams focus on complex issues while giving customers instant answers.
https://screen.studio/share/Doeue1WL
We built a voice AI customer support agent with Gemini's powerful function calling at the core:
Deepgram Flux STT → Gemini LLM (with function calling) → Cartesia TTS
Gemini 2.5 Flash powers the intelligence, providing:
- Natural conversation understanding - Interprets customer requests with context awareness
- Dynamic function calling - Automatically invokes the right tools (shipment tracking, appointment scheduling, FAQ search, ticket creation, escalation) based on conversation flow
- 5 custom function handlers registered with Gemini for LTL freight operations
- Real-time decision making - Decides when to call functions, when to ask clarifying questions, and when to escalate to humans
Pipecat provides the real-time voice pipeline framework:
- Pipeline architecture - Chains STT → LLM → TTS processors seamlessly
- Frame-based processing - Efficient streaming of audio and LLM responses
- Transport abstraction - Supports SmallWebRTC (browser), Twilio (phone), and Daily.co
- Context aggregation - Manages conversation history for Gemini
- Transcript logging - Captures full conversation flow
- RTVI integration - Real-time voice interface for browser testing
- Deepgram Flux - Ultra-low latency STT optimized for conversational AI
- Cartesia - High-quality natural voice synthesis
- Twilio - Phone integration for production voice calls
- Redis - Real-time tool configuration and feature flags
- Next.js - Admin dashboard for dynamic tool management
This entire project was built today. This is the first time we've created a voice AI agent and first time using Pipecat.
- Deepgram Flux - Ultra-low latency STT optimized for conversational AI
- Redis Tool Configuration - Real-time admin dashboard to enable/disable tools without restart
- Pipecat Cloud - Production deployment
Gemini Live - We initially tried using Gemini's native audio model (Gemini Live) to simplify the architecture into a single model. While the native audio processing was impressive, we found that function calling wasn't reliable enough with Gemini Live. The model would sometimes fail to invoke tools when needed or call them at inappropriate times. We ended up going back to the three-model cascade approach (Deepgram Flux STT → Gemini 2.5 Flash LLM → Cartesia TTS) which gave us much more reliable and consistent function calling behavior.
We tried using Gemini TTS, but the voice didn't sound as natural as we wanted.
Pipecat was great to use. It was our first time using it and getting it up and running was a breeze.
844-996-0993
Pick up your phone and call our live customer support agent! Try asking about shipment status or scheduling delivery appointments.
Dashboard: https://voice-agent-green.vercel.app/
Configure which tools are enabled/disabled in real-time using our Redis-powered admin panel.
Available PRO numbers: 1324, 1589, 2401, 2750, 4012, 4678
Try these prompts:
- "What's the status of shipment 1324?"
- "I need to schedule a delivery appointment for PRO number 2401"
- "Can you help me change my delivery time?"
- "What are your business hours?"
This project includes two versions of the customer support bot:
- Cascade Model (
bot.py) - Uses Deepgram Flux STT, Gemini LLM, and Cartesia TTS - Gemini Live (
bot_gemini_live.py) - Uses Gemini's native audio model for end-to-end processing
Use Gemini Live (bot_gemini_live.py) if you want:
- ✅ Simpler architecture (fewer components)
- ✅ More natural conversation flow
- ✅ Lower latency
- ✅ Native audio understanding (better at detecting emotion, tone)
Use Cascade Model (bot.py) if you want:
- ✅ More control over individual components
- ✅ Ability to swap STT/TTS providers
- ✅ Custom voice configurations
- ✅ Separate tuning of speech recognition vs synthesis
- Voice Interaction: Real-time voice conversations using Google AI
- Phone Support: Optional Twilio integration for telephone-based support
- Intelligent Support: Powered by Gemini 2.5 Flash for natural conversations
- LTL Logistics Skills: Built-in demo data for PRO tracking and delivery appointments
- FAQ Search: Automated answers to common questions
- Ticket Creation: Automatic support ticket generation
- Human Escalation: Seamless handoff to human agents when needed
- Transcript Logging: Full conversation logging for quality assurance
- Python 3.12+
- uv package manager
- For both bots: Google API key for Gemini LLM
- For cascade model (
bot.py) only: Deepgram + Cartesia API keys - For phone support (optional): Twilio account and phone number
# Install dependencies with uv
uv sync# Copy the example environment file
cp .env.example .env
# Edit .env and add your API keysGoogle API Key (required for both bots):
- Visit Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy the key and add it to your
.envfile
Deepgram API Key (required for bot.py only):
- Visit Deepgram Console
- Sign up for a free account
- Create a new API key
- Copy the key and add it to your
.envfile
Cartesia API Key (required for bot.py only):
- Visit Cartesia
- Sign up for an account
- Generate an API key
- Copy the key and add it to your
.envfile
Option 1: Run the Gemini Live version (recommended)
uv run bot_gemini_live.pyOption 2: Run the Cascade Model version
uv run bot.pyThe bot will start and display a URL (typically http://localhost:7860). Open this URL in your browser to interact with the voice agent using SmallWebRTC.
To enable phone-based interactions through Twilio:
- Sign up at Twilio
- Get your
TWILIO_ACCOUNT_SIDandTWILIO_AUTH_TOKENfrom the console - Add them to your
.envfile:TWILIO_ACCOUNT_SID=your_twilio_account_sid_here TWILIO_AUTH_TOKEN=your_twilio_auth_token_here
- In the Twilio console, navigate to Phone Numbers → Manage → Buy a number
- Purchase a phone number with Voice capabilities
- Note the phone number for testing
For local development, you'll need to expose your bot to the internet:
# Install ngrok (if you haven't already)
brew install ngrok # macOS
# or download from ngrok.com
# Start your bot with Twilio transport
uv run bot.py
# In another terminal, expose it with ngrok
ngrok http 7860Copy the ngrok HTTPS URL (e.g., https://abc123.ngrok.io)
- Go to Phone Numbers → Manage → Active Numbers in Twilio console
- Click your phone number
- Under "Voice Configuration" → "A call comes in":
- Select Webhook
- Enter your ngrok URL:
https://your-ngrok-url.ngrok.io - Set HTTP method to POST
- Click Save
Call your Twilio phone number to speak with your customer support agent!
Note: For production deployments, configure a TwiML Bin instead of a webhook. See the Twilio WebSocket documentation for more details.
voice-hackathon/
├── bot.py # Cascade model implementation (STT → LLM → TTS)
├── bot_gemini_live.py # Gemini Live implementation (native audio)
├── support_tools.py # Customer support functions (FAQ, tickets, escalation)
├── pyproject.toml # Project dependencies
├── .env.example # Environment variables template
└── README.md # This file
- Speech-to-Text (STT): Deepgram Flux converts customer speech to text with ultra-low latency
- Language Model (LLM): Gemini 2.5 Flash processes requests and generates responses
- Text-to-Speech (TTS): Cartesia converts responses back to speech
- Transport: SmallWebRTC handles real-time audio streaming
Why Flux? Deepgram Flux is optimized specifically for real-time conversational AI, providing faster recognition and better handling of natural speech patterns.
- Native Audio Processing: Gemini Live handles speech input and output natively
- Function Calling: Same customer support tools (FAQ, tickets, escalation)
- LTL Tools: Includes authentication, shipment lookup, and delivery appointment management
- Transport: SmallWebRTC handles real-time audio streaming
Key Difference: Gemini Live uses a single model for everything, eliminating the need for separate STT/TTS services.
The bot can search a knowledge base for common questions:
- Business hours
- Shipping information
- Return policy
- Payment methods
- Account management
For issues requiring follow-up, the bot creates support tickets with:
- Ticket ID
- Priority level
- Issue description
- Estimated response time
The bot escalates to human agents when:
- Customer explicitly requests human assistance
- Issue is complex or beyond bot capabilities
- Customer appears frustrated
- Multiple resolution attempts fail
- Sample customers and shipments live in
data/ltl_dataset.json - Agents can search shipments by customer ID, company name, or directly by PRO number (sample IDs:
CUST-1001,CUST-2002,CUST-3003) - Each shipment tracks status, ETA, and delivery appointment date/time windows
- Modify this JSON file to tailor the demo experience or plug in live data sources
Edit support_tools.py and add entries to the FAQ_DATABASE dictionary:
FAQ_DATABASE = {
"your_key": {
"question": "Your question here?",
"answer": "Your answer here.",
},
# ... more entries
}For Cascade Model (bot.py):
Edit the system prompt around line 197:
messages = [
{
"role": "system",
"content": """Your custom instructions here...""",
},
]For Gemini Live (bot_gemini_live.py):
Edit the instructions variable around line 172:
instructions = """Your custom instructions here..."""All voice and model settings are now configurable via environment variables in your .env file!
For Cascade Model (bot.py):
Set in .env:
# Change the Gemini model
GEMINI_MODEL=gemini-2.5-flash
# Change the Cartesia voice
CARTESIA_VOICE_ID=f786b574-daa5-4673-aa0c-cbe3e8534c02Available Cartesia voices:
f786b574-daa5-4673-aa0c-cbe3e8534c02- Friendly Australian Man (default)71a7ad14-091c-4e8e-a314-022ece01c121- British Reading Ladya0e99841-438c-4a64-b679-ae501e7d6091- Barbershop Man694f9389-aac1-45b6-b726-9d9369183238- Classy British Man- Find more at: Cartesia Voice Library
For Gemini Live (bot_gemini_live.py):
Set in .env:
# Change the Gemini Live model
GEMINI_LIVE_MODEL=gemini-2.5-flash-native-audio-preview-09-2025
# Change the voice
GEMINI_VOICE_ID=CharonAvailable Gemini Live voices:
Puck- Energetic and youthfulCharon- Deep and authoritative (default)Kore- Warm and friendlyFenrir- Powerful and commandingAoede- Melodic and expressiveLeda- Calm and professionalOrus- Wise and matureZephyr- Light and breezy
"Missing GOOGLE_API_KEY"
- Make sure you've created a
.envfile (copy from.env.example) - Verify your API key is correctly set in the
.envfile
"Missing DEEPGRAM_API_KEY" (bot.py only)
- Add your Deepgram API key to
.env - Verify you've signed up at console.deepgram.com
"Missing CARTESIA_API_KEY" (bot.py only)
- Add your Cartesia API key to
.env - Verify you've signed up at play.cartesia.ai
- Check that port 7860 is not being used by another application
- Try accessing
http://localhost:7860in a different browser - Ensure your microphone permissions are enabled in the browser
- Verify your microphone is working and selected in browser settings
- Check browser console for WebRTC errors
- Try using Chrome or Edge (best WebRTC support)
Gemini Live (bot_gemini_live.py):
- Ensure you're using a recent version of
pipecat-ai(>= 0.0.90) - If you get model errors, verify the model name is correct
- Check that your Google API key has access to Gemini Live models
Cascade Model (bot.py):
- Deepgram errors usually indicate API key issues or rate limits
- Cartesia errors may indicate voice ID problems - check the voice library
- If STT isn't working, verify your Deepgram account has credits
# Set custom port
uv run bot.py --port 8080
# or
uv run bot_gemini_live.py --port 8080The bot uses loguru for logging. All logs are printed to the console, including:
- Client connections/disconnections
- Transcripts of conversations
- Function calls (FAQ searches, ticket creation, etc.)
- Errors and warnings
- Integrate Real Backend: Replace mock implementations in
support_tools.pywith real APIs - Add Authentication: Verify customer identity before accessing account information
- Expand FAQ: Add more entries to the knowledge base
- Analytics: Track common issues and bot performance
- Multi-language: Add support for multiple languages
- Sentiment Analysis: Detect customer emotion and adjust responses accordingly
- Pipecat Documentation
- Google AI Studio
- Gemini API Docs
- Gemini Live Documentation
- Deepgram Documentation
- Cartesia Documentation
- Twilio Voice Documentation
- Twilio WebSocket Streams
- SmallWebRTC
MIT