Appearance
Architecture
Overview
This platform runs configurable AI voice agents that handle inbound and outbound phone calls. It processes speech in real time using a modular pipeline, supports multi-step conversation flows, and persists call artifacts for review.
System Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ Telephony Providers │
│ Twilio · Exotel · Vobiz │
└──────────┬──────────────────────────────────────────┬───────────────┘
│ HTTP webhooks │ WebSocket (audio)
▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ FastAPI Backend │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────────┐ │
│ │ REST API │ │ Webhooks │ │ WebSocket Handlers │ │
│ │ /api/* │ │ /telephony/ │ │ /ws/{provider}/{agent} │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Voice Pipeline (Pipecat) │ │
│ │ │ │
│ │ Audio In → STT → LLM (+ tools) → TTS → Audio Out │ │
│ │ ↑ │ │
│ │ Flow Engine (optional multi-node state machine) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────┴───────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ Filesystem │ │
│ │ calls,agents │ │ sessions │ │ recordings │ │
│ │ events,flows │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ React Dashboard (admin UI) │
│ Agents · Calls · Flow Editor · Tools · Settings │
└─────────────────────────────────────────────────────────────────────┘Component Summary
Backend API (api/)
FastAPI application providing:
- Agent CRUD and AI-powered agent generation
- Call history, events, and recording retrieval
- Outbound call initiation (provider-agnostic)
- Inbound telephony webhooks (TwiML, ExoML, Vobiz XML)
- WebSocket endpoints for real-time audio streaming
- Flow node management (visual editor backend)
- Webhook tool management
- Telephony settings configuration
Voice Runtime (agent/)
Real-time voice processing built on Pipecat:
- pipeline.py — Assembles the STT → LLM → TTS chain with turn-taking strategies, idle detection, and recording
- flow_engine.py — State machine that drives multi-node conversations with per-node instructions, tools, and transitions
- outbound.py — Provider-agnostic outbound call initiation
- recording.py — Captures raw audio frames and writes WAV files
- processors.py — Custom frame processors for transcript collection and event logging
- trace.py — Fire-and-forget async event persistence
Telephony Layer (telephony/)
Provider abstraction supporting three telephony backends:
| Provider | Audio Format | Serializer |
|---|---|---|
| Twilio | mu-law 8 kHz | TwilioFrameSerializer |
| Exotel | PCM 8 kHz | ExotelFrameSerializer |
| Vobiz | mu-law 8 kHz | VobizFrameSerializer |
Each provider implements: transport creation, webhook response building, and outbound call initiation.
AI Services (services/)
Pluggable factories for each AI capability:
- STT — Deepgram, Sarvam, OpenAI, ElevenLabs
- LLM — OpenAI (GPT-4.1 family), Google Gemini, Grok (xAI)
- TTS — Cartesia, ElevenLabs, Sarvam, Deepgram, OpenAI
Agents choose their provider and model per-service through database configuration.
Tool System (tools/)
Function calling via webhook-based tools:
- Tools are stored in the database with HTTP endpoint, parameters schema, and headers
- The LLM calls tools during conversation; results are passed back as context
- Flow nodes can scope which tools are available at each step
Persistence
PostgreSQL stores agents, calls, call events, flow nodes, tools, and telephony settings.
Redis provides session caching with TTL-based expiry.
Filesystem stores WAV recordings organized as recordings/YYYY/MM/{call_id}.wav.
Background Workers (workers/)
APScheduler runs on app startup:
- Recording cleanup — Daily at 3 AM, deletes recordings older than retention threshold
- Stale call cleanup — Every 5 minutes, marks calls stuck in
IN_PROGRESSfor >30 minutes as completed
Dashboard (dashboard/)
React 19 + TypeScript admin interface:
- Agent management and configuration
- Call history with transcript and recording playback
- Visual flow editor (XYFlow canvas)
- Webhook tool management
- Telephony settings
Call Lifecycle
Inbound
- Provider sends HTTP request to
POST /telephony/{provider}/inbound/{agent_name} - Backend returns provider-specific XML response with WebSocket stream URL
- Provider opens WebSocket to
/ws/{provider}/{agent_name} - WebSocket handler creates/updates call record, starts voice pipeline
- Pipeline runs: STT → LLM → TTS with real-time audio streaming
- Transcript, events, and recording are persisted throughout the call
- On disconnect, call is marked completed or failed
Outbound
- Client calls
POST /api/calls/outboundwith phone number, agent, and optional context - Backend creates call record in
RINGINGstate - Provider REST API initiates the call with callback URL
- When answered, provider connects WebSocket — same pipeline runs
- Call context (custom parameters) is injected into the LLM system prompt
Data Model
Agent
Named voice persona with full configuration:
- System prompt, greeting (supports
templating) - STT, LLM, and TTS provider/model/settings
- Pipeline settings (VAD, turn detection, idle thresholds)
- Pre-call tool IDs, context variable schema
- Associated flow nodes
Call
Individual conversation record:
- Phone number, direction (inbound/outbound), status
- Timestamps, duration, transcript
- Recording path, provider metadata
- Call context (outbound parameters)
CallEvent
Time-ordered event log per call:
call_started,call_endeduser_spoke,agent_spoketool_called,tool_resultnode_entered,node_transitioncontext_injected,error
FlowNode
Conversation step within an agent's flow:
- Node key, position, initial/terminal flags
- Role messages (persona) and task messages (instructions)
- Transition functions (route to next node)
- Tool IDs and pre-actions
- Visual editor position
Tool
Webhook-based external integration:
- Name, description, parameter schema
- HTTP endpoint, method, headers, timeout
TelephonySettings
Provider credentials and concurrency limits (singleton row).
Design Constraints
- Telephony audio is 8 kHz — the pipeline matches this to avoid transcoding overhead
- WebSocket calls are stateful — production deployments need sticky routing
- Recordings are local files — multi-instance deployments need shared storage
- Each call runs its own Pipecat pipeline — capacity scales with CPU, network, and provider limits