Appearance
Operations
Runtime Services
Production deployments require:
- Application container (Python backend)
- PostgreSQL 16+
- Redis 7+
- Persistent storage for recordings
- Public HTTPS URL with WebSocket support
Deployment
Docker Compose
bash
docker compose up --buildThis starts PostgreSQL, Redis, and the backend. Run migrations before first use:
bash
docker compose exec app alembic upgrade head
docker compose exec app python scripts/seed.pyManual Deployment
- Provision PostgreSQL and Redis
- Deploy the application container
- Set all environment variables (see Configuration)
- Run
alembic upgrade head - Optionally seed:
python scripts/seed.py - Ensure WebSocket traffic is routed correctly (no buffering, sticky sessions)
Dashboard
Build the dashboard and let the backend serve it:
bash
cd dashboard && npm install && npm run buildThe built files in dashboard/dist/ are mounted at / by the FastAPI app.
Database Migrations
Migrations are managed by Alembic. The current migration chain:
| Version | Description |
|---|---|
| 001 | Initial schema (calls, agents, events) |
| 002 | Agent service fields (STT/LLM/TTS config) |
| 003 | Agent provider settings (JSON) |
| 004 | Pipeline settings (VAD, turn detection) |
| 005 | Tools and flow nodes |
| 006 | Telephony settings (multi-provider) |
| 007 | Telephony concurrency limits |
| 008 | Flow node tool_ids |
| 009 | Agent pre-call tool IDs |
| 010 | Vobiz telephony provider |
| 011 | Agent context variables |
| 012 | Extended event types |
Apply all migrations:
bash
alembic upgrade headCheck current version:
bash
alembic currentLogging
The backend uses structlog for structured logging. Key events:
| Event | When |
|---|---|
call_started | Pipeline begins |
call_connected | WebSocket connected |
call_disconnected | WebSocket closed |
call_completed | Call finalized |
user_spoke | User transcription captured |
agent_spoke | Agent response generated |
tool_called | Tool function invoked |
recording_saved | WAV file written |
cleanup_completed | Scheduled cleanup finished |
node_entered | Flow engine entered a new node |
node_transition | Flow engine transitioned between nodes |
Set the log level via LOG_LEVEL environment variable.
Background Jobs
APScheduler starts automatically with the application:
| Job | Schedule | Description |
|---|---|---|
| Recording cleanup | Daily at 3 AM | Deletes recordings older than RECORDING_RETENTION_DAYS |
| Stale call cleanup | Every 5 minutes | Marks calls stuck in IN_PROGRESS for >30 min as completed |
Concurrency
WebSocket Sessions
Each active call holds one WebSocket connection and runs one Pipecat pipeline. Capacity depends on:
- CPU (pipeline processing)
- Network bandwidth (audio streaming)
- Provider API rate limits (STT/LLM/TTS)
- Telephony provider concurrency limits
Per-Provider Limits
Each telephony provider has a configurable max_concurrent setting (default: 10). Enforced at:
- WebSocket accept (returns 503 if exceeded)
- Outbound call API (returns error if exceeded)
Configure via PUT /api/settings/telephony.
Database Connections
The SQLAlchemy async pool defaults are suitable for moderate load. For higher concurrency:
- Use managed PostgreSQL or PgBouncer
- Monitor connection counts
- Tune pool size in
database.py
Scaling
Sticky Sessions
WebSocket calls are stateful. Production load balancers must pin WebSocket connections to a single instance. Configure session affinity or use a WebSocket-aware proxy.
Recordings
Recordings are written to local disk by default. Multi-instance deployments need shared storage (NFS, S3-backed FUSE, etc.) if recordings must be accessible across all instances.
Horizontal Scaling
Each call is independent — there is no cross-call shared state beyond the database. Adding instances scales call capacity linearly, provided:
- Sticky sessions are configured
- Recording storage is shared or per-instance
- Database connection pool is sized appropriately
Recovery
Server Restart
On startup, the app marks any IN_PROGRESS calls as FAILED. This prevents stale records from a previous crash.
Stale Calls
The background job catches calls that somehow stay IN_PROGRESS for >30 minutes and marks them COMPLETED with an error message.
Troubleshooting
Calls fail immediately after backend starts
- Verify telephony provider credentials
- Verify AI service API keys (STT, LLM, TTS)
- Check WebSocket reachability from the provider
- Confirm the agent exists in the database
No recordings appear
- Check
RECORDINGS_DIRpath and filesystem permissions - Verify the call reached the disconnect cleanup phase
- Check backend logs for recording errors
Dashboard loads but shows empty data
- Verify
DASHBOARD_URLin CORS settings - Check database connectivity
- Run
python scripts/seed.pyif no agents exist
Migrations fail
If the database is in a partially migrated state:
bash
# Check current state
alembic current
# If stuck, reset and rerun
alembic downgrade base
alembic upgrade headFor a fresh database, simply run alembic upgrade head.