Skip to content

Agent Creation Guide

Agents are the core of the platform. Each agent handles inbound or outbound calls with its own voice, persona, and conversation logic.


Two Modes

FreeformFlow
StructureSingle system promptMultiple nodes, each with its own prompt
ToolsAvailable throughout the callPer-node
ControlLLM decides everythingStructured, step-by-step
Best forOpen-ended conversationsScripted, multi-stage conversations

Use freeform when: The caller can ask anything at any time — support, FAQ, lookup.

Use flows when: The conversation has stages — qualify → book → confirm, or greet → verify → offer → close.


Contents

DocWhat it covers
Freeform AgentsSimple agents, multilingual, inline tools
Flow AgentsMulti-node flows, branching, escape hatches
ToolsWebhook tools, pre-call tools, pre-actions
Outbound CallsContext variables, outbound dialling
Objection HandlingCall me later, not interested, transfer to human, DNC
JSON Import & AI CreationFull JSON schema, AI prompt template, common mistakes

Quick Start

Minimal freeform agent (API):

bash
curl -X POST https://your-domain.com/api/agents \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "support-agent",
    "prompt": "You are Maya, a friendly support agent for Acme Corp. Help customers with order queries.",
    "greeting": "Hi, this is Maya from Acme. How can I help you today?"
  }'

Minimal freeform agent (import JSON):

json
{
  "version": "1",
  "agent": {
    "name": "support-agent",
    "prompt": "You are Maya, a friendly support agent for Acme Corp.",
    "greeting": "Hi, this is Maya from Acme. How can I help you today?"
  },
  "tools": [],
  "flow_nodes": []
}

Upload via Dashboard → Agents↑ Import, or POST /api/agents/import.


Pipeline at a Glance

Every call goes through:

Audio In → VAD → STT → LLM → TTS → Audio Out
  • VAD — detects when the caller starts and stops speaking
  • STT — converts speech to text (Deepgram, Sarvam, OpenAI, ElevenLabs)
  • LLM — generates the response (OpenAI, Google, Grok)
  • TTS — converts text to speech (Cartesia, ElevenLabs, Sarvam, Deepgram, OpenAI)

Default stack: deepgram nova-3-general + gpt-4.1-nano + cartesia sonic-3.


Auto-Injected Rules

The platform appends these to every agent's system prompt automatically — you do not need to add them:

  • Max 2 sentences per response (unless the caller asks for detail)
  • No markdown, bullet points, or special characters
  • Spell out numbers in full
  • No filler openers ("Certainly!", "Of course!", etc.)
  • Call end_call when the conversation is complete

Auto-Injected Call Metadata

The LLM always receives:

Call info: This is an inbound call. Caller phone: +919876543210.

For outbound calls with context variables:

Call context:
- customer_name: Ravi Kumar
- invoice_amount: 4500