Skip to content

Welcome to CallmAi Agentic AI Platform

CallmAI Agents

Realtime framework for production-grade multimodal and voice AI agents.

Introduction

The Agents framework allows you to add or remove Agent(s) and client participant(s)to the platform as a full realtime conversation. The API includes a complete set of tools and abstractions that make it easy to feed realtime media and data through an AI pipeline that works with multiple languages, and publish realtime results back to the client.

Use cases

Some applications for agents include:

  • Multimodal assistant: Talk to, or text an AI assistant.
  • Telehealth: Bring AI into realtime telemedicine consultations, with or without humans in the loop.
  • Call center: Deploy AI to the front lines of customer service with inbound and outbound call support.
  • Realtime language learning: Practice conversations in different languages in realtime.
  • NPCs: Add lifelike NPCs backed by language models instead of static scripts. In the context of video games. They typically interact with the player in some way, often providing quests, dialogue, or other information.

The following demonstrate some of these use cases:

  • Medical Office Receptionist: Agent that have engaging conversations with patients based on symptoms and medical history.
  • Restaurant Agent: A restaurant front-of-house agent that can take orders, add items to a shared cart, and checkout.
  • Concierage Agent: Our Ai voice agent acts like a smart, always-on assistant answering inquiries, booking showings, and guiding potential clients anytime, day or night.
  • Travel Agency Agent: Travel agent quickly answers about bookings, cancellations, documents, and changes. It is equiped to respond instantly and around the clock.

Framework overview

flowchart TD
    classDef adminActions fill:#4a5568,stroke:#2d3748,color:#e2e8f0
    classDef systemActions fill:#553c9a,stroke:#44337a,color:#e9d8fd
    classDef customerActions fill:#2c5282,stroke:#2a4365,color:#bee3f8

    %% Admin Setup Flow
    A[Company Administrator] -->|1. Sign Up| B[Provide Company Info]
    B -->|2. Upload| C[Knowledge Base]
    C -->|3. Select| D[Configure Agent]
    D -->|Set Phone Number| E[Assign Phone Number]
    D -->|Set Language| F[Configure Initial Language]
    E --> G[4. Start Agent]
    F --> G

    %% Distribution Flow
    G -->|5a. Share Link| H[Add to Website UI]
    G -->|5b. Share Phone| I[Publish Phone Number]

    %% Customer Interaction Flow
    J[Customer] -->|Web Access| H
    J -->|Phone Call| I
    H -->|6a. Web Interaction| K[AI Agent]
    I -->|6b. Voice Interaction| K

    %% Styling
    class A,B,C,D,E,F,G,H,I adminActions
    class J,K customerActions

    %% Subgraphs for visual grouping
    subgraph "Admin Setup Phase"
        A
        B
        C
        D
        E
        F
        G
    end

    subgraph "Customer Access Points"
        H
        I
    end

    subgraph "Interaction Phase"
        J
        K
    end

Our Agent operates as a stateful, realtime bridge between powerful AI models and your users. While AI models typically run in data centers with reliable connectivity, users often connect from mobile networks with varying quality.

WebRTC ensures smooth communication between agents and users, even over unstable connections. CallmAI WebRTC is used between the frontend and the agent, while the agent communicates with our servers using HTTP requests and WebSockets. This setup provides the benefits of WebRTC without its typical complexity.

The platform includes components for handling the core challenges of realtime voice AI, such as streaming audio through a voice pipeline, reliable turn detection, handling interruptions, and LLM orchestration.

Other framework features include:

  • Voice, video, and text: We build agents that can process realtime input and produce output in any modality.
  • Multi-agent handoff: Break down complex workflows into simpler tasks.
  • Extensive integrations: Integrate with nearly every service using APIs.
  • Made for developers: Control your agents through simple API calls, and do not worry about configurations and architecture.
  • Production ready: Includes built-in agent orchestration, load balancing, and detailed statistics.

How our agents connect to your clients

sequenceDiagram
    participant Agent as Agent
    participant Server as CallmAI Server
    participant Session as CallmAI Session
    participant User as User
    participant Frontend as Frontend App
    participant Phone as Phone

    Note over Agent,Server: Agent Registration Phase
    Agent->>Server: Register as available agent
    Server-->>Agent: Acknowledge registration

    Note over User,Server: Session Initiation Phase
    User->>Server: Request session
    Server->>Session: Create new session
    Server->>Agent: Dispatch to session
    Agent->>Session: Join session

    Note over User,Session: User Connection Phase
    alt Web/App Connection
        User->>Frontend: Open application
        Frontend->>Session: Connect via WebRTC
    else Phone Connection
        User->>Phone: Dial in
        Phone->>Session: Connect via telephony
    end

    Note over Agent,User: Realtime Communication
    Session->>Agent: User joined notification
    Agent->>User: Initial greeting

    loop Active Session
        User->>Agent: Send audio/video/text
        Agent->>User: Process and respond
    end

When your agent starts, it first registers with a CallmAI server. The agent waits until, a real user starts and join a session.

After your agent and user are both in a session, the agent and your frontend app can communicate using CallmAI WebRTC. This enables reliable and fast realtime communication in any network conditions. CallmAI also includes full support for telephony, so the user can join the call from a phone instead of a frontend app