India AI Impact Summit 2026

Speak.
Listen.
No Cloud.

Session 5 — Build a voice-to-voice AI News Anchor using Pipecat. All services run locally. Sub-2-second latency. Zero API fees.

<2s

Latency

3

Local services

₹0

API cost

The Shift

Text is powerful.
Voice is natural.

Every session so far used keyboards and screens. Session 5 is different — you speak and the AI speaks back.

Sessions 1-4

Type → Read

Keyboard input
Screen output
Hands required

Session 5

Speak → Listen

Voice input
Audio output
Hands-free

🎤 You say

"What are the top headlines today?"

🔊 AI responds

"Here are today's top stories. First, India's GDP grew 8.2%..."

Framework

What is
Pipecat?

An open-source Python framework for building voice and multimodal AI agents. Think of it as Express.js but for voice pipelines.

Pipeline architecture — Chain services like LEGO blocks: STT → LLM → TTS.
Modular & swappable — Replace any service without touching the rest.
Real-time streaming — Audio frames flow through the pipeline with minimal latency.

# Pipecat — Voice AI in ~20 lines

from pipecat.pipeline import Pipeline
from pipecat.services import (
    WhisperSTT,      # Speech → Text
    OllamaLLM,       # Local LLM
    KokoroTTS,       # Text → Speech
)

# Build the pipeline
pipeline = Pipeline([
    WhisperSTT(model="base"),
    OllamaLLM(model="llama3.2"),
    KokoroTTS(voice="af_heart"),
])

# Run — that's it!
pipeline.run()

Architecture

The Voice AI Pipeline

Your voice flows through three services — each running locally on your AI PC. No cloud round-trips.

Your Voice

Microphone input

Whisper STT

Speech → Text

Local LLM

Think + Reason

Kokoro TTS

Text → Speech

Speaker

AI responds

🎧 Whisper — Open-source STT by OpenAI

🧠 Ollama — Run any LLM locally

🔊 Kokoro — Fast local TTS engine

Open Source Stack

Three Services. All Local. All Free.

Each service runs as a separate process. Pipecat orchestrates data flow between them.

🎧

Speech-to-Text

Whisper

✓ By OpenAI (open-sourced)
✓ Multi-language support
✓ GPU-accelerated on Intel
✓ ~200ms transcription

🧠

Language Model

Ollama (LLM)

✓ Llama 3.2 / Mistral / Phi
✓ Understands intent & context
✓ GPU/NPU accelerated
✓ ~800ms first token

🔊

Text-to-Speech

Kokoro TTS

✓ Natural-sounding voices
✓ Multiple voice styles
✓ Streaming output
✓ ~400ms to first audio

Performance

Under 2 Seconds.
End to End.

You speak → AI thinks → AI speaks back. The entire round-trip happens in under 2 seconds locally. Cloud services add 3-5x more latency.

🚀 Key insight: Pipecat uses streaming — TTS starts speaking while the LLM is still generating tokens. This overlaps latencies.

💡 Why so fast? No network round-trip to cloud servers. All three services are talking to each other on localhost (127.0.0.1).

Latency Breakdown (Local)

🎧 STT

200ms

~200ms

🧠 LLM

800ms

~800ms

🔊 TTS

400ms

~400ms

⚡ Total

~1.4s

vs Cloud (for comparison)

☁️ Cloud

3-6 seconds

3-6s

Cost Comparison

Cloud charges
per minute.

Every major cloud voice AI service charges per minute of audio or per character of text. Running locally? Unlimited. Forever. Free.

# Cloud costs for 1 hour of voice AI:

Google Cloud Speech:  ₹2.00/min
OpenAI Whisper API:  ₹0.50/min
Cloud TTS:           ₹330-1300/1M chars
GPT-4 API:           ₹2.50/1K tokens

# 1 hour of voice conversation ≈
Cloud total:  ~₹160-400 per hour
Local total:  ₹0. Always. 🎉

❌ Cloud Voice AI

Pay-per-use pricing

Each API call costs money
Voice data sent to servers
Network latency added
Rate limits apply

✅ Local Pipecat

₹0 unlimited usage

No API costs ever
Voice stays on device
Sub-2s local latency
No rate limits

Live Demo

Talk to Your
News Anchor

Speak naturally. The AI fetches real-time news and talks back. Try these conversations:

1

"Hello, are you there?"

2

"What are the top headlines today?"

3

"Tell me about technology news."

4

"How are the markets doing?"

5

"Tell me more about that first story."

🎤 Pro tip: Speak naturally — no need to talk slowly or loudly. The AI handles natural conversation pace.

Beyond News

Voice AI Use Cases

The same Pipecat pipeline pattern works for any voice agent — just swap the LLM prompt and data source.

📰

News Anchor

Real-time news briefings via voice

🏥

Healthcare

Voice-based patient triage and reminders

🎓

Education

Interactive voice tutors for students

🏠

Smart Home

Private voice assistant — no Alexa

🛎️

Customer Support

AI phone agents for businesses

♿

Accessibility

Voice UI for visually impaired users

🚗

Automotive

In-car voice assistants

🧸

Toys & Companions

AI-powered interactive characters

Session Complete

What You Learned

A complete voice AI agent — built with open-source tools, running locally, with sub-2s latency.

🔌

Pipecat Framework

Pipeline architecture for voice AI agents

🎧

STT → LLM → TTS

Three-service pipeline, all running locally

⚡

<2s Latency

Streaming + local = real-time conversation

🔒

100% Private

Your voice never leaves your device

Speak. Listen. No Cloud.

Text is powerful.Voice is natural.