India AI Impact Summit 2026

Speak.
Listen.
No Cloud.

Session 5 — Build a voice-to-voice AI News Anchor using Pipecat. All services run locally. Sub-2-second latency. Zero API fees.

<2s
Latency
3
Local services
₹0
API cost
AI News Anchor
The Shift

Text is powerful.
Voice is natural.

Every session so far used keyboards and screens. Session 5 is different — you speak and the AI speaks back.

Sessions 1-4

Type → Read

  • Keyboard input
  • Screen output
  • Hands required
Session 5

Speak → Listen

  • Voice input
  • Audio output
  • Hands-free
🎤 You say
"What are the top headlines today?"
🔊 AI responds
"Here are today's top stories. First, India's GDP grew 8.2%..."
Framework

What is
Pipecat?

An open-source Python framework for building voice and multimodal AI agents. Think of it as Express.js but for voice pipelines.

  • Pipeline architecture — Chain services like LEGO blocks: STT → LLM → TTS.
  • Modular & swappable — Replace any service without touching the rest.
  • Real-time streaming — Audio frames flow through the pipeline with minimal latency.
# Pipecat — Voice AI in ~20 lines

from pipecat.pipeline import Pipeline
from pipecat.services import (
    WhisperSTT,      # Speech → Text
    OllamaLLM,       # Local LLM
    KokoroTTS,       # Text → Speech
)

# Build the pipeline
pipeline = Pipeline([
    WhisperSTT(model="base"),
    OllamaLLM(model="llama3.2"),
    KokoroTTS(voice="af_heart"),
])

# Run — that's it!
pipeline.run()
Architecture

The Voice AI Pipeline

Your voice flows through three services — each running locally on your AI PC. No cloud round-trips.

Your Voice
Microphone input
Whisper STT
Speech → Text
Local LLM
Think + Reason
Kokoro TTS
Text → Speech
Speaker
AI responds
🎧 Whisper — Open-source STT by OpenAI
🧠 Ollama — Run any LLM locally
🔊 Kokoro — Fast local TTS engine
Open Source Stack

Three Services. All Local. All Free.

Each service runs as a separate process. Pipecat orchestrates data flow between them.

🎧
Speech-to-Text
Whisper
  • ✓ By OpenAI (open-sourced)
  • ✓ Multi-language support
  • ✓ GPU-accelerated on Intel
  • ✓ ~200ms transcription
🧠
Language Model
Ollama (LLM)
  • ✓ Llama 3.2 / Mistral / Phi
  • ✓ Understands intent & context
  • ✓ GPU/NPU accelerated
  • ✓ ~800ms first token
🔊
Text-to-Speech
Kokoro TTS
  • ✓ Natural-sounding voices
  • ✓ Multiple voice styles
  • ✓ Streaming output
  • ✓ ~400ms to first audio
Performance

Under 2 Seconds.
End to End.

You speak → AI thinks → AI speaks back. The entire round-trip happens in under 2 seconds locally. Cloud services add 3-5x more latency.

🚀 Key insight: Pipecat uses streaming — TTS starts speaking while the LLM is still generating tokens. This overlaps latencies.
💡 Why so fast? No network round-trip to cloud servers. All three services are talking to each other on localhost (127.0.0.1).
Latency Breakdown (Local)
🎧 STT
200ms
~200ms
🧠 LLM
800ms
~800ms
🔊 TTS
400ms
~400ms
⚡ Total
~1.4s
~1.4s
vs Cloud (for comparison)
☁️ Cloud
3-6 seconds
3-6s
Cost Comparison

Cloud charges
per minute.

Every major cloud voice AI service charges per minute of audio or per character of text. Running locally? Unlimited. Forever. Free.

# Cloud costs for 1 hour of voice AI:

Google Cloud Speech:  ₹2.00/min
OpenAI Whisper API:  ₹0.50/min
Cloud TTS:           ₹330-1300/1M chars
GPT-4 API:           ₹2.50/1K tokens

# 1 hour of voice conversation ≈
Cloud total:  ~₹160-400 per hour
Local total:  ₹0. Always. 🎉
❌ Cloud Voice AI

Pay-per-use pricing

  • Each API call costs money
  • Voice data sent to servers
  • Network latency added
  • Rate limits apply
✅ Local Pipecat

₹0 unlimited usage

  • No API costs ever
  • Voice stays on device
  • Sub-2s local latency
  • No rate limits
Live Demo

Talk to Your
News Anchor

Speak naturally. The AI fetches real-time news and talks back. Try these conversations:

1
"Hello, are you there?"
2
"What are the top headlines today?"
3
"Tell me about technology news."
4
"How are the markets doing?"
5
"Tell me more about that first story."
Voice AI Demo
🎤 Pro tip: Speak naturally — no need to talk slowly or loudly. The AI handles natural conversation pace.
Beyond News

Voice AI Use Cases

The same Pipecat pipeline pattern works for any voice agent — just swap the LLM prompt and data source.

📰

News Anchor

Real-time news briefings via voice

🏥

Healthcare

Voice-based patient triage and reminders

🎓

Education

Interactive voice tutors for students

🏠

Smart Home

Private voice assistant — no Alexa

🛎️

Customer Support

AI phone agents for businesses

Accessibility

Voice UI for visually impaired users

🚗

Automotive

In-car voice assistants

🧸

Toys & Companions

AI-powered interactive characters

Session Complete

What You Learned

A complete voice AI agent — built with open-source tools, running locally, with sub-2s latency.

🔌

Pipecat Framework

Pipeline architecture for voice AI agents

🎧

STT → LLM → TTS

Three-service pipeline, all running locally

<2s Latency

Streaming + local = real-time conversation

🔒

100% Private

Your voice never leaves your device