Tags: AI · LLM · Multi-Agent Systems · LangGraph · FastAPI · Deployment · Healthcare AI · Text-to-Speech · Speech-to-Text
This publication documents the design, development, and deployment of Obiora, a multi-agent medical AI assistant built using large language models and deployed as a production-ready API.
Unlike traditional single-prompt chatbot systems, Obiora introduces a stateful, multi-agent architecture that separates concerns between a conversational assistant and a specialized medical persona (Dr. Obiora). The system handles real user flows including onboarding, payment gating, session management, and post-consultation summaries.
The application is deployed using FastAPI + Docker on Render, with persistent user state, tool-based reasoning, and voice interaction capabilities via speech-to-text and text-to-speech pipelines.
This work demonstrates architecture design, state management, deployment strategy, and production considerations such as cost, scalability, and security.
Healthcare access—especially for quick consultations—is often limited by:
Obiora addresses this by providing a guided AI-assisted consultation experience that:
This is not just a chatbot — it is a controlled interaction system designed to simulate a structured consultation workflow.
Input:
User: My name is Tunde User: I want to speak with Dr. Obiora User: 1234567890 User: yes I am ready User: I have chest pain
Output:
Dr. Obiora → provides structured medical response System → stores session summary for future use
Functional
Performance
Product
Stage Users/day Notes
MVP 50–200 Testing + demos
Growth 1,000+ API integrations
Scale 10,000+ Health platforms
Obiora is built using a multi-agent architecture powered by LangGraph, enabling:
Agents:
Assistant Agent
Dr. Obiora Agent
A single prompt cannot reliably enforce:
Multi-agent architecture enables:
The system maintains structured state:
AgentState = {
"messages",
"username",
"new_user",
"account_number",
"ready_set",
"dr_summary",
"date"
}
This enables:
Model Used
Provider: Groq
Model: The free-tier tool-calling models
Why This Model?
This system is currently a baseline model-driven system.
The architecture is model-agnostic — meaning better models can be plugged in without changing the system design.
Platform
Why Render?
Architecture Flow
Client → FastAPI → LangGraph → Groq LLM → Response
↓
SQLite Store
Endpoint Design
/login → user session initialization
/chat → text interaction
/chat/voice → voice interaction
/health → system health check
Current Setup (API-based)
Component Cost
LLM (Groq) Usage-based
Hosting (Render) Free / low-tier
Storage Minimal
Estimated Cost Drivers
Optimization Strategies
Cost Insight
This architecture allows:
Linear scaling with usage, not infrastructure complexity
Key Metrics
Metric Why
Latency User experience
Error rate Stability
Token usage Cost control
Session completion Product success
Planned Tools
Future improvements:
This project goes beyond a simple LLM demo by implementing:
Short-term
Medium-term
Long-term
Obiora demonstrates how to move from:
❌ Prompt-based chatbot
➡️
✅ Production-ready AI system
By combining:
🔗 Links
GitHub: https://github.com/Blaqadonis/obiora