Obiora : A Production Ready AI Medical Assistant with Voice and Emergency Triage

🩺 Obiora: Production-Ready Medical AI Assistant

Tags: AI · LLM · Multi-Agent Systems · LangGraph · FastAPI · Deployment · Healthcare AI · Text-to-Speech · Speech-to-Text

Abstract

This publication documents the design, development, and deployment of Obiora, a multi-agent medical AI assistant built using large language models and deployed as a production-ready API.

Unlike traditional single-prompt chatbot systems, Obiora introduces a stateful, multi-agent architecture that separates concerns between a conversational assistant and a specialized medical persona (Dr. Obiora). The system handles real user flows including onboarding, payment gating, session management, and post-consultation summaries.

The application is deployed using FastAPI + Docker on Render, with persistent user state, tool-based reasoning, and voice interaction capabilities via speech-to-text and text-to-speech pipelines.

This work demonstrates architecture design, state management, deployment strategy, and production considerations such as cost, scalability, and security.

1. Use Case Definition

1.1 Problem Statement

Healthcare access—especially for quick consultations—is often limited by:

Cost barriers
Availability of professionals
Friction in accessing basic medical advice

Obiora addresses this by providing a guided AI-assisted consultation experience that:

Welcomes and manages users
Handles structured onboarding (name, payment readiness)
Transfers users to a specialized “expert”
Maintains session context
Generates summaries of medical conversations

This is not just a chatbot — it is a controlled interaction system designed to simulate a structured consultation workflow.

1.2 Target Users

Individuals seeking quick, low-friction medical guidance
Early-stage digital health platforms
Developers building conversational healthcare tools

1.3 Input / Output Example

Input:

User: My name is Tunde  
User: I want to speak with Dr. Obiora  
User: 1234567890  
User: yes I am ready  
User: I have chest pain

Output:

Dr. Obiora → provides structured medical response  
System → stores session summary for future use

1.4 Success Criteria

Functional

Correct routing between assistant and doctor agent
Proper session transitions (pre → consultation → post)
Persistent user memory

Performance

< 2s response time (API-based LLM)
Stable multi-turn conversations

Product

Clear user flow (no confusion or broken transitions)
Scalable API endpoints

1.5 Traffic Expectations

Stage	Users/day	Notes
MVP	50–200	Testing + demos
Growth	1,000+	API integrations
Scale	10,000+	Health platforms

2. System Architecture

2.1 Core Design

Obiora is built using a multi-agent architecture powered by LangGraph, enabling:

Stateful conversations
Conditional routing
Tool-based reasoning

Agents:

Assistant Agent

Handles onboarding
Collects user data
Manages payment readiness

Dr. Obiora Agent

Handles medical interaction
Provides structured responses
Triggers session summarization

2.2 Why Multi-Agent Instead of Single LLM?

A single prompt cannot reliably enforce:

Structured workflows
Controlled transitions
Business logic (e.g., payment before consultation)

Multi-agent architecture enables:

Separation of concerns
Better control over behavior
More predictable outputs

2.3 State Management

The system maintains structured state:

AgentState = {
    "messages",
    "username",
    "new_user",
    "account_number",
    "ready_set",
    "dr_summary",
    "date"
}

This enables:

Memory across turns
Business logic enforcement
Post-session analytics

3. Model Selection & Configuration

Model Used

Provider: Groq
Model: The free-tier tool-calling models

Why This Model?

Strong instruction-following capability
Fast inference via Groq infrastructure
Good balance between cost and quality

This system is currently a baseline model-driven system.

The architecture is model-agnostic — meaning better models can be plugged in without changing the system design.

4. Deployment Strategy

Platform

Backend: FastAPI
Containerization: Docker
Hosting: Render

Why Render?

Simple deployment workflow
Built-in scaling
Free tier for MVP

Architecture Flow

Client → FastAPI → LangGraph → Groq LLM → Response
                          ↓
                    SQLite Store

Endpoint Design

/login → user session initialization
/chat → text interaction
/chat/voice → voice interaction
/health → system health check

5. Cost Analysis

Current Setup (API-based)

Component	Cost
LLM (Groq)	Usage-based
Hosting (Render)	Free / low-tier
Storage	Minimal

Estimated Cost Drivers

Token usage (LLM calls)
Voice transcription (Whisper)
Response generation

Optimization Strategies

Switch to smaller models for non-critical flows
Cache repeated responses
Introduce batching
Use hybrid local + API inference

Cost Insight

This architecture allows:

Linear scaling with usage, not infrastructure complexity

6. Monitoring & Observability

Key Metrics

Metric	Why
Latency	User experience
Error rate	Stability
Token usage	Cost control
Session completion	Product success

Planned Tools

LangSmith / LangFuse → LLM tracing
Logs (FastAPI) → debugging
Custom metrics → session tracking

7. Security Considerations

API input validation
Controlled tool execution
No direct model exposure
Environment variable management for API keys

Future improvements:

Rate limiting
Authentication tokens
PII redaction

8. From Prototype to Product

This project goes beyond a simple LLM demo by implementing:

Structured user journeys
Stateful conversations
Multi-agent orchestration
Persistent memory
API-based interaction

9. Limitations

No fine-tuned medical model (yet)
Relies on general-purpose LLM
No human validation layer
Limited real-world medical guarantees

10. Future Improvements

Short-term

Better prompt engineering
Response validation layer
Improved summarization

Medium-term

Fine-tuned medical model
Retrieval-augmented generation (RAG)
Conversation analytics

Long-term

Clinical-grade safety systems
Multi-language support
Integration with health systems

11. Conclusion

Obiora demonstrates how to move from:

❌ Prompt-based chatbot
➡️
✅ Production-ready AI system

By combining:

Multi-agent architecture
Stateful workflows
API deployment
Real-world interaction design

🔗 Links
GitHub: https://github.com/Blaqadonis/obiora