Protocol-grounded clinical guidance for frontline health workers. No internet required. Runs on any device.
AfyaPack is a full-stack AI health assistant that works entirely offline, using local language models to provide grounded, citation-backed clinical decision support to health workers in low-resource settings. It supports English and Swahili, runs on both desktop and mobile, and exposes its intelligence through a conversational AI chat interface backed by an agentic routing layer.
Frontline health workers in remote and rural areas manage complex patient presentations without access to specialist consultation, reference libraries, or stable internet. They make life-or-death decisions under pressure, often alone. When a child arrives with fever, convulsions, and signs of dehydration — the worker must instantly recall multi-step clinical protocols while managing the patient.
AfyaPack solves this. It puts an AI clinical assistant in their pocket that:
| Feature | Description |
|---|---|
| AI Chat | Natural language Q&A — ask in English or Swahili, get protocol-grounded answers |
| Patient Encounter Triage | 4-step structured intake with live danger sign screening |
| Protocol-Grounded Guidance | AI answers are grounded exclusively in local protocol documents |
| Red Flag Screening | Rule-based engine fires before AI — instant alerts for critical vitals |
| Referral Generator | AI writes a structured handoff note, editable before printing |
| Stock Tracker | Medicine and supply inventory with low-stock alerts |
| Protocol Library | Full-text search across clinical guidelines, available offline |
| Swahili Support | Language auto-detection, adapted system prompts, medical term translation |
| MCP Server | 13 tools exposable to Claude and AI assistants via Model Context Protocol |
| Voice Input | Speech-to-text transcription in the chat composer (Web Speech API) |
The dashboard greets the user with live system status — AI model readiness, protocol count, stock alerts — and a one-tap chat entry point with contextual quick prompts.
User: "Mtoto wa miaka 2 ana homa na kuhara, ameshindwa kunywa. Nifanye nini?"
(Child of 2 years has fever and diarrhea, unable to drink. What do I do?)
AfyaPack:
Response includes:
For more complex assessments, the 4-step guided triage form captures:
Live red flag alerts appear as the form fills — before any AI call.
After encounter submission, the app:
One tap generates a structured clinical handoff note, pre-populated with patient data and clinical concern. Editable before copy/save. Works offline — no print server needed.
┌─────────────────────────────────────────────────────────────────┐
│ AFYAPACK SYSTEM │
├──────────────────────┬──────────────────────┬───────────────────┤
│ FRONTEND (Web) │ BACKEND (API) │ LOCAL AI LAYER │
│ Next.js 14 │ Node.js + Express │ │
│ Tailwind CSS │ Port 3001 │ ┌─────────────┐ │
│ Framer Motion │ │ │Foundry Local│ │
│ Port 3000 │ ┌───────────────┐ │ │qwen2.5-0.5b │ │
│ │ │ SQLite │ │ │Port 54346 │ │
│ ┌───────────────┐ │ │ (sql.js WASM) │ │ └──────┬──────┘ │
│ │ Chat UI │ │ │ │ │ │ fallback │
│ │ Encounter Form│◄──┤ │ Protocols DB │ │ ┌──────▼──────┐ │
│ │ Guidance View │ │ │ Encounters │ │ │ Ollama │ │
│ │ Stock Tracker │ │ │ Guidance │ │ │ mistral: │ │
│ │ Protocol Lib │ │ │ Referrals │ │ │ latest │ │
│ └───────┬───────┘ │ │ Stock Items │ │ │Port 11434 │ │
│ │ │ └───────────────┘ │ └──────┬──────┘ │
│ IndexedDB (idb) │ │ │ fallback │
│ - Offline queue │ ┌───────────────┐ │ ┌──────▼──────┐ │
│ - UI state cache │ │ TF-IDF Engine │ │ │ Mock Mode │ │
│ - Encounter drafts │ │ 28 chunks │ │ │ Template AI │ │
│ │ │ 5 protocols │ │ └─────────────┘ │
└──────────────────────┴──┴───────────────┴───┴───────────────────┘
│
┌─────────▼─────────┐
│ MCP SERVER │
│ 13 Tools │
│ Swahili Support │
│ Claude Code CLI │
└───────────────────┘
Key design decisions:
better-sqlite3 fails to compile on Node 24. sql.js is pure WebAssembly — no native compilation, runs anywhere, persists to disk asynchronously. A compatibility shim exposes the same synchronous API.AfyaPack uses a Retrieval-Augmented Generation (RAG) pattern, but entirely offline:
User Question
│
▼
Intent Detection (rule-based regex)
│
├── clinical → Protocol Search (TF-IDF)
├── stock → SQLite stock query
└── general → pass-through
│
▼
Context Assembly
(retrieved chunks injected into prompt)
│
▼
Local AI Model (Foundry Local / Ollama)
│
▼
Structured Response
+ citations
+ escalation flag
+ language tag
Fine-tuning a model requires compute, data curation, and retraining when protocols change. RAG lets us:
No external vector database. The TF-IDF + cosine similarity engine runs entirely in the Node.js process:
// tokenize → compute TF → compute IDF → build vector → cosine similarity
const query = "fever dehydration child convulsions";
const chunks = searchProtocols(query, 4);
// Returns ranked chunks with doc title, section, score, and raw text
Why TF-IDF over embeddings?
Embeddings require a model call or a large embedding model on disk. TF-IDF:
Foundry Local is Microsoft’s runtime for running AI models directly on-device. AfyaPack uses it to power its AI guidance without any cloud API.
| Property | Benefit for AfyaPack |
|---|---|
| On-device inference | Works in areas with no internet connectivity |
| OpenAI-compatible API | Drop-in replacement — same POST /v1/chat/completions call |
| Quantized models | qwen2.5-0.5b runs on CPU with 4-bit quantization — ~1.5 GB RAM |
| Lightweight runtime | The Foundry Local daemon is a small background service |
| No data leaves the device | Patient data stays local — critical for healthcare privacy |
| Cross-platform | Windows, macOS, Linux — same API everywhere |
Because Foundry Local exposes a standard OpenAI-compatible HTTP API (http://localhost:54346/v1), it can be swapped with any compatible runtime — including containerized deployments:
Docker Compose approach (future scale):
┌──────────────────┐ HTTP ┌─────────────────────┐
│ AfyaPack API │ ────────► │ Foundry Local │
│ (Node container) │ │ (sidecar container) │
└──────────────────┘ └─────────────────────┘
The same codebase that runs on a health worker’s laptop can be deployed as microservices in a clinic’s local network — serving multiple tablets from a single AI inference server. This enables a hub-and-spoke model:
# Install Foundry Local CLI
winget install Microsoft.FoundryLocal # Windows
brew install foundry-local # macOS
# Pull and run the model (downloads ~1.5 GB once)
foundry model run qwen2.5-0.5b
# Verify it's running
curl http://127.0.0.1:54346/v1/models
AfyaPack automatically detects Foundry Local on startup:
// api/src/services/foundry.js
async function tryFoundry() {
const res = await fetchWithTimeout(
`${FOUNDRY_ENDPOINT}/v1/chat/completions`,
{ method: 'POST', body: JSON.stringify({ model: FOUNDRY_MODEL,
messages: [{ role: 'user', content: 'Hi' }], max_tokens: 5 }) },
5000 // 5 second probe timeout
);
return res.ok;
}
Ollama provides an alternative local inference runtime. AfyaPack falls back to Ollama automatically if Foundry Local is not running:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh # Linux/macOS
# Or download from https://ollama.com on Windows
# Pull a model
ollama pull mistral
# or smaller:
ollama pull qwen2.5:0.5b
# Verify
ollama list
AfyaPack uses Ollama’s OpenAI-compatible endpoint (/v1/chat/completions) — the same code path as Foundry Local:
// Identical call signature — provider is transparent to the rest of the app
const res = await fetch(`${OLLAMA_ENDPOINT}/v1/chat/completions`, {
method: 'POST',
body: JSON.stringify({ model: OLLAMA_MODEL, messages, max_tokens: 700 })
});
Provider priority order:
Startup probe (5s timeout each):
1. Foundry Local at :54346 → use if reachable
2. Ollama at :11434 → use if reachable
3. Mock mode → template responses, retrieval still active
The VS Code AI Toolkit extension was used during development to:
qwen2.5-0.5b, phi-3.5-mini, and Llama-3.2-1B for our use case (low-latency clinical Q&A on CPU)Model selection rationale:
| Model | Size | Latency (CPU) | Quality | Chosen |
|---|---|---|---|---|
qwen2.5-0.5b |
~1.5 GB | 8–20s | Good for structured output | ✅ Primary |
phi-3.5-mini |
~2.2 GB | 15–40s | Better reasoning | Backup |
mistral:latest |
~4 GB | 30–60s | Best quality | ✅ Ollama fallback |
The 0.5B parameter model was chosen for Foundry Local because it runs on CPU without requiring a GPU, making it suitable for low-cost clinic hardware.
AfyaPack bypasses the official Foundry Local SDK and calls the HTTP API directly. This was a deliberate choice after discovering SDK initialization issues:
// Direct HTTP — stable, dependency-free, future-proof
const response = await fetch('http://127.0.0.1:54346/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'qwen2.5-0.5b-instruct-generic-cpu:4',
messages: conversationHistory,
max_tokens: 700,
temperature: 0.25 // Low temperature for clinical consistency
})
});
This approach means AfyaPack works with any OpenAI-compatible endpoint — Foundry Local, Ollama, Azure OpenAI, or a self-hosted vLLM server — by changing a single environment variable.
AfyaPack exposes its capabilities as a Model Context Protocol (MCP) server, allowing Claude Code and other AI assistants to call AfyaPack’s tools directly.
# The MCP server registers with Claude Code
claude mcp add afyapack node C:/copilot/AfyaPack/mcp/src/index.js
# Verify connection
claude mcp list
# afyapack: ✓ Connected
| Tool | Description |
|---|---|
search_protocols |
TF-IDF search across local clinical protocols |
create_encounter |
Create a patient encounter with red flag screening |
get_guidance |
Generate grounded AI guidance for an encounter |
generate_referral |
Create a structured referral handoff note |
check_stock |
Query medicine and supply levels |
get_system_status |
Check AI model, API, and DB health |
ask_foundry |
Direct call to Foundry Local model |
ask_ollama |
Direct call to Ollama model |
ask_local_model |
Auto-routing to best available model |
explain_in_swahili |
Translate/explain clinical content in Swahili |
translate_clinical_term |
Bidirectional English↔Swahili medical term lookup |
detect_language |
Detect if text is Swahili or English |
list_medical_terms |
Return the full medical term dictionary |
Once the MCP server is connected, you can ask Claude:
"Use search_protocols to find guidance on maternal warning signs"
"Create an encounter for a 3-year-old with fever 39.5°C,
pulse 130, and convulsions — then get guidance"
"Check stock and tell me what medicines are critically low"
"Explain ORS preparation in Swahili using explain_in_swahili"
// mcp/src/index.js
const server = new McpServer({ name: 'afyapack-mcp', version: '1.0.0' });
// Tools are registered with Zod schema validation
server.tool('search_protocols', description, zodSchema, async (args) => {
const result = await tool.handler(args);
return { content: [{ type: 'text', text: JSON.stringify(result) }] };
});
// Swahili detection in error handling
const lang = detectSwahili(JSON.stringify(args)) ? 'sw' : 'en';
const errorMsg = lang === 'sw' ? `Hitilafu: ${err.message}` : `Error: ${err.message}`;
cd mcp
node src/test.js
Expected output: 24/24 tests pass across provider detection, Swahili support, API integration, and full clinical workflow.
git clone https://github.com/yourusername/afyapack.git
cd afyapack
npm run install:all
# Installs root, api/, and web/ dependencies in one command
# Root .env (already committed with sensible defaults)
cat .env
FOUNDRY_ENDPOINT=http://127.0.0.1:54346
FOUNDRY_MODEL=qwen2.5-0.5b-instruct-generic-cpu:4
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=mistral:latest
AFYAPACK_API_PORT=3001
DEFAULT_LANGUAGE=sw
The API reads from api/.env:
PORT=3001
FOUNDRY_ENDPOINT=http://127.0.0.1:54346
FOUNDRY_MODEL=qwen2.5-0.5b-instruct-generic-cpu:4
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=mistral:latest
Option A: Foundry Local (recommended — lighter weight)
# Windows (winget)
winget install Microsoft.FoundryLocal
# Run the model (downloads ~1.5 GB on first run)
foundry model run qwen2.5-0.5b
# Verify
curl http://127.0.0.1:54346/v1/models
Option B: Ollama
# Download from https://ollama.com or:
curl -fsSL https://ollama.com/install.sh | sh # Linux/macOS
ollama pull mistral
# or lighter alternative:
ollama pull qwen2.5:0.5b
Option C: No local AI (demo mode)
Skip this step. The app runs in demo mode — retrieval and red flag screening still work. AI responses are templated.
# Both servers together (recommended)
npm run dev
# Output:
# [API] AfyaPack API running at http://localhost:3001
# [WEB] - ready on http://localhost:3000
Or start them separately:
# Terminal 1 — API
cd api && node src/index.js
# Terminal 2 — Web
cd web && npx next dev -p 3000
# Check API health
curl http://localhost:3001/api/health
# Expected response:
{
"status": "ok",
"ai": { "ready": true, "provider": "foundry", "model": "qwen2.5-0.5b-..." },
"db": { "protocol_chunks": 28, "stock_items": 12, "encounters": 2 }
}
Open http://localhost:3000 in your browser.
# Add the MCP server
claude mcp add afyapack node /absolute/path/to/afyapack/mcp/src/index.js \
--env FOUNDRY_ENDPOINT=http://127.0.0.1:54346 \
--env AFYAPACK_API_URL=http://localhost:3001 \
--env DEFAULT_LANGUAGE=sw
# Verify
claude mcp list
# Run end-to-end MCP tests
cd mcp && node src/test.js
All AI model and endpoint configuration is managed through environment variables, enabling easy switching between Foundry Local, Ollama, or any OpenAI-compatible endpoint:
| Variable | Default | Description |
|---|---|---|
FOUNDRY_ENDPOINT |
http://127.0.0.1:54346 |
Foundry Local API URL |
FOUNDRY_MODEL |
qwen2.5-0.5b-instruct-generic-cpu:4 |
Model ID for Foundry Local |
OLLAMA_ENDPOINT |
http://localhost:11434 |
Ollama API URL |
OLLAMA_MODEL |
mistral:latest |
Model for Ollama |
AFYAPACK_API_PORT |
3001 |
API server port |
DEFAULT_LANGUAGE |
sw |
Default response language |
NEXT_PUBLIC_API_URL |
http://localhost:3001 |
API URL for Next.js client |
To use a different model or provider, only the .env files need to change — no code modification required.
Base URL: http://localhost:3001
GET /api/health
→ { status, ai: { ready, provider, model }, db: { protocol_chunks, stock_items, encounters } }
POST /api/chat
Body: { message: string, history: Message[] }
→ { reply, citations, escalation_needed, tool_used, language, intent }
GET /api/encounters → list all encounters
GET /api/encounters/:id → get single encounter
POST /api/encounters → create encounter (triggers red flag screening)
PATCH /api/encounters/:id → update encounter
GET /api/guidance/:encounterId → get existing guidance
POST /api/guidance → generate grounded guidance for encounter
Body: { encounter_id: string }
GET /api/protocols → list all protocol documents
GET /api/protocols/:id → get protocol with all sections
POST /api/protocols/search → TF-IDF search
Body: { query: string, topK: number }
GET /api/referrals/:encounterId → get existing referral
POST /api/referrals → generate referral note
PATCH /api/referrals/:id → update (save edited note)
GET /api/stock → all stock items with is_low / is_out flags
POST /api/stock → create item
PATCH /api/stock/:id → update item
POST /api/stock/:id/adjust → adjust quantity by delta
DELETE /api/stock/:id → remove item
The /api/chat endpoint implements lightweight agentic behavior — automatically routing each message to the right tool before calling the AI:
// api/src/routes/chat.js
function detectIntent(message) {
if (STOCK_PATTERNS.test(message)) return 'stock';
if (DANGER_PATTERNS.test(message)) return 'screening';
if (CLINICAL_PATTERNS.test(message)) return 'clinical';
return 'general';
}
function detectLanguage(message) {
return SWAHILI_PATTERNS.test(message) ? 'sw' : 'en';
}
Routing decision table:
| Intent | Tool Called | Context Injected |
|---|---|---|
clinical |
searchProtocols(message, 4) |
Top 4 protocol chunks |
stock |
SQLite stock query | Full inventory table |
screening |
searchProtocols(message, 4) |
Top 4 protocol chunks |
general |
None | System prompt only |
The system prompt switches between English and Swahili based on language detection. The AI never receives context it wasn’t given — preventing hallucination of clinical facts outside the protocol corpus.
AfyaPack is designed for East African frontline health workers. Swahili (Kiswahili) is supported throughout:
// mcp/src/swahili.js
const SWAHILI_MARKERS = [
'homa', 'kuhara', 'mtoto', 'mgonjwa', 'dawa', 'mimba',
'dalili', 'haraka', 'hatari', 'msaada', 'degedege'
];
function detectSwahili(text) {
const lower = text.toLowerCase();
const hits = SWAHILI_MARKERS.filter(word => lower.includes(word));
return hits.length >= 2;
}
const MEDICAL_TERMS_SW_EN = {
'homa': 'fever',
'kuhara': 'diarrhea',
'degedege': 'convulsions',
'kutoweza kunywa': 'unable to drink',
'macho yaliyozama': 'sunken eyes',
'ngozi kurudi polepole': 'skin pinch returns slowly',
'mimba': 'pregnancy',
'damu nyingi': 'heavy bleeding',
};
When Swahili is detected, the system prompt switches:
Wewe ni AfyaPack, msaidizi wa afya wa AI anayefanya kazi bila mtandao.
Unasaidia wahudumu wa afya wa mstari wa mbele.
KANUNI MUHIMU:
1. Wewe SI daktari. Usiwahi kutoa utambuzi.
2. Toa ushauri kulingana na miongozo ya matibabu iliyotolewa tu.
3. Jibu kwa Kiswahili wazi na rahisi kuelewa.
4. Daima eleza wakati mgonjwa anahitaji kupelekwa hospitali.
The MCP server’s explain_in_swahili tool uses this prompt to explain any clinical content in plain Swahili, making specialist knowledge accessible to community health workers.
Before any AI call, a deterministic rule-based engine screens every patient encounter for critical clinical indicators. This fires instantly — no model inference needed.
// web/src/lib/redflags.js + api/src/routes/encounters.js
function screenRedFlags(encounter) {
const flags = [];
// Temperature thresholds
if (temp >= 40.0) flags.push({ severity: 'critical', message: 'Critical fever (≥40°C)', action: 'Test for malaria, assess for meningitis' });
if (temp >= 39.0) flags.push({ severity: 'high', message: 'High fever (≥39°C)', action: 'Give paracetamol, assess for malaria' });
if (temp < 35.5) flags.push({ severity: 'critical', message: 'Hypothermia (<35.5°C)', action: 'Warm patient, urgent referral' });
// Pulse thresholds
if (pulse > 130) flags.push({ severity: 'critical', message: 'Severe tachycardia (>130 bpm)', action: 'Urgent referral' });
if (pulse > 110) flags.push({ severity: 'high', message: 'Tachycardia (>110 bpm)', action: 'Assess for dehydration, infection' });
if (pulse < 50) flags.push({ severity: 'critical', message: 'Bradycardia (<50 bpm)', action: 'Urgent referral' });
// Symptom-based critical flags
const CRITICAL_SYMPTOMS = ['convulsions', 'unconscious', 'unable to breathe', 'degedege'];
CRITICAL_SYMPTOMS.forEach(sym => {
if (symptoms.some(s => s.toLowerCase().includes(sym))) {
flags.push({ severity: 'critical', message: `Danger sign: ${sym}` });
}
});
// Pre-eclampsia pattern (pregnancy-specific)
if (pregnant && hasHeadache && (hasSwelling || hasVisualDisturbance)) {
flags.push({ severity: 'critical', message: 'Possible pre-eclampsia', action: 'REFER IMMEDIATELY' });
}
return flags;
}
Red flags fire on the frontend in real time as the form fills — giving the health worker an alert before they even submit the encounter.
The retrieval engine is built entirely in JavaScript, with no external dependencies:
// 1. Ingest protocol documents
PROTOCOLS.forEach(doc => {
doc.sections.forEach(section => {
const text = `${doc.title} ${section.heading} ${section.text}`;
const tokens = tokenize(text);
// Store tokens in SQLite for IDF computation
db.prepare('INSERT INTO chunks VALUES (?, ?, ?, ?, ?)').run(...);
});
});
// 2. Compute IDF across all chunks
const allTokenArrays = chunks.map(c => tokenize(c.content));
const idf = computeIDF(allTokenArrays);
// IDF stored in memory for fast query-time scoring
function searchProtocols(query, topK = 4) {
const queryTokens = tokenize(query);
const queryVector = buildTFIDFVector(queryTokens, idf);
const scored = chunks.map(chunk => ({
...chunk,
score: cosineSimilarity(queryVector, chunk.tfidfVector)
}));
return scored
.sort((a, b) => b.score - a.score)
.slice(0, topK)
.filter(c => c.score > 0);
}
5 clinical protocol documents, 28 indexed chunks:
| Document | Sections | Topics |
|---|---|---|
| Fever and Dehydration in Children | 7 | IMCI, ORT, malaria, paracetamol dosing |
| Maternal Warning Signs | 6 | Pre-eclampsia, PPH, sepsis, antenatal danger signs |
| Referral Triggers | 5 | Decision criteria for referral |
| Community Health Worker Protocols | 5 | First aid, assessment, triage |
| Essential Medicines Guide | 5 | Dosing, storage, dispensing |
The current setup runs on a single laptop or desktop — ideal for individual health worker use and demonstration.
One AI inference server (a clinic PC or mini PC) serves multiple tablets over local WiFi:
[Clinic Server — runs Foundry Local]
│
Local WiFi (no internet needed)
┌────┴────┬────────┬────────┐
[Tab1] [Tab2] [Tab3] [Tab4]
(phone) (tablet) (tablet) (laptop)
Configuration: Set FOUNDRY_ENDPOINT and AFYAPACK_API_URL to the server’s local IP address on all client devices.
# docker-compose.yml (illustrative)
version: '3.8'
services:
api:
build: ./api
ports:
- "3001:3001"
environment:
- FOUNDRY_ENDPOINT=http://foundry:54346
- OLLAMA_ENDPOINT=http://ollama:11434
depends_on:
- foundry
web:
build: ./web
ports:
- "3000:3000"
environment:
- NEXT_PUBLIC_API_URL=http://api:3001
foundry:
image: mcr.microsoft.com/foundry-local:latest
ports:
- "54346:54346"
volumes:
- foundry-models:/models
volumes:
foundry-models:
This configuration:
foundry service[District Health Office Server]
│ (VPN or satellite when available)
├── Protocol updates pushed to all clinics
└── Anonymized aggregate statistics synced
[Clinic A Server] ── [Clinic A tablets ×4]
[Clinic B Server] ── [Clinic B tablets ×6]
[Clinic C Server] ── [Clinic C tablets ×2]
The offline-first design means each clinic operates independently. Sync happens opportunistically when connectivity is available.
The current sql.js (WASM SQLite) implementation is designed for single-device use. For multi-device or multi-clinic deployments:
| Scale | Recommended DB | Migration effort |
|---|---|---|
| Single device | sql.js (current) | None |
| Clinic LAN | libsql / Turso | Change getDB() adapter |
| Regional network | PostgreSQL + edge sync | Replace getDB() + add sync queue |
The sync queue infrastructure already exists: IndexedDB in the browser stores an offline_queue of actions to replay when a connection is available.
Traditional local AI deployments require complex native dependencies, CUDA drivers, or platform-specific builds. Foundry Local’s architecture separates concerns cleanly:
┌─────────────────────────────────────────┐
│ Your Application │
│ (Node.js, Python, .NET — any runtime) │
│ │
│ HTTP POST /v1/chat/completions │
└─────────────────┬───────────────────────┘
│ localhost:54346
┌─────────────────▼───────────────────────┐
│ Foundry Local Runtime │
│ - Model management │
│ - Quantization / hardware selection │
│ - OpenAI-compat API server │
│ - CPU/GPU dispatch │
└─────────────────────────────────────────┘
Because the interface is a simple HTTP API, the application container needs zero AI-specific dependencies. It’s a standard Node.js container that makes HTTP requests. The AI runtime is a separate sidecar.
This means:
AfyaPack includes PWA manifest configuration, enabling installation on Android and iOS from the browser:
// public/manifest.json
{
"name": "AfyaPack Health Assistant",
"short_name": "AfyaPack",
"start_url": "/",
"display": "standalone",
"theme_color": "#16A394",
"background_color": "#F2F4F8",
"icons": [...]
}
When installed as a PWA:
AfyaPack is designed mobile-first, not mobile-adapted:
| Decision | Rationale |
|---|---|
| Fixed bottom nav (5 items) | Thumb-reachable on phones |
| Chat CTA as primary bottom nav item | Most-used feature is always one tap away |
| Sticky chat composer at bottom | Mirrors WhatsApp/SMS pattern familiar to users |
| Voice input in composer | Faster than typing on phone keyboard |
| Symptom chips (tap-to-add) | Avoids keyboard for most common inputs |
| Full-screen loading states | Prevents confusion during AI inference |
| 44px minimum touch targets | Accessibility and field usability |
The ChatComposer component uses the Web Speech API for voice transcription:
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = false;
recognition.interimResults = true;
recognition.onresult = (event) => {
const transcript = Array.from(event.results).map(r => r[0].transcript).join('');
setText(transcript); // Shows live transcription in the input
};
Supported on: Chrome for Android, Samsung Internet, Safari 14.1+, Chrome desktop. Falls back gracefully (microphone button hidden if not supported).
AfyaPack was built using AI assistance throughout the development process — from architecture decisions to code generation to debugging. Here is an honest account of how AI was used and what we learned.
| Tool | How it was used |
|---|---|
| Claude Code (Claude Sonnet 4.6) | Primary development assistant — architecture, all code generation, debugging |
| VS Code AI Toolkit | Model evaluation, inference playground, endpoint configuration |
| Foundry Local | Runtime for the AI model powering the product itself |
| GitHub Copilot | Inline completions during manual editing |
The entire project was scaffolded, built, and iterated through a conversation-driven development loop:
Architecture design through prompting: We described the problem, constraints, and technical requirements in a detailed prompt. Claude proposed the architecture — sql.js over better-sqlite3, TF-IDF over vector embeddings, provider waterfall pattern — and justified each choice.
Incremental code generation: Each module was generated with a prompt describing exactly what it needed to do, its inputs, outputs, and constraints. The AI generated working code that was then verified against the running system.
Debugging through conversation: When native SQLite compilation failed on Node 24, we described the error to Claude, which identified the EBUSY file lock issue, proposed the sql.js alternative, and generated the compatibility shim.
Redesign through specification: The full UI redesign was triggered by a single comprehensive design brief. All 10 phases (design system, layout, dashboard, chat, encounter flow, guidance, stock, protocols, settings, polish) were executed sequentially by the AI.
Test-driven iteration: The MCP test suite (24 tests) was generated first, then the implementation was built to pass it.
The most effective prompts for code generation in this project were structured as:
[WHAT IT IS]: One sentence describing the module's identity
[WHAT IT DOES]: Input → processing → output
[CONSTRAINTS]: Hard requirements (no external deps, must handle X, must return Y)
[CONTEXT]: What exists that this integrates with
[EDGE CASES]: Failure modes to handle
This prompt is sent to the local AI model with every guidance generation request. It took several iterations to get right:
You are AfyaPack, a protocol-based clinical decision support tool for frontline health workers.
CRITICAL RULES:
1. You are NOT a diagnostic tool. Never state a diagnosis.
2. Base your guidance ONLY on the protocol excerpts provided. Do not use outside knowledge.
3. Always include a clear safety note that this is decision support only.
4. Always state when referral or escalation is needed.
5. Cite which protocol source(s) your guidance is based on.
6. Use plain, clear language appropriate for frontline workers.
7. If the retrieved protocols do not cover the presentation, explicitly say so.
RESPONSE FORMAT:
**Assessment Context**
[Brief 1-2 sentence summary of the key clinical picture]
**Protocol-Based Guidance**
[Numbered list of recommended actions based ONLY on retrieved protocols]
**When to Refer/Escalate**
[Clear conditions that require referral, based on protocols]
**Sources Used**
[List the protocol titles cited, numbered]
**Safety Note**
⚠️ This is protocol-based decision support only.
What we learned about this prompt:
The language-switching prompt was generated with this specification:
Build a language detection function that:
- Takes a string of text
- Returns 'sw' if it's Swahili, 'en' otherwise
- Uses heuristic matching on a curated word list (not an ML model)
- Must work offline without any API call
- Should handle mixed-language input gracefully
- Clinical Swahili words to target: homa, kuhara, degedege, mimba, dalili
- False positive rate must be very low for English text
The resulting function checks for ≥2 Swahili marker word hits — a simple but highly reliable heuristic for this domain.
The intent detection system was specified as:
Build a lightweight intent router that:
- Takes a user message string
- Returns one of: 'clinical', 'stock', 'screening', 'general'
- Uses regex pattern matching (not an LLM call) — must be <1ms
- 'clinical': mentions symptoms, vitals, protocols, treatments
- 'stock': mentions medicines, supplies, inventory, running out
- 'screening': mentions danger, urgent, emergency, referral
- 'general': everything else
- Must handle Swahili keywords in each category
This zero-latency routing layer means the AI is only called once per message, with the right context pre-assembled.
Too vague: "Build the TF-IDF retrieval engine" — resulted in a correct but non-integrated implementation that required significant rework.
Too prescriptive on implementation: "Use a Map object with string keys for the TF-IDF vectors" — the AI followed the instruction but the resulting code was harder to optimize. Describing the behaviour rather than the implementation produced better results.
Missing constraints: Early guidance prompts didn’t include the “ONLY use retrieved protocols” constraint, leading to responses that mixed retrieved facts with model-internal knowledge.
Safety constraints must be explicit and repeated — the model cannot infer that clinical hallucination is unacceptable. State it in the system prompt, state it in the user prompt, and validate the output.
Format instructions are load-bearing — structured output (numbered sections, bold headings) is both more readable for the user AND easier to parse programmatically.
Temperature matters more than model size — qwen2.5-0.5b at temperature 0.2 with a strict system prompt outperformed larger models at temperature 0.7 for consistent clinical formatting.
Retrieval quality > model quality — a good TF-IDF retrieval with the right 4 protocol chunks produces better guidance than a larger model with no grounding.
AfyaPack was designed with responsible AI principles from the ground up:
Every component reinforces the same message:
The AI cannot answer outside its protocol corpus. If no relevant protocol chunks are retrieved, the response explicitly states: “The retrieved protocols do not cover this presentation.”
Every AI response shows which protocol sections were used. Users can see the exact source text. If the AI is wrong, the citation shows the user where to verify.
No patient data leaves the device. All storage is:
afyapack.db)The rule-based red flag engine fires before the AI. Critical danger signs produce immediate escalation alerts regardless of what the AI says. The AI cannot override a critical flag.
afyapack/
├── api/ # Express backend
│ ├── src/
│ │ ├── db/
│ │ │ ├── index.js # sql.js wrapper (better-sqlite3 compat API)
│ │ │ └── seed.js # Protocol ingestion + demo data
│ │ ├── routes/
│ │ │ ├── chat.js # POST /api/chat — agentic routing
│ │ │ ├── encounters.js # Patient encounter CRUD + red flag screening
│ │ │ ├── guidance.js # AI guidance generation (RAG)
│ │ │ ├── referrals.js # Referral note generation
│ │ │ ├── protocols.js # Protocol CRUD + TF-IDF search
│ │ │ ├── stock.js # Stock management
│ │ │ └── health.js # Health check
│ │ ├── services/
│ │ │ ├── foundry.js # AI provider (Foundry → Ollama → Mock)
│ │ │ ├── retrieval.js # TF-IDF engine
│ │ │ └── prompts.js # System prompt builders
│ │ └── index.js # App entrypoint
│ └── .env
│
├── web/ # Next.js 14 frontend
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.js # Dashboard
│ │ │ ├── chat/page.js # AI Chat interface
│ │ │ ├── encounter/page.js # Triage wizard
│ │ │ ├── guidance/[id]/ # Clinical guidance
│ │ │ ├── referral/[id]/ # Referral notes
│ │ │ ├── stock/page.js # Stock tracker
│ │ │ ├── protocols/page.js # Protocol library
│ │ │ ├── settings/page.js # System status
│ │ │ ├── globals.css # Design system
│ │ │ └── layout.js # Root layout
│ │ ├── components/
│ │ │ ├── chat/
│ │ │ │ ├── ChatComposer.jsx # Input with voice
│ │ │ │ ├── ChatMessage.jsx # Message renderer
│ │ │ │ └── WelcomeState.jsx # Empty state + prompts
│ │ │ ├── layout/
│ │ │ │ ├── AppShell.jsx
│ │ │ │ ├── Sidebar.jsx
│ │ │ │ ├── BottomNav.jsx
│ │ │ │ └── RightPanel.jsx
│ │ │ ├── EncounterForm.jsx # 4-step triage wizard
│ │ │ ├── RedFlagBanner.jsx # Clinical alert display
│ │ │ ├── GuidanceDisplay.jsx # Guidance markdown renderer
│ │ │ └── CitationCard.jsx # Protocol source cards
│ │ ├── hooks/
│ │ │ ├── useSystemStatus.js # Polls /api/health every 30s
│ │ │ └── useOfflineStatus.js # navigator.onLine events
│ │ └── lib/
│ │ ├── api.js # API client
│ │ ├── redflags.js # Client-side red flag screening
│ │ ├── db.js # IndexedDB (drafts, offline queue)
│ │ └── utils.js # Date, patient, symptom utils
│ └── tailwind.config.js
│
├── mcp/ # Model Context Protocol server
│ ├── src/
│ │ ├── index.js # MCP server (stdio transport)
│ │ ├── swahili.js # Language detection + dictionary
│ │ ├── providers/
│ │ │ ├── foundry.js # Foundry Local provider
│ │ │ ├── ollama.js # Ollama provider
│ │ │ └── router.js # Provider auto-selection
│ │ ├── tools/
│ │ │ ├── afyapack.js # AfyaPack API tools
│ │ │ ├── ask-model.js # Direct model tools
│ │ │ └── swahili-tools.js # Swahili language tools
│ │ └── test.js # 24-test end-to-end suite
│ └── claude_config.json # Claude Desktop MCP config
│
├── .env # Root environment config
├── package.json # Root scripts (concurrently)
└── README.md
| Layer | Technology | Why |
|---|---|---|
| Frontend framework | Next.js 14 (App Router) | React server components, file-based routing, PWA support |
| Styling | Tailwind CSS + custom design system | Utility-first, no runtime overhead |
| Animations | Framer Motion | Smooth transitions without fighting CSS |
| UI primitives | Radix UI (unstyled) | Accessible, headless, fully customizable |
| Icons | Lucide React | Consistent, tree-shakable, 1000+ icons |
| Typography | Plus Jakarta Sans + Inter (Google Fonts) | Premium feel, wide language support |
| Backend | Node.js + Express | JavaScript everywhere, minimal overhead |
| Database | sql.js (WASM SQLite) | No native compilation — runs on Node 24, any OS |
| Browser storage | IndexedDB via idb |
Offline queue, drafts, UI state |
| Local AI (primary) | Foundry Local + qwen2.5-0.5b | On-device, CPU-only, OpenAI-compat API |
| Local AI (fallback) | Ollama + mistral:latest | Broader model choice, same API |
| Retrieval | Custom TF-IDF in JavaScript | Zero dependencies, < 5ms, fully offline |
| MCP | @modelcontextprotocol/sdk |
Expose tools to Claude and AI assistants |
| Voice input | Web Speech API | No library needed, works on Android/Chrome |
| Concurrency | concurrently |
Single npm run dev starts both servers |
AfyaPack was built for the Microsoft JavaScript AI Build-a-thon. All patient data is stored locally. No data is transmitted to external servers. This tool is clinical decision support only. It does not diagnose.