cat README.md
# Medical RAG Assistant
A privacy-first clinical decision support system that grounds an LLM in a curated
medical knowledge base, with **all computation running locally** — no data leaves
the machine.
## Why
Clinical AI is only useful if it's trustworthy and confidential. Off-the-shelf
chat assistants hallucinate, can't cite sources, and ship patient data to third
parties. This system answers clinical questions from authoritative literature with
inline citations, while keeping every byte — records, audio, embeddings — on-device.
## How
- **Consultation pipeline** — upload audio or paste text → Whisper transcription →
structured SOAP extraction → RAG-grounded recommendations, streamed via SSE
- **Hybrid retrieval** — dense (MedEmbed) + BM25 sparse, fused with Reciprocal Rank
Fusion, then cross-encoder reranked to top-8
- **Curated knowledge base** — ingested from authoritative free APIs (FDA drug
labels, PubMed abstracts/guidelines/systematic reviews, MedlinePlus), filtered by
a 104-cluster / 462-keyword GP topic taxonomy spanning 14 specialties
- **Cited Q&A chat** — free-text clinical questions answered token-by-token with
inline source citations
- **Rigorous evaluation** — ablation across embed × chat × reranker × retrieval-mode
on 4 medical QA benchmarks (PubMedQA, MedQA-USMLE, MMLU, MedMCQA), plus end-to-end
WER/CER, ROUGE-L, BERTScore, and RAGAs faithfulness/precision on PriMock57 and
ACI-Bench
## Stack
- **Backend** — FastAPI (Python 3.12), PostgreSQL 16, SQLAlchemy + Pydantic
- **LLM & RAG** — Ollama (`qwen3:4b`), Qdrant hybrid vector store,
MedEmbed-large embeddings, gte-reranker-modernbert cross-encoder
- **Transcription** — faster-whisper `large-v3-turbo`, in-process CPU/GPU
- **Frontend** — React 18 + Vite + Ant Design 5, SSE streaming
- **Infra** — Docker Compose (PostgreSQL + Qdrant), uv, Makefile task runner
ls -l
Python, FastAPI, React, PyTorch, Qdrant, Ollama, PostgreSQL, faster-whisper