Agentic RAG Clinical Trial Discovery Chatbot

A patient- and clinician-facing clinical trials search assistant that uses agentic RAG over 113k ClinicalTrials.gov records to surface grounded, safety-filtered trial options for 14 major diseases.

Domain: Healthcare • Clinical trials Tech: Python • Gemini 2.0 Flash • all-MiniLM-L6-v2 • Qdrant • Streamlit • Docker • Google Cloud Run Data: 113k ClinicalTrials.gov trials (14 diseases) with NCT IDs, status, phase, enrollment, and summaries UI: Streamlit chat interface for trial search

Live app → GitHub repo →

Abstract

This project develops a multi-agent, retrieval-augmented chatbot that lets patients and clinicians query ClinicalTrials.gov in natural language and receive ranked, explainable trial matches. The system embeds 113,247 trials into a Qdrant vector database using all-MiniLM-L6-v2 and applies a disease-aware hybrid ranking function that combines semantic similarity, disease alignment, and recruitment status. Top-5 trials are summarized with NCT IDs and PubMed-grounded explanations, while an ActiveSafetyFilter agent blocks or rewrites unsafe medical advice before responses reach the user. The assistant is delivered through a Streamlit web app deployed on Google Cloud Run, backed by a managed Qdrant cluster.

What I did (methods)

Multi-agent pipeline: Designed a zero-shot agentic workflow with SymptomParser, ProfileAgent, QdrantRetrievalAgent, DiagnosisAdvisor, and ActiveSafetyFilter agents, orchestrated around Gemini 2.0 Flash for parsing, summarization, and safety checks.
ClinicalTrials.gov preprocessing: Downloaded ~260k registry records, filtered to 113k trials across 14 target diseases, normalized text, constructed disease-specific synonym lists, and built a Qdrant collection with MiniLM embeddings plus rich trial metadata.
Hybrid retrieval & ranking: Implemented a hybrid scoring function 0.65 semantic similarity, 0.25 disease match, 0.10 status to rank trials and compute per-query confidence scores from the top-5 results.
Evaluation pipeline: Created HARD and ROBUST query sets (70 queries, 5 per disease) and computed Top-1, Recall@5, nDCG@5, MAP@5, confusion matrices, and error taxonomies (sparse recall vs semantic drift).
Web app & deployment: Built a Streamlit chat UI that displays ranked trial cards with NCT IDs, relevance scores, summaries, and links to ClinicalTrials.gov, containerized the app, and deployed it on Google Cloud Run with a managed Qdrant backend.

Key findings

Strong performance on clearer diseases: For diabetes, obesity, asthma, Parkinson’s disease, and prostate cancer, the system achieved high Top-1 and nDCG@5 on hard queries, confirming the value of disease-aware hybrid ranking.
Robustness gaps on noisy queries: Stroke, lung cancer, cardiovascular disease, and kidney disease showed performance drops on patient-style queries due to disease misclassification and sparse recall issues.
Agentic pipeline vs baseline: Compared with a Qdrant-only baseline, the HealthcareBot pipeline improved Top-1, nDCG@5, and MAP@5 while slightly lowering Recall@5, illustrating a precision–recall trade-off between breadth and disease-aware ranking.
Safety behavior: Safety audits showed that responses were either passed or revised by the ActiveSafetyFilter, with no unsafe medical advice emitted in the evaluated scenarios.

What this shows about me

Comfortable building end-to-end agentic RAG systems for high-stakes domains that combine vector search, hybrid ranking, and multi-agent orchestration.
Experienced working with large clinical registries, designing disease-specific filters, and engineering embeddings and vector-store schemas for retrieval at scale.
Able to design offline evaluation frameworks with ranking metrics, confusion matrices, and error analyses to understand system behavior in detail.
Focused on transparency and safety, exposing NCT IDs and PubMed evidence while enforcing domain-specific safety filters rather than generic content blocks.