Agentic RAG Clinical Trial Discovery Chatbot
A patient- and clinician-facing clinical trials search assistant that uses agentic RAG over 113k ClinicalTrials.gov records to surface grounded, safety-filtered trial options for 14 major diseases.
Abstract
This project develops a multi-agent, retrieval-augmented chatbot that lets patients and clinicians query ClinicalTrials.gov in natural language and receive ranked, explainable trial matches. The system embeds 113,247 trials into a Qdrant vector database using all-MiniLM-L6-v2 and applies a disease-aware hybrid ranking function that combines semantic similarity, disease alignment, and recruitment status. Top-5 trials are summarized with NCT IDs and PubMed-grounded explanations, while an ActiveSafetyFilter agent blocks or rewrites unsafe medical advice before responses reach the user. The assistant is delivered through a Streamlit web app deployed on Google Cloud Run, backed by a managed Qdrant cluster.
What I did (methods)
- Multi-agent pipeline: Designed a zero-shot agentic workflow with SymptomParser, ProfileAgent, QdrantRetrievalAgent, DiagnosisAdvisor, and ActiveSafetyFilter agents, orchestrated around Gemini 2.0 Flash for parsing, summarization, and safety checks.
- ClinicalTrials.gov preprocessing: Downloaded ~260k registry records, filtered to 113k trials across 14 target diseases, normalized text, constructed disease-specific synonym lists, and built a Qdrant collection with MiniLM embeddings plus rich trial metadata.
- Hybrid retrieval & ranking: Implemented a hybrid scoring function 0.65 semantic similarity, 0.25 disease match, 0.10 status to rank trials and compute per-query confidence scores from the top-5 results.
- Evaluation pipeline: Created HARD and ROBUST query sets (70 queries, 5 per disease) and computed Top-1, Recall@5, nDCG@5, MAP@5, confusion matrices, and error taxonomies (sparse recall vs semantic drift).
- Web app & deployment: Built a Streamlit chat UI that displays ranked trial cards with NCT IDs, relevance scores, summaries, and links to ClinicalTrials.gov, containerized the app, and deployed it on Google Cloud Run with a managed Qdrant backend.
Key findings
- Strong performance on clearer diseases: For diabetes, obesity, asthma, Parkinson’s disease, and prostate cancer, the system achieved high Top-1 and nDCG@5 on hard queries, confirming the value of disease-aware hybrid ranking.
- Robustness gaps on noisy queries: Stroke, lung cancer, cardiovascular disease, and kidney disease showed performance drops on patient-style queries due to disease misclassification and sparse recall issues.
- Agentic pipeline vs baseline: Compared with a Qdrant-only baseline, the HealthcareBot pipeline improved Top-1, nDCG@5, and MAP@5 while slightly lowering Recall@5, illustrating a precision–recall trade-off between breadth and disease-aware ranking.
- Safety behavior: Safety audits showed that responses were either passed or revised by the ActiveSafetyFilter, with no unsafe medical advice emitted in the evaluated scenarios.
What this shows about me
- Comfortable building end-to-end agentic RAG systems for high-stakes domains that combine vector search, hybrid ranking, and multi-agent orchestration.
- Experienced working with large clinical registries, designing disease-specific filters, and engineering embeddings and vector-store schemas for retrieval at scale.
- Able to design offline evaluation frameworks with ranking metrics, confusion matrices, and error analyses to understand system behavior in detail.
- Focused on transparency and safety, exposing NCT IDs and PubMed evidence while enforcing domain-specific safety filters rather than generic content blocks.