From 60% to 84%: Building an AI Agent for Public Health Data
The fixes that actually worked weren't about prompts. We built SaludAI , an open-source AI agent that takes clinical questions in natural language and queries a FHIR R4 server. Think: "How many patients with type 2 diabetes over 60 are in Buenos Aires?" — the agent resolves the terminology, builds the FHIR queries, follows references across resource types, and returns a traceable answer. No LangChain. The agent loop is ~300 lines of Python, every step logged to Langfuse . When something breaks, we can read exactly why. We benchmarked it: 100 questions, 200 synthetic Argentine patients, 10 FHIR resource types, 4 terminology systems. Inspired by FHIR-AgentBench (Verily/KAIST/MIT) — but on synthetic data, so the scores aren't directly comparable to their clinical dataset. Here's how accuracy evolved, and what each fix taught us. 60% → 82%: The agent wasn't seeing the data Our FHIR client returned only the first page. With 687 immunizations and 437 encounters across 200 patients, most coun
Continue reading on Dev.to Python
Opens in a new tab




