Why I’m writing this in the first person
I read a lot of pieces about AI in healthcare applications. Most of them feel like they were assembled from press releases. They have a confident voice, a tidy three-trend structure, and almost no contact with what a clinic actually feels like at 4 p.m. on a winter Monday when half the rota has called in sick.
So I’ll just write what I’ve seen.
I’m a general practitioner with 12+ years of clinical experience. For the last year I’ve been building two clinical AI projects — SAFE-Triage (a triage and safety layer for clinical AI) and Hathor (vaccine-card and immunisation reconciliation). They’re not slideware. They are messy, partially shipped, real codebases that I argue with at night. That work — and the very specific failures inside it — is the lens I’m going to use here.
What “AI in healthcare applications” actually means right now
When people say AI in healthcare applications, they usually mean one of four very different things, and conflating them is most of why the conversation goes badly:
- Decision support inside the consultation. An AI that nudges a clinician at the point of care — differentials, drug interactions, missed red flags. The vast majority of approved products live here.
- Operational AI. Scheduling, prior-auth, claims, documentation, ambient scribes. This is the most quietly profitable category and the one I’d bet on if I were investing today.
- Imaging and signal AI. Radiology, pathology, ECG, retinal photos. The most regulated, the most validated, and the most over-claimed in marketing copy.
- Patient-facing AI. Symptom checkers, triage chatbots, post-discharge nudges. The most visible and the one that, in my view, carries the most concentrated harm risk if it’s done badly.
I’m going to keep coming back to these four, because almost every disagreement I’ve had about AI in healthcare disappears when both sides agree which bucket they’re talking about.
What I learned shipping SAFE-Triage
SAFE-Triage started as a fairly arrogant idea. I had read enough triage AI papers to think the field’s main problem was just sloppy safety scaffolding — that if you wrapped an LLM in the right guardrails, you could get something usefully better than the average call-handler. Two months in I knew I was wrong about two things.
First, “safety” is not a layer. It’s the architecture. The cases where AI in healthcare applications fail catastrophically are not the cases the developer imagined; they’re the cases nobody imagined because the patient phrases their complaint in a way that doesn’t map onto any of the model’s training data. The fix is not a better safety prompt. The fix is a system that can tell when it doesn’t know — and refuses to triage at all.
Second, the eval set is the whole product. Whoever owns the eval set owns the model. I now spend more time curating the case mix than fine-tuning. If you take one thing from this whole piece, take that: in any clinical AI you adopt, ask to see the eval set before you ask to see the accuracy number.
If a vendor shows you a benchmark number and won’t show you the eval cases, that benchmark number means nothing.
What I learned building Hathor
Hathor is, on paper, a boring problem: reconcile messy vaccine cards (paper, photo, EMR fragments, Arabic, English, French) into a single defensible immunisation record. In practice it’s the most technically and clinically interesting AI in healthcare application I’ve built, for one reason: the ground truth is contested.
A child may have three different vaccine records — one from the rural clinic, one from school, one from the parents’ memory. None of them is automatically wrong. The AI’s job isn’t to pick a winner; it’s to surface the conflict honestly so a human can decide. That is a very specific design constraint and it’s the one I’d argue most clinical AI products miss. They optimise for confidence; what clinicians actually need is calibrated humility.
This is where I think the next wave of useful AI in healthcare applications sits — in the gap between a confident wrong answer and a useful “I’m not sure, here’s why.”
Where I think the hype is wrong (and where it’s earned)
Wrong: Autonomous medical AI. Not in the regulatory sense — that’s a separate fight (I’ll write about the licensing question soon) — but in the practical one. We don’t have a good answer yet for who pays when an autonomous system gets it wrong, and until we do, “autonomous” is a marketing word, not a clinical one.
Wrong: AI replacing radiologists. Five years of this story, still wrong. What’s actually happening: radiologists with AI assistance are clearing worklists faster, missing fewer subtle findings, and now have a documentation problem instead of a reading problem.
Earned: Ambient documentation. This one is real. The category will consolidate fast in the next 18 months, and primary care will be the biggest beneficiary because primary care has the worst documentation burden of any specialty. (I have opinions on which scribes work and which don’t — that’s a whole separate post.)
Earned: Patient-side triage when bounded. A symptom checker that knows it can’t safely triage chest pain in a 50-year-old smoker — that flag it as “go to A&E now” without trying to be clever — is a net good. The bad versions are the ones that try to handle everything.
What I’d build next if I were starting over
If a younger version of me were asking what to work on in AI in healthcare applications in 2026, I’d say:
- Build the eval set first, the model second. Spend three months collecting 500 real cases with disagreement. That is your moat.
- Pick the narrowest possible niche. “AI in healthcare” is not a market. “Reconciling paediatric vaccine cards across two languages” is.
- Talk to the people who’ll have to use it daily. Not the CMO, not the procurement lead — the FY2 doctor who’ll click your button 80 times a shift.
- Be honest about what you don’t know. The calibrated-humility thing again. It is the single biggest underrated property in clinical AI right now.
What’s next on this site
I’m planning a follow-up on the eval-set problem — what mine looks like, how I collect cases with genuine clinical disagreement, and the specific reasons most public benchmarks for clinical AI are misleading. If you want it when it lands, the newsletter is the easiest way; otherwise I’ll be cross-posting the recognition updates from my About page as the Harvard HSIL hackathon, the Cerebral Valley × Anthropic Claude Code hackathon, and the Africa Health ExCon submission progress.
For the rest of the AI in healthcare applications archive on this site, the AI in Healthcare category is the canonical index.
About the author. Dr. Ahmed Zayed is a GP with 12+ years of clinical experience and the founder of ZayedMD. He builds SAFE-Triage and Hathor, publishes physician-led coverage of AI in healthcare, and writes in both English and Arabic. The full bio is on the About page.
Medical disclaimer. This article is editorial commentary, not medical advice. Nothing here should be used as a substitute for consultation with a qualified clinician. If you are unwell, see a doctor.
Licensed physician and clinical AI specialist. Founder and Editor-in-Chief of ZayedMD, a physician-led medical publication covering clinical AI, neurology, metabolic health, and evidence-based patient guidance.


