Case Mysteries, LLM Mysticism

Journal-to-journal transmission is now occurring as the paper planting continues

Simple puzzlers like those found in Encyclopedia Brown can be fun. They’re brief, have winsome characters, and add small amounts of peril from which the protagonist always emerges unscathed while the antagonist gets her or his comeuppance. They’re predictable in structure and outcome. There’s always a solution, always an answer.

There is a related genre in medical journals called the “case mystery.” These are usually actual cases recast into constrained editorial conceits in order to couch a mystery for readers to solve. They’re fun editorial features if done right. I’ve edited, written, ghostwritten, and reviewed these kinds of things numerous times over the years, while also interviewing readers and editors about their value, quality, and relevance.

As anyone who has written test questions or padded out a case narrative to fill space knows, writing distractors and inventing filler can be the toughest part. It’s especially tough because harried readers are apt to solve the structure, skip the BS, and jump to the solution, returning to enjoy the conceit and drama only when time allows, if at all.

A recent study in Science purports to demonstrate that LLMs can perform as well as or better than physicians in evaluating various written mystery cases through triage and differential diagnoses.

  • The authors pretentiously frame their work as pursuing a logical approach posed in 1959 and conveniently published in Science.
    • This pre-1960 paper was published before a lot of computer and real science occurred. The Salk vaccine was brand new. It was a year after AAFP started requiring CME, 22 years before the founding of the ACCME itself, years before citation systems were implemented, decades before online search tools, and decades before clinical evidence systems.

All this new study might actually show is that written medical mystery cases (a burlesque of actual medicine) work for inference engines (a burlesque of actual thinking) because so many little clues are available to the “attentive” reader — and perhaps that inference engines know to skip to the end and dispense with the conceit. As Eric Topol writes, critiquing this paper and those like it:

Most of the many publications [promoting AI decision support] use case studies, simulations, and actors as patients. Hardly representative of the messy world of the practice of medicine.

A Paper With a Purpose and a Plan

We have to begin with a pesky question: Why is a study of medical decisions in Science? Was it easier to shop it to a general science journal because actual physician editors would be harder to get it by? Was it easier to sneak in because it referred to another Science paper?

    • Encyclopedia senses something amiss . . .

The study first appeared on arXiv in December 2024. Its v3 was posted on arXiv the same day the paper was submitted to Science in June 2025.

This post is for paying subscribers only

Already have an account? Sign in.

Subscribe to The Geyser

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe