SCT-Bench

Evaluating AI Clinical Reasoning Using Script Concordance Testing

GitHub 📄 Paper

Competition Now Open!

Submit your model to the 2025 SCT-Bench Competition by May 2025. Winners will be featured on our benchmark website.

What is SCT-Bench?

SCT-Bench is a benchmark for evaluating clinical reasoning in large language models (LLMs) using Script Concordance Tests (SCTs). SCTs are scored by comparing LLM responses to a clinical challenge with those of a panel of expert clinicians. Points are awarded to the model based on the proportion of experts who chose that answer. We evaluate the performance of various state-of-the-art LLMs compared to physicians and students on a novel set of SCT questions.

Why This Competition?

Script Concordance Testing is a validated medical assessment tool designed to evaluate clinical reasoning under uncertainty. Unlike traditional multiple-choice questions, SCTs measure how new information alters diagnostic and treatment hypotheses—a critical aspect of real-world clinical decision-making.

2025 Competition Leaderboard

Will your model perform better than the current leaders on the private test set?

Rank Model/Group Overall Score (%)
🚀 Submit your model

Submission deadline: May 2025.

Top LLMs vs. Physicians and Students

This leaderboard will be continually updated to include the four highest-performing models measured on the complete private test set. Note: Some values are missing (-) as they were not recorded at the original SCT testing sites.

About SCT-Bench

Dataset Overview

SCT-Bench comprises 750 SCT questions drawn from diverse international datasets, including the Open Medical SCT and Adelaide SCT datasets. Each question is designed to evaluate clinical reasoning under uncertainty.

Example SCT Question with Expert Scoring

Clinical Stem If you were thinking of: And then you find: Category This diagnosis becomes:
Much less likely
(-2)
Slightly less likely
(-1)
Neither more nor less
(0)
Slightly more likely
(1)
Much more likely
(2)
A 27-year-old male presents to the doctor with weakness affecting his right arm. He has a manually repetitive job and also suffered a shoulder dislocation while playing sport 1 week ago. Carpal tunnel syndrome He also complains of "shooting" pain in his neck Expert opinions 10 7 0 0 0
Score values 1.0 0.7 0 0 0

Each SCT question presents a clinical scenario, a diagnostic/treatment hypothesis, and new information. Expert clinicians rate how this new information affects the likelihood of the hypothesis. Their responses are aggregated into weighted scores, where the most common expert response receives a score of 1.0.