Do AI tools make clinicians better, or do they atrophy the very skills that keep patients safe? The history of medicine contains many episodes of “redistribution” of skill. Chest percussion is forgotten after the invention of X-ray – respiratory medicine hasn’t collapsed; ultrasound has displaced palpation. Each time, the practice reorganized and, overall, outcomes improved. So perhaps radiologists reading fewer raw CTs or endoscopists leaning on real-time detectors – is simply progress. Yet AI is definitely different from any instruments we’ve ever seen. It is a decision partner that changes our attention and the very way of learning, causing capability drift.
Is deskilling real?
Most trials of AI in medicine have asked whether the tool increases a performance metric when it is switched on. In colonoscopy, for instance, computer-aided detection (CADe) reliably increases adenoma detection rates (ADRs) – the main indicator of quality performance, across dozens of randomized trials, with meta-analyses reporting higher ADRs and lower miss rates versus standard colonoscopy. The most recent data syntheses across more than 40 trials show consistent gains, even as effect sizes vary with baseline quality and context.
But what happens to human performance when the tool is off after a period of reliance? This was addressed in a multicentre observational study from four Polish endoscopy centers. Investigators compared AI-off colonoscopies performed in two windows: the 3 months before AI was introduced (no prior AI exposure) and the 3 months after AI became routine. In these AI-off procedures by experienced endoscopists, ADR fell from 28.4% pre-implementation to 22.4% post-implementation.
There were definite weaknesses of the study – design was observational, unblinded, and entangled with service reorganizations. So this is not a final verdict on colonoscopy, let alone on AI. Still, it is, to our knowledge, the first study to suggest that continuous AI exposure may degrade a patient-relevant endpoint when the tool is absent.
European guidance had already warned of this risk; eye-tracking experiments have documented narrower visual sweeps with CADe on, and analogous work in mammography showed degraded clinician detection when AI support was expected. The Polish data connect those mechanistic concerns to a clinical quality indicator.
Across domains, a similar pattern appears: when people and algorithms are paired without deliberate human-factors design, the combined system can be less effective than either component individually. Adam Rodman, MD, MPH (Associate Editor of the New England Journal of Medicine) summarized results from his own trials and broader literature: giving a human an AI system doesn’t inherently improve the human’s performance – and often improves it less than letting the AI run by itself. The effect has been observed in diagnostic and management reasoning, breast imaging, and chest X-ray interpretation, among others.

Automation bias and “verification complacency” shift attention, and, over time, muscle memory and cognitive stamina for exhaustive search may erode. Newcomers are especially vulnerable to so-called “decoupled learning”: acquiring the task with AI from day one, without ever internalizing the unaided skill. Radiology educators have begun to measure both risks and remedies. A 2025 pilot with residents working alongside an explainable chest-X-ray severity model found accuracy and inter-rater agreement improved when support was used judiciously; importantly, trainees showed resilience when the model erred – evidence that teaching “how to work with AI” can counter pure dependence.
Put together, the evidence base supports two claims:
1) AI can raise task metrics when on;
2) Some configurations of exposure can lower human performance when off, especially if the workflow encourages over-reliance.
Human-in-the-loop, or human-on-the-loop?
“Humans in the loop” has become a reflexive mantra. Rodman challenged that premise: embedding clinicians at every micro-decision can deskill, introduce latency, and limit scalability; by contrast, human-on-the-loop designs delegate narrow tasks to automation while keeping humans in supervisory, escalation, and boundary-setting roles. In this framing, the human role shifts toward exception-handling, goal-setting, and system oversight – the places where human cognition adds the most value.
When researchers vary human-AI workflows, sequence and interface design prove decisive. Clinical imaging studies show that our performance and vulnerability to automation bias depends on factors like “who reviews first,” how the model justifies its output, and when the suggestion is revealed. Explanations that guide attention rather than merely offer post-hoc rationales tend to strengthen collaboration. By contrast, opaque saliency heatmaps often narrow vigilance and hinder joint performance.
Why keep a human at all?
A fair provocation follows: if standalone AI sometimes outperforms the human-AI combination, why insist on human involvement? Several answers arise from evidence and governance.
Distribution shift and brittleness. AI components are trained on the past; clinical practice lives in the moving present. Seemingly trivial context changes – lighting, patient mix, acquisition protocols – can invert model behavior. The human provides real-time model monitoring, triage, and fail-safe intervention. Deskilled humans cannot play that role.
Relational care. Stakeholders across clinical, regulatory, and patient communities point to “care work” (communication, consent, empathy, culture) as integral to quality, even where narrow perception tasks can be automated. The middle-ground view emerging from qualitative studies is to automate some tasks, and preserve distinctly human functions that hinge on context, meaning, and trust.
Liability and accountability. Today, moral and legal responsibility sits largely with individual clinicians. As Rodman put it, the mechanisms to shift or share liability is absent, both providers and companies face distorted incentives, cooling investment and pushing risk to the bedside. Emerging discussions of LLM accountability likewise find that clinicians are currently treated as final decision-makers, even when relying on AI recommendations. Until regulatory and insurance frameworks mature, humans remain the accountable “last mile.”
What actually works to address deskilling?
Evidence points away from slogans and toward workflow and training design. Several practical levers recur across studies and early guidelines:
Structured “AI-off” practice to maintain baseline skill. Periodic sessions, simulations, and credentialing that require unaided performance preserve search patterns and stamina. Endoscopy teams and academic centers are piloting simulation blocks and deliberate “AI-off” rotations. In colonoscopy, prospective crossover trials and behavioral studies to refine such guardrails are explicitly called for.
Cognitive forcing strategies and explainable interfaces. Training clinicians to question model outputs on pre-specified error-prone cases (e.g. flat serrated lesions) and interfaces that guide attention (rather than merely overlaying boxes) reduce over-reliance.
Human-on-the-loop role clarity. Let automation handle the repeatable, high-volume micro-tasks (e.g., pre-screening images, flagging candidates). Keep clinicians responsible for escalation decisions, informed consent, and patient communication. Experiment with the hand-off order –AI-first vs human-first
Quality metrics that watch the human. In endoscopy, tracking unaided ADR, not solely AI-on performance, will reveal capability drift. More generally, maintain “dual KPIs”: the system-level metric and a human-capability sentinel that is periodically measured without assistance.
Education that treats AI as a co-resident, not an oracle. Early evidence with radiology residents shows that pairing training with transparent feedback can both raise agreement and build resilience to model failure – an antidote to “never-skilling.” Curricula should include failure mode libraries and calibration exercises.
Risk management beyond the department. Contracts and system governance should explicitly allocate liability across vendor, hospital, and clinician; audit trails should capture who saw what and when; and reimbursement models must recognize supervision time and exception handling. As long as liability sits squarely with individuals, adoption will either be timid or brittle.
So are we too apprehensive, or is this time different?
This time, deskilling is different. Unlike earlier technologies that sharpened vision, deepened understanding, and ultimately built new, often more advanced skills, AI invites the offloading of cognition and judgment in exchange for speed and apparent precision. The offer is tempting on many levels, yet it leaves an open question: what comes next, and which roles remain for humans? “Human-in-the-loop” frameworks assign oversight, but this works only when the overseer can still perform the task unaided. Taking into consideration that, in many domains, AI learns faster than people, this demands ever more creativity to keep practitioners at parity with the trainee.
We are not at the point of unleashing systems without restraint for many reasons, from the unwillingness to lose the driver’s seat to the fact that AI is still too unreliable. So designs need guardrails that preserve human capability through continuous training, even at the cost of slower visible progress.

