AI vs. Experts: Who Detects Wound Complications More Effectively?

Can artificial intelligence compete with healthcare professionals in diagnosing wound-healing complications? A study published in July 2025 in the Journal of the American Medical Informatics Association provides nuanced answers—and a few surprises about what truly defines an “expert.”

The Challenge: Detecting Wound Maceration

Maceration – the swelling and softening of tissues caused by prolonged contact with fluids – is a common complication of chronic wounds. Its detection relies mainly on visual assessment, making it an ideal testing ground for evaluating AI’s capabilities in medical imaging.

German researchers presented 30 images of chronic wounds to two types of evaluators: 481 healthcare professionals (physicians and nurses) and an artificial intelligence model based on a convolutional neural network.

The Results: An AI Advantage—But Not an Overwhelming One

Unsurprisingly, the AI achieved higher scores:

Metric	Humans	AI
Accuracy	79,3 %	90 %
Sensitivity	76,4 %	93,3 %
Specificity	83 %	86,7 %

However, the difference is not statistically significant when the AI is compared to all participants as a whole. The gap becomes significant only when measured against the lowest-performing groups.

La vraie surprise : ce qui définit l’expertise

This is where the study becomes truly compelling. The researchers analyzed which factors influenced the accuracy of human diagnoses. Only two variables had a statistically significant impact:

What Matters:

Formal specialized qualifications (dermatologists or nurses certified in wound care)
Participants’ diagnostic self-confidence

What Doesn’t (or Barely) Matter:

Years of professional experience
Working specifically in the field of wound care
Age or gender

In other words, a newly certified nurse who is confident in their skills may outperform a colleague with 20 years of experience but without specialized training.

A Revealing Disagreement Among Professionals

The study also measured inter-rater agreement among the human evaluators. The result is concerning: overall agreement was only “fair” (Kappa = 0.391). This means that the same patient could receive different diagnoses depending on which healthcare professional they consult.

Professionals with specialized training and high self-confidence demonstrated much stronger agreement among themselves—suggesting that they rely on similar assessment criteria, likely acquired through formal training.

What Are the Implications for Clinical Practice?

The authors draw several key lessons from these findings:

AI as a safety net. Professionals without specialized training, with low self-confidence, or limited experience are those who would benefit most from an AI-based diagnostic support system. However, caution is essential: they are also the most likely to accept the machine’s recommendations uncritically—including its errors.

Rethinking the Concept of Expertise. Simply relying on professional title or years of experience to define an “expert” in human–AI comparative studies is insufficient. Specific training and psychological traits such as self-confidence play a decisive role.

AI as a Complement, Not a Replacement. Humans integrate contextual information – such as the patient’s overall condition, medical history, and clinical intuition – that AI cannot (yet) capture from a single image alone.

Limitations to Keep in Mind

The study has several important limitations:

The images were sourced from a single hospital center, which may limit the generalizability of the results.
The task was relatively simple (clear-cut cases of maceration versus no maceration).
Real-world working conditions (time pressure, fatigue) were not replicated.

In conclusion

This study encourages us to move beyond the simplistic question, “Is AI better than humans?” and instead focus on a more meaningful one: “How can AI support different profiles of healthcare professionals?”

The answer appears clear: algorithmic assistance would deliver the greatest added value to the least specialized professionals—provided that system reliability is ensured and users are trained to maintain a critical perspective on the machine’s recommendations.

Source : https://academic.oup.com/jamia/article/32/9/1425/8203516