Dermatologists significantly improved their diagnostic accuracy for melanocytic lesions when they added a form of artificial intelligence (AI) to clinical decision making, a multicenter study showed.
Sensitivity for distinguishing nevi and melanomas increased from 84% without AI to 100% with it, and specificity improved from 72.1% to 83.7%. Mean accuracy and mean area under the receiver operating characteristic curve (AUC-ROC) also increased significantly when clinical acumen and AI were combined.
For most lesions, dermatologists found support from a (CNN) reassuring and/or helpful, reported Holger A. Haenssle, MD, of the University of Heidelberg in Germany, and co-authors in .
While patients were open-minded about CNN support, they rejected the idea of replacing clinician judgment with neural networks.
"To our knowledge, there have been no data showing to what extent dermatologists would apply CNN recommendations and revise their original decisions in a prospective clinical situation to date," the authors wrote of their findings. "Interestingly, within this study, all previously mentioned main outcome measures significantly improved after dermatologists gained access to CNN results. The results of this prospective study largely confirm data of retrospective studies using lesion images instead of live examinations."
"These results indicate that a broader application of this human-with-machine approach, particularly in nonspecialized institutions, could be beneficial to clinicians and patients," they added.
The clinician-and-machine approach has potentially broad application in dermatology.
"There are plenty of other instances where 'diagnostic AI systems' could be applied in dermatology," Haenssle told 鶹ý via email. "Every week there is a new publication [with proof-of-principle evidence for different indications] -- differentiating harmless drug-rash from TEN [toxic epidermal necrolysis], identifying onychomycosis from images of toenails and so on."
The successful diagnosis of melanocytic lesions provided the impetus for a new investigation "covering the full spectrum of the most relevant benign/malignant, melanocytic/nonmelanocytic, and pigmented/non-pigmented skin lesions," Haenssle added.
Previous studies of CNN in dermatology were retrospective and evaluated the technology's diagnostic performance for images of lesions with verified diagnosis. Numerous studies have shown high-level performance but also important limitations of CNN technology, notably an increased number of false diagnoses from images that included artifacts, such as scale bars and skin markings, as well as for rare lesions in mucosal or subungual sites, the authors noted.
Additionally, several retrospective studies of human-machine collaborations compared the diagnostic performance of dermatologists with or without the availability of CNN classification results. Prospective and retrospective studies have notable differences. In a prospective study, patients are interviewed and examined, and clinical decisions have a direct association with patient well-being, whereas retrospective studies avoid the consequences of missing malignant lesions, Haenssle and co-authors said.
To address some of the limitations of prior studies, the investigators undertook a prospective study of 22 dermatologists in a mix of academic and community settings. Dermatologists performed skin cancer screenings using direct examination and dermoscopy. They graded the malignancy risk of melanocytic lesions (range 0-1, malignancy threshold ≥0.5) and suggested management decisions (no action, follow-up, or excision).
Suspect lesions were evaluated by a market-approved CNN (Moleanalyzer Pro) and graded for malignancy risk, using the same range and threshold as the dermatologists. The scores were shared with dermatologists with a request to re-evaluate the lesions and reconsider initial management decisions.
Reference diagnoses were based on histopathologic results for excised lesions or clinical follow-up and expert consensus for non-excised lesions. The primary outcomes were dermatologists' diagnostic sensitivity and specificity with and without the aid of CNN.
The dermatologists detected 228 suspect lesions (190 nevi and 38 melanomas) in 188 patients. The clinicians' diagnostic sensitivity and specificity improved significantly with the aid of CNN (P=0.03 and P<0.001, respectively). Mean accuracy increased from 74.1% to 86.4% (P<0.001) and mean AUC-ROC from 0.895 to 0.968 (P=0.005).
CNN alone achieved diagnostic sensitivity similar to that of dermatologists, higher specificity, and higher diagnostic accuracy. Additionally, the number of unnecessary excisions of nevi decreased significantly by 19.2% (104/190 to 84/190, P<0.001).
Dermatologists with 2 to 5 years of experience examined 96 (42.1%) lesions, while less-experienced dermatologists examined 78 (34.2%), and dermatologists with more than 5 years of experience examined 54 (23.7%). Dermatologists with less dermoscopy experience had the greatest improvement in diagnostic accuracy with the collaboration of CNN (70.5% to 87.2%, P<0.01).
Diagnostic accuracy also improved significantly for those with 2 to 5 years of experience (77.1% to 91.7%, P=0.01). More experienced dermatologists had a small, nonsignificant improvement in accuracy with the aid of CNN (74.1% to 75.9%).
Disclosures
Haenssle reported personal fees from SciBase, FotoFinder Systems, HEINE Optotechnik, and Magnosco.
Co-authors reported relationships with FotoFinder Systems, Amgen, Bristol Myers Squibb, MSD, Philochem, Roche, and HEINE Optotechnik.
Primary Source
JAMA Dermatology
Winkler JK, et al "Assessment of diagnostic performance of dermatologists cooperating with a convolutional neural network in a prospective clinical study: Human with machine" JAMA Dermatol 2023; DOI: 10.1001/jamadermatol.2023.0905.