Freely Available at Chat Bots for the Diagnosis of Inflammatory Dermatological Diseases: Focusing on Sensitivity and Specificity Background: Inflammatory non-infectious dermatological diseases, such as acne, psoriasis, rosacea, and hidradenitis suppurativa, are widespread and frequently associated with psychosocial burden. Early recognition and adequate treatment are crucial to prevent chronicity and complications. In recent years, freely available artificial intelligence (AI) chatbots have been increasingly used for health-related queries, raising the question of their diagnostic reliability. Methods: Patient-reported descriptions of cutaneous lesions with dermatologist-confirmed diagnoses were collected from 27 patients and submitted to three freely available AI chatbots (ChatGPT, Copilot, Gemini). For the analysis, only three conditions—rosacea, acne, and psoriasis—were included, as the other diseases in the dataset had too few cases for meaningful evaluation. Diagnostic performance was evaluated against the gold standard. Sensitivity, specificity, and accuracy were calculated for the first suggested diagnosis and for all differential diagnoses. Secondary outcomes included referral recommendations (immediate, conditional, none) and the use of sources. Results: For first diagnoses, sensitivity was highly variable, ranging from very low (psoriasis 0.50–0.57) to perfect values in some cases (acne 1.00), while specificity was generally high (0.85–1.00). When all diagnoses were considered, sensitivity became consistently high (0.64–1.00), but specificity fluctuated widely, from very low (0.23 in psoriasis with Gemini) to higher values (up to 0.92). Consequently, overall accuracy remained in the low-to-medium range, with better performance in acne and rosacea, and poorer results in psoriasis. Referral recommendations differed across platforms, with Gemini tending toward immediate referral, ChatGPT showing more balanced patterns, and Copilot in between. Only Copilot occasionally cited sources, while the others did not. Conclusion: Conversational AI is not yet reliable as a stand-alone diagnostic tool in dermatology. However, its high sensitivity in the all-diagnoses setting suggests potential as a supportive instrument for screening and patient education. Larger and more diverse datasets, clinical validation, and multimodal approaches combining text with images will be essential to strengthen its role as a complementary aid to dermatological care.
Freely Available at Chat Bots for the Diagnosis of Inflammatory Dermatological Diseases: Focusing on Sensitivity and Specificity
HOCO, ALESIA
2024/2025
Abstract
Freely Available at Chat Bots for the Diagnosis of Inflammatory Dermatological Diseases: Focusing on Sensitivity and Specificity Background: Inflammatory non-infectious dermatological diseases, such as acne, psoriasis, rosacea, and hidradenitis suppurativa, are widespread and frequently associated with psychosocial burden. Early recognition and adequate treatment are crucial to prevent chronicity and complications. In recent years, freely available artificial intelligence (AI) chatbots have been increasingly used for health-related queries, raising the question of their diagnostic reliability. Methods: Patient-reported descriptions of cutaneous lesions with dermatologist-confirmed diagnoses were collected from 27 patients and submitted to three freely available AI chatbots (ChatGPT, Copilot, Gemini). For the analysis, only three conditions—rosacea, acne, and psoriasis—were included, as the other diseases in the dataset had too few cases for meaningful evaluation. Diagnostic performance was evaluated against the gold standard. Sensitivity, specificity, and accuracy were calculated for the first suggested diagnosis and for all differential diagnoses. Secondary outcomes included referral recommendations (immediate, conditional, none) and the use of sources. Results: For first diagnoses, sensitivity was highly variable, ranging from very low (psoriasis 0.50–0.57) to perfect values in some cases (acne 1.00), while specificity was generally high (0.85–1.00). When all diagnoses were considered, sensitivity became consistently high (0.64–1.00), but specificity fluctuated widely, from very low (0.23 in psoriasis with Gemini) to higher values (up to 0.92). Consequently, overall accuracy remained in the low-to-medium range, with better performance in acne and rosacea, and poorer results in psoriasis. Referral recommendations differed across platforms, with Gemini tending toward immediate referral, ChatGPT showing more balanced patterns, and Copilot in between. Only Copilot occasionally cited sources, while the others did not. Conclusion: Conversational AI is not yet reliable as a stand-alone diagnostic tool in dermatology. However, its high sensitivity in the all-diagnoses setting suggests potential as a supportive instrument for screening and patient education. Larger and more diverse datasets, clinical validation, and multimodal approaches combining text with images will be essential to strengthen its role as a complementary aid to dermatological care.| File | Dimensione | Formato | |
|---|---|---|---|
|
Hoco.Alesia .pdf
Accesso riservato
Dimensione
921.82 kB
Formato
Adobe PDF
|
921.82 kB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14251/3743