Advantage of grading classification using volumetric artificial intelligence for periventricular hyperintensity and deep subcortical white matter hyperintensity.
Authors
Affiliations (8)
Affiliations (8)
- Department of Neurosurgery, Graduate School of Biomedical and Health Sciences, Hiroshima University, 1-2-3 Kasumi, Minami-ku, Hiroshima, Hiroshima, 734-8551, Japan.
- Department of Neurosurgery, Graduate School of Biomedical and Health Sciences, Hiroshima University, 1-2-3 Kasumi, Minami-ku, Hiroshima, Hiroshima, 734-8551, Japan. [email protected].
- Department of Neurosurgery, Shimane Prefectural Central Hospital, 4-1-1 Himebara, Izumo, Shimane, 693- 8555, Japan. [email protected].
- LPIXEL Inc, 1-6-1 Otemachi, Chiyoda-ku, Tokyo, 100-0004, Japan.
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
- Department of Neurosurgery, Nakamura Memorial Hospital, South-1, West-14, Chuo-ku, Sapporo, 060-8570, Hokkaido, Japan.
- Department of Neurosurgery, Shin-yurigaoka General Hospital, 255, Furusawa-Miyako, Kawasaki Asao-ku, Kanagawa, 215-0026, Japan.
- Department of Neurology, Hokuto Hospital, 7-5 Kisen, Inada-cho, Obihiro, 080- 0833, Hokkaido, Japan.
Abstract
We developed and validated an artificial intelligence (AI) algorithm for the automated grading of periventricular hyperintensity (PVH) and deep subcortical white matter hyperintensity (DWMH) using magnetic resonance imaging. Overall, 246 patients were evaluated, with 137 and 109 allocated to the training and testing groups, respectively. AI-predicted grading according to the Fazekas scale was compared with expert assessments using accuracy, F1-score, and mean absolute error. Inter-rater agreement was evaluated using Fleiss' kappa to assess consistency among human raters and Cohen's kappa to measure agreement between the AI and individual human raters. The AI demonstrated superior multi-class accuracy in PVH classification compared with the human expert, achieving an accuracy of 0.798 versus 0.743. In DWMH classification, the AI outperformed the expert specifically in distinguishing Fazekas 0/1/2 from the 3 classification, achieving an accuracy of 0.954 compared with the expert's 0.927. Inter-rater agreement analysis showed that for PVH and DWMH, the AI achieved "good agreement" with human raters. For PVH, the AI's agreement exceeded the human inter-rater agreement. The developed AI also exhibited lower variability in volume ratio distribution within the same grade compared with human raters. The developed AI algorithm effectively distinguished between PVH and DWMH, achieving accuracy comparable to human performance.