TY - JOUR
T1 - Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI
T2 - The CADDementia challenge
AU - Bron, Esther E.
AU - Smits, Marion
AU - van der Flier, Wiesje M.
AU - Vrenken, Hugo
AU - Barkhof, Frederik
AU - Scheltens, Philip
AU - Papma, Janne M.
AU - Steketee, Rebecca M.E.
AU - Méndez Orellana, Carolina
AU - Meijboom, Rozanna
AU - Pinto, Madalena
AU - Meireles, Joana R.
AU - Garrett, Carolina
AU - Bastos-Leite, António J.
AU - Abdulkadir, Ahmed
AU - Ronneberger, Olaf
AU - Amoroso, Nicola
AU - Bellotti, Roberto
AU - Cárdenas-Peña, David
AU - Álvarez-Meza, Andrés M.
AU - Dolph, Chester V.
AU - Iftekharuddin, Khan M.
AU - Eskildsen, Simon Fristed
AU - Coupé, Pierrick
AU - Fonov, Vladimir S.
AU - Franke, Katja
AU - Gaser, Christian
AU - Ledig, Christian
AU - Guerrero, Ricardo
AU - Tong, Tong
AU - Gray, Katherine R.
AU - Moradi, Elaheh
AU - Tohka, Jussi
AU - Routier, Alexandre
AU - Durrleman, Stanley
AU - Sarica, Alessia
AU - Di Fatta, Giuseppe
AU - Sensi, Francesco
AU - Chincarini, Andrea
AU - Smith, Garry M.
AU - Stoyanov, Zhivko V.
AU - Sørensen, Lauge Emil Borch Laurs
AU - Nielsen, Mads
AU - Tangaro, Sabina
AU - Inglese, Paolo
AU - Wachinger, Christian
AU - Reuter, Martin
AU - van Swieten, John C.
AU - Niessen, Wiro J.
AU - Klein, Stefan
PY - 2015/5/1
Y1 - 2015/5/1
N2 - Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimer's disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n. =. 30) and optionally on data from other sources (e.g., the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.
AB - Algorithms for computer-aided diagnosis of dementia based on structural MRI have demonstrated high performance in the literature, but are difficult to compare as different data sets and methodology were used for evaluation. In addition, it is unclear how the algorithms would perform on previously unseen data, and thus, how they would perform in clinical practice when there is no real opportunity to adapt the algorithm to the data at hand. To address these comparability, generalizability and clinical applicability issues, we organized a grand challenge that aimed to objectively compare algorithms based on a clinically representative multi-center data set. Using clinical practice as the starting point, the goal was to reproduce the clinical diagnosis. Therefore, we evaluated algorithms for multi-class classification of three diagnostic groups: patients with probable Alzheimer's disease, patients with mild cognitive impairment and healthy controls. The diagnosis based on clinical criteria was used as reference standard, as it was the best available reference despite its known limitations. For evaluation, a previously unseen test set was used consisting of 354 T1-weighted MRI scans with the diagnoses blinded. Fifteen research teams participated with a total of 29 algorithms. The algorithms were trained on a small training set (n. =. 30) and optionally on data from other sources (e.g., the Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of aging). The best performing algorithm yielded an accuracy of 63.0% and an area under the receiver-operating-characteristic curve (AUC) of 78.8%. In general, the best performances were achieved using feature extraction based on voxel-based morphometry or a combination of features that included volume, cortical thickness, shape and intensity. The challenge is open for new submissions via the web-based framework: http://caddementia.grand-challenge.org.
KW - Alzheimer's disease
KW - Challenge
KW - Classification
KW - Computer-aided diagnosis
KW - Mild cognitive impairment
KW - Structural MRI
U2 - 10.1016/j.neuroimage.2015.01.048
DO - 10.1016/j.neuroimage.2015.01.048
M3 - Journal article
C2 - 25652394
SN - 1053-8119
VL - 111
SP - 562
EP - 579
JO - NeuroImage
JF - NeuroImage
ER -