TY - JOUR
T1 - Performance of five research-domain automated WM lesion segmentation methods in a multi-center MS study
AU - de Sitter, Alexandra
AU - Steenwijk, Martijn D
AU - Ruet, Aurélie
AU - Versteeg, Adriaan
AU - Liu, Yaou
AU - van Schijndel, Ronald A
AU - Pouwels, Petra J W
AU - Kilsdonk, Iris D
AU - Cover, Keith S
AU - van Dijk, Bob W
AU - Ropele, Stefan
AU - Rocca, Maria A
AU - Yiannakas, Marios
AU - Wattjes, Mike P
AU - Damangir, Soheil
AU - Frisoni, Giovanni B
AU - Sastre-Garriga, Jaume
AU - Rovira, Alex
AU - Enzinger, Christian
AU - Filippi, Massimo
AU - Frederiksen, Jette
AU - Ciccarelli, Olga
AU - Kappos, Ludwig
AU - Barkhof, Frederik
AU - Vrenken, Hugo
AU - MAGNIMS study group and for neuGRID
N1 - Copyright © 2017 Elsevier Inc. All rights reserved.
PY - 2017/12
Y1 - 2017/12
N2 - BACKGROUND AND PURPOSE: In vivoidentification of white matter lesions plays a key-role in evaluation of patients with multiple sclerosis (MS). Automated lesion segmentation methods have been developed to substitute manual outlining, but evidence of their performance in multi-center investigations is lacking. In this work, five research-domain automated segmentation methods were evaluated using a multi-center MS dataset.METHODS: 70 MS patients (median EDSS of 2.0 [range 0.0-6.5]) were included from a six-center dataset of the MAGNIMS Study Group (www.magnims.eu) which included 2D FLAIR and 3D T1 images with manual lesion segmentation as a reference. Automated lesion segmentations were produced using five algorithms: Cascade; Lesion Segmentation Toolbox (LST) with both the Lesion growth algorithm (LGA) and the Lesion prediction algorithm (LPA); Lesion-Topology preserving Anatomical Segmentation (Lesion-TOADS); and k-Nearest Neighbor with Tissue Type Priors (kNN-TTP). Main software parameters were optimized using a training set (N = 18), and formal testing was performed on the remaining patients (N = 52). To evaluate volumetric agreement with the reference segmentations, intraclass correlation coefficient (ICC) as well as mean difference in lesion volumes between the automated and reference segmentations were calculated. The Similarity Index (SI), False Positive (FP) volumes and False Negative (FN) volumes were used to examine spatial agreement. All analyses were repeated using a leave-one-center-out design to exclude the center of interest from the training phase to evaluate the performance of the method on 'unseen' center.RESULTS: Compared to the reference mean lesion volume (4.85 ± 7.29 mL), the methods displayed a mean difference of 1.60 ± 4.83 (Cascade), 2.31 ± 7.66 (LGA), 0.44 ± 4.68 (LPA), 1.76 ± 4.17 (Lesion-TOADS) and -1.39 ± 4.10 mL (kNN-TTP). The ICCs were 0.755, 0.713, 0.851, 0.806 and 0.723, respectively. Spatial agreement with reference segmentations was higher for LPA (SI = 0.37 ± 0.23), Lesion-TOADS (SI = 0.35 ± 0.18) and kNN-TTP (SI = 0.44 ± 0.14) than for Cascade (SI = 0.26 ± 0.17) or LGA (SI = 0.31 ± 0.23). All methods showed highly similar results when used on data from a center not used in software parameter optimization.CONCLUSION: The performance of the methods in this multi-center MS dataset was moderate, but appeared to be robust even with new datasets from centers not included in training the automated methods.
AB - BACKGROUND AND PURPOSE: In vivoidentification of white matter lesions plays a key-role in evaluation of patients with multiple sclerosis (MS). Automated lesion segmentation methods have been developed to substitute manual outlining, but evidence of their performance in multi-center investigations is lacking. In this work, five research-domain automated segmentation methods were evaluated using a multi-center MS dataset.METHODS: 70 MS patients (median EDSS of 2.0 [range 0.0-6.5]) were included from a six-center dataset of the MAGNIMS Study Group (www.magnims.eu) which included 2D FLAIR and 3D T1 images with manual lesion segmentation as a reference. Automated lesion segmentations were produced using five algorithms: Cascade; Lesion Segmentation Toolbox (LST) with both the Lesion growth algorithm (LGA) and the Lesion prediction algorithm (LPA); Lesion-Topology preserving Anatomical Segmentation (Lesion-TOADS); and k-Nearest Neighbor with Tissue Type Priors (kNN-TTP). Main software parameters were optimized using a training set (N = 18), and formal testing was performed on the remaining patients (N = 52). To evaluate volumetric agreement with the reference segmentations, intraclass correlation coefficient (ICC) as well as mean difference in lesion volumes between the automated and reference segmentations were calculated. The Similarity Index (SI), False Positive (FP) volumes and False Negative (FN) volumes were used to examine spatial agreement. All analyses were repeated using a leave-one-center-out design to exclude the center of interest from the training phase to evaluate the performance of the method on 'unseen' center.RESULTS: Compared to the reference mean lesion volume (4.85 ± 7.29 mL), the methods displayed a mean difference of 1.60 ± 4.83 (Cascade), 2.31 ± 7.66 (LGA), 0.44 ± 4.68 (LPA), 1.76 ± 4.17 (Lesion-TOADS) and -1.39 ± 4.10 mL (kNN-TTP). The ICCs were 0.755, 0.713, 0.851, 0.806 and 0.723, respectively. Spatial agreement with reference segmentations was higher for LPA (SI = 0.37 ± 0.23), Lesion-TOADS (SI = 0.35 ± 0.18) and kNN-TTP (SI = 0.44 ± 0.14) than for Cascade (SI = 0.26 ± 0.17) or LGA (SI = 0.31 ± 0.23). All methods showed highly similar results when used on data from a center not used in software parameter optimization.CONCLUSION: The performance of the methods in this multi-center MS dataset was moderate, but appeared to be robust even with new datasets from centers not included in training the automated methods.
U2 - 10.1016/j.neuroimage.2017.09.011
DO - 10.1016/j.neuroimage.2017.09.011
M3 - Journal article
C2 - 28899746
SN - 1053-8119
VL - 163
SP - 106
EP - 114
JO - NeuroImage
JF - NeuroImage
ER -