Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment.

Cédric Lukas; Jürgen Braun; Désirée van der Heijde; Kay-Geert A Hermann; Martin Rudwaleit; Mikkel Østergaard; Ans Oostveen; Phil O'Connor; Walter P Maksymowych; Robert G W Lambert; Anne Grethe Jurik; Xenofon Baraliakos; Robert Landewé; NN NN

Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment.

Cédric Lukas, Jürgen Braun, Désirée van der Heijde, Kay-Geert A Hermann, Martin Rudwaleit, Mikkel Østergaard, Ans Oostveen, Phil O'Connor, Walter P Maksymowych, Robert G W Lambert, Anne Grethe Jurik, Xenofon Baraliakos, Robert Landewé, NN NN

105 Citationer (Scopus)

Abstract

OBJECTIVE: Magnetic resonance imaging (MRI) of the spine is increasingly important in the assessment of inflammatory activity in clinical trials with patients with ankylosing spondylitis (AS). We investigated feasibility, inter-reader reliability, sensitivity to change, and discriminatory ability of 3 different scoring methods for MRI activity and change in activity of the spine in patients with AS. METHODS: Thirty sets of spinal MRI at baseline and after 24 weeks of followup, derived from a randomized clinical trial comparing a tumor necrosis factor (TNF)-blocking drug (n = 20) with placebo (n = 10) and selected to cover a wide range of activity at baseline and change in activity, were presented electronically in a partial latin-square design to 9 experienced readers from different countries (Europe, Canada). Readers scored each set of MRI 3 times, using 3 different methods including the Ankylosing Spondylitis spine Magnetic Resonance Imaging-activity [ASspiMRI-a, grading activity (0-6) per vertebral unit in 23 units]; the Berlin modification of the ASspiMRI-a; and the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system, which scores the 6 vertebral units considered by the reader as the most abnormal, with additional scores for "depth" and "intensity." Both the order of the methods used by each reader and the timepoints (before/after treatment) were randomized. Feasibility of each scoring system was evaluated by measuring the mean time needed to score each set of MRI, and inter-reader reliability was evaluated by smallest detectable change (SDC) and by intraclass correlation coefficients (ICC) for all readers together and for all possible reader pairs separately. Sensitivity to change was investigated by calculating Guyatt's effect size on change scores. Discriminatory ability was assessed using Z-scores (Mann-Whitney test) comparing change in score between patients treated with TNF-blocking drug and placebo. RESULTS: The mean time to score one set of MRI was shortest for the Berlin method. SDC was lowest for the Berlin method and highest for SPARCC. Overall inter-reader ICC per method were between 0.49 and 0.77 for scoring activity status, and between 0.46 and 0.72 for scoring activity change. ICC for all possible reader pairs showed much more fluctuation per method, with lowest observed values of about 0.05 (very low agreement) and highest observed values over 0.90 (excellent agreement). In general, ICC for SPARCC were consistently higher than for other systems. Sensitivity to change differed per reader, and was more consistent with SPARCC than with the other methods, but was in general excellent for all 3 methods. Discrimination between groups (TNF-blocker vs placebo) assessed by Z-scores was good and comparable among methods. CONCLUSION: This experiment demonstrates the feasibility of multiple-reader MRI scoring exercises for method comparison, provides evidence for the feasibility, reliability, sensitivity to change, and discriminatory capacity of all 3 tested scoring systems to be used in assessing spinal activity on MRI in patients with AS in clinical trials. On the basis of these results it is not possible to prioritize one of the 3 methods.
Udgivelsesdato: 2007-Apr

Originalsprog	Engelsk
Tidsskrift	Journal of Rheumatology
Vol/bind	34
Udgave nummer	4
Sider (fra-til)	862-70
Antal sider	8
ISSN	0315-162X
Status	Udgivet - 2007

Adgang til dokumentet

http://www.jrheum.com/subscribers/07/04/862.html

Citationsformater

Lukas, C., Braun, J., van der Heijde, D., Hermann, K-G. A., Rudwaleit, M., Østergaard, M., Oostveen, A., O'Connor, P., Maksymowych, W. P., Lambert, R. G. W., Jurik, A. G., Baraliakos, X., Landewé, R., & NN, NN. (2007). Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment. Journal of Rheumatology, 34(4), 862-70. http://www.jrheum.com/subscribers/07/04/862.html

Lukas, C, Braun, J, van der Heijde, D, Hermann, K-GA, Rudwaleit, M, Østergaard, M, Oostveen, A, O'Connor, P, Maksymowych, WP, Lambert, RGW, Jurik, AG, Baraliakos, X, Landewé, R & NN, NN 2007, 'Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment.', Journal of Rheumatology, bind 34, nr. 4, s. 862-70. <http://www.jrheum.com/subscribers/07/04/862.html>

@article{b9feff79f1bd4ee885e70c5f554d0211,

title = "Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment.",

abstract = "OBJECTIVE: Magnetic resonance imaging (MRI) of the spine is increasingly important in the assessment of inflammatory activity in clinical trials with patients with ankylosing spondylitis (AS). We investigated feasibility, inter-reader reliability, sensitivity to change, and discriminatory ability of 3 different scoring methods for MRI activity and change in activity of the spine in patients with AS. METHODS: Thirty sets of spinal MRI at baseline and after 24 weeks of followup, derived from a randomized clinical trial comparing a tumor necrosis factor (TNF)-blocking drug (n = 20) with placebo (n = 10) and selected to cover a wide range of activity at baseline and change in activity, were presented electronically in a partial latin-square design to 9 experienced readers from different countries (Europe, Canada). Readers scored each set of MRI 3 times, using 3 different methods including the Ankylosing Spondylitis spine Magnetic Resonance Imaging-activity [ASspiMRI-a, grading activity (0-6) per vertebral unit in 23 units]; the Berlin modification of the ASspiMRI-a; and the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system, which scores the 6 vertebral units considered by the reader as the most abnormal, with additional scores for {"}depth{"} and {"}intensity.{"} Both the order of the methods used by each reader and the timepoints (before/after treatment) were randomized. Feasibility of each scoring system was evaluated by measuring the mean time needed to score each set of MRI, and inter-reader reliability was evaluated by smallest detectable change (SDC) and by intraclass correlation coefficients (ICC) for all readers together and for all possible reader pairs separately. Sensitivity to change was investigated by calculating Guyatt's effect size on change scores. Discriminatory ability was assessed using Z-scores (Mann-Whitney test) comparing change in score between patients treated with TNF-blocking drug and placebo. RESULTS: The mean time to score one set of MRI was shortest for the Berlin method. SDC was lowest for the Berlin method and highest for SPARCC. Overall inter-reader ICC per method were between 0.49 and 0.77 for scoring activity status, and between 0.46 and 0.72 for scoring activity change. ICC for all possible reader pairs showed much more fluctuation per method, with lowest observed values of about 0.05 (very low agreement) and highest observed values over 0.90 (excellent agreement). In general, ICC for SPARCC were consistently higher than for other systems. Sensitivity to change differed per reader, and was more consistent with SPARCC than with the other methods, but was in general excellent for all 3 methods. Discrimination between groups (TNF-blocker vs placebo) assessed by Z-scores was good and comparable among methods. CONCLUSION: This experiment demonstrates the feasibility of multiple-reader MRI scoring exercises for method comparison, provides evidence for the feasibility, reliability, sensitivity to change, and discriminatory capacity of all 3 tested scoring systems to be used in assessing spinal activity on MRI in patients with AS in clinical trials. On the basis of these results it is not possible to prioritize one of the 3 methods.",

author = "C{\'e}dric Lukas and J{\"u}rgen Braun and {van der Heijde}, D{\'e}sir{\'e}e and Hermann, {Kay-Geert A} and Martin Rudwaleit and Mikkel {\O}stergaard and Ans Oostveen and Phil O'Connor and Maksymowych, {Walter P} and Lambert, {Robert G W} and Jurik, {Anne Grethe} and Xenofon Baraliakos and Robert Landew{\'e} and NN NN",

year = "2007",

language = "English",

volume = "34",

pages = "862--70",

journal = "Journal of Rheumatology",

issn = "0315-162X",

publisher = "Journal of Rheumatology Publishing Co. Ltd.",

number = "4",

}

TY - JOUR

T1 - Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment.

AU - Lukas, Cédric

AU - Braun, Jürgen

AU - van der Heijde, Désirée

AU - Hermann, Kay-Geert A

AU - Rudwaleit, Martin

AU - Østergaard, Mikkel

AU - Oostveen, Ans

AU - O'Connor, Phil

AU - Maksymowych, Walter P

AU - Lambert, Robert G W

AU - Jurik, Anne Grethe

AU - Baraliakos, Xenofon

AU - Landewé, Robert

AU - NN, NN

PY - 2007

Y1 - 2007

N2 - OBJECTIVE: Magnetic resonance imaging (MRI) of the spine is increasingly important in the assessment of inflammatory activity in clinical trials with patients with ankylosing spondylitis (AS). We investigated feasibility, inter-reader reliability, sensitivity to change, and discriminatory ability of 3 different scoring methods for MRI activity and change in activity of the spine in patients with AS. METHODS: Thirty sets of spinal MRI at baseline and after 24 weeks of followup, derived from a randomized clinical trial comparing a tumor necrosis factor (TNF)-blocking drug (n = 20) with placebo (n = 10) and selected to cover a wide range of activity at baseline and change in activity, were presented electronically in a partial latin-square design to 9 experienced readers from different countries (Europe, Canada). Readers scored each set of MRI 3 times, using 3 different methods including the Ankylosing Spondylitis spine Magnetic Resonance Imaging-activity [ASspiMRI-a, grading activity (0-6) per vertebral unit in 23 units]; the Berlin modification of the ASspiMRI-a; and the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system, which scores the 6 vertebral units considered by the reader as the most abnormal, with additional scores for "depth" and "intensity." Both the order of the methods used by each reader and the timepoints (before/after treatment) were randomized. Feasibility of each scoring system was evaluated by measuring the mean time needed to score each set of MRI, and inter-reader reliability was evaluated by smallest detectable change (SDC) and by intraclass correlation coefficients (ICC) for all readers together and for all possible reader pairs separately. Sensitivity to change was investigated by calculating Guyatt's effect size on change scores. Discriminatory ability was assessed using Z-scores (Mann-Whitney test) comparing change in score between patients treated with TNF-blocking drug and placebo. RESULTS: The mean time to score one set of MRI was shortest for the Berlin method. SDC was lowest for the Berlin method and highest for SPARCC. Overall inter-reader ICC per method were between 0.49 and 0.77 for scoring activity status, and between 0.46 and 0.72 for scoring activity change. ICC for all possible reader pairs showed much more fluctuation per method, with lowest observed values of about 0.05 (very low agreement) and highest observed values over 0.90 (excellent agreement). In general, ICC for SPARCC were consistently higher than for other systems. Sensitivity to change differed per reader, and was more consistent with SPARCC than with the other methods, but was in general excellent for all 3 methods. Discrimination between groups (TNF-blocker vs placebo) assessed by Z-scores was good and comparable among methods. CONCLUSION: This experiment demonstrates the feasibility of multiple-reader MRI scoring exercises for method comparison, provides evidence for the feasibility, reliability, sensitivity to change, and discriminatory capacity of all 3 tested scoring systems to be used in assessing spinal activity on MRI in patients with AS in clinical trials. On the basis of these results it is not possible to prioritize one of the 3 methods.

AB - OBJECTIVE: Magnetic resonance imaging (MRI) of the spine is increasingly important in the assessment of inflammatory activity in clinical trials with patients with ankylosing spondylitis (AS). We investigated feasibility, inter-reader reliability, sensitivity to change, and discriminatory ability of 3 different scoring methods for MRI activity and change in activity of the spine in patients with AS. METHODS: Thirty sets of spinal MRI at baseline and after 24 weeks of followup, derived from a randomized clinical trial comparing a tumor necrosis factor (TNF)-blocking drug (n = 20) with placebo (n = 10) and selected to cover a wide range of activity at baseline and change in activity, were presented electronically in a partial latin-square design to 9 experienced readers from different countries (Europe, Canada). Readers scored each set of MRI 3 times, using 3 different methods including the Ankylosing Spondylitis spine Magnetic Resonance Imaging-activity [ASspiMRI-a, grading activity (0-6) per vertebral unit in 23 units]; the Berlin modification of the ASspiMRI-a; and the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system, which scores the 6 vertebral units considered by the reader as the most abnormal, with additional scores for "depth" and "intensity." Both the order of the methods used by each reader and the timepoints (before/after treatment) were randomized. Feasibility of each scoring system was evaluated by measuring the mean time needed to score each set of MRI, and inter-reader reliability was evaluated by smallest detectable change (SDC) and by intraclass correlation coefficients (ICC) for all readers together and for all possible reader pairs separately. Sensitivity to change was investigated by calculating Guyatt's effect size on change scores. Discriminatory ability was assessed using Z-scores (Mann-Whitney test) comparing change in score between patients treated with TNF-blocking drug and placebo. RESULTS: The mean time to score one set of MRI was shortest for the Berlin method. SDC was lowest for the Berlin method and highest for SPARCC. Overall inter-reader ICC per method were between 0.49 and 0.77 for scoring activity status, and between 0.46 and 0.72 for scoring activity change. ICC for all possible reader pairs showed much more fluctuation per method, with lowest observed values of about 0.05 (very low agreement) and highest observed values over 0.90 (excellent agreement). In general, ICC for SPARCC were consistently higher than for other systems. Sensitivity to change differed per reader, and was more consistent with SPARCC than with the other methods, but was in general excellent for all 3 methods. Discrimination between groups (TNF-blocker vs placebo) assessed by Z-scores was good and comparable among methods. CONCLUSION: This experiment demonstrates the feasibility of multiple-reader MRI scoring exercises for method comparison, provides evidence for the feasibility, reliability, sensitivity to change, and discriminatory capacity of all 3 tested scoring systems to be used in assessing spinal activity on MRI in patients with AS in clinical trials. On the basis of these results it is not possible to prioritize one of the 3 methods.

M3 - Journal article

SN - 0315-162X

VL - 34

SP - 862

EP - 870

JO - Journal of Rheumatology

JF - Journal of Rheumatology

IS - 4

ER -

Scoring inflammatory activity of the spine by magnetic resonance imaging in ankylosing spondylitis: a multireader experiment.

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater