Interrater Agreement Measures For Nominal And Ordinal Data

Interrater Agreement Measures For Nominal And Ordinal Data

Krippendorff [12] proposed an even more flexible agreement than Fleiss` K, known as Krippendorffs Alpha. It can also be used for two or more tips and categories, and it applies not only to nominal data, but to each scale of measurement, including metric data. Another important advantage of Krippendorff`s alpha is that it can handle the missing values, as each observation is evaluated by at least two advisors. Observations with a single evaluation should be excluded. Because of the categorical nature of ordinal classifications, non-parametric tier-invariant approaches are preferred by some researchers [52-55], for whom classification methods are not influenced by re-labelling of the ordinal classification scale [52-56]. Svensson et al [52.53] describe a non-parametric, variable-rank approach to assessing the different components of differences of opinion in the classified data. Currently, these approaches are limited to evaluating peer-to-peer associations and agreements, so they are better suited to studies with a smaller number of experts. Liu and Agresti [36] point out that, for parametric approaches, where the latent variable model, for example, considers disease status to be an unnoticed latent variable, the estimated effects are invariant on the number of categories of classification scales and their intersections, and if the model is appropriate, different studies using different scales for classifications should lead to similar conclusions. The different variants of cci must be selected based on the type of study and the type of agreement the researcher wishes to collect.

Four main factors determine the appropriate ICC variant based on its own study design (McGraw-Wong, 1996; Shrout – Fleiss, 1979) and reviewed here. Since there is no standard software in which Fleiss` K and Krippendorffs Alpha are implemented with bootstrap confidence intervals (see preview of additional file 2), we provide with this article an R script called “K_alpha”.. The reference was the R kripp.alpha function of the irr package [31] and andrew Hayes` macro-kalpha SAS [30]. The K_alpha function calculates Fleiss` K (for nominal data) with asymptotic intervals and bootstrap and Krippendorff alpha with standard start interval. The program description as well as the program itself, the call of functions for a fictitious dataset and the corresponding output are shown in the additional file 3. The authors are grateful for the support of the 1R01CA17246301-A1 scholarship from the U.S. National Institute of Health. We thank Dr. Allsbrook for his kindness in providing his data set. We welcome the comments and suggestions from the reviews of this manuscript.

An analysis of the IRR was conducted to assess the extent to which coders systematically attributed categorical assessments of depression to the subjects in the study. Marginal distributions of depression assessments did not highlight prevalence or bias problems, suggesting that Cohen`s Kappa (1960) was an appropriate index of IRR (Di Eugenis-Glass, 2004). Kappa was calculated for each pair of coders, which was then calculated to provide a single IRR index (Light, 1971). The resulting Kappa indicated a significant agreement, n- 0.68 (Landis-Koch, 1977), and was consistent with previously published IRR estimates from the coding of similar constructions in previous studies. The flawless analysis showed that coders had a significant match in depression assessments, although the interest rate variable had a slight error differential due to differentiated subjective assessments of coders, which slightly reduced statistical performance for subsequent analyses, although the evaluations were deemed appropriate to be used in the hypothesis tests of the present study.