Interobserver Variability in Applying a Radiographic Definition for ARDS: Conclusion
To reduce interobserver variability, future panels charged with revising the definition for ALI-ARDS should consider the issues raised by this study. An annotated set of training radiographs clarifying the interpretation of the difficult radiographic patterns identified by our readers might be a useful adjunct to a written definition. The effect of analog vs digital technique on agreement needs to be further evaluated. Modifications to the definition may also improve agreement. For example, a “negative” definition that specifies which radiographic patterns are inconsistent with ALI-ARDS may lead to greater agreement than the current version. Specific instructions to interpret radiographs strictly by the definition, even if it results in positive radiographs that the reader might personally consider negative for ALI-ARDS, may facilitate consistent readings. It is interesting to note that the ARDS Network investigators, who have read chest radiographs together as part of clinical trial planning, interpreted radiographs no more consistently as a group than other participants. Group reading exercises may be insufficient to ensure agreement, and a modified definition or example radiograph may be necessary. Finally, it is important that definitions proposed by consensus panels be empirically evaluated for interobserver agreement by readers who will be using the definition. this
This study has important implications for consensus panels charged with defining critical care syndromes in general and for the interpretation of clinical trials, Sepsis syndrome, multiple organ dysfunction syndrome, and ARDS are diagnosed on the basis of operational definitions proposed by experts.” These definitions frequently make clinical sense and therefore seem valid. However, they are rarely subjected to empiric testing to evaluate their reliability. As we have shown, interobserver variability may be high, particularly with regard to radiographic or clinical features that are difficult to standardize among clinicians. Because the accuracy of critical care syndrome definitions cannot be verified, as accepted “gold standard” diagnostic tests do not exist, it is particularly important that future critical care syndrome definitions demonstrate their reliability. It is important to appreciate that the absence of effective therapies for ALI-ARDS limits the clinical consequences of its diagnosis. Therefore, the findings of this study are largely a challenge to the research community. However, when effective treatments are found, their applicability at the bedside will depend on clinicians’ abilities to identify and treat patients similar to those enrolled in the clinical trials., If the efficacy of an ALI-ARDS therapy depends, in part, on radiographic aspects of the syndrome, and if clinicians cannot identify similar patients because of variability in applying the definition, then the treatment will not be as effective in their patients as in the clinical trial subjects. Before we can expect clinicians to consistently identify those chest radiographs that meet criteria for ALI-ARDS, tools should be developed to help experts apply the definition consistently.