25 Sep

Interobserver Variability in Applying a Radiographic Definition for ARDS: K-statistic

Because any diagnostic test may perform poorly in a specific spectrum of cases, it is possible that the poor agreement in this study reflects the sample of radiographs. We tried to simulate the broad range of chest radiographs that would be encountered in screening patients for ALI-ARDS for enrollment in a clinical trial by using a sample of patients who were critically ill, intubated, and met the oxygenation criterion for ALI-ARDS. It is possible that our sample size of 21 readers and 28 radiographs was too small to estimate the true K-statistic value. This uncertainty is reflected in the confidence intervals around the K-statistic value, which exclude excellent agreement.

The K-statistic is affected by the prevalence of positive readings in the sample. If, for example, the average reader had read the radiograph as 90% positive or 90% negative for ALI-ARDS, the K-sta-tistic might have appeared low when, in fact, considerable agreement existed among readers. However, this limitation does not apply to our study, because the wide variability among readers led to a broad range of positive readings, and the average prevalence was nearly optimal 54% (Table 2).
Some aspects of the chest radiograph presentation process may have contributed to the level of observed agreement. Serial radiographs were not available for review as they might be in clinical practice, and such review could have improved apparent agreement on radiographs of pleural effusions or on those using overlying monitoring devices. In addition, we chose to study the readings of pulmonary and critical care physician experts rather than radiologists. We consider it a unique opportunity to have studied their performance; however, it is possible that a group of radiologists would have interpreted the chest radiographs with less variability. Because the diagnosis of ALI-ARDS and the decision to enroll in clinical trials are evaluations frequently made by a clinician at the bedside, we believe the study participants were a valid choice to address the research question. We did not proctor the readers when they interpreted radiographs. However, any collaborative reading would only have biased the study toward finding a higher level of agreement than actually exists.

