A QUARTER OF A CENTURY AGO, Hanley and Lippman-Hand1 enlightened the medical community, via JAMA, about the hazards of overinterpreting zero numerator events. They calculated the upper boundary of a 95% confidence interval for actual event rates when none occurs in a sample of given size. For example, no occurrence in 300 observations still allows a 95% chance that the true incidence is as high as approximately 1%.
In this issue of the journal, Davidson et al encountered this principle and its complement in their attempt to use the ROTEM thromboelastometric device (Pentapharm, Basel, Switzerland) to predict bleeding after cardiac surgery.2 Studying a cohort of 58 patients at minimal or modest risk for bleeding at cardiac surgery, they found that 47 of the 50 (94%) patients without bleeding had abnormal ROTEM results. In this case, if (nearly) everything goes wrong, might, in fact, most things be all right? The investigators concluded that the ROTEM test provides a poor positive predictive value for postoperative bleeding. From the other side, the negative predictive value of 100% was based on only 3 subjects; its confidence interval extended nearly to 0%. If nothing goes wrong, is everything all right? Not likely.
This commentary will return to clinical messages from this study after highlighting a number of its other features. First, this investigation shows how the predictive value of a test depends on the prevalence of the underlying disease or disorder. With a population at only modest risk for bleeding (no recent antiplatelet therapy, no repeat sternotomy operations, no subjects with known coagulopathy or receiving anticoagulants before surgery, and near-universal administration of prophylactic antifibrinolytics), the trial showed that only 14% (8/58) satisfied the criteria for postoperative bleeding. If a cohort at higher risk of bleeding (60% instead of 14%) displayed ROTEM results with sensitivity and specificity for bleeding identical to those of the authors' cohort, the positive predictive value would be 61.5%, much more impressive than the 14.5% obtained. The 100% negative predictive value still suffers from too few observations of a negative test result. Indeed, the reader adept at these calculations notes that the positive predictive value tracks the prevalence of bleeding because of the few instances of normal test results, leading the clinician to question the value of this test.
Second, this report underscores the value of statistical consultation for sample-size calculation before data collection. Davidson et al studied 60 subjects and calculated the power of their study after obtaining the results. Calculating sample size during the design phase forces investigators to face the harsh statistical realities that observations can vary considerably and that collecting more data permits better inference in the face of that variation. When determining the trial sample size, the clinician must first define the minimally clinically relevant effect that the trial must detect. Occasionally, this exercise shows investigators that the difference they originally expected is clinically irrelevant; occasionally, it shows the difference to be unobtainable. The ROTEM trial needed 231 subjects to provide adequate confidence in the positive predictive value of 14.5% obtained; it studied 58.
Third, this trial highlights the flimsy basis of the 5% level usually chosen to denote significance of results. Every statistical test based on this 5% level has a 1 in 20 chance of providing a spuriously significant result, thus denoting a difference when none truly exists (ie, the data observed differ by chance alone). The reader cannot know which of the several statistically significant p values shown in the results tables of this trial were obtained from true differences and which were spurious. For example, of the 20 tests appearing in Table 4, perhaps 1 of the 3 significant values is misleading; is it the one associating preoperative platelet function analysis results with bleeding? These “nominal” significance levels must be viewed as hypothesis generating and not confirmatory.
Fourth, results in this trial and others of laboratory coagulation testing call attention to how little clinicians understand about how ex vivo coagulation tests reflect in vivo hemostatic mechanisms. For example, the ROTEM purports to depict “clot strength” by the deflection of a pen mechanically linked to a rotating piston. Published data linking pen deflection to tensile strength of formed clot (measured in dynes
·
cm−2) remain elusive.3 Is the relationship between the tensile strength of formed clot and pen deflection linear? Does it exhibit a threshold or a maximum? Davidson et al used the manufacturer's supplied limits of normal in designating ROTEM values as normal or abnormal. Yet, does the range of values of a parameter obtained in a cohort of healthy subjects ensure that values outside those limits denote a disorder? Abnormal values may not be pathologic. Can “poor clot strength” be concluded when obtaining a thromboelastograph or ROTEM value of maximum pen deflection amplitude outside the normal range? The construction of individual operating characteristic curves for each measurement might identify more appropriate discriminant values to denote bleeding tendency, potentially improving their predictive values.
Platelet dysfunction constitutes the fundamental coagulation defect accompanying cardiopulmonary bypass. However, the Thrombelastogram (Haemoscope, Niles, IL) and its cousins, including ROTEM, seem best suited to detect fibrinolysis.4, 5 Do clinicians demand too much of these tests when asking them to predict excessive bleeding after bypass?
What clinical messages does this trial deliver about the ROTEM? Results in a patient at minimal or modest risk for bleeding that fall outside the manufacturer's specifications for normal provide little ability to distinguish those who bleed after cardiac surgery from those who do not. What inferences does a normal ROTEM result permit? Too few of these lower risk patients displayed that result to permit conclusions. Thus, clinicians should also expect few of their patients to display normal results. No patient had surgical bleeding in the cohort reported by Davidson et al; these data cannot support that a totally normal ROTEM result in a bleeding patient denotes surgical bleeding, no matter how “logical” that deduction.
Davidson et al have augmented substantially the understanding of the limitations of thromboelastometry and generated important questions for future investigation. Clinicians again are reminded that when nothing goes wrong, everything might not be all right.