If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Memory Clinic, University Department of Geriatric Medicine FELIX PLATTER, Basel, SwitzerlandClinic for Anaesthesia, Intermediate Care, Prehospital Emergency Medicine and Pain Therapy, University Hospital Basel, Basel, Switzerland
Clinic for Anaesthesia, Intermediate Care, Prehospital Emergency Medicine and Pain Therapy, University Hospital Basel, Basel, SwitzerlandDepartment of Clinical Research University of Basel, Basel, Switzerland
Address correspondence to Nicolai Goettel, MD, DESA, EDIC, University of Florida College of Medicine, Department of Anaesthesiology, 1345 SW Center Dr, PO Box 100254, Gainesville, FL 32610.
Department of Clinical Research University of Basel, Basel, SwitzerlandDepartment of Anaesthesiology, University of Florida College of Medicine, Gainesville, FL, USA
This investigation provided independent external validation of an existing preoperative risk prediction model.
Design
A prospective observational cohort study of patients undergoing cardiac surgery covering the period between April 16, 2018 and January 18, 2022.
Setting
Two academic hospitals in Switzerland.
Participants
Adult patients (≥60 years of age) who underwent elective cardiac surgery, including coronary artery bypass graft, mitral or aortic valve replacement or repair, and combined procedures.
Interventions
None.
Measurements and Main Results
The primary outcome measure was the incidence of postoperative delirium (POD) in the intensive or intermediate care unit, diagnosed using the Intensive Care Delirium Screening Checklist. The prediction model contained 4 preoperative risk factors to which the following points were assigned: Mini-Mental State Examination (MMSE) score ≤23 received 2 points; MMSE 24-27, Geriatric Depression Scale (GDS) >4, prior stroke and/or transient ischemic attack (TIA), and abnormal serum albumin (≤3.5 or ≥4.5 g/dL) received 1 point each. The missing data were handled using multiple imputation. In total, 348 patients were included in the study. Sixty patients (17.4%) developed POD. For point levels in the prediction model of 0, 1, 2, and ≥3, the cumulative incidence of POD was 12.6%, 22.8%, 25.8%, and 35%, respectively. The validation resulted in a pooled area under the receiver operating characteristics curve of 0.60 (median CI, 0.525-0.679).
Conclusions
The evaluated predictive model for delirium after cardiac surgery in this patient cohort showed only poor discriminative capacity but fair calibration.
WITH APPROXIMATELY 80 MILLION surgical procedures performed in Europe each year, postoperative delirium (POD) is a major complication of surgery, and poses a significant burden for patients, families, medical, and nursing staff, as well as the healthcare system.
Complications and mortality in older surgical patients in Australia and New Zealand (the REASON study): A multicentre, prospective, observational study.
Postoperative delirium is characterized by an acutely developing and fluctuating disturbance of awareness, attention, and cognition, and is classified as a postoperative neurocognitive disorder according to the new nomenclature.
Estimating patients' risk for postoperative delirium from preoperative routine data - Trial design of the PRe-Operative prediction of postoperative DElirium by appropriate SCreening (PROPDESC) study - A monocentre prospective observational trial.
Numerous epidemiologic studies reported widely divergent data on the incidence of POD, depending on the cohort of patients studied (eg, older versus younger patients), the type of surgical procedure, and treatment modalities (eg, elective versus emergency surgery).
In light of continuous increases in the older population, given demographic aging in industrialized countries and clear interests in improving delirium care, an accurate POD prediction model may be a powerful tool to facilitate the early implementation of prevention measures in clinical practice.
Estimating patients' risk for postoperative delirium from preoperative routine data - Trial design of the PRe-Operative prediction of postoperative DElirium by appropriate SCreening (PROPDESC) study - A monocentre prospective observational trial.
have been developed for cardiac surgery. From a clinical standpoint, their prediction model appeared to be practical as it was based on just the following 4 risk factors: impaired cognition, depressive symptoms, prior stroke or TIA, and abnormal serum albumin.
of the time, respectively. Furthermore, the rate of prospective external validation of new risk-prediction models within 5 years after publication is small (16%).
A potential reason for the limited validations could be the much stronger academic incentives for the development of new models rather than the validation of previously published models.
However, it is essential, as well as mandatory, to test the generalizability of a model and to retest it according to new data in order to understand its robustness to distributional shifts over time and its settings before implementing it in clinical practice.
in a prospective cohort study of patients who had undergone cardiac surgery.
Methods
The study authors conducted and reported this prospective observational cohort study according to the Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis guidelines.
The study protocol (No. 2020-00848) was approved by the institutional review board (Ethikkommission Nordwest- und Zentralschweiz) on July 27, 2020. A prior requirement for informed consent was later waived by Ethikkommission Nordwest- und Zentralschweiz.
Design and Selection Criteria
This broad prospective validation study was conducted at 2 academic medical centers in Basel and Zurich, Switzerland. The inclusion and exclusion criteria were identical to the derivation cohort used in the original model of Rudolph et al.
Briefly, the authors included patients aged ≥60 years who underwent elective cardiac surgery, including coronary artery bypass graft, mitral or aortic valve replacement or repair, and combined procedures. The exclusion criteria were non-German speaking, living >60 miles from the study center, emergency surgery, delirium before surgery, concurrent aortic or carotid surgical procedures, and medical instability limiting preoperative assessment.
Study Participants
The authors consecutively included 279 patients at the University Hospital Basel from April 16, 2018 to January 18, 2022, and 69 patients at the University Hospital Zurich from January 13, 2021 to January 18, 2022. The recruitment and inclusion process is shown in Figure 1.
including the Mini-Mental State Examination (MMSE; range: 0-30 points, 0 = worst), the Geriatric Depression Scale (GDS; range: 0-15 points, 15 = worst), history of TIA and/or stroke, and serum albumin concentration were assessed during the routinely held preoperative anesthesia consultation. Demographic factors, age at the time of surgery, sex, and type of surgery were collected from the electronic medical record.
Outcome
The primary outcome was the incidence of delirium after cardiac surgery. POD was diagnosed using the Intensive Care Delirium Screening Checklist (ICDSC) with a score of ≥4 points (maximum score = 8) during the intensive care unit (ICU) or intermediate care unit stay. The ICDSC was administered 3 times per day by trained nursing staff, blinded to the predictor variables, until the patient was discharged from the ICU or intermediate care unit. The ICDSC is an 8-item screening instrument based on the Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV-TR criteria, which was specifically designed for the intensive care setting.
The checklist contains the following items, which are rated as absent or present: (1) consciousness (ie, comatose, stuporous, awake, or hypervigilant); (2) orientation; (3) hallucinations or delusions; (4) psychomotor activity; (5) inappropriate speech or mood; (6) attentiveness; (7) sleep-wake cycle disturbances; and (8) fluctuation of symptoms. The items are rated on the patient's behavior at the time of screening, and interrater reliability among intensive care staff is considered adequate.
All patients underwent cardiac surgery under general anesthesia. The anesthesia protocol, the operative procedure, and the postoperative care (eg, pain control) were performed according to local hospital policies and practice protocols. The use of aortic cross-clamp, cardiopulmonary bypass, high-dose heparin, and hypothermia was at the discretion of the attending surgeon. The intraoperative data were extracted from the surgical notes.
Sample Size
There are no generally accepted approaches or empirical evidence to estimate the sample size requirements for validation studies of risk prediction models.
Therefore, the authors determined their sample size according to the events per variable rule. This common rule of thumb was originally adapted to ensure stability in regression covariates and postulates that at least 10 events (cases with POD) must occur for each candidate predictor in the model.
In the authors’ analysis, they included 15 patients with POD per predictor variable. Therefore, the required sample size was a minimum of 60 patients presenting with POD (4 predictors × 15 events).
Missing Data
In the overall cohort, data on POD were missing in 5%, education was missing in 7%, GDS in 8%, MMSE in 6%, and the serum albumin concentration in 2%. There were no missing values of age, sex, and history of TIA and/or stroke. The authors assumed the missing data occurred at random, and they performed multiple imputations using the multivariate imputation by chained equation procedure with the predictive mean-matching method. The missing values were predicted based on the demographic variables (ie, age, sex, and education), all predictor variables, and outcome. The continuous variables were maintained as continuous in the imputation and only subsequently categorized for the final predictive model. In accordance with the original model, the authors created 20 multiple imputed datasets.
They reported all results from the pooled dataset. Rubin's rules were used to pool the regression coefficient estimates from the imputed datasets. The authors also reported the results of the original dataset with missing data.
Statistical Analysis
For descriptive analysis, all continuous variables are presented as mean ± SD. The categorical variables are reported as frequencies and percentages. The preoperative characteristics of patients from Basel were compared to those recruited from Zurich using a t test for the continuous variables. The categorical variables were compared with a chi-square test. Before applying the clinical prediction model, which was developed in a previous study, to the overall cohort dataset, the continuous risk factors were categorized using identical clinically meaningful cutoff points as used in the original model.
Therefore, GDS was dichotomized at >4 points, which indicates clinical depression. The MMSE was categorized as not impaired (range: 28-30 points), mild impairment (range: 24-27 points), and definitive impairment (≤23 points). The variables TIA and/or history of stroke were combined into one variable. Serum albumin concentration was classified into a normal value (3.6-4.4 g/dL) versus an abnormal value (≤3.5 or ≥4.5 g/dL). The clinical prediction model points were assigned as follows: MMSE ≤23 points received 2 points; MMSE 24 to 27 points, GDS >4 points, prior stroke/TIA, and abnormal serum albumin received 1 point each.
The incidence of POD is presented with increasing clinical prediction model points and a risk ratio relative to the lowest risk group. The summary statistics of the original model in the derivation cohort are based on the bootstrapping method, which was used for variable selection. Because the authors did not perform variable selection (model selection), they did not require bootstrapping. However, to make the results of the derivation cohort comparable to their validation cohort, the authors calculated the raw risk ratio, including associated CIs of the prediction model for each score in their cohort and the derivation cohort of Rudolph et al. For model validation, the authors assessed the model performance using measures of discrimination and calibration. In the dataset, they assessed model discrimination with the area under the receiver operating characteristic curve (AUROC; identical to the c-statistics) in each imputed dataset, and reported the median AUROC. Calibration was assessed using the Hosmer-Lemeshow test for goodness of fit in the imputed datasets. In a sensitivity analysis, the authors examined the c-statistics, excluding “off-pump” patients. All analyses were computed using IBM SPSS Statistics V.28.0.1.0 (IBM SPSS, Inc, Armonk, NY) for Windows.
Results
Participants
Among the 348 patients in this combined external validation cohort, 17.4% (n = 60) developed POD after cardiac surgery. The baseline characteristics of patients from Basel and Zurich were similar, with the exception that patients from Zurich had a slightly higher incidence of POD. Compared to Zurich, patients from Basel were more likely to be female patients, have a low serum albumin concentration, and present with more depressive symptoms (Table 1). The mean patient age at surgery was 70.9 ± 5.7 years. Twenty-two patients underwent “off-pump” surgery.
Table 1Baseline Characteristics of the External Swiss validation Cohort and the Derivation Cohort of Rudolph and Colleagues
In comparison to the original model in the derivation cohort, patients in this study were slightly younger (70.9 ± 5.7 v 74.7 ± 6.3 years), mostly male patients (79.3%), and showed a much lower incidence of POD (17.4% v 52%). The prevalence of TIA and/or stroke was lower (14.9% v 22%) for the authors’ cohort, as well as the mean GDS (1.5 ± 1.8 v 3.3 ± 3.0 points). The mean MMSE was higher (28.4 ± 1.6 v 26.9 ± 2.6 points). Moreover, the authors’ cohort had a higher percentage of the normal value of serum albumin concentration, but the abnormal serum albumin values were lower. Furthermore, most of the patients in their study had a high level of education (Table 1), similar to that reported by Rudolph et al.
The authors calculated the clinical prediction model points and applied them to the overall Swiss cohort. The increasing risk score was associated with an increased risk of POD. The number of patients with a score ≥3 was far too small (6 patients) and was not representative. However, POD was identified in 12.6% with a low-risk score, 22.8% with a moderate-risk score, 25.8% with a high-risk score, and 35% with a very-high-risk score. When applying the risk stratification system with no points as reference, the presence of ≥1 point increased the delirium risk by 1.5; 2 points or more doubled the delirium risk, and ≥3 points more nearly tripled the delirium risk (Table 2). The Hosmer-Lemeshow test for goodness of fit showed good agreement between the observed numbers and numbers estimated in the logistic regression model 1.000 (χ2 = 0.000) in the imputed datasets. The median AUROC (identical to the c-statistics) was 0.60 (median CI, 0.525-0.679). Graphical representation of discrimination is shown in Figure 2. In the original dataset with missing data, the Hosmer-Lemeshow test showed good agreement between the observed numbers and numbers estimated in the logistic regression model 1.000 (χ2 = 0.000) as well, and the AUROC was 0.60 (95% CI, 0.524-0.681). Excluding “off-pump” patients, the median AUROC was 0.61 (median CI, 0.530-0.685) in the imputed dataset; in the original dataset with missing data, the AUROC was 0.61 (95% CI, 0.529-0.688). Overall, compared to Rudolph et al., there was a degradation of model performance in the authors’ validation cohort. The β coefficients for the logistic model based on the 4 preoperative predictors are presented in Table 1 in the supplement.
Table 2Performance of the Clinical Prediction Model in the Swiss External Validation Cohort Compared to the Derivation Cohort of Rudolph and Colleagues
Fig 2Area under the receiver operator characteristic curve (AUROC) showing the ability of the delirium prediction model by Rudolph et al. to correctly classify those with and without postoperative delirium after cardiac surgery in the underlying independent external Swiss validation cohort. AUROC = 0.5 indicates no discrimination, whereas AUROC = 1.0 indicates perfect discrimination. The black dotted reference line refers to no discrimination.
The aim of this prospective observational study was to externally validate a previously published clinical prediction model for predicting POD in an independent cohort of cardiac surgery patients in Switzerland, in line with recent framework guidelines.
the prediction model validated in their contemporary patient cohort was conflicting in that it showed fair calibration but a degradation (AUROC = 0.60) in the prediction of POD after cardiac surgery. To observe substantial decrements in discrimination during validations (compared with performance on the derivation dataset) was not surprising, as it was in line with previous reports.
There were several potential reasons for this. First, the observed magnitude of the AUROC may be explained by case mix and heterogeneity in the characteristics of the cohorts/populations. There was variability in the derivation and the authors’ validation cohort, especially in the outcome measure of POD (52% v 17.4%), as well as in the predictor variables. In comparison to the original model of Rudolph et al.,
patients undergoing cardiac surgery in the authors’ sample reported fewer depressive symptoms (1.5 ± 1.8 v 3.3 ± 3.0 points), showed a lower prevalence of TIA and/or stroke (14.9% v 22%), and performed better on the MMSE (28.4 ± 1.6 v 26.9 ± 2.6 points). Moreover, the authors’ cohort had a higher percentage of normal-value serum albumin concentrations. However, the abnormal serum albumin values were lower compared to the original model.
In addition, Rudolph et al. validated their prediction model in a US population, whereas the authors evaluated the prediction model in Switzerland. However, according to a previous large-scale review, this substantially larger decrease in discriminatory performance might be expected to be more pronounced when models are evaluated in populations that are dissimilar to the derivation population.
The prediction model may not be applicable to current patients undergoing cardiac surgery due to improvements in general healthcare, technical and technologic advances; and the establishment of preventive measures against delirium, such as dexmedetomidine infusion during surgery, at least in Zurich, may have resulted in a drift of the clinical prediction model performance over time.
In this study, the authors used the ICDSC to diagnose POD. This corresponded to the standard procedure at the 2 academic institutions instead of the Confusion Assessment Method (CAM) or CAM-ICU for intubated patients, which was used by Rudolph et al.
However, in 2 meta‑analyses, the pooled sensitivity of CAM‑ICU was found to be 75.5%-to-80.0%, and specificity was 95.8% to 95.9% for detection of delirium; whereas the pooled sensitivity for the ICDSC was found to be 74.0%-to-80.1%, and specificity was 74.6%-to-81.9%. Therefore, it can be assumed that both instruments are highly valid when compared to the gold standard (DSM‑IV criteria) in detecting POD.
The confusion assessment method for the intensive care unit (CAM-ICU) and intensive care delirium screening checklist (ICDSC) for the diagnosis of delirium: A systematic review and meta-analysis of clinical studies.
Although some cases of delirium may have been missed, the observed incidence of POD in the authors’ study was relatively low compared to the derivation cohort of Rudolph et al.
depending on the definition used, timing, characteristics of the studied population, selected assessment tool, type of surgical procedure, and the mode of treatment.
which also capture delirium symptoms and their severity. This may have contributed to the higher rate of POD in their sample. However, information regarding the duration of the CAM and/or CAM-ICU assessments, and whether the assessors were blind to the predictors, was lacking. This may have led to a possible bias in the POD rate. Second, the prevalence of delirium increases with age. Many studies have found age to be a significant predictive factor of POD, despite regression analysis to control for confounders. Age >60 years may be considered an implicit element of the original model by Rudolph et al., because patients <60 years were excluded. However, patients in the authors’ cohort had a mean age of 70.9 years, which was younger than in the derivation cohort of Rudolph et al. (74.7 ± 6.3 years). Third, besides advanced age, baseline cognitive impairment is the most highly cited factor associated with an increased risk of delirium.
In the authors’ cohort, patients had a better preoperative test performance (MMSE, 28.4 ± 1.6 points) compared to the derivation cohort (MMSE, 26.9 ± 2.6 points). According to established, clinically important ranges, 28.4 points indicate no impairment, whereas 26.9 points indicate mild impairment.
Fourth, the risk prediction model was applied retrospectively. Although this could have caused some errors in the risk stratification of individual patients, the authors herein think that this effect was small because all data used for the application of the Rudolph et al. prediction model were collected prospectively. Fifth, in recent years, guidelines have been developed that recommend the use of multicomponent, nonpharmacologic interventions to reduce delirium.
Clinical practice guidelines for the prevention and management of pain, agitation/sedation, delirium, immobility, and sleep disruption in adult patients in the ICU.
There are several simple, single-component interventions, such as reducing environmental stressors (eg, avoiding excessive noise, maintaining daylight and nighttime rhythm) and frequent orientation of patients to time and place, which can be implemented relatively easily.
However, although these measures seem relatively inexpensive at first sight, there are considerable “hidden costs,” such as higher nurse-to-patient ratios and specific training requirements for caregivers. Given the high burden on scarce human and material resources, these multicomponent interventions are most cost-effective when targeted at high-risk patients.
In addition, there is high variability among different institutions, which may or may not apply preventative measures against delirium, and it is still uncertain as to which interventions are most effective. Therefore, the authors assumed that preventative measures, as administered in both participating institutions, may have played a role in lowering the incidence of POD in their cohort. Moreover, advances in surgical and anesthetic techniques and developments in cardiopulmonary bypass technology may have contributed to a lower delirium incidence as compared to 20 years ago.
Overall, the poor result of discriminative performance (AUROC = 0.60) of the Rudolph et al. prediction model in the authors’ sample was in line with a previously published large head-to-head comparison study.
The aim of this previous study was to identify clinical prediction models for delirium developed and published since 1990, and to compare their performance head-to-head. In this large analysis, the model discrimination of the Rudolph et al. prediction model was considered poor (AUROC = 0.610).
There were several important strengths to this study. To the best of the authors’ knowledge, this was the first broad validation of the Rudolph et al. preoperative prediction model for POD after cardiac surgery in a German-speaking, Swiss population using real-world data and, therefore, was wholly independent of the development and validation sample of the original study. Furthermore, patients were recruited from more than 1 hospital in Switzerland. Second, the authors’ sample size was larger (almost 3 times larger) compared to Rudolph et al. Third, the primary outcome (POD) was ascertained by investigators blinded to the predictor variables. Finally, the authors handled missing data using multiple imputations. This is a popular statistical methodology that replaces missing values with plausible values. One can explicitly account for the uncertainty inherent in the imputed values by creating multiple imputed data sets. Moreover, this approach is superior to more historic approaches such as complete case analysis, mean imputation, and single imputation.
However, a number of critical considerations pertaining to the authors’ study can be made. First, the participants of this study were relatively well-educated (13.2 ± 3.4 years of education), which may have impacted the performance on the MMSE and the incidence of POD. Although all patients undergoing elective cardiac surgery at the participating institutions tested negative for SARS-CoV-2 preoperatively, possible effects of the COVID-19 pandemic during the recruitment period and seasonal variations should be kept in mind because this may limit the generalizability of the authors’ findings.
Data on patients’ history of prior SARS-CoV-2 infection were not available. Second, because the authors’ purpose was to validate the prediction model externally and to avoid causing additional unnecessary distress to patients before surgery, they collected only a minimal number of variables from patients and medical reports. Hence, establishing or updating (eg, recalibrating or extending the model by adding newly discovered predictors) a new prediction model was beyond the scope of this study. In addition, a previous systematic review and meta-analysis found no strong evidence of a relationship between AUROCs and the number of predictors used in prediction models.
It seems more important that the predictors can be applied in clinical practice, when time is often short. However, given the relative scarcity of external validations, it seems reasonable to prioritize the study of existing prediction models (as opposed to developing new ones) and realize how this might be optimized for clinical use.
Risk prediction models play an important role in current cardiac surgical practice. The study authors herein have provided an independent external validation of a previously developed preoperative prognostic model for incident POD in patients who underwent cardiac surgery in Switzerland. The evaluated prognostic model showed only poor discriminative capacity but fair calibration. However, poor performance in a single validation cohort does not reliably forecast performance on subsequent validations. Therefore, it is worth implementing further rigorous studies to evaluate the generalizability and the clinical validity of this prognostic model to realize how this might be optimized for clinical use.
Conflict of Interest
None.
Acknowledgments
The authors gratefully acknowledge the help of the numerous residents and nurses who assisted in the study implementation as well as with data collection. The authors also thank Allison Dwileski, BSc, for proofreading the manuscript.
Complications and mortality in older surgical patients in Australia and New Zealand (the REASON study): A multicentre, prospective, observational study.
Estimating patients' risk for postoperative delirium from preoperative routine data - Trial design of the PRe-Operative prediction of postoperative DElirium by appropriate SCreening (PROPDESC) study - A monocentre prospective observational trial.
The confusion assessment method for the intensive care unit (CAM-ICU) and intensive care delirium screening checklist (ICDSC) for the diagnosis of delirium: A systematic review and meta-analysis of clinical studies.
Clinical practice guidelines for the prevention and management of pain, agitation/sedation, delirium, immobility, and sleep disruption in adult patients in the ICU.
This work was supported by internal sources of the Clinic for Anesthesia, Intermediate Care, Prehospital Emergency Medicine and Pain Therapy, University Hospital Basel, Basel, Switzerland.