Performance of Scores Predicting Adverse Outcomes in Procurement Kidney Biopsies From Deceased Donors With Organs of Lower-Than-Average Quality

Several scores have been devised for providing a prognosis of outcomes after kidney transplantation. This study is a comprehensive test of these scores in a cohort of deceased donors with kidneys of lower-than-average quality and procurement biopsies. In total, 15 scores were tested on a retrospective cohort consisting of 221 donors, 223 procurement biopsies, and 223 recipient records for performance on delayed graft function, graft function, or death-censored graft loss. The best-performing score for DGF was the purely clinical Chapal score (AUC 0.709), followed by the Irish score (AUC 0.684); for graft function, the Nyberg score; and for transplant loss, the Snoeijs score (AUC 0.630) and the Leuven scores (AUCs 0.637 and 0.620). The only score with an acceptable performance was the Chapal score. Its disadvantage is that knowledge of the cold ischemia time is required, which is not known at allocation. None of the other scores performed acceptably. The scores fared better in discarded kidneys than in transplanted kidneys. Our study shows an unmet need for practical prognostic scores useful at the time of a decision about discarding or accepting deceased donor kidneys of lower-than-average quality in the Eurotransplant consortium.


INTRODUCTION
For most patients with end-stage kidney disease, kidney transplantation is the best available treatment with better survival, quality of life and lower use of healthcare resources [1][2][3].Despite the increasing use of living donation [4,5], most patients on dialysis still have to wait on a deceased donor kidney transplant (DDK).Today, transplant physicians are facing the dilemma of how to best use the scarce pool of increasingly older DDKs while avoiding the risk of a poor outcome for the recipients which can be associated with delayed graft function (DGF), premature transplant loss or even endanger their lives [1,3].
Several purely clinical [6][7][8][9][10][11][12][13], combined clinicohistological [14][15][16], or purely histological scores [17][18][19][20] have been devised for quality assessment of DDKs; the Nyberg score, is for practical purposes best considered clinical, as it does not requires histopathology [9].The scores with a histology component have been developed on preimplantation but not the clinically decisive procurement biopsies from unselected cohorts, reflecting the full spectrum of DDK quality, including those with the lowest risk.Some of these scores have been internally [9,14,16] or externally validated in the publications of subsequent scores from other authors or in separate studies.A recent publication has tested four scores [6][7][8]12] for their performance in the prognostication of DGF in a large Dutch cohort of unselected preimplantation biopsies [21].An earlier study from the United Kingdom evaluated the performance of four scores [9,11,22,23] regarding mid-term transplant function [24], two of which have been updated since [7,9].A recent study from the United States (US) validated three scores [9,25,26] on a single-centre cohort of donors with kidneys of lower quality for the prognostic performance regarding two-year-transplant survival [27].Similarly, in another study [28], four scores, including that proposed by Banff [16,19,25,29] failed to predict graft survival and early graft function.The scores and their validation studies have helped to better understand and address the causes of DGF and premature transplant failure.However, these scores have never been validated regarding their usefulness for the decision about acceptance or discard of a DDK on a set of procurement biopsies, taken to assess organ quality before allocation.This is particularly important in view of recent data showing that procurement biopsies lead to discard of organs suitable of transplantation [30].
Primary aim of this study is to conduct the overdue comprehensive test of a variety of scores (listed in Table 1) for their performance on various end points, such as delayed graft function, graft function, or death-censored graft loss on a retrospective cohort of procurement biopsies specifically commissioned for DDK quality assessment by the Deutsche Stiftung Organtransplantation (DSO; German Foundation for Organ Transplantation), operating within the Eurotransplant consortium.As a secondary aim, we examined whether purely clinical scores perform as well as scores including a histopathology component.Lastly, we wanted to test their performance on the considerable proportion of the discarded kidneys in our cohort.

Biopsies, Reporting, Donor, and Recipient Data
We extracted data from the "DSO Region Nord" and from the German transplant centers of kidneys allocated, between 1 January 2003, and 31 March 2012.The collection of recipient follow-up data was completed in December 2015.Data were analyzed between 1 January 2018, and 31 May 2020.Only adult recipients of deceased donor kidneys of lower quality were included.Recipients with dual kidney-and combined kidney transplantation were excluded.Our cohort consisted exclusively of brain death donors since donation after cardiac death is not allowed in Germany.
The allocation was under the auspices of Eurotransplant, an international non-profit organization responsible for the coordination and distribution of organs for transplantation between residents of eight European countries. 1 The following donor data were collected: age, sex, weight, height, body mass index (BMI), length of hospital stay, cardiopulmonary The score designation and the reference are given in the first column; the type of score as in purely clinical (C), combined clinical and pathological (C + P) or solely pathological (P) is given in the second column.Subsequent columns list the parameters used in the respective scores.The parameters are organized as relating to the donor, to the transplant procedure, to the transplant itself or to the recipient.Note that although renal artery plaque as used in the Nyberg score is a pathological finding, it is not typically assessed by a pathologist (pathological and clinic-pathological scores are in italics; the numbers correspond to the references in the manuscript).The biopsies were evaluated at the Institute of Pathology in parallel to the transport of the DDK and the preparation for transplantation.Procurement biopsies were not performed in all kidneys but only in that deemed to be of lower quality to increase their chance of acceptance.The results were reported after rapid paraffinembedding on multiple hematoxylin-eosin and periodic-acid-Schiffstained sections within 4 h.The DSO oversaw DDK management after notification.The decision about use or discard of the DDK was then made by the transplant physician in the receiving centre.The first assessment was done by the pathologist on duty and included information on representativeness of biopsy, number of glomeruli and arteries, percentage of tubular atrophy, and grading of acute tubular injury.The recommendation was usually suitable/not suitable or partially suitable.The histopathological scores reported below were provided in a second, blinded reading by an experienced nephropathologist.A flowchart of the study is given in Figure 1.

Definitions
The definition of lower organ quality depended not on strict criteria but was based on clinical judgment considering the macroscopic appearance of the organ in combination with donor's clinical data.The macroscopic appraisal was done on the "back table," after removal of the perinephric fat and the clean dissection of the vessels from the surrounding tissues.It included organ quality as well as perfusion quality, both of which were rated as good, medium, or poor; likewise, atherosclerosis was characterized as no, mild, or severe.The decision was usually felt after discussion of each case between the senior surgeon of the harvesting team and the physician of the recipient's center.Senior surgeons were accredited by the DSO and had many years of experience in the transplant field.
Extended criteria donors (ECD) were classified as previously reported [38].eGFR was calculated by means of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation.Admission, highest, lowest, and terminal eGFR were respectively estimated by using the first, the lowest, the highest and the last serum creatinine prior to organ recovery [39].Primary nonfunction (PNF) was defined as the permanent lack of graft function from the time of transplantation [40] and delayed graft function (DGF) as the need for dialysis in the first week [41].

Scores
An overview of the parameters included in the respective scores is given in Table 1.Kidney Donor Profile and Risk Index (KDP, KDRI) were calculated according to the Organ Procurement and Transplantation Network (OPTN) 2 and estimated post transplant survival (EPTS) score by the web calculator provided by OPTN (EPTS calculator-OPTN).  2 https://optn.transplant.hrsa.gov/media/1512/guide_to_calculating_interpreting_kdpi.pdf 3 hrsa.gov

Outcome Measures
The following outcomes were analyzed: PNF, DGF, graft function at 3 months, one-and 3 years, death censored graft failure and patient death at one, three and 5 years.All survival times were censored at the last date a patient was known to be alive.eGFR results were presented as 10 mL/min per 1.73 m 2 for ease of interpretation.

Statistics
Continuous variables were described as mean ± standard deviation (SD) and central trends between groups compared by Mann-Whitney-U-tests. Fisher's exact-and χ 2 -tests were used to compare distributions of categorical variables, respectively.To estimate how well a risk-score discriminates the different endpoints, the area underneath the receiver operating characteristics curve (AUC) was calculated.AUCs range from 0% to 100%, with 0% suggesting perfect inaccuracy, 100% perfect accuracy, 50% suggesting no discrimination and 50%-70% suggesting poor discrimination, 70%-80% suggesting acceptable and 80%-90% excellent and finally 90% suggesting outstanding performance [42,43].A p-value below 0.05 was considered significant in all comparisons in two-sided tests; however, in this retrospective observational study, p-values can only be considered descriptive.Statistical analysis was performed with the use of SPSS software, v24 (IBM Corp, Armonk, NY, United States) and IBM SPSS Statistics Essentials for R.

Ethical Permission
All organ transplants were performed according the Declaration of Istanbul [44]; no transplants from prisoners were used.The study was conducted in accordance with the Declarations of Helsinki and approved by the local ethical review board of Hannover Medical School (No. 1519-2012).

Donors' and Recipients' Characteristics
From 442 kidneys recovered from 221 donors, 149 were discarded.In 287 (98%) of the 293 transplanted kidneys the tissue blocks were found.Follow-up data were available from 223 recipients (Figure 1).The KDRI was 1.48 and 107 (63.3%) were ECD.The average age was 61 years and 54% were males.Only 13% of donors had diabetes and 30% cardiovascular disease.The prevalence of hepatitis B and C was low (6.5% and 1.2%).Cerebrovascular accident was the most common cause of brain death (60%).The serum creatinine at recovery was 149 μmol/L.Approximately 50% of donors experienced acute kidney injury (AKI) (Table 2).The accepted kidneys showed macroscopically a good perfusion and organ quality at all, except for atherosclerosis which was severe in 46.5% of them.Biopsies were performed in 80% and the majority were needle biopsies with a representative number of glomeruli and arteries.Mean and minimal (<5%) global glomerulosclerosis were 10.4% and 50% respectively, whereas the majority of acute and chronic tubular, interstitial, and vascular Banff lesion scores were of low grade.On the contrary, acute tubular injury was, as expected, more severe (Table 3).The average age of recipients

Donor and Organ Related Differences Between Discards and Transplantations
149 of the 442 available kidneys were discarded (33%).45 were recovered from donors whose contralateral kidney was transplanted and 104 from donors whose both kidneys were discarded (Figure 2).Except for the higher prevalence of hepatitis C and the longer duration of brain death, there were no differences in the baseline characteristics between donors of transplanted and discarded kidneys (Table 6).
The following categories of reasons for discard were recorded: 1) Macroscopic organ damage, such as renal capsule fissure, cortical hemorrhage, large infarcts, large renal cysts, heavy aortic patch and/or renal artery atherosclerosis and mottled appearance after reperfusion.2) findings of procurement biopsies.3) concerns about a transmissible donor infection, 4) extrarenal malignancy known or detected during procurement or tumor of the contralateral kidney; 5) denial of the transplant center to finally accept the offer 6) non transplantability of the recipient.
47 kidneys were discarded due to macroscopic findings, 43 due to the results of biopsy and 27 due to one of the reasons belonging to categories 3 to 6. Unfortunately, for nearly every fifth discarded kidney (32/149, 21.5%) the exact reason remained unknown.

Score Performance in Transplanted Kidneys
The performance of the scores is shown in Table 8.Depending on missing data, up to 103 (46%) out of the 223 DDKs had to be excluded for the analysis of the endpoints.Chapal and Irish had the best predictability for DGF with an AUC of 0.709 and 0.684, respectively, whereas Jeldres had an AUC of 0.503, Balaz of 0.506/0.490,and Schold of 0.451.For the prognostication of graft survival, the best-performing scores were of Rao and Port for 1 year with a significant AUC of 0.699 and 0.662, followed by de Vusser for 3 years, Snoeijs and de Vusser for 5 years with respective AUCs of 0.637, 0.630 and 0.620.Regarding graft function the trend was similar.Here, Navaro was acceptable, whereas the performance of Anglicheau poor (AUC 0.649) and the significance of Ortiz marginal (Kendall's tau 1 year 0.157, p = 0.026).The predictive power of the EPTS score was poor (AUC 0.642).

DISCUSSION
Primary aim of this retrospective study was to test the performance of scores previously devised for quality assessment of a DDK of lower quality for their value in supporting the decision about discard or acceptance.The rather dismal clinical outcome in our cohort with 48.9% and 15.8% of recipients respectively developing DGF or losing their graft within the first year shows that it was indeed a formidable real-life challenge for the scores.
For DGF we found an acceptable discrimination with an AUC of 0.709 for the Chapal score.The Irish score could have even performed better if we would have been able to provide the missing recipient parameter of "previous blood transfusion."Moreover, the applicability of the purely clinical and thus economical Irish score is Kidney function at 3 months (creatinine), µmol/L 188.3 ± 77.9 (n missing = 1) Kidney function at 3 months (eGFR), mL/min/1.73m 2  34.6 ± 14.7 (n missing = 1) Kidney function at 1 year (creatinine), µmol/L 166.9 ± 52.9 (n missing = 1) Kidney function at 1 year (eGFR), mL/min/1.73m 2  37.  Predicting Allocation of Marginal Kidneys limited because it requires the cold and warm ischemia time, both unknow at the time of allocation.Conversely, the Chapal score required donor-and recipient parameters, which, except for the cold ischemia time, are easily to obtain.The score of Chapal showed a lower AUC than that reported in the initial publication [6].This may be explained by the higher incidence of DGF in our cohort (48.9% vs.

25.4% reported by Chapal).
Similarly poor results were seen for the Anglicheau and Ortiz scores to predict graft function.Their poor performance may be explained by the higher age of our recipients, compared with those in the cohorts of Anglicheau and Ortiz (61.0 vs. 50.6 vs. 48 years), as well as the higher ratio of our donors with hypertension (56.8% vs. 30.8%)and their higher creatinine levels before organ removal (149 vs. 101 μmol/L) compared with those in the cohort of Anglicheau.However, the better performing score of Nyberg, requires cold ischemia time, a parameter not known at the time of allocation.
None of the scores for graft survival reached an acceptable performance.The pathological scores of Navarro and Snoejjs and the clinicopathological of de Vusser outperformed the solely clinical Rao and Port's scoring systems.This suggests that there are aspects of donor organ quality that cannot be reliably determined from clinical data alone.Inclusion of pathologic data could allow for better assessment of overall organ quality, particularly in kidneys of lower-than-average quality and explain the better performance of the scores with histopathology.Still, this was not sufficient to push AUC into the acceptable range.The score of Navarro [17] has been adopted by the Spanish Society of Nephrology [46].Here, kidneys with a score <8 are proposed for single transplantation.The very poor results obtained by Navarro et al in their study transplanting kidneys with a score 6-7 were not confirmed later by others [47].In summary, the majority of the scores are not suitable for procurement biopsies because they include information, which is not available during procurement.Beyond that, the scores were developed after examination of paraffin embedded renal tissue, a procedure that is time consuming and not practical in the limited time setting of allocation.The only exception is the Remuzzi score, which was based on frozen sections.However, in our experience frozen sections are often difficult to evaluate due to inappropriate handling during transport [31].
Procurement may also lead to needless discards if the histopathologic evaluation is conducted by general pathologists and not by nephropathologists.The failure of pretransplant biopsies to predict graft outcomes was highlighted in an older metaanalysis of 47 studies testing 15 scores [48].In a recent paper, more than half of kidneys discarded in US would have been suitable for transplant in France, where procurement biopsies are rarely performed [49].Furthermore, their usefulness has been questioned due to low reproducibility and poor predictive power [50], albeit there are centers proposing punch-instead of wedge or needle-biopsies as a means to improve standardization, sample adequacy and reproducibility [51].At all, scores based on preimplantation biopsies can be implemented to predict graft function but their applicability to decide on transplantation or discard has probably been overestimated [52].
Strengths of our study were the comprehensive evaluation exclusively of procurement biopsies by an experienced nephropathologist according to the most recent Banff criteria [29] and the validation of the most known scores for the endpoints for which they have been developed.
Limitations should also be recognized.First, the definition of DGF as need for dialysis within the first week after transplantation, an endpoint that may be influenced by various clinical factors (such as heart failure, hyperkalemia, etc.) is not uniformly accepted.Furthermore, we excluded PNF, because it has a different pathogenesis [40] and was not tested as outcome parameter in the scores.The extraordinarily high incidence of PNF and DGF was probably due to bias by indication; our cohort was highly selective since biopsies were performed only in those donors whose organs were supposed to be of lower quality.Another reason was the higher incidence of donors with AKI an acknowledged risk factor for both outcomes [53].Second, the scores have been constructed on preimplantation biopsies, which are in terms of prognostication completely different from procurement biopsies due to the accrued damage during cold preservation and transport as well as the reperfusion injury after implantation.Third, the number of missing data implies that each score was tested on  not have changed considerably in the decade before and after 2010. 4ifth, the indications for procurement biopsies relied not on objective criteria since they were performed on case-by-case basis and not according to a standardized protocol.For example, the macroscopic assessment of the recovered organs was quite subjective.However, it can be of value if performed in a more structured way by experienced surgeons [54].Finally, an inherent, unavoidable drawback of all similar studies is the unknown performance of the certainly non-randomly discarded DDKs.Despite all these limitations, this is the only study examining the performance of these scores on the dataset for which they are most usefully from a clinical point of view: procurement biopsies for the decision of DDK transplantation or discard.We found that, that none of the tested scores should allow a confident, evidence-based decision about acceptance or discard of a DDK based on prognosis of the different endpoints within the ET context.Probably, clinical parameters not included in that scores, such as donor's AKI or donor's creatinine metrics are more important for short term outcomes [53,55].
Here, some conclusions can be drawn: First, organs from donors with AKI should not be accepted for recipients at high risk for DGF or these recipients may be preferentially treated with an immunosuppression protocol based on belatacept [56].Second, the recipient should return timely to dialysis to avoid losing it waitlist points if an early graft failure is expected.Finally, we must always keep in mind that especially for the elderly patients, rejection of organs leads in the end to an increase in mortality due to the longer waiting list time [57].
Regarding the second aim, we could indeed show that for the endpoint death censored graft survival histological [17,20], or clinicopathological [16] scores performed marginally better than purely clinical ones.But even if the AUCs were slightly better their overall performance was moderate to poor.While for some DDKs donor and recipient parameters might be entirely sufficient for a prognosis, for some donor/recipient matches histopathology might add valuable information.We are currently investigating such an approach with a facultative histopathology component including only reproducible parameters independently from each other associated with prognosis.
As to the testing of the scores in the discarded kidneys, we found that scores with a histological component were better than the solely clinical.However, an inherent bias cannot be excluded since the histologic evaluation of an offered organ is often the principal reason of its discard.Here, we can only postulate that histological assessment is warranted in kidneys supposed to be unsuitable for transplantation.Probably, the most important finding was that many of the discarded kidneys could have been successfully transplanted.

CONCLUSION
Procurement biopsies are often used during allocation to increase the possibility of acceptance of kidneys of lower quality.However, the available prognostic scores perform at best only moderately.Though none of the scores could reach an acceptable discriminatory power, those based on histopathologic criteria performed slightly better than the more practical solely clinical ones.Our findings are based on data from the Eurotransplant region but can also be applied to other Multinational or National Transplant Organizations or -even morebe valuable for individual decisions in transplant centers.

FIGURE 2 |
FIGURE 2 | Flow chard of the handling of discarded organs.

TABLE 1 |
Parameters used in the previously published scores for the quality assessment of DDKs.

TABLE 2 |
[45]graphic data and ICU monitoring parameters of the donors.They showed a low immunologic risk profile, a cold ischemia time (13.8 h) which was at the lower range of that reported for Eurotransplant[45]and a high EPTS score (Table4).PNF occurred in 26 (11.7%) and DGF in 109 (48.9%) patients.We observed 49 graft losses during a median follow-up of 43.8 months (IQR 19-68 months).Patient and death-censored graft survival at 1, 3, and 5 years after kidney transplantation were respectively 90.6% and 91.1%, and 86.1% and 82.9% and 83% and 81.6% (Table5).
(Continued on following page) Transplant International | Published by Frontiers October 2023 | Volume 36 | Article 11399 was 61 years.

TABLE 2 |
(Continued) Demographic data and ICU monitoring parameters of the donors.
Continuous parameters are given as mean ± standard deviation, numerical and ordinal parameters as count and percentage.Abbreviations: DDK, deceased donor kidney; FSGS, focal and segmental glomerulosclerosis; RPS, renal pathology society.a Of note, macroscopic parameters listed in this table were determined by the harvesting surgeon, and not by a pathologist while the histopathological parameters were determined retrospectively by an experienced nephropathologist.Transplant International | Published by Frontiers October 2023 | Volume 36 | Article 11399 6

TABLE 4 |
Clinical parameters of recipients.Continuous parameters are given as mean ± standard deviation, numerical and ordinal parameters as count and percentage.

TABLE 5 |
Outcome data of recipients.

TABLE 6 |
Comparison of baseline characteristics between donors with transplanted and discarded kidneys.

TABLE 7 |
Comparison of macroscopic and histological characteristics between transplanted and discarded kidneys., since the data required for the calculation of all scores, are not routinely collected in the ET database nor at the DSO or the transplant centers.A registry with data of all sources (DSO, ET, transplant centers) is not available.Fourth, the test cohort dates back approximately 10 years.However, most of the evidence base of kidney transplantation relies on data collected before 2010 and the follow-up period of our study should

TABLE 8 |
Previously published scores for the quality assessment of DDKs tested in this study including the endpoints they were designed for and their performance in the original publication.

TABLE 9 |
Performance of the investigated scores for the prediction of discards vs. transplantation.CIV, chronic interstitial and vascular score according to the Banff classification; composite CIV Score: CIV score considering also clinical parameters (donor age >51 years, anoxic donor brain injury).A, B and C refer to the first (bilateral discard), second (unilateral discard) and third (bilateral transplantation) column of the table.Pathological and combined clinical and pathological scores are in italics, the numbers correspond to the references of the manuscript.Bold values represent statistically significant parameters.