Qualifying a Novel Clinical Trial Endpoint (iBOX) Predictive of Long-Term Kidney Transplant Outcomes

New immunosuppressive therapies that improve long-term graft survival are needed in kidney transplant. Critical Path Institute’s Transplant Therapeutics Consortium received a qualification opinion for the iBOX Scoring System as a novel secondary efficacy endpoint for kidney transplant clinical trials through European Medicines Agency’s qualification of novel methodologies for drug development. This is the first qualified endpoint for any transplant indication and is now available for use in kidney transplant clinical trials. Although the current efficacy failure endpoint has typically shown the noninferiority of therapeutic regimens, the iBOX Scoring System can be used to demonstrate the superiority of a new immunosuppressive therapy compared to the standard of care from 6 months to 24 months posttransplant in pivotal or exploratory drug therapeutic studies.

A summary detailing the datasets in the TTC's Kidney Transplant Database with associated information describing why datasets were included/excluded in the EMA qualification submission is included below.
To acquire the subject-level data necessary to develop a novel surrogate endpoint, the TTC led an extensive global data collaboration effort across the field of kidney transplantation.To date, the TTC has acquired eleven clinical trial datasets and twenty observational datasets from clinical transplant centers, representing data from over 20,000 kidney transplant recipients in the TTC Kidney Transplant Database (Figure 2 in the main manuscript).
Datasets from relevant clinical trials of ISTs, including those in the Loupy et al. 2019 publication, and real-world data from international clinical transplant centers were prioritized for acquisition.From these 31 datasets, five contained all necessary variables collected at one-year post-transplant (i.e., eGFR, proteinuria, kidney allograft biopsy histopathology, and DSA), long-term death and graft loss follow-up of at least five years, immunosuppressive regimen information (i.e., induction and maintenance IST) to test the performance of the surrogate with all three MOA, and the documentation required to support the description of the analytical considerations for each dataset.
Datasets missing the necessary variables at one-year post-transplant or a variable necessary to calculate the model variable (as in recipient age to calculate an eGFR value) were excluded.For example, in the data for the three Novartis studies (TRANSFORM, US-92, and ELEVATE), recipient age was missing due to Novartis' anonymization procedures for data sharing.This, in turn, prohibited calculating eGFR values for the subjects in these studies.Moreover, US-92 and ELEVATE were missing DSA and proteinuria data, and follow-up was limited to one and two years, respectively.
Five datasets had the requisite subject-level data to conduct the internal and external validation analyses in this Briefing Dossier for a Qualification Opinion submission.These datasets were acquired from clinical transplant centers (i.e., Loupy et al., 2019  The property mentioned above implies that  = 0 if the iBox Scoring System model exactly predicts the number of events.Therefore,  � represents calibration-in-the-large, the degree to which the expected number of events predicted by the iBox Scoring System for the dataset subjects match the expected number of events predicted by the Poisson model (the latter of which is estimated using the actual number of observed events in the external dataset).Statistical significance is evaluated using the SE on this intercept term.

and Helsinki iBOX in Loupy et al., 2019 iBox in Qualification Opinion Core components of model
derivation, Mayo Clinic Rochester,

Table 2 .
Number of subjects with six months and two-year post-transplant iBOX assessments in the validation datasets.

Table 6 .
Poisson calibration for the full and abbreviated iBOX at six months and two years post-transplant in the validation datasets.p-value<0.05 would indicate a significant difference between the expected number of graft loss events as predicted by the iBOX versus the actual number of graft loss events.Supplemental Table7.Calibration for iBOX with only eGFR and proteinuria.value <0.05 would indicate a significant difference between the expected number of graft loss events as predicted by the iBOX versus the actual number of graft loss events.value <0.05 would indicate a significant difference between the expected number of graft loss events as predicted by the iBOX versus the actual number of graft loss events.with a functioning graft) in the validation datasets. A

SE) for full iBOX at 1-year using death-censored graft survival c-statistic (SE) for full iBOX at 1- year using all-cause graft loss
A cumulative hazard function (), which can be calculated by integration from a hazard function ℎ(), can be interpreted as the expected number of events experienced by time .The calibration method described byCrowson et al. (2016)takes advantage of this property to assess the accuracy of the iBox Scoring System models for the external dataset using the following Poisson regression model:log([  ]) =  + � �  (  ,   )�,where   is the number of events experienced by the  ℎ subject of the dataset (in our case, 0 if the subject was censored and 1 if the subject experienced an event) during the observation period (from time 0 to   ), [  ] is the expected number of events if this Poisson model is true,  is the model intercept, and  �  (  ,   ) is the cumulative hazard at time of event or censoring   as predicted by the full and abbreviated iBox Scoring System for subject  as a function of its iBox score   .Here � �  (  ,   )� is used as an offset (a term where the coefficient is fixed to one) in the Poisson regression model.