SYSTEMATIC REVIEW AND META-ANALYSIS

Transpl. Int., 02 October 2025

Volume 38 - 2025 | https://doi.org/10.3389/ti.2025.14497

Quality of Measurement Properties in Patient Reported Outcomes Used in Adult Liver Transplant Candidates and Recipients: a Systematic Review

  • 1. Department of Surgery, Amsterdam University Medical Centers, Amsterdam, Netherlands

  • 2. The Liver Unit, Queen Elizabeth Hospital Birmingham, Birmingham, United Kingdom

  • 3. Centre for Liver and Gastrointestinal Research, Institute of Immunology and Immunotherapy, University of Birmingham, Birmingham, United Kingdom

  • 4. Department of Surgery, Hôpital Universitaire de Bruxelles, Bruxelles, Belgium

  • 5. Department of Surgery, Section of HPB and Liver Transplantation, University Medical Center Groningen, Groningen, Netherlands

Article metrics

4,4k

Views

391

Downloads

Abstract

Objective:

Patient Reported Outcome Measures (PROMs) are increasingly recognized in liver transplant (LT)-patients, yet recent evaluations of their quality are lacking. This systematic review gives a comprehensive overview of available PROMs in adults awaiting or undergoing LT and their measurement properties.

Method:

A systematic search in MEDLINE, EMBASE, PubMed, and COCHRANE (01/2010–08/2023) included studies involving adult LT-candidates and/or recipients utilizing PROMs with original evaluations of measurement properties. The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) was used to ascertain the quality of measurement properties.

Results:

In total, 23 studies encompassing 35 PROMs were identified, including nine disease-specific and 26 generic PROMs. The (Short-form) Liver Disease Quality of Life ((SF-)LDQoL), Transplant Effects Questionnaire (TxEQ) and Post-Liver Transplant Quality of Life (pLTQ) were the most utilized disease-specific PROMs. Most studies demonstrated low-quality evidence for measurement properties. pLTQ demonstrated high-quality evidence for internal consistency, reliability, and responsiveness; the generic Hospital Anxiety and Depression Scale (HADS) showed strong evidence for internal consistency and construct validity.

Conclusion:

Measurement properties in LT-patients remains of low-quality. pLTQ stands out for its superior methodological quality among disease-specific PROMs. For future studies, there is a strong recommendation to focus more on patients’ subjective measures and their measurement properties.

Introduction

The field of liver transplantation (LT) is rapidly evolving. Over the last 10 years, more than 8,000 liver transplants have been performed in the United Kingdom with excellent long-term outcomes. In the United Kingdom, elective transplant procedures exhibit respective one- and 5-year survival rates of 94% and 81%, while urgent transplant cases demonstrate corresponding survival rates of 90% and 81% over the same time periods [1].

With increasing numbers and improving survival rates, there is a growing population of long-term survivors following LT. This results in a shift of focus towards subjective patient outcomes, including quality of life (QoL), anxiety and depressive symptoms. Survival is easily quantifiable; patients’ subjective outcomes however are not. The last 20 years have seen the advent of a multitude of generic and disease-specific tools for measuring these patient-reported outcome measures (PROMs). Despite the increased recognition of the importance of PROMs and the growing number of tools, a standardized methodology for their application among patients undergoing LT has yet to be established.

The use of PROMs in the LT population is an invaluable tool to target improvements in clinical care, develop benchmarking standards and assess hospital performance [2]. Given the breadth of available tools (both generic and specific), it is difficult to select one that is most likely to deliver meaningful results and effect the most benefit in this cohort. Ultimately, the integration of a PROM into routine care of LT patients requires careful consideration at an early stage. Two systematic reviews by Jay et al. and Cleemput et al. reported on QoL instruments used in the LT population [3, 4]. However, both articles are over 10 years old and there have been significant methodological improvements since. Considering the above, a full, up to date systematic review is required. The aim of this systematic review is to provide a comprehensive overview of PROMs currently available for use in adults undergoing LT and their measurement properties.

Methods

Design

An initial scoping search was undertaken to identify relevant studies on this topic. This systematic review was conducted and written in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines and report in PROSPERO (PROSPERO registration number: CRD42021251533) [5].

Search

A systematic search was conducted of MEDLINE, EMBASE, PubMed and COCHRANE to identify all studies including patients undergoing LT from January 2010 until August 2023. To report the screening process, the PRISMA flow diagram was used. Studies were included if they used a PROM to measure subjective insight of LT candidates and/or LT recipients, inclusive of QoL, anxiety, depressive symptoms, pain, mobility and liver failure symptoms. Included studies had to report either the development or evaluation of one or more measurement properties of their chosen PROM. Studies with non-original evaluations of the measurement properties were excluded. In vitro studies, studies only covering patients under 16 years of age or those reporting on living donors were excluded. Systematic literature reviews were excluded but were used to cross check included studies and identify additional references. Additionally, the reference list of included studies was reviewed to identify additional eligible studies. The complete search strategy is described in Supplementary Table S1.

Screening Process

EndNote X7 (Clarivate Analytics, Pennsylvania, US) was used to collate the search results and exports of all citations were sent to the review software Rayyan (Qatar Computing Research Institute, Doha, Qatar) where duplicates were removed. After duplicate removal, four independent reviewers (SvK, SP, KJ, VW) screened by title and abstract and then by full text review. Abstracts that did not report enough information for an inclusion/exclusion decision underwent full text review. Disputes were resolved by the senior author (HH).

Data Extraction

Data extraction elements were defined in advance and included: study population, demographics (age, sex, pre-/post-LT), the PROM tools (title, scoring system, number of items, domains) and measurement properties of the PROM. Some studies described measurement properties with different definitions. The COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) was used to ascertain which measurement properties were evaluated by the studies [6].

Quality Assessment of Included Studies and Measurement Properties

Two authors (SvK, VW) first independently assessed the methodological quality of different domains of the studies using the COSMIN Risk of Bias checklist [7]. This employs a four-point rating system (“very good,” “adequate,” “doubtful” or “inadequate”) and the overall quality rating of each study is based on “the worst score counts” principle, i.e., the lowest rating of any standard. Table 1 presents information on the domains used to evaluate the risk of bias and quality of the measurement properties for each PROM.

TABLE 1

DomainDescription
Reliability
 Internal consistencyThe degree of the interrelatedness among the items of the PROM, as long as the items together form a unidimensional scale. Most of the times, the Cronbach’s alpha is measured. If the Cronbach’s alpha is >0.70, the internal consistency can be deemed “sufficient”
 ReliabilityThe proportion of the total variance in the measurements which is due to “true” differences between patients. There must be evidence that the patients are stable at the time of the PROM assessment. If the intra class correlation coefficient is > 0.70, the reliability is deemed “sufficient”
 Measurement errorThe systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured. The smallest detectable change should be smaller than the minimal important change, to deem the measurement property “sufficient”
Validity
 Content validityThe degree to which the content of a PROM is an adequate reflection of the construct to be
Measured. Content validity is considered the most important measurement property, because the items of the used PROM should be relevant, comprehensive and comprehensible for the patient population in which the PROM is used
 Contruct validityThe degree to which the scores of a PROM are consistent with hypotheses based on the assumption that the PROM validly measures the construct to be measured
Construct validity is divided into structural validity, hypotheses testing and cross-cultural validity
Structural validity refers to the degree to which the scores of a PROM are an adequate reflection of the dimensionality of the construct to be measured and is usually assessed by factor analysis Hypotheses testing for construct validity refers to the degree to which the scores of a PROM are consistent with hypotheses

Cross‐cultural validity refers to the degree to which the performance of the items on a translated or culturally adapted instrument are an adequate reflection of the
performance of the items of the original version of the instrument. Therefore, this measurement has to be assessed by at least two different groups
 Criterion validityThe degree to which the scores of a PROM are an adequate reflection of a ‘gold standard’, deemed ‘sufficient’ if the correlation with this gold standard is ≥0.70 or has an Area Under the Curve of ≥0.70
ResponsivenessThe ability of a PROM to detect change over time in the construct to be measured. The results should be in accordance with the hypotheses or have an Area Under the Curve of ≥0.70
InterpretabilityInterpretability is the degree to which one can assign qualitative meaning ‐ that is, clinical or commonly understood connotations – to a PROM’s quantitative scores or change in scores

Description of the domains used to evaluate the risk of bias and quality of the measurement properties for each PROM.

Data Synthesis

Subsequently, the quality of the measurement properties was assessed by the updated criteria for good measurement properties (based on Terwee et al, and Prinsen et al) as outlined by the COSMIN guideline for systematic reviews [6, 7].

Measurement properties were assessed using the following principles: content validity, structural validity, internal consistency, cross‐cultural validity, reliability, measurement error, criterion validity, hypothesis testing for construct validity and responsiveness. The quality of the measurement properties were scored using a four-point rating system (“+”= sufficient, “?” = indeterminate, “−“ = insufficients “±” = inconsistent). When the measurement properties of a PROM were not reported in any of the included articles, no score was assigned.

The criteria for good measurement properties were then applied to the results per measurement property per PROM, and the quality of the evidence (using the GRADE approach) was analyzed.

Results

The search strategy retrieved a total of 2,362 titles/abstracts. After 260 duplicates were removed, 2,102 abstracts were screened, and 210 full-text articles were retrieved for further review. Following reference list and citation searching, two more articles were retrieved. After further review, a total number of 23 studies were included (Figure 1).

FIGURE 1

In total, 35 PROMs were used, with a minimum of one, and a maximum of six PROMs per study. PROMs could be divided in two categories: generic and disease-specific PROMs, and PROMs used for pre- and post-LT populations. Seven PROMs were disease-specific for liver disease and/or LT. Additionally, two PROMs addressed osteoporotic symptoms [Quality of Life Questionnaire in Osteoporosis (QUALIOST)] and emotional responses of organ transplant recipients [the Transplant Effects Quesstionnaire (TxEQ)], and were also categorized as disease-specific PROMs. 25 PROMs used in the studies were generic. One PROM was categorized under “utility measures,” providing utilities or values regarding health, that can be used for cost-utility analyses or interventions [8].

A total of eleven PROMs were applied to the pre-LT population, while thirteen were used for post-LT population. Additionally, eleven PROMs were used for both the pre- and post-LT population. Detailed study characteristics are described in Table 2, and a brief description of the PROMs evaluated is presented in Supplementary Table S2.

TABLE 2

PROMAuthorCountryPublication yearStudy populationGender (male (%))Age (mean (SD))Mode of administrationNumber of itemsResponse rate (%)Target populationPatient population (pre-/post LT)
Disease-specific PROMs
Short Form Liver Disease Quality of Life ((SF-)LDQOL)Kanwal F (SF-LDQOL) [9]USA200815654.853.9 (11)Questionnaire36Pre
Gralnek I.M [10].USA200022164.352.211186.6Pre
Transplant Effects Questionnaire (TxEQ)Pérez-San-Gregorio, MÁ [11]Spain201824021Post
Annema, C [12].Netherlands201811665.550.8 (11.4)Questionnaire75.8Both
Post-Liver Transplant Quality of Life (pLTQ)Molski, C [13].Brazil201616056.9 (10.4)Questionnaire32Post
Saab, S [14].USA201119659.753.1 (12.6)3293.8Post
Self-made questionnaireParsa Yekta, Z [15].Iran201325063.337.5 (12)Questionnaire administered by hospital receptionist40Post
Self-made questionnaireLasker, J. N. (social QoL) [16]USA2011100058.5Questionnaire via mail, online and interviewResponse to items ranged from 93% to 100%women with PBC on waiting list (WL) and post-transplant (PT)Both
Self-made questionnaireFranciosi, M. (ITaLi-Q) [17]Italy201117771.857.2Questionnaire, self-administered and anonymous37100% first questionairre, 49/177 the retestPatients requiring HBV prophylaxis after LTPost
Self-made questionnaireChen, X. (Post-LiverTransplant Symptom Experience Questionnaire) [18]China2021265 (reliability tested on 30 patients in pilot study)80Questionnaire4096.1Post
Self-management Questionnaire for LT recipientsXing L [19].China201512445Post
Quality of Life Questionnaire in Osteopororis (QUALIOST)Atamaz, F [20].Turkey201338 LT patients, 42 controls81.642 (11.6)24NDPost
Generic PROMs
Short-form 36 (SF-36)Fernandez, A. C [21].USA201612560.856.13696Pre
Miller-Matero, L. R [22].USA20148466.8SRD 53.96 (7.11) and HRD 55.87 (6.89)Semi-structured interview3666.7Both (prospective study)
Hospital Anxiety and Depression Score (HADS)Pelgur H [23].Turkey20096467Face-to-face interview, Questionnaire administered by researcher14NDpatients who had undergone liver transplantation at least 1 month prior and were attending clinic for follow-upPost
Miller-Matero, L. R [22].USA20148466.8SRD 53.96 (7.11) and HRD 55.87 (6.89)Semi-structured interview1466.7Both (prospective study)
Lin. X [24]China201728575.853.3 (10.2)Questionnaire1495Post
World Health Organisation – Five Wellbeing Index (WHO-5)Fernandez, A. C [21].USA201612560.856.1 (8.64)556Pre
Weber S [25].Germany20217964.658.2Questionnaire5NDPost
WHOQOL-BREFAnnema, C [12].Netherlands201811665.550.8 (11.4)Questionnaire2475.8Both
Molski, C [13].Brazil201616056.9 (10.4)QuestionnairePost
Post-Traumatic Growth Inventory (PTGI)Gangeri, L [26].Italy20182338461Questionnaire send to patients2176Post
Scrignaro M [30].Italy20161001559.882158Post
The Functional Assessment of Cancer Therapy - General (FACT-G)Gangeri, L [26].Italy20182338461Questionnaire send to patients2776Post
Connor Davidson resilience scale (CD-RISC)Fernandez, A. C [21].USA201612560.856.1 (8.64)2556Pre
Beck Depression Inventory (BDI)Fernandez, A. C [21].USA201612560.856.1 (8.64)2156Pre
Beck Anxiety Inventory (BAI)Fernandez, A. C [21].USA201612560.856.1 (8.64)2156Pre
Medical Outcomes Study Social Support Survey (SSS)Fernandez, A. C [21].USA201612560.856.1 (8.64)2056Pre
State-Trait Anxiety Inventory (STAI-6)Annema, C [12].Netherlands201811665.550.8 (11.4)Questionnaire675.8Both
Center of Epidemiological Studies Depression Scale (CES-D)Annema, C [12].Netherlands201811665.550.8 (11.4)Questionnaire2075.8Both
Pearlin-Scooler Mastery ScaleAnnema, C [12].Netherlands201811665.550.8 (11.4)Questionnaire775.8Both
Coping Inventory for Stressful Situations (CISS-SF)Annema, C [12].Netherlands201811665.550.8 (11.4)Questionnaire2175.8Both
Perceived Social Support Scale (PSSS)Lin, X [24].China201728575.853.3 (10.2)Questionnaire1495Post
General Comfort QuestionnaireDemir B [29].Turkey202114881.8%NDInterview28NDPost
Fatigue Symptom Inventory (FSI)Lin, X [24].China201728575.853.3 (10.2)Questionnaire1395Post
Patient Health Questionnaire depression scale (PHQ-9)Gronewold N [27].Germany202254463.151.95 (9.84)Questionnaire9NDPre
Generalized anxiety disorder screener (GAD-7)Gronewold N [27].Germany202254463.151.95 (9.84)Questionnaire7NDPre
Perceived social support questionnaireGronewold N [27].Germany202254463.151.95 (9.84)Questionnaire14NDPre
Sense of Coherence Scale by AntonovskyGronewold N [27].Germany202254463.151.95 (9.84)Questionnaire9NDPre
General Self-Efficacy Short ScaleGronewold N [27].Germany202254463.151.95 (9.84)Questionnaire3NDPre
German Body ImageGronewold N [27].Germany202254463.151.95 (9.84)Questionnaire20NDPre
Short Questionnaire to Assess Health-Enhancing Physicial Activity (SQUASH)Ushio M [28].Japan202317347.4NDQuestionnaire13NDPost
UCLA Loneliness ScaleWeber S [25].Germany20217964.658.2Questionnaire20NDPost
Utility Measure
EQ-5DRussell R.T [8].USA20092856453.35Both

Study and patient characteristics, categorized per Patient Reported Outcome Measurements (PROMs).

Abbreviation: ND = not described.

The risk of bias and methodological qualities of the PROMs used and described in the selected studies are described in Tables 3, 4, respectively. Overall, the evidence for the measurement properties was limited and the methodological quality was insufficient or inconsistent. None of the studies evaluated all measurement properties of the COSMIN system. Internal consistency was the most evaluated measurement property.

TABLE 3

PROMAuthorContent valicityStructural validityInternal valdity (Cronbach’s alpha)Cross-cultural validityReliabilityMeasurement error (test-retest)Criterion validityHypothesis testing for construct validityResponsiveness
Disease specific N = 9
(SF-)LDQOLKanwal F (SF-LDQOL)Inadequatevery goodadequatevery goodvery goodvery good
Gralnek I.M.Very goodInadequatevery goodinadequatevery good
TxEQPérez-San-Gregorio, MÁvery goodvery good
Annema, Cinadequatevery goodinadequate
pLTQMolski, Cvery goodvery goodvery goodvery good
Saab, Svery goodvery goodvery goodvery good
Self-made questionnaireParsa Yekta, Zvery goodInadequatevery goodadequate
Self-made questionnaireLasker, J. N. (social QoL)very goodinadequateinadequatedoubtful
Self-made questionnaireFranciosi, M. (ITaLi-Q)very gooddoubtfulvery goodvery goodvery good
Self-made questionnaireChen, X. (Post-LiverTransplant Symptom Experience Questionnaire)InadequateVery goodinadequate
Self-management Questionnaire for LT recipientsXing Lvery good
QUALIOSTAtamaz, FNAVery goodDoubtfuldoubtfulVery good
Generic N = 26
Short-form 36 (SF-36)Fernandez, A. CVery goodInadequateVery good
Miller-Matero, L. Rvery goodvery goodvery good
Hospital Anxiety and Depression Score (HADS)Pelgur Hvery good
Miller-Matero, L. Rvery goodvery goodvery good
Lin. XVery good
World Health Organisation – Five Wellbeing Index (WHO-5)Fernandez, A. CInadequate/Doubtful
Weber SDoubtful
WHOQOL-BREFAnnema, Cvery goodinadequate
Molski, C
Post-Traumatic Growth Inventory (PTGI)Gangeri, Lvery gooddoubtfulvery goodDoubtfulvery good
Scrignaro MVery goodInadequateInadequateVery good
The Functional Assessment of Cancer Therapy - General (FACT-G)Gangeri, Lvery gooddoubtfulvery goodDoubtfulvery good
Connor Davidson resilience scale (CD-RISC)Fernandez, A. Cinadequatevery goodadequatevery good
Beck Depression Inventory (BDI)Fernandez, A. CInadequate/Doubtful
Beck Anxiety Inventory (BAI)Fernandez, A. CInadequate/Doubtful
Medical Outcomes Study Social Support Survey (SSS)Fernandez, A. CInadequate/Doubtful
State-Trait Anxiety Inventory (STAI-6)Annema, CInadequate/Doubtful
Center of Epidemiological Studies Depression Scale (CES-D)Annema, CInadequate/Doubtful
Pearlin-Scooler Mastery ScaleAnnema, CInadequate/Doubtful
Coping Inventory for Stressful Situations (CISS-SF)Annema, Cvery good
Perceived Social Support Scale (PSSS)Lin. XVery good
General Comfort QuestionnaireDemir BDoubtfulInadequate
Fatigue Symptom Inventory (FSI)Lin. XVery good
Patient Health Questionnaire depression scale (PHQ-9)Gronewold NDoubtful
Generalized anxiety deisorder screener (GAD-7)Gronewold NDoubtful
Perceived social support questionnaireGronewold NDoubtful
Sense of coherence scale by AntonovskyGronewold NDoubtful
general self-efficacy short scaleGronewold NDoubtful
German body imageGronewold NVery good
Short Questionnaire to Assess Health-Enhancing Physicial Activity (SQUASH)Ushio MAdequateadequateVery good
UCLA loniless scaleWeber SDoubtful
Utility measures N = 1
EQ-5DRussell R.T.Doubtfulinadequatevery goodvery good

Risk of Bias using the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) Risk of Bias checklist.

TABLE 4

PROMAuthorContent ValidityStructural validityInternal valdityCross-cultural validityReliabilityMeasurement errorCriterion validityHypothesis testing for construct validityResponsiveness
Disease specific N = 9
(SF-)LDQOLKanwal F (SF-LDQOL)?-?++
Gralnek I.M.?-+
TxEQPérez-San-Gregorio, MÁ++
Annema, C.-?
pLTQMolski, C.+?++
Saab, S.?+-?
Self-made questionnaireParsa Yekta, Z.?+?
Self-made questionnaireLasker, J. N. (social QoL)?-???
Self-made questionnaireFranciosi, M. (ITaLi-Q)?/-+?+?
Self-made questionnaireChen, X. (Post-LiverTransplant Symptom Experience Questionnaire)?+
Self-management Questionnaire for LT recipientsXing L.+
QUALIOSTAtamaz, F.+++-
Generic N = 26
Short-form 36 (SF-36)Fernandez, A. C.+?+
Miller-Matero, L. R.++
Hospital Anxiety and Depression Score (HADS)Pelgur H.+
Miller-Matero, L. R.++
Lin. X+
World Health Organisation – Five Wellbeing Index (WHO-5)Fernandez, A. C+
Weber S.+
WHOQOL-BREFAnnema, C.-
Molski, C.
Post-Traumatic Growth Inventory (PTGI)Gangeri, L.+??+
Scrignaro M.+??+
The Functional Assessment of Cancer Therapy - General (FACT-G)Gangeri, L.+??+
Connor Davidson resilience scale (CD-RISC)Fernandez, A. C.+?
Beck Depression Inventory (BDI)Fernandez, A. C.+
Beck Anxiety Inventory (BAI)Fernandez, A. C.+
Medical Outcomes Study Social Support Survey (SSS)Fernandez, A. C.+
State-Trait Anxiety Inventory (STAI-6)Annema, C.+
Center of Epidemiological Studies Depression Scale (CES-D)Annema, C.+
Pearlin-Scooler Mastery ScaleAnnema, C+
Coping Inventory for Stressful Situations (CISS-SF)Annema, C+
Perceived Social Support Scale (PSSS)Lin. X+
General Comfort QuestionnaireDemir B+?
Fatigue Symptom Inventory (FSI)Lin. X+
Patient Health Questionnaire depression scale (PHQ-9)Gronewold N+
Generalized anxiety disorder screener (GAD-7)Gronewold N+
Perceived social support questionnaireGronewold N+
Sense of coherence scale by AntonovskyGronewold N+
General self-efficacy short scaleGronewold N+
German body imageGronewold N+
Short Questionnaire to Assess Health-Enhancing Physicial Activity (SQUASH)Ushio M-??
UCLA loniless scaleWeber S+
Utility measures N = 1
EQ-5DRussell R.T.?-+

Quality Assessment of the Patient Reported Outcome Measures (PROMs) using the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) guideline.

Abbreviations: + = positive rating; ? = indeterminate rating; − = negative rating.

Disease-Specific PROMs

A total of twelve articles described the measurement properties of the nine disease-specific PROMs [920]. Of these PROMs, one was used in a pre-LT population, six in the post-LT population and two in both the pre- and post-LT population.

Only the (Short-form) Liver Disease Quality of Life [(SF-)LDQOL] (two studies), TxEQ (two studies) and Post-Liver Transplant Quality of Life (pLTQ) (two studies) were employed by multiple studies, each with their own measurement properties of the utilized PROMs. The pLTQ scored a high evidence level for internal validity, reliability and responsiveness.

The ITaLi-Q, the self-made questionnaires by Parsa Yekta et al. and Chen et al., the self-management questionnaire for LT-recipients by Xing et al. and the QUALIOST were all graded with a high evidence level for adequate internal validity [15, 18, 19].

The QUALIOST reported a high level of evidence for cross-cultural validity and reliability. The (SF-)LDQOL reported a high level of evidence on hypothesis testing for construct validity and responsiveness.

Generic PROMs

A total of fourteen articles described the measurement properties of 26 generic PROMs [12, 13, 2130]. Of these PROMs, ten were used in a pre-LT population, and eight in the post-LT population. Furthermore, eight PROMs were utilized in both the pre- and post-LT population. The EQ-5D, graded as a ‘utility measure’, used in both pre- and post-LT population.

The most utilized PROMs were the Hospital Anxiety and Depression Score (HADS) (three studies), the Short-form 36 (SF-36) (two studies), the World Health Organisation – Five Wellbeing index (WHO-5) (two studies), the WHOQOL-BREF (two studies) and the Post-Traumatic Growth Inventory (PTGI). All other PROMs were used by one study only.

There was moderate evidence for the internal validity in most studies; the HADS and SF-36 both scored a high level of evidence in internal validity and hypothesis testing for construct validity. The Short-Questionnaire to Assess Health-Enhancing Physical Activity showed a low level of evidence for reliability. The EQ-5D showed a low level of evidence for criterion validity.

Discussion

This systematic review is the first study to evaluate the methodological quality of PROMs utilized in the pre- and post-LT population, using the COSMIN-guidelines. In total, 23 articles employed nine disease-specific PROMs for the pre- and post-LT population, while 25 general PROMs and one utility measure were included. The (SF-)LDQOL, TxEQ and pLTQ were the most commonly used disease-specific PROMs. PLTQ showed high quality evidence of Internal validity, reliability and responsiveness. HADS was the most frequently used general PROM, and showed high-quality evidence for internal consistency and hypothesis testing for construct validity.

The methodological quality of most general and disease-specific PROMs was found to be limited, as the majority of the studies failed to adequately evaluate the measurement properties of the utilized PROMs, a trend observed in other similar reviews [3133]. Within this review, most studies merely described the internal validity, while other essential measurement properties either lacked a description or exhibited inadequate methodological quality. Furthermore, there was inconsistency in scores for different measurement properties between different studies. For example, internal validity of the PROM TxEQ demonstrated sufficient quality in one study, but insufficient quality in another study, while both studies utilized the same PROM within the post-LT patient population. This discrepancy aligns with finding from the study by Elberts et al., who evaluated the quality of measurement properties in patients with neurological diseases [32]. Variations in measurement properties between studies can be in part attributed to differences in patient demographics and socio-economic characteristics. McHorney et al. found that SF-36 scores were generally lower among the elderly, those with less than a high school education and those in poverty [34]. Therefore, socio-economic backgrounds and diverse patient populations must be considered when implementing a PROM.

The limited use of PROMs in this patient population made it challenging to effectively synthesize and summarize the data. Most PROMs were reported in only one study, with only thirteen studies evaluating the same PROMs [9, 10, 14]. This lack of quality assessment is also reflected in reviews evaluating PROMs in other medical subpopulations [32, 33]. Aiyegbus et al. reviewed the measurement properties of PROMs used in kidney transplantation patients [31]. Despite a greater quantity of studies including a quality assessment of PROMs, the evidence was still of poor quality, with significant gaps in information. Chiarotto et al. evaluated the quality of measurement properties in PROMs for patients with lower back pain – including the SF-36, SF-12, EQ-5D-3L, EQ-5D-5L, Nottingham Health Profile and the PROMIS-GH-10, and found similar scarcities of high-quality evidence in their patient population [35].

The lack of robust quality assessment of PROMs can be attributed to their relatively recent rise in prominence in clinical research. However, PROMs are of the upmost importance for individual patients, as they reflect what matters to patients at a personal level, transcending the broader context of population-level survival. Therefore, identifying high quality, high level of evidence measures that can be standardized across patient populations is of paramount importance.

Assessing subjective patient measurements remain complex due to variability in individual values. Individuals prioritize different aspects of their live, posing a challenge in developing a universally applicable tool. While general tools like the SF-36 and HADS offer a broad applicability, they lack assessment of disease-specific burden. Disease-specific PROMs are therefore more suitable for subpopulations, facilitating accurate detection of burden in subjective measurements.

An additional consideration when selecting a PROM is its original intended purpose. For example, the EuroQol-5 Dimension (EQ-5D) was not originally conceived for the evaluation of QoL in medical research but rather to facilitate cost-effectiveness assessments, rendering it particularly valuable in economic studies. Poor definitions within PROMs also pose a problem, for example, the definition of HRQoL is not always clear [36].

This review extends beyond PROMs simply assessing QoL, to encompass an overview of all PROMs used in pre- and post-LT population. There is not a clear single best option and the choice of a PROM should be made with careful deliberation, considering the particular objectives of the study. Over the last decade, the use of PROMs has increased, including the use of web questionnaires [37]. The integration of PROMs into research and clinical practice enables more accurate assessment of patient symptoms and supports more efficient allocation of healthcare resources. In the context of LT, evaluating changes in symptoms before and after the procedure is particularly relevant, as it could reflects treatment effectiveness. Disease-specific PROMs are therefore generally more appropriate for assessing disease-related symptoms with greater sensitivity. In contrast, generic PROMs are more appropriate to compare across different diseases and populations, and preferred in health technology assessment [38]. Nonetheless, the use of both generic and disease-specific PROMs requires careful consideration. When clinicians or researchers select existing PROMs or developing new ones, several critical aspects must be addressed, including cross-cultural validation, the intended purpose (clinical or research), and patient acceptability and feasibility [31].

There are limitations to this review. Firstly, the populations of the included studies are heterogenous, conducted across many different countries and languages. Cultural nuances play a pivotal role in shaping perception, and the translation of PROMs into different languages may introduce variations in interpretation. Cross-cultural validation represents one approach addressing this problem. However, most of the studies did not provide a comprehensive report on this measurement property. Furthermore, the pre- and post-LT populations have different considerations, including underlying liver disease, the severity of the disease, time after transplantation and the current symptoms of the patient. All these aspects influence patient’s subjective feelings and therefore the outcome of the PROM utilized. However, since there was a lack of strong evidence studies, these sub-analyses could not be performed.

In summary, this review identified the (SF-)LDQOL, TxEQ and pLTQ as the most commonly used disease-specific PROMs, and the HADS was the most frequently used general PROM. For disease-specific PROMs in both pre- and post-LT patients, the pLTQ emerges as the PROM of choice based on its superior methodological quality. However, the limited number of studies assessing the quality of the same PROMs and the low quality of evidence surrounding these instruments highlight the necessity of further investigation. Further studies are needed to carefully evaluate both the appropriateness of the PROM selection for their target population, and the evidence regarding the measurement properties of these instruments, either through rigorous assessment or validation.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

SK, SP-B, KJ, and VW conducted the search, selected the studies and wrote the manuscript. HH supervised and reviewed the manuscript. All authors contributed to the article and approved the submitted version.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We thank the Centre for Patient-Reported Outcome Research at the University of Birmingham, who were consulted during the review process. The graphical abstract was designed with BioRender.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontierspartnerships.org/articles/10.3389/ti.2025.14497/full#supplementary-material

Abbreviations

BAI, Beck Anxiety Inventory; BDI, Beck Depression Inventory; CD-RISK, Connor Davidson Resilience Scale; CES-D, Center of Epidemiological Studies Depression Scale; CISS-SF, Coping Inventory for Stressful Situations; COSMIN, COnsensus-based Standards for the selection of health Measurement Instruments; EQ-5D, EuroQol-5 Dimension; FACT-G, The Functional Assessment of Cancer Therapy – General; FSI, Fatigue Symptom Inventory; GAD-7, Generalized anxiety disorder screener; HADS, Hospital Anxiety and Depression Score LPA-SQUASH: Light-intensity Physical Activity Short Questionnaire to Assess Health-Enhancing Physicial Activity LT, Liver Transplantation PHQ-9, Patient Health Questionnaire depression scale pLTQ, Post-Liver Transplant Quality of Life PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analysis PROM, Patient Reported Outcome Measures PSSS, Perceived Social Support Scale PTGI, Post-Traumatic Growth Inventory QoL, Quality of Life; QUALIOST, Quality of Life Questionnaire in Osteopororis (SF-)LDQoL, Short Form Liver Disease Quality of Life SF-36, Short-Form 36 SOC-L9, Sense of Coherence scale by Antonovsky SSS, Medical Outcomes Study Social Support Survey STAI-6, State-Trait Anxiety Inventory TxEQ, Transplant Effects Questionnaire WHO-5, World Health Organisation – Five Wellbeing Index.

References

  • 1.

    Annual Report on Liver Transplantation 2018/2019. NHS Blood and Transplant. (2018).

  • 2.

    GirgentiRTropeaAButtafarroMARagusaRAmmirataM. Quality of Life in Liver Transplant Recipients: A Retrospective Study. Int J Environ Res Public Health (2020) 17:3809. 10.3390/ijerph17113809

  • 3.

    JayCLButtZLadnerDPSkaroAIAbecassisMM. A Review of Quality of Life Instruments Used in Liver Transplantation. J Hepatol (2009) 51:94959. 10.1016/j.jhep.2009.07.010

  • 4.

    CleemputIDobbelsF. Measuring patient-reported Outcomes in Solid Organ Transplant Recipients: An Overview of Instruments Developed to Date. Pharmacoeconomics (2007) 25:26986. 10.2165/00019053-200725040-00002

  • 5.

    Preferred Reporting Items for Systematic Reviews and Meta-Analyses. (2020).

  • 6.

    PrinsenCACMokkinkLBBouterLMAlonsoJPatrickDLde VetHCWet alCOSMIN Guideline for Systematic Reviews of Patient-Reported Outcome Measures. Qual Life Res (2018) 27:114757. 10.1007/s11136-018-1798-3

  • 7.

    TerweeCBBotSDde BoerMRvan der WindtDAKnolDLDekkerJet alQuality Criteria Were Proposed for Measurement Properties of Health Status Questionnaires. J Clin Epidemiol (2007) 60:3442. 10.1016/j.jclinepi.2006.03.012

  • 8.

    RussellRTFeurerIDWisawatapnimitPPinsonCW. The Validity of EQ-5D US Preference Weights in Liver Transplant Candidates and Recipients. Liver Transpl (2009) 15:8895. 10.1002/lt.21648

  • 9.

    KanwalFSpiegelBMHaysRDDurazoFHanSBSaabSet alProspective Validation of the Short Form Liver Disease Quality of Life Instrument. Aliment Pharmacol Ther (2008) 28:1088101. 10.1111/j.1365-2036.2008.03817.x

  • 10.

    GralnekIMHaysRDKilbourneARosenHRKeeffeEBArtinianLet alDevelopment and Evaluation of the Liver Disease Quality of Life Instrument in Persons with Advanced, Chronic Liver disease--the LDQOL 1.0. Am J Gastroenterol (2000) 95:355265. 10.1111/j.1572-0241.2000.03375.x

  • 11.

    Pérez-San-GregorioMMartín-RodríguezASánchez-MartínMBorda-MasMAvargues-NavarroMLGómez-BravoMet alSpanish Adaptation and Validation of the Transplant Effects Questionnaire (TxEQ-Spanish) in Liver Transplant Recipients and its Relationship to Posttraumatic Growth and Quality of Life. Front Psychiatry (2018) 9:148. 10.3389/fpsyt.2018.00148

  • 12.

    AnnemaCDrentGRoodbolPFStewartREMetselaarHJvan HoekBet alTrajectories of Anxiety and Depression After Liver Transplantation as Related to Outcomes During 2-Year Follow-Up: A Prospective Cohort Study. Psychosom Med (2018) 80:17483. 10.1097/PSY.0000000000000539

  • 13.

    MolskiCMattielloRSarriaEESaabSMedeirosRBrandãoA. Cultural Validation of the Post-liver Transplant Quality of Life (Pltq) Questionnaire for the Brazilian Population. Ann Hepatol (2016) 15:37785. 10.5604/16652681.1198810

  • 14.

    SaabSNgVLandaverdeCLeeSJComuladaWSArevaloJet alDevelopment of a Disease-specific Questionnaire to Measure health-related Quality of Life in Liver Transplant Recipients. Liver Transpl (2011) 17:56779. 10.1002/lt.22267

  • 15.

    Parsa YektaZTayebiZShahsavariHEbadiATayebiRBolourchifardFet alLiver Transplant Recipients Quality of Life Instrument: Development and Psychometric Testing. Hepat Mon (2013) 13:e9701. 10.5812/hepatmon.9701

  • 16.

    LaskerJNSogolowEDShortLMSassDA. The Impact of Biopsychosocial Factors on Quality of Life: Women with Primary Biliary Cirrhosis on Waiting List and Post Liver Transplantation. Br J Health Psychol (2011) 16:50227. 10.1348/135910710X527964

  • 17.

    FranciosiMCaccamoLDe SimonePPinnaADDi CostanzoGGVolpesRet alDevelopment and Validation of a Questionnaire Evaluating the Impact of Hepatitis B Immune Globulin Prophylaxis on the Quality of Life of Liver Transplant Recipients. Liver Transpl (2012) 18:3329. 10.1002/lt.22473

  • 18.

    ChenXZhangYYuJ. Symptom Experience and Related Predictors in Liver Transplantation Recipients. Asian Nurs Res Korean Soc Nurs Sci (2021) 15:814. 10.1016/j.anr.2020.11.001

  • 19.

    XingLChenQYLiJNHuZQZhangYTaoR. Self-Management and self-efficacy Status in Liver Recipients. Hepatobiliary Pancreat Dis Int (2015) 14:25362. 10.1016/s1499-3872(15)60333-2

  • 20.

    AtamazFHepgulerSOzturkCPinarY. Is QUALIOST Appropriate for the Patients With Orthotopic Liver Transplantation in Measuring Quality of Life?Transpl Proc (2013) 45:2869. 10.1016/j.transproceed.2012.10.027

  • 21.

    FernandezACFehonDCTreloarHNgRSledgeWH. Resilience in Organ Transplantation: An Application of the Connor-Davidson Resilience Scale (CD-RISC) with Liver Transplant Candidates. J Pers Assess (2015) 97:48793. 10.1080/00223891.2015.1029620

  • 22.

    Miller-MateroLREshelmanAPaulsonDArmstrongRBrownKAMoonkaDet alBeyond Survival: How Well Do Transplanted Livers Work? A Preliminary Comparison of standard-risk, high-risk, and Living Donor Recipients. Clin Transpl (2014) 28:6918. 10.1111/ctr.12368

  • 23.

    PelgurHAtakNKoseK. Anxiety and Depression Levels of Patients Undergoing Liver Transplantation and Their Need for Training. Transpl Proc (2009) 41:17438. 10.1016/j.transproceed.2008.11.012

  • 24.

    LinXHTengSWangLZhangJShangYBLiuHXet alFatigue and Its Associated Factors in Liver Transplant Recipients in Beijing: A Cross-Sectional Study. BMJ Open (2017) 7:e011840. 10.1136/bmjopen-2016-011840

  • 25.

    WeberSRekSEser-ValeriDPadbergFReiterFPDe ToniEet alThe Psychosocial Burden on Liver Transplant Recipients During the COVID-19 Pandemic. Visc Med (2021) 382:18. 10.1159/000517158

  • 26.

    GangeriLScrignaroMBianchiEBorreaniCBhoorieSMazzaferroV. A Longitudinal Investigation of Posttraumatic Growth and Quality of Life in Liver Transplant Recipients. Prog Transpl (2018) 28:23643. 10.1177/1526924818781569

  • 27.

    GronewoldNSchunnFIhrigAMayerGWohnslandSWagenlechnerPet alPsychosocial Characteristics of Patients Evaluated for Kidney, Liver, or Heart Transplantation. Psychosom Med (2023) 85:98105. 10.1097/PSY.0000000000001142

  • 28.

    UshioMMakimotoKFujitaKTanakaSKanaokaMKosaiYet alValidation of the LPA-SQUASH in post-liver-transplant Patients. Jpn J Nurs Sci (2023) 20:e12540. 10.1111/jjns.12540

  • 29.

    DemirBBulbulogluS. The Effect of Immunosuppression Therapy on Activities of Daily Living and Comfort Level After Liver Transplantation. Transpl Immunol (2021) 69:101468. 10.1016/j.trim.2021.101468

  • 30.

    ScrignaroMSaniFWakefieldJRBianchiEMagrinMEGangeriL. Post-Traumatic Growth Enhances Social Identification in Liver Transplant Patients: A Longitudinal Study. J Psychosom Res (2016) 88:2832. 10.1016/j.jpsychores.2016.07.004

  • 31.

    AiyegbusiOLKyteDCockwellPMarshallTGheorgheAKeeleyTet alMeasurement Properties of Patient-Reported Outcome Measures (Proms) Used in Adult Patients With Chronic Kidney Disease: A Systematic Review. PLoS One (2017) 12:e0179733. 10.1371/journal.pone.0179733

  • 32.

    ElbersRGRietbergMBvan WegenEEHVerhoefJKramerSFTerweeCBet alSelf-Report Fatigue Questionnaires in Multiple Sclerosis, Parkinson’s Disease and Stroke: A Systematic Review of Measurement Properties. (2025).

  • 33.

    GreenALilesCRushtonAKyteDG. Measurement Properties of Patient-Reported Outcome Measures (PROMS) in Patellofemoral Pain Syndrome: A Systematic Review. Man. Ther. (2014) 51726. 10.1016/j.math.2014.05.013

  • 34.

    McHorneyCAWareJEJr.LuJFSherbourneCD. The MOS 36-item Short-form Health Survey (SF-36): III. Tests of Data Quality, Scaling Assumptions, and Reliability Across Diverse Patient Groups. Med Care (1994) 32:4066. 10.1097/00005650-199401000-00004

  • 35.

    ChiarottoATerweeCBKamperSJBoersMOsteloRW. Evidence on the Measurement Properties of Health-Related Quality of Life Instruments Is Largely Missing in Patients With Low Back Pain: A Systematic Review. J Clin Epidemiol (2018) 102:2337. 10.1016/j.jclinepi.2018.05.006

  • 36.

    BoersMKirwanJRWellsGBeatonDGossecLd'AgostinoMAet alDeveloping Core Outcome Measurement Sets for Clinical Trials: OMERACT Filter 2.0. J Clin Epidemiol (2014) 67:74553. 10.1016/j.jclinepi.2013.11.013

  • 37.

    HjollundNHI. Fifteen Years' Use of Patient-Reported Outcome Measures at the Group and Patient Levels: Trend Analysis. J Med Internet Res (2019) 21:e15856. 10.2196/15856

  • 38.

    WhittalAMeregagliaMNicodE. The Use of Patient-Reported Outcome Measures in Rare Diseases and Implications for Health Technology Assessment. Patient (2021) 14:485503. 10.1007/s40271-020-00493-w

Summary

Keywords

patient reported outcome measures, liver transplantation, quality of life, measurement properties, surgery

Citation

van Knippenberg SEM, Powell-Brett SF, Joshi K, Weeda VB and Hartog H (2025) Quality of Measurement Properties in Patient Reported Outcomes Used in Adult Liver Transplant Candidates and Recipients: a Systematic Review. Transpl. Int. 38:14497. doi: 10.3389/ti.2025.14497

Received

15 February 2025

Accepted

12 August 2025

Published

02 October 2025

Volume

38 - 2025

Updates

Copyright

*Correspondence: Hermien Hartog,

†These authors have contributed equally to this work

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article