Your new experience awaits. Try the new design now and help us make it even better

SYSTEMATIC REVIEW article

Front. Psychol., 11 February 2026

Sec. Quantitative Psychology and Measurement

Volume 17 - 2026 | https://doi.org/10.3389/fpsyg.2026.1709822

Patient-reported outcome measures for anticipatory grief: a systematic review

Shiyuan Wang&#x;Shiyuan Wang1Wangyang Gu&#x;Wangyang Gu2Ling ZhangLing Zhang1Wei LiuWei Liu1Zeqin TaoZeqin Tao1Yanrong ChengYanrong Cheng1Wenlin LiuWenlin Liu1Xiaoting ZhaoXiaoting Zhao1Wen Tu,
Wen Tu1,3*Xing Gao
Xing Gao1*
  • 1School of Nursing, Hainan Medical University, Haikou, China
  • 2School of Public Health, Hainan Medical University, Haikou, China
  • 3College of Nursing, University of Utah, Salt Lake City, UT, United States

Background: Anticipatory grief (AG) refers to the experience of grief symptoms by patients or their caregivers in response to life-threatening illnesses, even before an actual loss has occurred. The selection of valid and reliable patient-reported outcome measures (PROMs) to assess AG is essential for early identification, effective intervention, and the reduction of prolonged grief disorder. This study summarizes psychometric properties of AG PROMs and recommends the most effective PROMs.

Methods: Nine databases (PubMed, EMBASE, Web of Science, CINAHL, Cochrane Library, PsycINFO, CNKI, Wanfang, China Biology Medicine Database) were searched from inception to December 2024. Search terms include “preparatory” or “preparedness” or “pre-loss” or “pre-death” or “anticipatory,” “grief” or “mourn” or “bereave,” “surveys and questionnaires” or “assess” or “instrument” or “measure” or “inventory” or “scale” or “interview.” Eligible studies included those reporting the development or validation of any AG assessment for patients with chronic illnesses and their informal caregivers. Extracted psychometric properties encompassed content validity, structural validity, internal consistency, cross-cultural validity, reliability, measurement error, criterion validity, hypothesis testing, and responsiveness. We conducted a reliability generalization meta-analysis of internal consistency (Cronbach’s alpha) for identical AG PROMs across different studies. Quality assessment, measurement property ratings, synthesis, and modified grading of evidence were conducted following the COSMIN methodology for systematic reviews.

Results: A total of 20 studies comprising of 13 AG PROMs were included. 2 PROMs were designed for patient assessment, and 11 targeted informal caregivers. PG-12 demonstrated high-quality evidence in assessing informal caregiver AG, with satisfactory content validity and internal consistency, supporting grade A. MM-CGI-CCPS showed insufficient evidence for structural validity and internal consistency, warranting grade C; the remaining tools are suggested as grade B due to limited evidentiary support.

Conclusion: This review confirms PG-12 as a grade A PROM for measuring AG in informal caregivers. However, no grade A PROMs were identified for assessing AG in patients. Future studies are needed to further validate the methodological quality and measurement properties of PROMs used in patient populations. Significant concerns remain regarding other existing PROMs, particularly in content validity, structural validity, cross-cultural adaptation, measurement error, and reliability, which increase uncertainty in available evidence.

Systematic review registration: https://www.crd.york.ac.uk/PROSPERO/view/CRD42024624205, identifier (CRD42024624205).

1 Introduction

Chronic illnesses (CIs) have become the most significant global burden of disease. According to the World Health Statistics Report released by the World Health Organization (WHO) in 2024, about 43 million people died from CIs worldwide in 2021. These account for about three-quarters of all global deaths (GBD 2021 Forecasting Collaborators, 2024). According to the Global Report on CIs 2023 released by the WHO, CIs cost the world $52 trillion annually, accounting for 4.3% of the global average annual gross domestic product (GDP) (World Health Organization, 2023). Beyond the physical and economic strain, CIs also bring profound emotional challenges, as patients and their informal caregivers often begin grieving long before the end of life.

Anticipatory grief (AG) is a prevalent yet often overlooked psychological response characterized by the experience of grief symptoms before the actual occurrence of loss (Cheung et al., 2018). This phenomenon is particularly common among patients with CIs such as cancer, Alzheimer’s disease, and stroke, as well as their informal caregivers (Holm et al., 2019; Pérez-González et al., 2023; Liu et al., 2023). Research reveals that AG manifests not merely as cognitive anticipation of loss, but more significantly as a psychological burden that individuals continually bear (Chen et al., 2024).

AG has become a prevalent clinical phenomenon among patients and informal caregivers. Studies have shown that about 30–50% of patients with CIs would experience a psychological response to AG because of fear and uncertainty about the impending loss, which is particularly pronounced in cancer patients (Treml et al., 2021). If AG is not promptly identified and addressed, it can significantly diminish patients’ quality of life and negatively impact treatment adherence (Li et al., 2020). Moreover, studies have shown that elevated levels of AG in cancer patients are significantly associated with severe depression and anxiety, which can negatively impact health outcomes, reduce survival rates, and increase the risk of adverse events (Tian et al., 2021; Zheng et al., 2025).

Regarding patient care, about 14.9–33% of informal caregivers also experience AG due to excessive physical and mental burden, and uncertainty about the progress of the patients’ disease (Nielsen et al., 2017). This phenomenon occurs in primary informal caregivers, including spouses and parents. AG exacerbates with increasing caregiving stress, subsequently inducing psychological disorders such as anxiety and depression in informal caregivers, and may potentially lead to adverse bereavement outcomes (Areia et al., 2019). Previous studies have identified a significant correlation between AG in informal caregivers and poor patient prognosis, which not only compromises informal caregivers’ quality of life and caregiving capacity but also contributes to premature nursing institutionalization and increased early mortality among patients (Toot et al., 2017; Tekdemir et al., 2024). Early identification of AG can improve health outcomes and quality of life for both patients and their informal caregivers.

Despite growing recognition of AG among patients with CIs and their informal caregivers, there is currently no standardized, high-quality tool for its assessment. Existing evidence is limited, with only one systematic review focused on dementia informal caregivers (Dehpour and Koffman, 2023). As a result, researchers and clinicians lack reliable guidance when choosing appropriate instruments for different CIs populations, making it difficult to identify those at greatest risk for AG. The use of PROMs with good psychometric properties is essential for early identification and intervention for AG. This study aimed to systematically review and compare existing AG assessment tools, assessing their psychometric properties to provide evidence-based recommendations for selecting high-quality PROMs. The findings will inform the selection of the most appropriate assessment tools for clinical practice by enhancing the accuracy of AG identification in patients and informal caregivers, supporting timely and individualized interventions, and informing future guidelines to improve the well-being of patients with CIs and their informal caregivers.

2 Methods

2.1 Design

This study employed the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) method to systematically review the psychometric properties. We registered the current review in the PROSPERO database (registration number: CRD42024624205).

2.2 Search strategy

In the search strategy, three steps were followed. First, search terms were identified and standardized using authoritative thesauri. English terms were developed based on PubMed’s MeSH and relevant keywords, while Chinese terms were drawn from their official equivalent, the SinoMed Subject Headings system, to ensure the Chinese search terms are semantically equivalent to the English ones. The research team, including an academic librarian, a statistical analysis expert, and six researchers, were trained in evidence-based methodologies, confirmed the final search strategy, and both English and Chinese search terms included: “preparatory” or “preparedness” or “pre-loss” or “pre-death” or “anticipatory,” “grief” or “mourn” or “bereave,” “surveys and questionnaires” or “assess” or “instrument” or “measure” or “questionnaire” or “inventory” or “scale” or “interview” or “criteria.” Secondly, nine databases (PubMed, EMBASE, Web of Science, Cochrane Library, PsycINFO, CINAHL, CNKI, Wanfang, and China Biology Medicine Database) were selected from their inception to December 10, 2024. The search filter developed by the Oxford University PROMs group in PubMed was utilized to further enhance search accuracy in accordance with the COSMIN guidelines. Thirdly, relevant core journals and books from the past five years, including Oxford Textbook of Palliative Medicine, Practical Guidance on Palliative Care in Oncology, and Palliative Medicine, were manually searched, and the references of included studies were traced to supplement any eligible literature not included. Supplementary file 1 shows the search strategies.

2.3 Selection criteria

Inclusion criteria: (1) including all patients with CIs and their informal caregivers, with all participants aged ≥ 18 years. (2) studies focused on developing or validating tools to measure AG, including but not limited to questionnaires, checklists, and scales. (3) assessment tools contain at least one psychometric feature for measuring AG. (4) studies published in English or Chinese.

Exclusion criteria: (1) the full text is unavailable. (2) duplicate or overlapping publications. (3) review articles. (4) studies where the assessment instruments were employed exclusively as validation criteria for other PROMs. (5) used PROMs as an outcome measure tool.

2.4 Quality appraisal

Quality appraisal consisted of two steps: First, the COSMIN guidelines provided a standardized framework to evaluate measurement tools, ensuring a structured and comprehensive assessment of methodological quality and psychometric properties to identify high-quality PROMs. Second, the modified Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system was used to assess the certainty of evidence. Two researchers trained in evidence-based methods, independently conducted the appraisal. Discrepancies were resolved through consultation with a third researcher to reach consensus, ensuring reliability and accuracy.

2.4.1 Methodological quality assessment

The methodological quality of each study was evaluated using the COSMIN risk bias checklist (Mokkink et al., 2018). The checklist comprises 10 domains and 116 items that assess: PROM development, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing, and responsiveness. Each item was rated on a 4-point scale: “very good,” “adequate,” “doubtful,” and “inadequate.” The “not applicable (NA)” rating was assigned to items deemed irrelevant or absent in a given study and was excluded from the final domain evaluation. The overall rating for each domain was determined using the “worst-score-counts” principle, whereby the lowest item score defined the final domain evaluation.

2.4.2 Summarizing the quality of psychometric properties

The measurement properties of PROMs were assessed according to the COSMIN standard (Terwee et al., 2007), covering nine domains (excluding PROM development). Each property was categorized into three levels: “sufficient (+),” “insufficient (−),” or “indeterminate (?).” For content validity, the following criteria were applied: “sufficient (+)” if ≥ 85% of items met the standards, “insufficient (−)” if not met, and “indeterminate (?)” when information was inadequate or when there was a high risk of bias. Items were rated as “not applicable (NA)” when they were irrelevant to the study content or when content was missing, and these items were excluded from the final evaluation. A consensus rating was established when individual study ratings were consistent; unresolved discrepancies in the assessments were labeled as ‘inconsistent’. Since COSMIN standards do not encompass exploratory factor analysis (EFA) for structural validity, we adopted the criteria proposed by Lee et al. (2019), rating this property as “sufficient (+)” when either ≥ 50% of the variance was explained or when Pearson’s correlation coefficient was ≥ 0.80, thereby ensuring a robust assessment of structural validity.

To evaluate the overall internal consistency of AG PROMs, reliability generalization (RG) meta-analyses were conducted separately for the Cronbach’s alpha (α) coefficients obtained from the total scale scores of each measurement instrument, with the pooled results classified as “sufficient (+)” or “insufficient (−).” To account for between-study heterogeneity, all analyses were based on random-effects models. Because Cronbach’s α is bounded between 0 and 1, its sampling distribution is non-normal; therefore, this study applied the Bonett transformation (Bonett, 2002) to correct for this issue. We performed the RG meta-analysis using inverse-variance weighting based on the sampling variances of the Bonett-transformed α values, and estimated between-study variance (τ2) using restricted maximum likelihood (López-López et al., 2013; Borenstein et al., 2009). For each meta-analysis, the pooled Cronbach’s α and its 95% confidence interval (CI) were computed using the Hartung–Knapp method (Hartung and Knapp, 2001). Heterogeneity was quantified using Cochran’s Q statistic and the 𝐼2 index. A statistically significant Q test (p < 0.05) was interpreted as evidence of heterogeneity (Higgins and Thompson, 2002). The magnitude of heterogeneity was interpreted based on 𝐼2 values as follows: negligible (< 25%), low (25–49.9%), moderate (50–74.9%), and high (> 75%). Forest plots were constructed to visually display study-specific α coefficients and the pooled estimates. This RG meta-analysis was reported in accordance with the REGEMA checklist (Sánchez-Meca et al., 2021).

2.4.3 Grading of the quality of the evidence

We employed a modified GRADE approach to evaluate the quality of evidence for the psychometric properties of PROMs (Mokkink et al., 2016), including four criteria: risk of bias, inconsistency, indirectness, and imprecision. The evidence quality was then categorized into four levels: “high,” “moderate,” “low,” and “very low.”

Based on the evidence quality and grading, final recommendations were made for the included PROMs (Prinsen et al., 2018). The recommendation criteria were classified into three categories: grade A (strongly recommended): PROMs demonstrating sufficient (+) content validity (acceptable at any evidence level) and sufficient (+) internal consistency (supported by at least low-quality evidence); grade B (weakly recommended): PROMs showing potential for use but requiring further validation; grade C (not recommended): PROMs with high-quality evidence for a measurement property as insufficient (−).

2.5 Data extraction

Microsoft Excel 16.0 software was used to extract the basic information from each study and the characteristics of PROMs. Two researchers independently extracted the data, including basic information (author (year), PROMs, country/region, PROM language, research design, sample size and participants, participants’ mean age, Year of development/validity) and the characteristic information of PROMs (PROMs, references, Cronbach’s alpha, mode of administration, items number and Subscale, range of scores, original language).

2.6 Synthesis

References retrieved from relevant databases were imported into EndNote, and duplicates were removed. Two researchers trained in evidence-based methodologies independently screened the titles and abstracts of the retrieved articles. Articles meeting the inclusion criteria were subsequently reviewed in full text by the same two researchers. During both screening phases, any disagreements were resolved through discussion with a third researcher to reach consensus.

3 Results

3.1 Literature results

A total of 2,773 articles were retrieved from a search of nine databases and a hand search of relevant core journals and books. After removing duplicates, a total of 1,413 articles remained (Figure 1). Following title/abstract and full-text screening, 1,360 articles were excluded because they did not meet the inclusion criteria or were review articles. After full-text screening, a total of 20 articles were included (Theut et al., 1991; Marwit and Meuser, 2002; Marwit and Meuser, 2005; Mystakidou et al., 2005; Periyakoil et al., 2005; Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Liew, 2016; Meichsner et al., 2016; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023; Liu et al., 2023). Supplementary file 2 contains the list of included studies.

Figure 1
Flowchart illustrating a systematic review process. Out of 2,773 records identified, 1,413 duplicates were removed, leaving 1,360 screened by title and abstract. After excluding 1,264 records, 96 full-text articles were assessed. Seventy-six were excluded, resulting in 20 studies included in the review.

Figure 1. PRISMA flowchart of the identification and selection of studies.

3.2 Basic characteristics of the included studies

A total of 20 studies were included in the review, with 18 published in English and 2 in Chinese, as shown in Table 1. The articles were published between 1991 and 2024 and conducted across 11 regions: the United States, the United Kingdom, Greece, China, Hong Kong, Jordan, Singapore, Portugal, Turkey, Germany, and Sweden. All the studies were cross-sectional studies, with sample sizes ranging from 27 to 508 participants, involving a total of 4,978 participants. The subjects comprised patients (Mystakidou et al., 2005; Periyakoil et al., 2005; Xin, 2017) and informal caregivers (Theut et al., 1991; Marwit and Meuser, 2002; Marwit and Meuser, 2005; Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Liew, 2016; Meichsner et al., 2016; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023; Liu et al., 2023). The study participants included patients with CIs and their informal caregivers. The spectrum of chronic conditions encompassed dementia, cancer, stroke, and cerebral palsy. Two studies (Liew and Yap, 2018; Liew et al., 2018) used the same data to develop and validate separate scales. One article (Xin, 2017) includes the results of the measurement properties of AG measurement tools for two different groups of study subjects: patients and informal caregivers.

Table 1
www.frontiersin.org

Table 1. Study characteristics.

3.3 Basic characteristics of AG measurement tools

As shown in Table 2, among the 20 included studies, 13 reported the characteristics of AG measurement tools. Two instruments were designed for CIs patients, and 11 for informal caregivers of CIs patients. Of the 13 PROMs, 11 PROMs (Marwit and Meuser, 2002, 2005; Mystakidou et al., 2005; Periyakoil et al., 2005; Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Meichsner et al., 2016; Coelho et al., 2017; Cheng et al., 2019; Holm et al., 2019; Liu et al., 2023) comprised multiple subscales (ranging from 2 to 7 subscales), while one PROM (Liew and Yap, 2018) were unidimensional. In addition, one PROM (Xin, 2017) was originally developed as unidimensional but was validated as multidimensional in its cross-cultural adaptation. The total number of items in the 13 PROMs varied from 6 to 50. The MM-CGI has the largest number of items (50) (Marwit and Meuser, 2002). Four PROMs (Theut et al., 1991; Marwit and Meuser, 2002; Marwit and Meuser, 2005; Coelho et al., 2017) were available in multiple languages. Regarding reliability reporting, 12 PROMs provided Cronbach’s α coefficients, and 1 PROM reported a Kappa coefficient. Overall, existing AG PROMs demonstrate substantial heterogeneity in terms of structural dimensions, number of items, available language versions, and reliability metrics.

Table 2
www.frontiersin.org

Table 2. Patient-reported outcome measure characteristics.

3.4 Methodological quality and measurement properties

Among the 20 included studies, the number of psychometric properties examined in each study ranged from 3 to 6, with 12 studies investigating five or more properties (Marwit and Meuser, 2002; Mystakidou et al., 2005; Periyakoil et al., 2005; Meichsner et al., 2016; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Önal et al., 2023; Liu et al., 2023). All studies consistently reported on content validity, internal consistency, and hypothesis testing analysis results. However, these studies were lacking in information regarding result validity, cross-cultural validity, stability, and criterion validity, and did not provide data on measurement error and responsiveness.

3.4.1 Content validity

As shown in Tables 3, 4, the content validity of 20 studies was evaluated. 9 studies (Marwit and Meuser, 2005; Mystakidou et al., 2005; Periyakoil et al., 2005; Al-Gamal et al., 2009; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Cheng et al., 2019; Liu et al., 2023) ensured the reliability of content validity by obtaining the opinions of consulting experts or test subjects regarding the constructs of the measurement tools, and selected appropriate analysis methods. Therefore, the methodological quality was rated as “adequate.” 11 studies were rated as “doubtful,” among which three studies (Theut et al., 1991; Al-Gamal and Long, 2014; Meichsner et al., 2016) did not conduct pre-experiments, and eight studies (Marwit and Meuser, 2002; Liew, 2016; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023) had unclear descriptions of concepts or methodologies. Based on the assessment results of each study in terms of comprehensiveness, relevance, and understandability. Two PROMs, including the Marwit-Meuser Caregiver Grief Inventory Short Form (MM-CGI-SF) and Prolonged Grief Disorder Questionnaire–Predeath (PG-12) were rated as “sufficient (+).” eleven PROMs, including the Preparatory Grief in Advanced Cancer (PGAC), Terminally III Grief or Depression Scale (TIGDS), Marwit-Meuser Caregiver Grief Inventory (MM-CGI), MM-CGI Childhood Cancer Scale (MM-CGI-CCS), Anticipatory Grief Scale (AGS), Anticipatory Grief Scale-13 (AGS-13), Anticipatory Grief Scale for Stroke Caregivers (AGS-SC), 6-item Caregiver Grief Scale (MM-CGI-BF), MM-CGI Childhood Cerebral Palsy Scale (MM-CGI-CCPS), Caregiver Grief Scale (CGS), and Caregiver Grief Questionnaire (CGQ) were rated as “indeterminate (?)” due to insufficient evidence.

Table 3
www.frontiersin.org

Table 3. Methodological quality assessment.

Table 4
www.frontiersin.org

Table 4. Rating of the measurement properties of the instruments.

3.4.2 Internal structure

The internal structure (including structural validity, internal consistency, and cross-cultural validity) was presented in Tables 3, 4. Structural validity was assessed in 16 studies. Three studies (Coelho et al., 2017; Cheng et al., 2019; Önal et al., 2023) employed confirmatory factor analysis (CFA), four studies (Marwit and Meuser, 2005; Mystakidou et al., 2005; Xin, 2017; Holm et al., 2019) utilized EFA, and 3 studies (Meichsner et al., 2016; Gilsenan et al., 2022; Liu et al., 2023) applied both CFA and EFA, and two studies (Marwit and Meuser, 2002; Ar-Karci and Karanci, 2019) used principal component analysis (PCA). Four studies did not clearly report their validation methods (Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Liew, 2016; Chan et al., 2017). Methodological quality assessment revealed that three studies (Meichsner et al., 2016; Gilsenan et al., 2022; Liu et al., 2023) were rated as “very good,” five studies (Marwit and Meuser, 2005; Mystakidou et al., 2005; Xin, 2017; Cheng et al., 2019; Holm et al., 2019) as “adequate,” and eight studies (Marwit and Meuser, 2002; Ar-Karci and Karanci, 2019; Coelho et al., 2017; Önal et al., 2023) as “doubtful.” Structural validity Psychological attribute results: The structural validity of six PROMs (AGS, AGS-SC, PGAC, MM-CGI-SF, CGS, and CGQ) was rated as “sufficient (+),” and the structural validity of six PROMs (AGS-13, MM-CGI, MM-CGI-BF, MM-CGI-CCS, MM-CGI-CCPS, and PG-12) was rated as “indeterminate(?)” due to insufficient evidence or unclear description of methods.

Internal consistency was assessed in all studies, with methodological quality rated as “very good” across all evaluations. 19 studies (Theut et al., 1991; Marwit and Meuser, 2002; Marwit and Meuser, 2005; Mystakidou et al., 2005; Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Liew, 2016; Meichsner et al., 2016; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023; Liu et al., 2023) calculated and reported Cronbach’s α coefficients, while one study (Periyakoil et al., 2005) reported Cohen’s kappa coefficient (κ) with 95% CI. Regarding internal consistency psychometric properties, 20 studies (Theut et al., 1991; Marwit and Meuser, 2002; Marwit and Meuser, 2005; Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Liew, 2016; Meichsner et al., 2016; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023; Liu et al., 2023; Mystakidou et al., 2005; Periyakoil et al., 2005) demonstrated Cronbach’s α coefficients that met the threshold and were therefore rated “sufficient (+).”

Cross-cultural validity was assessed in seven studies that reported on the cross-cultural adaptation and psychometric evaluation of AG PROMs in non-English–speaking populations. One study (Xin, 2017) included both the Chinese Version of Anticipatory Grief Scale (C-AGS) and Chinese Version of Preparatory Grief in Advanced Cancer Patients Scale (C-PGAC), which underwent forward–backward translation, expert review, cognitive interviewing, and comprehensive validation in a Chinese sample. Similarly, five studies (Coelho et al., 2017; Liew and Yap, 2018; Ar-Karci and Karanci, 2019; Holm et al., 2019; Liu et al., 2023) followed standard adaptation procedures and evaluated the psychometric properties of the instruments within their respective cultural contexts. However, all analyses were confined to single-culture samples and did not include comparisons with the original English versions or other language versions using measurement invariance testing. Notably, one study (Liew, 2016) reported a significant difference in MM-CGI scores between Asian and U.S. caregivers (p < 0.001); however, without conducting multi-group confirmatory factor analysis or differential item functioning analysis, potential measurement bias could not be ruled out, and thus the observed difference cannot be interpreted as a true cultural difference. In summary, although these studies provide strong support for the local applicability of the instruments, none performed formal cross-cultural validity testing. Therefore, the methodological quality of these studies regarding cross-cultural validity was rated as “inadequate.” The evidence for this measurement property was rated as “insufficient (−).”

3.4.3 Assessment of quality and results for the remaining measurement properties

Tables 3, 4 present the assessment results of reliability, criterion validity, and hypothesis testing. Reliability was assessed in 9 studies. One study (Periyakoil et al., 2005) calculated the kappa statistic. Five studies (Meichsner et al., 2016; Liew and Yap, 2018; Liew et al., 2018; Cheng et al., 2019; Önal et al., 2023) reported an intraclass correlation coefficient (ICC) ≥ 0.8, indicating good test–retest reliability, and were rated as “very good” in methodological quality, one study (Ar-Karci and Karanci, 2019) that reported ICC without specifying the retest interval was rated as “doubtful,” one study (Liu et al., 2023) with an ICC below the threshold was rated as “inadequate.” Nine studies assessed test–retest reliability intervals. Five studies (Periyakoil et al., 2005; Liew and Yap, 2018; Liew et al., 2018; Cheng et al., 2019; Önal et al., 2023) that met the 14-day retest criterion were rated as “sufficient (+)”; three studies (Mystakidou et al., 2005; Meichsner et al., 2016; Liu et al., 2023) that failed to meet this standard were rated as “insufficient (−);” and one study (Ar-Karci and Karanci, 2019) with an unspecified interval duration was rated as “indeterminate (?).” The synthesized results of reliability measurement properties indicated that five PROMs (TIGDS, MM-CGI, PG-12, MM-CGI-BF, and CGQ) were rated as “sufficient (+).” Three PROMs (CGS, PGAC, and AGS-SC) were classified as “insufficient (−),” and one instrument (MM-CGI-SF) was rated as “indeterminate (?).”

20 studies reported hypothesis testing results. 19 studies (Theut et al., 1991; Marwit and Meuser, 2002, 2005; Mystakidou et al., 2005; Periyakoil et al., 2005; Al-Gamal et al., 2009; Al-Gamal and Long, 2014; Liew, 2016; Meichsner et al., 2016; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023) employed convergent or discriminant validity analyses to examine the target instruments against generic comparator scales, though the applicability of these scales to the specific population remains unclear. 16 studies (Theut et al., 1991; Marwit and Meuser, 2002, 2005; Al-Gamal and Long, 2014; Liew, 2016; Meichsner et al., 2016; Chan et al., 2017; Coelho et al., 2017; Xin, 2017; Liew and Yap, 2018; Liew et al., 2018; Ar-Karci and Karanci, 2019; Cheng et al., 2019; Holm et al., 2019; Gilsenan et al., 2022; Önal et al., 2023) were therefore rated as “adequate.” Three studies (Mystakidou et al., 2005; Periyakoil et al., 2005; Al-Gamal et al., 2009) did not adequately address the measurement attributes of the comparison tools and were rated as “doubtful.” The overall result of the measurement attributes of the hypothesis test: eight PROMs (AGS-13, MM-CGI, MM-CGI-SF, PG-12, MM-CGI-CCPS, CGS, MM-CGI-BF, and CGQ) have a consistent correlation with the hypothesis, and are rated as “sufficient (+)”, four PROMs (MM-CGI-CCS, PGAC, TIGDS, and AGS) have unclarified hypothesis test information and are rated as “insufficient (−).”

Only two studies (Periyakoil et al., 2005; Liew and Yap, 2018) reported on criterion validity. Both studies employed receiver operating characteristic (ROC) curve analysis to assess the scale’s criterion validity, which is its ability to discriminate between groups in relation to a gold-standard reference, and reported the corresponding sensitivity and specificity. One study (Periyakoil et al., 2005) (TIGDS), which used clinical consensus as the gold standard, was rated as “very good” and the measurement attribute was “sufficient (+).” The other study (Liew and Yap, 2018) (MM-CGI-BF) was rated as having “adequate” methodological quality, and the measurement attribute was also “sufficient (+).”

3.4.4 Meta-analysis results

This study employed a RG meta-analysis to synthesize the Cronbach’s α coefficients of five AG PROMs to evaluate their internal consistency (Figure 2). For PG-12 (k = 2, N = 214), the pooled α coefficient was 0.85 (95% CI, 0.82, 0.87), with no observed heterogeneity [I2 = 0%, H2 = 1.00, Q(1) = 0.02, p = 0.898]. This indicates stable internal consistency across studies, and it was therefore rated as “sufficient (+).” AGS (k = 2, N = 418), the pooled alpha coefficient was 0.88 (95% CI, 0.83, 0.92), but moderate heterogeneity was present [I2 = 52.36%, H2 = 2.10, Q(1) = 2.10, p = 0.147]. Due to the limited number of studies, the stability of its internal consistency remains questionable, leading to a rating of “indeterminate (?).” For MM-CGI-SF (k = 2, N = 310), the pooled α coefficient was 0.93, but the confidence interval was very wide (95% CI, 0.57, 0.99), and moderate heterogeneity was observed [I2 = 64.71%, H2 = 2.83, Q(1) = 2.83, p = 0.092]. The limited evidence and high uncertainty make it difficult to draw a firm conclusion about its internal consistency, so it was rated as “indeterminate (?).” The analyses for MM-CGI [k = 3, N = 632, I2 = 98.57%, H2 = 70.12, Q(2) = 170.72, p < 0.001] and PGAC [k = 2, N = 839, I2 = 97.93%, H2 = 48.25, Q(1) = 48.25, p < 0.001] both showed very high heterogeneity, indicating that the internal consistency of MM-CGI and PGAC lacks robustness across studies. Therefore, both were rated as “insufficient (−).”

Figure 2
Forest plot showing Cronbach's Alpha values with 95% confidence intervals for different grief measurement instruments across multiple studies. Subgroups include MM-CGI, MM-CGI-SF, AGS, PG-12, and PGAC, each listing studies, sample sizes, individual alpha values, and random-effects model estimates, demonstrating reliability metrics for each instrument.

Figure 2. Forest plot of internal consistency coefficients for the AG PROMs total score. MM-CGI, Marwit-Meuser caregiver grief inventory; MM-CGI-SF, Marwit-Meuser caregiver grief inventory short form; AGS, Anticipatory Grief Scale; PG-12, Prolonged Grief Scale; PGAC, preparatory grief in advanced cancer; α, Cronbach’s alpha; 95% CI, 95% confidence interval.

3.5 Measurement attributes of the scale synthesize results and recommendations

Table 5 summarizes the evidence and recommendation grades for 13 PROMs. The evidence quality was downgraded based on four key factors of the modified GRADE approach (risk of bias, inconsistency, indirectness, and imprecision), with the overall certainty of evidence ranging from “high” to “very low.” The evidence grading results showed that two PROMs (PG-12 and MM-CGI-SF) were rated “high” for content validity, while eight PROMs (PG-12, CGS, CGQ, MM-CGI-CCS, MM-CGI-BF, MM-CGI-CCPS, AGS-SC, and AGS-13) were rated “high” for internal consistency. Regarding recommendation grades, the PG-12 was classified as grade A, eleven PROMs (MM-CGI-SF, TIGDS, PGAC, CGS, CGQ, MM-CGI, MM-CGI-BF, MM-CGI-CCS, AGS-SC, AGS and AGS-13) as grade B, and MM-CGI-CCPS as grade C.

Table 5
www.frontiersin.org

Table 5. Quality of evidence for each measurement property.

4 Discussion

This review aims to critically examine the measurement tools for AG in patients with CIs and their informal caregivers according to the COSMIN guidelines and to recommend the most effective AG PROMs. The focus is on the reliability, validity, responsiveness, and interpretability of AG measurement tools.

Evaluation of 13 AG PROMs revealed distinct strengths and limitations in their psychometric properties. While content validity was robust for two PROMs, significant gaps were observed in other critical domains. Cross-cultural validity emerged as a particular weakness, with none of the PROMs meeting adequacy standards. Furthermore, criterion validity was substantially understudied, with evidence available for only two PROMs, and test–retest reliability was adequately established for just five PROMs. Notably, none of the studies reported measurement error or responsiveness data.

Cross-cultural validity was not established for any of the eight PROMs evaluated, as none employed recommended methods, such as multi-group confirmatory factor analysis or cross-cultural differential item functioning analysis, to test measurement invariance. These methodological shortcomings suggest that when these instruments are applied across different countries or cultural contexts, measurement bias may arise, limiting the comparability and generalizability of findings. These methodological shortcomings highlight the impact of cultural factors on measurement accuracy. One study (Liew, 2016) reported higher levels of AG among Asian populations compared to American groups, a difference that may be attributed to the stronger emphasis on familial obligations and collectivist values in many Asian societies (Fan et al., 2022; Liang et al., 2024). Within these cultural contexts, expressing emotions openly is often discouraged to maintain family and social harmony, potentially leading to emotional suppression. Such suppression may not only exacerbate psychological distress but also intensify the experience of AG, thereby increasing susceptibility to depression and anxiety (Wang and Shi, 2013; Yu et al., 2021). Further supporting the role of cultural background, Dehpour and Koffman (2023) observed that individuals from certain ethnic groups, such as Malay populations, also exhibited elevated levels of AG. This suggests that the experience of AG may be more influenced by the individual psychological state and cultural background.

In terms of criterion validity, only two PROMs (TIGDS and MM-CGI-BF) focused on the agreement between the assessment tools and the “gold standard.” This may be due to the absence of a recognized gold standard for assessing AG in patients with CIs and their informal caregivers. Notably, without evidence of criterion validity, it is not possible to determine whether changes in instrument scores truly reflect clinically meaningful changes, leading to unquantifiable systematic bias. Lee et al. (2017) pointed out that in the absence of a gold standard, tools with similar measurement dimensions, multi-dimensional structure, and comprehensive content coverage can be selected as the reference basis for criterion validity evaluation. The TIGDS used the clinical consensus of an interdisciplinary hospice and palliative care team as a reference standard to distinguish AG from depressive symptoms (Periyakoil et al., 2005). Although the standard is a non-traditional quantitative tool, it provides a professional basis for discrimination based on the clinical experience of multidisciplinary experts, and its effect size is high. On the other hand, the COSMIN guidelines clearly suggest that only when the simplified version of the scale is compared with the full version of the scale, the full version of the scale can be considered as the “gold standard” for criterion validity assessment (Shi et al., 2021). In the study, the MM-CGI-BF was used to assess AG of informal caregivers of patients with CIs (Liew and Yap, 2018), and the original MM-CGI scale was used as the reference standard. The results showed that there was a strong correlation between the two, thus providing preliminary support for criterion-related validity.

Content validity is an important indicator for evaluating the consistency between the content of AG measurement tool and the target concept, and it is also one of the most crucial measurement attributes in the process of scale development (Peng et al., 2020). For the seven PROMs (AGS, MM-CGI, MM-CGI-SF, PG-12, MM-CGI-CCPS, MM-CGI-BF, CGS, and AGS-13), the cognitive interviews relied mainly on expert input and paid little attention to feedback from the target population regarding their understanding of, feelings about, or suggestions for the items. Although item authority and scientific grounding are important, failing to systematically assess how well the target population understands and accepts the items may widen the cognitive gap between researchers and respondents, thereby increasing measurement error during data collection (Mokkink et al., 2024). Moreover, methodological shortcomings commonly occur during cognitive interviews or pilot testing. Common problems include: lack of cognitive interviews during the development process, conducting only quantitative surveys without combining cognitive interviews, lack of pre-experimental studies, and unclear description of the analysis methods. These issues reduce transparency in content validity development and limit study reproducibility (Huang et al., 2025). More importantly, they weaken the instrument’s reliability across three key dimensions: comprehensiveness, relevance, and comprehensibility, thereby reducing its clinical usefulness and measurement accuracy.

Stability reflects the degree of consistency in the results obtained from multiple tests on the same group of subjects (Peng et al., 2020). The retest interval, as a key evaluation indicator, is related to the measured construct, the target population, and the environment (Han et al., 2023). Short intervals should be avoided (due to the memory effect) or long intervals (due to changes in clinical conditions), to ensure a relatively stable clinical state (Peng et al., 2020). However, the retest intervals in this study ranged from 3 days to 6 months, which may result in insufficient consistency when the same instrument is applied at different time points, thereby reducing the reliability of longitudinal follow-up and repeated measurements. Although the COSMIN guidelines provide reporting norms for retest intervals, AG is a dynamic process. Therefore, when conducting the assessment, it is necessary to clearly define the conditions for retesting and the basis for time selection, control the measurement environment and consistency, reduce the influence of memory effects and structural changes, and improve the applicability and reliability of the scales (Peng et al., 2020; Zhao et al., 2025).

Implication and future research recommendation AG constitutes a significant psychological burden for patients with CIs and their informal caregivers, yet its clinical importance remains underrecognized. Although the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) has incorporated related concepts such as persistent complex bereavement disorder (PCBD), AG itself has not been formally included in major diagnostic systems or clinical guidelines. This omission has contributed to a widespread lack of standardized assessment in routine practice. For researchers and clinicians seeking to evaluate AG using brief self-report instruments, we recommend the PGAC as a preferred tool for assessing AG in patients with CIs, and the PG-12 and MM-CGI-SF for evaluating AG in informal caregivers of CIs patients. Structured assessment tailored to specific populations shows promise for early identification of unmet psychosocial needs and timely intervention, thereby supporting more comprehensive and person-centered CI management.

We propose the following structured approach to advance the development of AG PROMs: For currently low-evidence, B-grade AG PROMs, systematic efforts are needed to upgrade them to COSMIN A-grade standards. First, instruments with insufficient or unclear structural validity evidence should undergo CFA in target populations to provide statistical support for their unidimensional or multidimensional structure. Second, missing content validity evidence should be addressed by revisiting the instrument development process through cognitive interviews or Delphi expert consensus methods. Third, criterion validity studies should replace vague or unvalidated reference standards with well-established, psychometrically sound clinical assessments as anchors. Building on these improvements, future development and validation of AG PROMs should follow a higher-quality methodological framework. Researchers should conduct rigorous cross-cultural validation across diverse ethnic and cultural groups to ensure applicability and generalizability in different populations and clinical settings. Reliable “gold standard” criteria selected based on validity, reliability, and expert consensus should be prioritized in criterion-related validity testing. Finally, strict adherence to COSMIN guidelines is essential, integrating qualitative and quantitative approaches and ensuring clear, transparent, and comprehensive reporting of content validity and other measurement properties.

4.1 Limitations

Although this study systematically evaluated the psychometric properties of existing AG PROMs, it also has limitations. First, all included studies were cross-sectional, which precluded the assessment of instrument responsiveness, thereby limiting the applicability of these measures in dynamically tracking the evolution of AG. Future studies should use longitudinal designs with repeated assessments to validate the sensitivity of AG instruments to change and to inform the identification of key intervention windows. Second, the evidence base is restricted to CIs patients and their informal caregivers; thus, generalizability to critically ill populations and their informal caregivers remains uncertain. Furthermore, in the RG meta-analysis, the number of studies included for each PROM was limited (k ≤ 3), which may compromise the stability of the pooled estimates and the statistical power of the heterogeneity tests. Additionally, the search was limited to English and Chinese language publications, which may introduce language bias. Strengths include strict adherence to the COSMIN methodology and inclusion of at least one psychometric property measurement tool to ensure the accuracy of the results.

5 Conclusion

This review of 20 studies identified 13 PROMs and their psychometric properties. The PG-12 demonstrated the best psychometric properties for assessing AG of informal caregivers and was highly recommended for relevant research and clinical practice. However, there is currently no instrument that meets the recommended criteria for grade A to assess patient AG. PGAC has moderate-quality evidence in terms of content validity, structural validity, internal consistency, and cross-cultural validity, and is therefore provisionally recommended. However, it exhibits substantial heterogeneity in internal consistency, and the sample sizes used for its development and validation were relatively small, which may introduce potential bias in the results. Therefore, it is recommended that future studies conduct further validation with larger, multicenter samples to comprehensively evaluate its methodological quality and measurement properties. Future research should focus on validating assessment tools for AG in patients with CIs. It also emphasizes validating the applicability of AG measurement tools for patients with CIs and their informal caregivers across diverse cultural contexts and settings, to advance AG research and optimize clinical practice.

Data availability statement

The datasets used or analysed during the current study are available from the corresponding author on reasonable request.

Author contributions

SW: Conceptualization, Methodology, Software, Validation, Visualization, Writing – original draft. WG: Conceptualization, Data curation, Methodology, Software, Writing – original draft. LZ: Methodology, Validation, Writing – review & editing. Wei L: Methodology, Validation, Writing – review & editing. ZT: Validation, Visualization, Writing – review & editing. YC: Validation, Visualization, Writing – review & editing. Wen L: Formal analysis, Validation, Writing – review & editing. XZ: Formal analysis, Validation, Writing – review & editing. WT: Conceptualization, Supervision, Writing – review & editing. XG: Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by the Natural Science Foundation of Hainan Province, China [grant number 825MS094].

Acknowledgments

We are deeply grateful to everyone who offered suggestions and assistance on this manuscript.

Conflict of interest

The authors declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2026.1709822/full#supplementary-material

References

Al-Gamal, E., and Long, T. (2014). The MM-CGI cerebral palsy: modification and pretesting of an instrument to measure anticipatory grief in parents whose child has cerebral palsy. J. Clin. Nurs. 23, 1810–1819. doi: 10.1111/jocn.12218,

PubMed Abstract | Crossref Full Text | Google Scholar

Al-Gamal, E., Long, T., and Livesley, J. (2009). Development of a modified instrument to measure anticipatory grieving in Jordanian parents of children diagnosed with Cancer: the Marwit and Meuser caregiver inventory childhood Cancer. Cancer Nurs. 32, 211–219. doi: 10.1097/NCC.0b013e31819a2ae4,

PubMed Abstract | Crossref Full Text | Google Scholar

Areia, N. P., Fonseca, G., Major, S., and Relvas, A. P. (2019). Psychological morbidity in family caregivers of people living with terminal cancer: prevalence and predictors. Palliat. Support. Care 17, 286–293. doi: 10.1017/S1478951518000044,

PubMed Abstract | Crossref Full Text | Google Scholar

Ar-Karci, Y., and Karanci, A. N. (2019). Examination of the psychometric properties of Marwit-Meuser caregiver grief inventory-Short Form. Turk. J. Psychiatry 31, 192–200. doi: 10.5080/u23501,

PubMed Abstract | Crossref Full Text | Google Scholar

Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Stat. Med. 21, 1331–1335. doi: 10.1002/sim.1108,

PubMed Abstract | Crossref Full Text | Google Scholar

Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. R. (2009). Introduction to Meta-Analysis. Chichester, UK: John Wiley and Sons, Ltd.

Google Scholar

Chan, W. C. H., Wong, B., Kwok, T., and Ho, F. (2017). Assessing grief of family caregivers of people with dementia: validation of the Chinese version of the Marwit–Meuser caregiver grief inventory. Health Soc. Work 42, 151–158. doi: 10.1093/hsw/hlx022

Crossref Full Text | Google Scholar

Chen, T., Su, L., Yu, J., Zhao, H., Xiao, H., and Wang, Y. (2024). Latent profile analysis of anticipatory grief in family caregivers of patients with chronic heart failure and its influencing factors. BMC Palliat. Care 23:291. doi: 10.1186/s12904-024-01621-1,

PubMed Abstract | Crossref Full Text | Google Scholar

Cheng, S. T., Ma, D. Y., and Lam, L. C. W. (2019). A brief measure of predeath grief in dementia caregivers: the caregiver grief questionnaire. Int. Psychogeriatr. 31, 1099–1107. doi: 10.1017/S1041610219000309,

PubMed Abstract | Crossref Full Text | Google Scholar

Cheung, D. S. K., Ho, K. H. M., Cheung, T. F., Lam, S. C., and Tse, M. M. Y. (2018). Anticipatory grief of spousal and adult children caregivers of people with dementia. BMC Palliat. Care 17:124. doi: 10.1186/s12904-018-0376-3,

PubMed Abstract | Crossref Full Text | Google Scholar

Coelho, A., Silva, C., and Barbosa, A. (2017). Portuguese validation of the prolonged grief disorder questionnaire–Predeath (PG–12): psychometric properties and correlates. Palliat. Support. Care 15, 544–553. doi: 10.1017/S1478951516001000

Crossref Full Text | Google Scholar

Dehpour, T., and Koffman, J. (2023). Assessment of anticipatory grief in informal caregivers of dependants with dementia: a systematic review. Aging Ment. Health 27, 110–123. doi: 10.1080/13607863.2022.2032599,

PubMed Abstract | Crossref Full Text | Google Scholar

Fan, H., Zhang, X., Wang, Y., Peng, Z., Chu, L., and Coyte, P. C. (2022). Does the provision of informal care matter for caregivers’ mental health? Evidence from China. Geriatr. Nurs. 48, 14–23. doi: 10.1016/j.gerinurse.2022.08.006,

PubMed Abstract | Crossref Full Text | Google Scholar

GBD 2021 Forecasting Collaborators (2024). Burden of disease scenarios for 204 countries and territories, 2022-2050: a forecasting analysis for the global burden of disease study 2021. Lancet 403, 2204–2256. doi: 10.1016/S0140-6736(24)00685-8,

PubMed Abstract | Crossref Full Text | Google Scholar

Gilsenan, J., Gorman, C., and Shevlin, M. (2022). Exploratory factor analysis of the caregiver grief inventory in a large UK sample of dementia carers. Aging Ment. Health 26, 320–327. doi: 10.1080/13607863.2020.1839856,

PubMed Abstract | Crossref Full Text | Google Scholar

Han, S., Zhou, J., Ji, M., Zhang, Y., Li, K., Chai, X., et al. (2023). Psychometric properties of measurement tools of active aging: a systematic review. Int. J. Nurs. Stud. 137:104388. doi: 10.1016/j.ijnurstu.2022.104388,

PubMed Abstract | Crossref Full Text | Google Scholar

Hartung, J., and Knapp, G. (2001). On tests of the overall treatment effect in meta-analysis with normally distributed responses. Stat. Med. 20, 1771–1782. doi: 10.1002/sim.791,

PubMed Abstract | Crossref Full Text | Google Scholar

Higgins, J. P. T., and Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Stat. Med. 21, 1539–1558. doi: 10.1002/sim.1186,

PubMed Abstract | Crossref Full Text | Google Scholar

*Holm, M., Alvariza, A., Fürst, C. J., Öhlen, J., and Årestedt, K. (2019). Psychometric evaluation of the anticipatory grief scale in a sample of family caregivers in the context of palliative care. Health Qual. Life Outcomes 17:42. doi: 10.1186/s12955-019-1110-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Huang, J., Yang, J., Han, M., Xue, Z., Xu, M., Qi, H., et al. (2025). Psychometric evaluation of patient-reported experience measures for peri-anesthesia care: a systematic review based on COSMIN guidelines. Int. J. Nurs. Stud. 161:104930. doi: 10.1016/j.ijnurstu.2024.104930,

PubMed Abstract | Crossref Full Text | Google Scholar

Lee, E.-H., Kim, C. J., Lee, J., and Moon, S. H. (2017). Self-administered health literacy instruments for people with diabetes: systematic review of measurement properties. J. Adv. Nurs. 73, 2035–2048. doi: 10.1111/jan.13256,

PubMed Abstract | Crossref Full Text | Google Scholar

Lee, J., Lee, E. H., and Moon, S. H. (2019). Systematic review of the measurement properties of the depression anxiety stress Scales-21 by applying updated COSMIN methodology. Qual. Life Res. 28, 2325–2339. doi: 10.1007/s11136-019-02177-x,

PubMed Abstract | Crossref Full Text | Google Scholar

Li, J. Y., Jiao, J., and Ren, H. L. (2020). Preparatory grief and its influencing factors among breast cancer patients undergoing chemotherapy. J. Nurs. Sci. 35, 11–14. doi: 10.3870/j.issn.1001-4152.2020.18.011.

Crossref Full Text | Google Scholar

Liang, H. J., Xiong, Q., Remawi, B. N., and Preston, N. (2024). Taiwanese family members’ bereavement experience following an expected death: a systematic review and narrative synthesis. BMC Palliat. Care 23:14. doi: 10.1186/s12904-024-01344-3,

PubMed Abstract | Crossref Full Text | Google Scholar

Liew, T. M. (2016). Applicability of the pre-death grief concept to dementia family caregivers in Asia. Int. J. Geriatr. Psychiatry 31, 749–754. doi: 10.1002/gps.4387,

PubMed Abstract | Crossref Full Text | Google Scholar

*Liew, T. M., and Yap, P. (2018). A brief, 6-item scale for caregiver grief in dementia caregiving. The Gerontologist 60, e1–e10. doi: 10.1093/geront/gny161,

PubMed Abstract | Crossref Full Text | Google Scholar

Liew, T. M., Yap, P., Luo, N., Hia, S. B., Koh, G. C.-H., and Tai, B. C. (2018). Detecting pre-death grief in family caregivers of persons with dementia: measurement equivalence of the mandarin-Chinese version of Marwit-Meuser caregiver grief inventory. BMC Geriatr. 18:114. doi: 10.1186/s12877-018-0804-5,

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, X., Zhou, Y. M., Xu, H. L., Peng, J. Y., Xie, Z. S., and Xing, L. M. (2023). Reliability and validity of the Chinese version of the anticipatory grief scale in caregivers of young and middle-aged patients with severe stroke. J. Hubei Univ. Med. 42, 676–680. doi: 10.13819/j.issn.2096-708X.2023.06.020

Crossref Full Text | Google Scholar

López-López, J. A., Botella, J., Sánchez-Meca, J., and Marín-Martínez, F. (2013). Alternatives for mixed-effects meta-regression models in the reliability generalization approach: a simulation study. J. Educ. Behav. Stat. 38, 443–469. doi: 10.3102/1076998612466142

Crossref Full Text | Google Scholar

Marwit, S. J., and Meuser, T. M. (2002). Development and initial validation of an inventory to assess grief in caregivers of persons with Alzheimer’s disease. The Gerontologist 42, 751–765. doi: 10.1093/geront/42.6.751,

PubMed Abstract | Crossref Full Text | Google Scholar

Marwit, S. J., and Meuser, T. M. (2005). Development of a SHORT FORM inventory to assess grief in caregivers of dementia patients. Death Stud. 29, 191–205. doi: 10.1080/07481180590916335,

PubMed Abstract | Crossref Full Text | Google Scholar

Meichsner, F., Schinköthe, D., and Wilz, G. (2016). The caregiver grief scale: development, exploratory and confirmatory factor analysis, and validation. Clin. Gerontol. 39, 342–361. doi: 10.1080/07317115.2015.1121947

Crossref Full Text | Google Scholar

Mokkink, L. B., de Vet, H. C. W., Prinsen, C. a. C., Patrick, D. L., Alonso, J., Bouter, L. M., et al. (2018). COSMIN risk of Bias checklist for systematic reviews of patient-reported outcome measures. Qual. Life Res. 27, 1171–1179. doi: 10.1007/s11136-017-1765-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Mokkink, L. B., Elsman, E. B. M., and Terwee, C. B. (2024). COSMIN guideline for systematic reviews of patient-reported outcome measures version 2.0. Qual. Life Res. 33, 2929–2939. doi: 10.1007/s11136-024-03761-6,

PubMed Abstract | Crossref Full Text | Google Scholar

Mokkink, L. B., Prinsen, C. A. C., Bouter, L. M., Vet, H. C. W.De, and Terwee, C. B. (2016). The COnsensus-based standards for the selection of health measurement INstruments (COSMIN) and how to select an outcome measurement instrument. Braz. J. Phys. Ther. 20, 105–113. doi:doi: 10.1590/bjpt-rbf.2014.0143,

PubMed Abstract | Crossref Full Text | Google Scholar

Mystakidou, K., Tsilika, E., Parpa, E., Katsouda, E., Sakkas, P., and Soldatos, C. (2005). Life before death: identifying preparatory grief through the development of a new measurement in advanced cancer patients (PGAC). Support Care Cancer 13, 834–841. doi: 10.1007/s00520-005-0797-4,

PubMed Abstract | Crossref Full Text | Google Scholar

Nielsen, M. K., Neergaard, M. A., Jensen, A. B., Vedsted, P., Bro, F., and Guldin, M.-B. (2017). Preloss grief in family caregivers during end-of-life cancer care: a nationwide population-based cohort study. Psychooncology 26, 2048–2056. doi: 10.1002/pon.4416,

PubMed Abstract | Crossref Full Text | Google Scholar

Önal, G., Keser, E., and Tüzün, Z. (2023). Validity and Reliability Study of the Prolonged Grief Disorder- Caregiver Turkish Form. Turk. J. Psychiatry, 35:46–55. doi: 10.5080/u27035

Crossref Full Text | Google Scholar

Peng, J., Shen, L. J., Chen, Y. T., Zhou, T., Cui, Y. B., Zou, L. L., et al. (2020). A systematic review of childhood cancer-related fatigue assessment tools based on the COSMIN guidelines. Chin. J. Evid. Based Med. 20, 1340–1344. doi: 10.7507/1672-2531.202003164

Crossref Full Text | Google Scholar

Pérez-González, A., Vilajoana-Celaya, J., and Guàrdia-Olmos, J. (2023). Burden and anticipatory grief in caregivers of family members with Alzheimer’s disease and other dementias. Palliat. Support. Care, 22, 1158–1168. doi: 10.1017/S1478951523001360

Crossref Full Text | Google Scholar

Periyakoil, V. S., Kraemer, H. C., Noda, A., Moos, R., Hallenbeck, J., Webster, M., et al. (2005). The development and initial validation of the terminally ill grief or depression scale (TIGDS). Int. J. Methods Psychiatr. Res. 14, 203–212. doi: 10.1002/mpr.8,

PubMed Abstract | Crossref Full Text | Google Scholar

Prinsen, C. A. C., Mokkink, L. B., Alonso, J., Patrick, D. L., De Vet, H. C. W., and Terwee, C. B. (2018). COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual. Life Res. 27, 1147–1157. doi: 10.1007/s11136-018-1798-3,

PubMed Abstract | Crossref Full Text | Google Scholar

Sánchez-Meca, J., Marín-Martínez, F., López-López, J. A., Núñez-Núñez, R. M., Rubio-Aparicio, M., López-García, J. J., et al. (2021). Improving the reporting quality of reliability generalization meta-analyses: the REGEMA checklist. Res. Synth. Methods 12, 516–536. doi: 10.1002/jrsm.1487,

PubMed Abstract | Crossref Full Text | Google Scholar

Shi, Y. X., Zhang, H. M., Huang, Y. Q., Wu, L., Liu, F., and Shang, S. M. (2021). Interpretation of COSMIN risk of bias checklist. Chinese Nursing Management 21, 1053–1057. doi: 10.3969/j.issn.1672-1756.2021.07.018

Crossref Full Text | Google Scholar

Tekdemir, R., Kaya, S., Aksoy, İ., and Karakaya, S. (2024). Anticipatory grief and its associated factors in lung Cancer patients. NI 62, 88–93. doi: 10.5152/NeuropsychiatricInvest.2024.24019

Crossref Full Text | Google Scholar

Terwee, C. B., Bot, S. D. M., de Boer, M. R., van der Windt, D. A. W. M., Knol, D. L., Dekker, J., et al. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. J. Clin. Epidemiol. 60, 34–42. doi: 10.1016/j.jclinepi.2006.03.012,

PubMed Abstract | Crossref Full Text | Google Scholar

Theut, S. K., Jordan, L., Ross, L. A., and Deutsch, S. I. (1991). Caregiver’s anticipatory grief in dementia: a pilot study. Int. J. Aging Hum. Dev. 33, 113–118. doi: 10.2190/4KYG-J2E1-5KEM-LEBA,

PubMed Abstract | Crossref Full Text | Google Scholar

Tian, L., Li, M. Y., Xiao, S. Q., and Yan, L. (2021). Prevalence and influencefactors of anticipatory grief in advanced cancer patients. J. Nursing 28, 11–14. doi: 10.16460/j.issn1008-9969.2021.07.0111

Crossref Full Text | Google Scholar

Toot, S., Swinson, T., Devine, M., Challis, D., and Orrell, M. (2017). Causes of nursing home placement for older people with dementia: a systematic review and meta-analysis. Int. Psychogeriatr. 29, 195–208. doi: 10.1017/S1041610216001654

Crossref Full Text | Google Scholar

Treml, J., Schmidt, V., Nagl, M., and Kersting, A. (2021). Pre-loss grief and preparedness for death among caregivers of terminally ill cancer patients: a systematic review. Soc. Sci. Med. 284:114240. doi: 10.1016/j.socscimed.2021.114240,

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, X. J., and Shi, M. W. (2013). Research on coping with stress among Chinese people: an emic and etic perspective. Adv. Psychol. Sci. 21, 1239–1247. doi: 10.3724/SP.J.1042.2013.01239

Crossref Full Text | Google Scholar

World Health Organization. (2023). World health statistics 2023: Monitoring health for the SDGs, sustainable development goals. Available online at: https://www.who.int/publications/i/item/9789240074323 (Accessed July 11, 2025).

Google Scholar

Xin, D. J. (2017). Study on grief of advanced cancer patients and their families (Luzhou, Sichuan, China: Master’s thesis, Southwest Medical University).

Google Scholar

Yu, W., Lu, Q., Lu, Y., Yang, H., Zhang, L., Guo, R., et al. (2021). Anticipatory grief among Chinese family caregivers of patients with advanced cancer: a cross-sectional study. Asia-Pac. J. Oncol. Nurs. 8, 369–376. doi: 10.4103/apjon.apjon-214,

PubMed Abstract | Crossref Full Text | Google Scholar

Zhao, Q., Wang, Y., and Bian, L. Z. (2025). A systematic review of childhood cancer related fatigue assessment tools based on the COSMIN guidelines. Chinese J. Contemporary Pediatrics 27, 184–191. doi: 10.7499/j.issn.1008-8830.2310071,

PubMed Abstract | Crossref Full Text | Google Scholar

Zheng, D., Lin, X., Gao, X., Wang, L., and Zhu, M. (2025). The impact of emotional freedom techniques on anxiety, depression, and anticipatory grief in people with cancer: a meta-analysis and systematic review. J. Psychosom. Res. 192:112088. doi: 10.1016/j.jpsychores.2025.112088

Crossref Full Text | Google Scholar

Keywords: anticipatory grief, chronic illness, patient-reported outcome measure, psychometric properties, systematic review

Citation: Wang S, Gu W, Zhang L, Liu W, Tao Z, Cheng Y, Liu W, Zhao X, Tu W and Gao X (2026) Patient-reported outcome measures for anticipatory grief: a systematic review. Front. Psychol. 17:1709822. doi: 10.3389/fpsyg.2026.1709822

Received: 21 September 2025; Revised: 24 January 2026; Accepted: 27 January 2026;
Published: 11 February 2026.

Edited by:

Laura Badenes-Ribera, University of Valencia, Spain

Reviewed by:

Asa Choi, University College London, United Kingdom
Meiyan Jiang, Baylor Health Care System, United States

Copyright © 2026 Wang, Gu, Zhang, Liu, Tao, Cheng, Liu, Zhao, Tu and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wen Tu, dHV3ZW5obkBnbWFpbC5jb20=; Xing Gao, eGluZ3hpbmc5ODA0QDE2My5jb20=

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.