Does the implementation of pay-for-performance indicators improve the quality of healthcare? First results in France

Background Pay-for-performance (P4P) models are intended to promote quality of care in both hospitals and primary care settings. They are considered as a means of changing medical practices, particularly in primary care. Objectives The first objective of this study was to assess how performance indicators changed over time, measured through “Remuneration on Public Health Objectives” (ROSP) scores, between 2017 and 2020 in a large French region (Grand Est region), and to compare this evolution in the rural vs. urban areas of the region. The second objective was to focus on the area with the least improvement in ROSP scores and to investigate whether the scores and the available sociodemographic characteristics of the area were associated. Methods First, we measured the evolution over time of P4P indicators (i.e., ROSP scores) obtained from the regional health insurance system, for GP practices in the Grand Est region between 2017 and 2020. We then compared the scores between the Aube Department and the rest of the region (urban areas). To address the second objective, we focused on the area found to have the least improvement in indicators to investigate whether there was a relationship between ROSP score and sociodemographic characteristics. Results More than 40,000 scores were collected. We observed an overall improvement in scores over the study period. The urban area (Grand Est region minus the Aube) scored better than the rural area (Aube) for chronic disease management [median 0.91 (0.84–0.95) vs. 0.90(0.79–0.94), p < 0.001] and prevention [median 0.36 (0.22–0.45) vs. 0.33 (0.17–0.43), p < 0.001], but not for efficiency, where the rural area (Aube) performed better [median 0.67(0.56–0.74) vs. 0.69 (0.57–0.75 in the rest of the Grand Est region, p = 0.004]. In the rural area, we found no significant association between ROSP scores and sociodemographic characteristics, except for extreme rurality in some sub-areas. Conclusions At the regional level, the overall improvement in scores observed between 2017 and 2020 suggests that the implementation of ROSP indicators have improved the quality of care, particularly in urban areas. These results also suggest that efforts should be focused on rural areas, which already had the lowest scores at the start of the P4P program.


. Introduction
Pay for performance (P4P) models are used to improve the quality of care through economic incentives that are based on the achievement of quality indicators. These models are now widely used in the form of mixed payments (fee-for-service and contracting) and represent the first step in shifting from feefor-service to capitation-based models. When primary care is predominantly funded on a fee-for-service basis, introducing a P4P model may help to change practices and promote prevention. Indeed, in the P4P model, payment is based on the number of patients being treated (for example, for chronic conditions) rather than on the number of individual procedures. When payment is on a fee-for-service basis, it can lead to artificial "inflation" of the number of procedures (1). With fee-for-service models, there is a propensity to give precedence to quantity at the cost of quality of care, contrary to capitation-based models, which favor quality (2).
P4P programs have been part of numerous experiments in both the hospital and ambulatory care sectors. In 2004, the United Kingdom (UK) was one of the first countries to introduce this type of model with the Quality and Outcomes Framework, which was designed to change medical practices through the use of performance indicators. Nevertheless, some evaluation studies indicated that performance indicators may not be directly beneficial in the hospital sector (3)(4)(5) or in primary care (6,7). Many parameters, such as the type of health insurance system and whether patients are seen in ambulatory versus hospital-based settings, may interfere with the results of these P4P programs. While interpreting and evaluating the effects of financial incentives is not a straightforward task (8), this innovative financing approach is a lever for improving practices in various care settings.
Achieving the objectives set by health authorities can be challenging for healthcare professionals, and physicians in disadvantaged areas often have greater difficulty achieving P4P program goals (9), as has recently been observed in a study from the United States (10). However, in areas with lower baseline performance indicators, P4P models may be particularly useful since there is room for significant improvement (11).
In France, an experimental measure based on voluntary participation, the Contract for Improvement of Individual Practice (Contrat d'Amélioration des Pratiques Individuelles-CAPI), was launched in 2008 to introduce payment by capitation into the remuneration of general practitioners (GP). In 2011, this measure was extended and became Remuneration based on Public Health Objectives (Rémunération sur Objectifs de Santé Publique-ROSP). ROSP applies to GPs as well as to certain specialists and is regularly updated. Currently, it includes 29 clinical indicators for GPs caring for adult patients. This P4P approach rewards all GPs by providing additional payments based on the level of achievement of ROSP indicators, as assessed by quality indicators. The list of indicators is known, so GPs can consult the expected performance criteria for this additional source of income. However, the implementation of ROSPs has been relatively slow: the first payments to GPs were made in 2013, and the number of indicators was expanded in 2016. The first evaluation of the effects of ROSP on physician remuneration took place in 2018. This system is based on a contract between GPs and the national health insurance system, which sets rates of payment according to the level of achievement of each indicator, measured by the scores obtained (National Health Insurance, 2022. La Rosp du médecin traitant de l'adulte. https://www.ameli.fr/medecin/ exercice-liberal/remuneration/remuneration-objectifs/medecintraitant-adulte=). A previous study reported wide variability in obtained scores, which was attributed to the type of physician and their geographical location (12). In this regard, remoteness is a known limiting factor for the use of primary care (in general or specialized medicine) (13)(14)(15)(16)(17). We hypothesized that this limitation could negatively impact the quality of care and may be reflected by lower ROSP scores.
The first objective of this study was therefore to measure the evolution in performance indicators between 2017 and 2020, as measured by ROSP scores, in a large French region (Grand Est region) and to compare the changes in scores between the different areas of the region (rural and urban areas). The second objective of the study was to focus on the area with the least improvement in ROSP scores to investigate whether there was an association between the scores and available sociodemographic characteristics.

. Methods
We performed a retrospective cohort study using data obtained from the Regional Health Insurance System. These routine reimbursement data include payments to physicians based on ROSP scores. ROSP scores are calculated for each individual GP, and they measure the level of achievement for each indicator. A detailed description of the calculation method is given in the Supplementary Figure 1 and Supplementary Table 1. We constructed our analyses in line with the two objectives. First, we sought to investigate whether there was an improvement over time following the implementation of P4P in the region for which we had data (Grand Est region). Then, if an improvement was observed, we compared the course of ROSP scores between the different areas of this region (rural: Aube department, and urban: the rest of the Grand-Est region). To address the second objective, we then focused on the area with the least improvement in ROSP scores in order to assess whether there was an association between the scores and available sociodemographic characteristics. The characteristics we focused on were: population density, potential local accessibility, and sociodemographic category of the area (i.e., urban with poor access to care, city center, rural and unattractive urban area, or rural area).

. . Primary outcome
We retrieved ROSP scores from 2017 to 2020 for all GPs who were eligible for performance-based payment in the Grand Est, an administrative region in the east of France. Accounting for almost 8% of the French population (5 million inhabitants), the Grand Est region includes five urban areas with more than 250,000 inhabitants each (i.e., Metz, Mulhouse, Nancy, Reims and Strasbourg). The Aube Department, in contrast, is the most rural of the 10 departments that comprise the Grand Est region. We thus compared the Aube department with the rest of the Grand Est region (excluding the Aube). Apart from the   . We therefore hypothesized that the comparison of these two areas (i.e., the Aube department vs. the rest of the Grand Est region) would highlight differences in GPs' practices in rural and urban areas. The measurement of ROSP indicators was an existing metric that concerns all GPs and that could be used in this framework of this study. The ROSP indicators are defined by the national health insurance system, and are applicable to three areas of GPs' clinical practice, namely: monitoring of chronic diseases, prevention measures, and efficiency of care. For the national health insurance system, these ROSP indicators are used to measure the quality of care and medical practices. For the majority of the indicators, the aim is to exceed the threshold value defined for each indicator. There are 29 indicators, for a total of 940 points. Each point has a monetary value of 7 euros. In addition to reaching the target rates set by the health insurance system, GPs must treat a minimum number of patients in order to be eligible for financial rewards via the ROSP system. The ROSP scores in the Aube department and the rest of the Grand Est region were compared overall (Supplementary Table 1).
Concerning the Iatrogenesis and Antibiotic-use indicators, the objectives for GPs involve limitation or reduction, i.e., lower scores are better. For example, for the indicator Percentage of patients aged >75 years old who do not have documented long-term psychiatric disorders and who have ≥2 prescribed psychotropic drugs (excluding anxiolytics) is in the Iatrogenesis category. The intermediate objective was to limit this prescription rate to 10% of patients meeting the definition, with an ultimate target of 3% or fewer. Only four indicators require that each GP connects individually to the health insurance website to declare their activity in view of ROSP indicator calculation (Ameli.fr). For all other indicators, the GP is not required to provide any information. The health insurance system computes the indicators automatically and calculates the total financial reward to be allocated to each physician.

. . Definitions for classification
For the second part of the study, to take into account potential geographical, social and healthcare differences, we classified the Aube department using three methods: (i) the French Office of National Statistics population density grid classification for municipalities was used to classify municipalities as either "high population density zones" (densely populated and intermediate density), or "low population density zones" (sparsely populated, or very sparsely populated); (ii) the local potential accessibility (LPA) score, which is a measure of the supply of and demand for GPs that takes into account volume of activity, and service use rates differentiated by population age structure. LPA was categorized as "high-accessibility" (if the values were above the median value of the LPA score) or "low-accessibility" (if values were below the median LPA score); and (iii) the Institute for Research and Documentation in the Economics of Health (IRDES) social and health classification (in 6 classes), including supply and demand for healthcare and the attractiveness of the area (details given in Supplementary Table 2).

. . Statistical analysis
Due to the asymmetric nature of the data collected and the presence of outliers, we used median values for our statistical analyses. Wilcoxon tests were used to compare the three ROSP categories, and the sub-categories for the three classifications described above, in the Grand Est region and Aube department between 2017 and 2020. We also assessed the trends in ROSP scores over the four study years using the Kruskal-Wallis test. A p-value <0.05 was considered statistically significant. All analyses were performed using SAS software version 9.4 (SAS Institute Inc., Cary, NC).

. . Ethical considerations
This study was conducted in accordance with national laws regarding epidemiological research and data protection. Since this study was entirely retrospective and observational, and relied solely on anonymous data (no personal data), neither ethical approval nor written consent were required.

. Results
We compared 1,919 ROSP scores from the Aube department to 39,017 ROSP scores from the remainder of the Grand Est region. All of the scores were generated between 2017 and 2020.
Between 2017 and 2020, the results tended to improve throughout the Grand Est region, including in the Aube department (Table 1). There was an improvement in Chronic disease follow-up, except for cardiovascular risk (rate variation: −1.96% for Aube, −5.36% for Grand Est). Concerning Prevention, the Cancer indicator decreased between 2017 and 2020 for the Aube Department, but was stable for the Grand Est region. The results for Iatrogenesis and Antibiotic use also improved (indicated by a decreased ROSP score) for the Aube and the Grand Est. For Efficiency, ROSP scores were higher for the Aube compared to Grand Est, and there was a greater increase between 2017 and 2020 for the Aube (rate increase: 27.27% for Aube, 25.45% for Grand Est). ROSP scores for Influenza were null for the Aube and the Grand Est in 2020.   Overall (n = , ) High-density population (n = , ) Low-density population (n = Overall ROSP scores between 2017 and 2020 were compared between the Aube department and the rest of the Grand Est region (excluding the Aube) ( Table 2). For indicators relating to chronic diseases, prevention and efficiency of care, while the results were significantly different, the differences were numerically small. Within each category, there were more marked differences between the Aube and the Grand Est for certain sub-criteria, such as the ROSP indicators for cardiovascular risk (median value Aube = 0.51 vs. Grand Est = 0.56, p < 0.001), antibiotic prescription (median Aube = 0.19 vs. Grand Est = 0.23, p < 0.001) and prescription of biosimilars (median Aube = 0.10 vs. median Grand Est = 0.05, p < 0.001).
In terms of prevention, cancer prevention was significantly worse in the Aube department, with a difference of 0.05 points (InterQuartile Range (IQR) Aube = 0.43 vs. IQR Grand Est = 0.48, p < 0.001). On the contrary, this department had a better overall .
/fpubh. .   Table 3 displays the results according to the population density of the area where the GP's practice was located for GPs in the Aube Department. In terms of chronic disease follow-up, there was no significant difference between high-and lowdensity areas, except for cardiovascular risk, where low-density zones had better results (median 0.55 vs. 0.50, p < 0.001). The opposite was observed for prevention: high-density zones achieved better results for iatrogenesis and prescription of antibiotics (median 0.11 vs. 0.14, p < 0.0001, and 0.16 vs. 0.25, p < 0.0001, respectively). There was no significant difference in cancer prevention between high-and low-population density areas, but the high-density zones obtained better results for prescription efficiency (median 0.93 vs. 0.94, p < 0.0001).
ROSP scores according to high vs. low potential accessibility in the Aube Department are presented in Table 4. The overall score for chronic disease follow-up was lower in low-accessibility zones than in high-accessibility zones (median 0.90 vs. 0.91, p < 0.02). Conversely, for cardiovascular risk, low-accessibility zones had higher scores (median 0.53 vs. 0.50, p < 0.0001).
Regarding prevention, the high-accessibility zones seemed to perform better for the risk of iatrogenesis and prescription of antibiotics (median 0.13 vs. 0.10, p < 0.0001 and median 0.24 vs. 0.12, p < 0.0001, respectively). The high-accessibility zones also had a better score for prescription efficiency (median 0.93 vs. 0.95, p < 0.001). Table 5

. Discussion
In our analysis of the temporal trends in ROSP scores from 2017 to 2020, we observed a gradual improvement each year for both the Aube department and the Grand Est region. This result suggests that the implementation ROSP has a positive impact of on quality of care. The increase was particularly marked for the prescription of biosimilars and generic drugs, which is a successful result in view of current health policies that aim to restrict health expenditures. This finding has also been described in the literature (18). Our results show that the urban area (Grand Est region) had better scores for chronic disease management and prevention, whereas the rural area (Aube) performed better for efficiency. However, the literature does not always show positive effects for these quality of care incentives. A recent study showed that P4P scores were inconsistently associated with quality improvement, which raises questions about the usefulness of the incentives (19).
In the Aube Department, it is worth underlining that overall ROSP scores were similar regardless of the population density (high-density vs. low-density). This shows that GPs can achieve similar quality of care outcomes within a rural area that is supposedly heterogeneous in terms of population density.
. /fpubh. . Overall (n = , ) However, scores in the Prevention category were worse in lowpopulation-density areas for cancer screening, iatrogenesis and antibiotic use. Again for the Aube department, the Chronic Disease indicator scored worse in areas with a lower potential accessibility score, although the difference in scores was very small. This result should be weighed against the fact that scores were higher for the Cardiovascular risk subcategory in areas with a low LPA score. Our results therefore only partly corroborate those of the literature, where it has been reported that GP activity differs in the city and in the countryside, with those practicing in rural areas tending to manage more patients with chronic diseases and to perform fewer preventive acts (20). The IRDES classification provides additional results, showing that urban areas with poor access to care had the lowest cardiovascular scores. However, the most rural areas within the Aube department had lower scores on the Chronic Disease, Cancer, and Iatrogenesis indicators, again highlighting significant differences within our rural study area. The prevention and efficiency scores did not differ according to the IRDES classification.
Our results can be at least partially explained by established biases of P4P programs in private practice. It is known that patients for whom P4P goals are more achievable receive more care (21). In addition, difficulty accessing specialists, such as cardiologists, may lead primary care physicians to over-medicate patients with certain conditions, and this would indirectly affect the ROSP scores compared to other regions. In this case, the indicators reflect more the difference in patients treated between urban and rural areas than the difference in practices related to the professionals themselves. The poorer results obtained in the areas in the Aube department with low potential accessibility could reflect shorter consultation times due to an increased burden of work for health professionals, especially GPs (22). The lack of time to explain the reasons for antibiotic abstention and to offer additional follow-up consultations could explain the over-prescription of antibiotics.
Overall, our study provides original results by seeking to compare practices between urban and rural areas and within a rural area based on P4P indicators. This investigation was made possible by access to this novel database. Ultimately, our work could be used to develop specific indicators to monitor the quality of care provided, and to provide insights into how we can best adapt the resources available to health professionals in rural areas.
This study has some limitations. Firstly, there is potential for selection bias because our statistics only include GPs who are registered for the P4P system. Although this represents the majority of physicians, it is important to note that their practices may differ from those of GPs who were not registered. We also know that GP have specific motivations for settling in urban vs. rural areas. While the majority of GPs choose to set up their practice in the region where they did their residency training, the criteria for choosing a more or less urban area are predominantly related to the dynamics of supply of care, demand for care and living conditions . /fpubh. .
in the area (23)(24)(25). Furthermore, it is not possible to fully assess the magnitude of the effect of ROSP scores on population health without first considering the case mix. The difficulty of assessing the overall impact is compounded by the frequent changes to the indicators, meaning that any assessment of the data and their relationship to patient health is necessarily limited to a short period of time. However, based on the trends we observed for the criteria studied, we can suggest that this limit seems well under control. We obtained results for only one large French region (Grand Est). However, this region has many points in common with the other French regions in terms of healthcare delivery. The design of the article did not enable direct assessment of the impact of the intervention through a comparison of the "here-vs-elsewhere" type, since we were not comparing two areas (i.e., one receiving the intervention and one not). The comparison of intervention vs. non-intervention areas was not possible because, subsequent to the Ministry of Health decision, P4P was implemented on a national level in a uniform manner. All regions in France implemented P4P at the same time. It was also not possible to conduct a beforeand-after evaluation, because the available data did not include information at T0, before the intervention began. It is common practice to evaluate public health policies with a time lag of several years in order to be more objective about the real impact of reforms, as it always takes time for practices to adapt, especially for GPs. Our study had 6 years of hindsight, which seemed reasonable to us. Our data therefore provide information on the evolution of ROSP scores (P4P indicators) during the implementation of P4P in a large French region, allowing us to judge whether a benefit could be expected from this implementation. We also sought to investigate whether the changes over time were different in rural vs. urban areas, bearing in mind that the entire region started P4P at the same time. It would have been very interesting to be able to compare the 2 areas according to population density, LPA, and IRDES classification. Unfortunately, we were unable to obtain sufficiently exhaustive data for the rest of the Grand Est region to ensure the validity of the comparison. Further studies are therefore needed to expand on this comparison and to be able to conclude on the impact of P4P.
Practitioners do not always see the introduction of performance-based payment as a positive change, which could contribute to weaker-than-expected improvements in efficiency (25). The implementation of P4P could also lead to over-medicalization as practitioners strive meet the indicator targets (26). Better compliance would likely require greater participation of the healthcare professionals themselves in the co-definition of these indicators. Finally, we cannot rule out a possible classification bias in the definition of groups for our comparisons, which are ultimately based on geographic criteria. However, our results were consistent across the three classification schemes, suggesting a limited effect of classification. It would have been interesting to obtain data on the medical demographics of the other departments of the region to extend our investigation. Future studies could qualitatively investigate the regional profiles of GPs to better understand whether their characteristics explain some of the differences we observed.

. Conclusions
The overall improvement in scores observed between 2017 and 2020 in the Grand Est region suggests that the implementation of ROSP indicators may be useful for improving quality of care in the medium and long term. However, the comparison of ROSP scores in rural and urban areas revealed certain differences, with urban areas doing better overall. When we focused on the rural area (Aube department), our data showed that the scores varied little according to the density of the sub-areas. However, significant differences were observed for some of the social criteria scores, showing lower ROSP scores for the extreme rurality ("Unattractive rural periphery") of an area. These results suggest that efforts should be concentrated on rural areas, which already had the lowest scores when P4P was first implemented, and which have seen fewer P4P-related benefits than their more urban neighbors.

Data availability statement
The data analyzed in this study is subject to the following licenses/restrictions: These data are provided by the primary health insurance fund, especially for the purposes of this study. They are therefore not available to the public. Requests to access these datasets should be directed to SS: stephane.sanchez@hcs-sante.fr.

Author contributions
LP and AO-H were involved in the conception and design of the study. SS and CQ were the coordinator of the study. LP and AO-H were responsible for the data collection. M-AS wrote the first draft. LB was in charge of the analysis. M-AS and CQ were involved in the interpretation and critically reviewed the first draft. All authors approved the final version and accept responsibility for the paper as published.