Sustained Effectiveness of Evidence-Based Parenting Programs After the Research Trial Ends

Despite ample evidence of the efficacy and effectiveness of evidence-based parenting programs (EBPPs) within research-led environments, there is very little evidence of maintenance of effectiveness when programs are delivered as part of regular service provision. The present study examined the effectiveness of EBPPs provided during a period of sustained service-led implementation in comparison to research-led effectiveness evaluation. Data from 3706 parents who received EBPPs during sustained implementation by services were compared to data from 1390 parents who had participated in an earlier researcher-led effectiveness trial of a national roll-out of EBPPs in England. In both phases, parents completed measures of child behavior problems, parenting style and parental mental well-being prior to starting parenting programs (pre-test), at the end of the programs (post) and at 12-months follow up. Results from Generalized Estimating Equations controlling for potential covariates indicated significant improvements in child behavior problems during sustained implementation, similar to the effectiveness phase; significant improvements in parenting style which were larger than the effectiveness phase at 12-month follow up; and significant improvements in parental mental well- being. Our findings demonstrate effective maintenance of gains when EBPPs are provided as part of regular provision across a large sample of English parents. Successful long-term implementation should consider effectiveness of EBPPs across the population, given the large contextual changes that take place between researcher-led evaluations and service take-up. Our findings support the integration of EBPPs in public health approaches to addressing child behavior problems and parent well-being.


INTRODUCTION
Evidence-informed policy making in public health or specialist service provision relies, in part, on research evidence about the efficacy and effectiveness of available interventions (Davies et al., 2000). According to recent standards of evidence (Gottfredson et al., 2015), interventions are gradually developed by building up the evidence in relation to the core mechanism of change, testing the efficacy of the full intervention package in controlled conditions, and then examining the effectiveness of the intervention under conditions that resemble real-life conditions more closely (Flay et al., 2005;Gottfredson et al., 2015).
Within implementation science it has become recognized that this evidence pathway should not end after demonstration of efficacy and then effectiveness when the intervention is scaled up and implemented as part of regular service provision. Rather, models of evidence pathways have extended to a further stage of sustainment or sustainability (Aarons et al., 2011). However, the theoretical conceptualization of this stage has been contentious, as indicated by the use of both sustainment (a state) and sustainability (a characteristic of the implementation). For example, in their review of 125 studies Wiltsey Stirman et al. (2012) found that 62% of studies used the term "sustainability"; that only 36 (29%) defined sustainability; and that there were a number of different definitions, the most common of which was that by Scheirer (2005) which just eight studies used. A recent review demonstrates consistency has not improved (Moore et al., 2017).
The main focus of such studies has been on the influences that enhance or reduce sustainability. A systematic review by Durlak and DuPre (2008) identified 23 contextual factors that can influence the success of an implementation. Wiltsey Stirman et al. (2012) identified four main categories with 24 subsidiary categories: innovation characteristics, e.g., fit, ability to maintain fidelity/integrity; context, e.g., climate, leadership, setting characteristics and system/policy change; capacity, e.g., funding, workforce (staffing attributes and community/stakeholder support); and processes and interactions, e.g., training and education, ongoing support, and engagement/relationship building.
In both these reviews, intervention effectiveness was not among these factors. Demonstration of effectiveness during sustained implementation was considered relevant only to service providers as a means of monitoring performance and assuring quality (Franks and Schroeder, 2013). In addition only 22% of the studies reviewed by Wiltsey Stirman et al. (2012) reported sustainability of individual outcomes. Shediac-Rizkallah and Bone (1998) noted inconsistencies with the definition of sustainability, and distinguished between six definitions of sustainability. Here we focus on definition three: "A development program is sustainable when it is able to deliver an appropriate level of benefits for an extended period of time after major financial, managerial and technical assistance from an external donor is terminated (United States Agency for International Development, 1988)" (Shediac-Rizkallah and Bone, 1998, pp. 91).
Indeed, findings from small-scale studies that examined effectiveness during transition to service-led provision (Zubrick et al., 2005;Price et al., 2012;Skar et al., 2015) highlighted the importance of maintaining effectiveness during sustained implementation.
The need to monitor the effectiveness of parenting programs during sustained implementation extends beyond the needs of a particular service provider: it should concern evidenceinformed policy-making. Moreover, program delivery changes as the level of experimental control changes, as well as many factors associated with successful implementation (c.f., Durlak and DuPre, 2008), so a change in the level of effectiveness should be considered a likely characteristic of sustained implementation, along with recognition of the limitations of our systemic capacity to control all factors we know are related to successful implementation. Therefore, to support a public health approach to the promotion of EBPPs, the question is no longer whether they work, but whether they still work when they are provided as part of regular service provision across the population.
Therefore the focus of the present study was on the sustainability of effectiveness, defined as maintenance of positive effects of the program(s) at a comparable level to that shown in the earlier formal effectiveness trial; this definition is consistent with Shediac-Rizkallah and Bone (1998).
In the present study we examine the sustainability of the effectiveness of parenting programs once these are implemented as part of regular service delivery in communities, outside trials which had previously demonstrated their effectiveness. Sustainability of effectiveness, therefore, is defined as maintenance of positive effects of the program(s) at a comparable level to that shown in the earlier formal effectiveness trial. The implementation domain is that of parenting programs which aim to reduce behavior problems among children because such problems can persist into adulthood and have both negative consequences for the individuals, and high societal costs (Scott et al., 2001). Parenting programs are mainly based on behavioral science and social learning models (Sanders et al., 2012) and aim to develop more adaptive parenting techniques to help parents to manage their child's behavior. Numerous randomized controlled trials (RCTs) have established the evidence-base of programs such as Triple P (Sanders et al., 2000) and Incredible Years (Webster-Stratton et al., 2001). Systematic reviews and metaanalyses of these RCTs have concluded that parenting programs are effective interventions to reduce child behavior problems and improve the overall emotional and behavioral adjustment of children, increase positive parenting styles, decrease ineffective use of discipline, and improve maternal mental health (Kaminski et al., 2008;Nowak and Heinrichs, 2008;Dretzke et al., 2009;Barlow et al., 2012Barlow et al., , 2016Furlong and McGilloway, 2015).
Whereas this evidence-base is key for policy makers wishing to embed parenting programs into regular service provision, it is not sufficient. The next step is to demonstrate that parenting programs work equally well when implemented under real-world conditions, gradually building from smaller effectiveness demonstrations to larger-population effectiveness research (scaling up). In the United Kingdom, an example of the latter is the United Kingdom-government instigated and funded Parenting Early Intervention Pathfinder evaluation (2006)(2007)(2008) of three evidence-based parenting programs (EBPPs) across 18 Local Authorities (LAs; geographical regions with administrative powers) in England. On the basis of evidence of their effectiveness (Lindsay et al., 2011b) the funder rolled out eight EBPPs across all 152 LAs in England, the Parenting Early Intervention Programme, and evaluated their effectiveness in a sample of 43 LAs (Lindsay et al., 2011a;Lindsay and Strand, 2013). The Lindsay et al. (2011a) study demonstrated that rolling out EBPPs was associated with small to moderate reductions in child behavior problems, and large changes in parenting style and parental mental well-being. In the United States, the rolling out of Triple P across 18 counties in South Carolina further demonstrated the public health benefits associated with the prevention of child maltreatment (Prinz et al., 2009).
Although there is limited evidence on the sustainability of EBPPs with respect to individual outcomes, findings from small-scale studies that examined effectiveness during transition to service-led provision have provided indicative evidence of positive effects. For instance, Price et al. (2012) focused on externalizing behaviors of children in foster care and found that provision of the parenting intervention by community agencies was as effective at reducing the target behaviors as an earlier effectiveness trial. Skar et al. (2015) assessed the long term sustainability of effects in a community-wide parenting program, finding that the beneficial effects on parenting measures were maintained 6 and 12 months later.
Hence there is a general need to monitor the effectiveness during sustained implementation which extends beyond the requirements of a particular service provider: it should concern evidence-informed policy-making that is looking to maintain service provision and well-being levels of the population. A change in effectiveness during implementation may be considered likely, given the large number of factors that may influence sustainability of effectiveness (Durlak and DuPre, 2008). Therefore, to support a public health approach to the promotion of EBPPs, the question concerning effectiveness applies not only to whether they are effective in trials, but also to whether they remain effective when provided as part of regular service provision across the targeted population.
The aim of the present study was to examine the effectiveness of EBPPs during a period of sustained implementation by services in England. The study responds to recent calls by the Society for Prevention Research (Gottfredson et al., 2015) to develop robust evidence of effectiveness during scaling-up of interventions. Our main research question examined whether effectiveness of EBPPs could be maintained during the phase of service-led sustained implementation as compared to an earlier researcher-led effectiveness evaluation phase. Specifically, we examined whether changes in child behavior problems, parenting style and parental mental well-being were significantly different between service-led sustained implementation and the previous researcher-led effectiveness evaluation.

Design
This study compares the effectiveness of parenting programs delivered across two different phases: the effectiveness trial comprised a researcher-led evaluation of effectiveness during national roll-out of parenting programs across England Lindsay and Strand, 2013); following this, the service-led sustained implementation phase (2011-2016) included service evaluation data collected for service monitoring purposes (see Figure 1). In the present study, data were drawn from four LAs that participated across both phases. These LAs requested that the research team continued to collect data from parents enrolling during the sustained implementation phase. Apart from providing annual reports of the analysis of their results, we took no part in the LAs implementation of the EBPPs that they had selected for delivery. The sustained implementation phase started in the school year following the last year of the trial implementation phase.

Participants
Parents were recruited to parenting programs through multiple routes, which were comparable in each LA. The recruitment was based entirely on the parent's or professional's concern about the child(ren)'s behavior. Local authorities were quite liberal with their inclusion criteria, including those parents who selfreferred, and referrals from schools, social services, and health services, with the agreement of the parent. No LA had formal inclusion or exclusion criteria. There were no inclusion or exclusion criteria for the research: all parents for whom precourse data were available were included in the trial. These criteria applied to both phases of the study. Table 1 presents the demographic information for the 1390 parents who took part in the effectiveness trial phase, and 3706 from the sustained implementation phase.

Demographic Information
Demographic information was collected prior to the start of parenting programs and included child age, gender and Special Educational Need (SEN) status, parent gender, parents' contact with health and social care professionals, parent education, ethnicity, family structure, housing status, and whether their child was eligible for free school meals. SEN status indicates that the child has been assessed by the educational authority as not being able to learn in the same way or at the same pace as his peers, and that additional or differentiated provision is needed to address the child's learning needs. Eligibility for free school meals is determined by the family's income and is a proxy measure of income poverty.

Socio-economic deprivation
A composite measure of socio-economic deprivation was developed by combining data on parental education level (no educational qualifications vs. Level 3+), single parent status (single vs. dual parent), home ownership (rent home vs. own home) and free school meal eligibility (eligible vs. ineligible). No educational qualifications indicates exiting school without any General Certificate for Secondary Education (GCSE) level qualifications, i.e., Level 3/upper secondary level, International Standard Classification of Education (ISCED; UNESCO, 2011). Socio-economic deprivation scores ranged from 0 to 4, with higher values indicating more deprivation.

Parent support needs
Parents were asked to indicate whether they had contacted professional support for themselves over the past 6 months: medical practitioner, psychiatrist, counselor, social workers or  other healthcare and support professionals. We summed the number of contacts indicated to capture the level of support need of the parent. A score of 0 indicated the participant had no contact with any of these professionals and therefore a low 'support need, ' whereas a score of 5 indicated parents had had contact with all the support professionals, therefore demonstrating a high level of support need.

Child Behavior Problems
The Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997) is a well-established measure of children's behavioral and emotional problems. The SDQ contains 25 items, grouped into five scales (5 items per scale): conduct problems, emotional symptoms, hyperactivity, peer problems, and prosocial behavior. For each item, parents rate their child's behavior on a 3-point scale (0 = not true, 1 = somewhat true, 2 = certainly true). The first four scales are summed to give a total difficulties score.
The SDQ is used extensively in research and clinical practice and has well-established psychometric properties (Goodman, 2001). We used the 4-17 year-old version of the SDQ, though the SDQ has been validated for use with children aged 2-17 years old 1 with altered wording of three of the items for the 2-3 years group. Here we present internal consistency data from the 4-17 years group, though findings were similar for the 2-17 year group. Good levels of internal consistency were demonstrated across both phases for conduct problems (Cronbach's alphas were 0.71 and 0.69 for the effectiveness and sustained implementation phase, respectively); emotional symptoms (α = 0.69 and 0.73); hyperactivity (α = 0.73 and 0.76) and total difficulties score (α = 0.82 and 0.84). Peer problems and prosocial behavior subscales were not included in the present study as these are behaviors not directly targeted by parenting programs.

Parenting
The Parenting Scale -Adolescent (Irvine et al., 1999) is a 13-item scale, shortened from an original 30-item scale (Arnold et al., 1993) to assess parenting style. The original 30-item version of this questionnaire has been validated for use with parents of children aged 18 months to 16 years (Arnold et al., 1993;Karazsia et al., 2008) and reliability and validity of the shorter version have been established for children aged 2 -16 (Karazsia et al., 2008). Two subscales are available: laxness (six items) and overreactivity (six items) as well as a single 'parental monitoring' item. Parents rate each item on a 7-point scale and the scores in each subscale are summed. Internal consistency was good (laxness, α = 0.75 and 0.83 and over-reactivity, α = 0.72 and 0.81, for each phase, respectively).

Parent Mental Well-Being
The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS; Tennant et al., 2007) is a 14-item measure of subjective mental well-being in adults. Parents rated 14 statements on a 1 (none of the time) to 5 (all of the time) scale. Total mental well-being score ranges from 14 to 70, with higher scores indicating higher mental well-being. Internal consistency levels were high: α = 0.93 for the effectiveness trial phase, and α = 0.94 for the sustained implementation phase.

Parenting Programs
During the effectiveness trial phase, the United Kingdom Department for Education (DfE) selected eight EBPPs, which had been accredited by the National Academy for Parenting Practitioners (NAPP; Asmussen et al., 2012) to a standard of effectiveness determined by the DfE. LAs could provide one or more of these EBPPs. The DfE funded LAs through the Parenting Early Intervention Programme to develop their infrastructure, including strategic and operational lead officers and staff who had been trained to be facilitators of the relevant programs the LAs chose to implement. Training was provided by the program providers (see Lindsay et al., 2011b). During the sustained implementation phase, the LAs chose to continue with the programs. Three parenting programs were common across the two phases: Incredible Years, Triple P and STOP.

Incredible Years
The Incredible Years program (IY; Webster-Stratton et al., 2004) was developed for parents of children between 8 and 1 www.sdqinfo.com/norms/UK3yearNorm.html 13 years old and focuses on teaching parents how to manage the child's behavior problems through improved parenting. In the current study, providers of IY offered 18-22 weekly group sessions of 2-2.5 h. In the effectiveness trial phase, facilitators attended a 4 or 5 days manualised workshop and received supervision in the form of peer support meetings and monthly telephone consultations from accredited mentors. The majority of facilitators had a higher education level qualification in an education or health and social care discipline. Forty-seven parents (3.4%) enrolled in IY during the effectiveness trial phase, compared to 10 (0.27%) parents in the sustained implementation phase.

Triple P
The Triple P program (Sanders et al., 2000) was developed for parents of children aged 0-16 years old. It aims to increase parents' skills and confidence in handling their child's behavior though positive parenting. In the current study, providers of Triple P offered the Level 4 version which involves eight 2-h weekly sessions: four face-to-face small group sessions, three telephone sessions and one final group session. In the effectiveness trial phase, facilitators received a 3-day manualised training program with an accredited Triple P trainer. The majority of facilitators had a higher education level qualification in an education or health and social care discipline. The majority of parents attended Triple P: 1241 (89.28%) parents in the effectiveness trial phase, and 3442 (92.88%) parents in the sustained implementation phase. STOP STOP (Ministry of Parenting, 2015) was developed as an 11week program for parents of children aged 11-16. It aims to help parents to communicate better with their children and to support their development, through discussions, role play and feedback to develop more effective parenting techniques. In the effectiveness trial phase, facilitators attended a 3-day workshop and received manualised training materials. In the effectiveness trial phase, 102 (7.34%) parents enrolled in STOP compared to 254 (6.85%) in the sustained implementation phase.

Program Fidelity
During the sustained implementation phase, in addition to existing trained staff, new facilitators received the same training as those engaged during the effectiveness trial phase. Both existing and newly trained facilitators received the same pattern of support and supervision as during the effectiveness trial stage. The monitoring of the implementation of programs by the facilitators was undertaken in the same manner during both phases according to each program's specifications by senior LA staff, who were both trained and experienced in the program's delivery. The criteria for completion of each program were the same in each phase, namely a minimum attendance of 75% of the group sessions, including the final session when the posttesting also occurred. This criterion was employed by the LA services during both the trial phase (Parent Early Intervention Programme) and during the sustained implementation phase. Facilitators completed a monitoring sheet after the final group session indicating those parents who had completed their course and also the reasons, if known, of parents who did not complete their course; these monitoring sheets were returned to the research team with the completed post-course questionnaires.

Procedure
Procedure was the same during both phases. Parents completed a pre-test questionnaire booklet either at the start of the first session or at an introductory session before the parenting course commenced. Post-test data were collected in the last session of the program. All pre and post measures were distributed by the program facilitator and posted to the research team for analysis. Twelve months following the pre-test measures follow up questionnaires were posted to parents directly from the research center.

Analytic Strategy
Data involved repeated measurements, which were also nested within LAs. Therefore, we accounted for data non-independence using Generalized Estimating Equations (GEE; Liang and Zeger, 1986). With their focus on population average effects (Hubbard et al., 2010), GEEs provided a good match to the research question.
Missing data ranged from 26.56 to 44.62% at post-test and 82.21-85.97% at follow up. This high level of data loss is a common occurrence in community evaluations (McWey et al., 2015;Abrahamse et al., 2016). We examined the association between missing data, initial levels of child behavior problems and participant characteristics, and we found no association (analysis available on request). There was also no significant difference in the proportion of missing data between the phases, across all outcome measures (all p > 0.05, full analysis available on request). This therefore suggests that the mechanism of missing data was not systematically related either to participant characteristics or the intervention on offer (Schafer and Graham, 2002). We had no reason to reject that data were Missing Completely at Random (MCAR) and, as such, were appropriate for fitting into GEEs. To address missing data, a quasi-likelihood estimation was used (Liang and Zeger, 1986) that makes full use of information available. GEEs were fitted specifying an identity link and exchangeable working correlation matrix, with a robust estimator for the covariance matrix which yields accurate standard errors (Garson, 2013). Outcomes were standardized and continuous covariates were grand mean centered. GEEs controlled for data clustering at LA level and program level.

RESULTS
Generalized Estimating Equations were fitted for each outcome to examine whether the effect of study phase (effectiveness trial vs. sustained implementation), time (pre-test, post-test, follow up) or their interaction was significant, accounting for the effect of LA, parent program type, child age, child gender, child's SEN status, parent gender, parent support need, ethnicity, and socioeconomic deprivation. GEEs use a Wald chi-square test to test whether each predictor makes a significant overall contribution to the model, and then provide an estimated coefficient for each level of the predictor. Unadjusted descriptive statistics are shown in Table 2, while Tables 3, 4 present the GEE coefficients of each level of predictor variables. Below, we report the overall predictor effects.

Child Behavior Problems
Generalized Estimating Equations were fitted twice with SDQ data: once for children aged 4-17 years and once for children aged 2-17 years. As the results were very similar, we report findings from the 2-17 year-old group analysis. Results from the 4-17 years old group are available on request to the first author.

Parenting
The interactions between time and phase were significant for laxness (Wald = 23.22, p < 0.001) and over-reactivity (Wald = 42.29, p < 0.001). To interpret the significant interactions, we focus on each time * phase coefficient. At pretest, scores of both parenting styles scores were significantly lower in the sustained implementation phase compared to the effectiveness trial phase (laxness: β = −0.12, p = 0.001; overreactivity: β = −0.27, p < 0.001). At post-test, differences between the two phases were no longer significant: laxness (β = 0.04, p = 0.254; over-reactivity: β = −0.03, p = 0.462). Follow up scores indicated better maintenance of gains in the sustained implementation phase for over-reactivity (β = −0.23, p < 0.001), but not laxness (β = −0.10, p = 0.086). Figure 2 presents adjusted laxness and over-reactivity scores over time for each phase that demonstrate the significant interaction described above.

Parent Mental Well-Being
The main effect of time on WEMWBS scores was significant (Wald = 1649.99, p < 0.001), indicating an increase in mental well-being from pre-test to post-test and from pre-test to follow up. Mental well-being scores were significantly higher in the sustained implementation phase compared to the effectiveness trial phase (Wald = 12.96, p < 0.001) but the interaction between time and phase was not significant (Wald = 4.93, p = 0.085), suggesting that, despite group differences, there was no differential gain between the two phases: they both experienced a significant improvement over time.
Generalized Estimating Equations coefficients are presented in Table 4. Having a female child who was causing concern was associated with higher parent mental well-being scores (β = 0.05, p = 0.046). Child's SEN status was associated with lower mental well-being scores (β = −0.08, p = 0.047). Parent mental well-being was not related to child age (β = −0.004, p = 0.235). Mental wellbeing scores were higher for male parents (β = −0.19, p < 0.001) and for parents with lower levels of support need (β = −0.13, p < 0.001). Parent mental well-being was lower for those with a higher level of socio-economic deprivation (β = −0.05, p < 0.001) and lower for those of White British ethnicity (β = −0.24, p < 0.001).

DISCUSSION
The current study examined whether evidence-based parenting programs (EBPPs) remain effective when delivered entirely by service providers as part of regular service delivery during the sustained implementation phase. Findings indicated that service-led sustained implementation was associated with significant improvements in child behavior problems, similar to the researcher-led effectiveness trial; there were significant improvements in parenting style, which were larger than the effectiveness trial at 12 months follow up; and significant improvements in parental mental well-being similar to the effectiveness trial.
Present findings come from a large English sample of parents and support findings from an implementation trial in the United States (Price et al., 2012), where researchers compared service-led implementation of a parenting group for foster carers with the comparison group of a previous effectiveness trial, and found the implementation phase resulted in significant benefits in child behavior problems. Present findings provide a rigorous demonstration of successful maintenance of the effectiveness of EBPPs delivered during the sustained implementation phase, relative to a group of people who also received EBPPs during the effectiveness trial.
Significant gains immediately after the program were maintained at 12 months follow up, similarly to the initial improvement and maintenance demonstrated in the effectiveness trial. Indeed, longer-term gains in parenting style (laxness and over-reactivity) were larger for parents in service-led implementation. This could be because parents who took up parenting programs during service implementation presented lower levels of initial difficulties. The present findings add to the limited evidence about the longer-term effectiveness of parenting programs, whether in community provision or research evaluations (c.f., Lundahl et al., 2006).
In the current study, we did not measure the factors that change between a researcher-led trial and service-led evaluation, but we know that a large number of conditions may change as we move from researcher-led evaluations to service-led delivery (Durlak and DuPre, 2008). Our aim was to examine whether effectiveness would be maintained, despite the changes in the implementation environment and circumstances. Our findings indicated that effectiveness can be maintained, though without identifying factors that are important for the maintenance of effectiveness. As our understanding of the implementation continuum improves (Spoth et al., 2013;Gottfredson et al., 2015), it is obvious that we have a patchy understanding of all conditions related to successful sustained implementation (Proctor, 2009;Wiltsey Stirman et al., 2012). The issue of effectiveness as an indicator of achieving population level benefits should be a core outcome of the sustained implementation agenda, beyond the knowledge that each service provider requires to monitor their quality. The current study demonstrated the maintenance of implementation effectiveness in a large English sample. Future research should examine further the service-level factors that may be related to successful effectiveness during sustained implementation.
The present data also suggested some interesting differences in terms of the parents who received programs across the two phases. While both phases operated a targeted provision model, parents who signed up during the sustained implementation phase experienced lower levels of parenting difficulties and better mental well-being before the programs started, compared to parents in the effectiveness trial. Though there were more White British parents and less income poverty (fewer children eligible for free school meals) in the sustained implementation phase, there were overall similar levels of socio-economic deprivation, child characteristics and child behavior problems (see Table 1). Therefore initial differences might not be entirely accounted for by socio-demographic differences. These differences may indicate differences in the way services recruit or refer parents over time, i.e., a broadening of referral criteria to a larger population during sustained implementation.
One of the core limitations of the present study is the selfselected sample of areas from which sustained implementation data were pooled. These were LAs where service providers decided to continue delivering and monitoring parenting programs following their involvement in the researcherled effectiveness trial, and they may differ from other LAs who did not choose to continue independent evaluation of the effectiveness of the implementation of their EBPPs after the effectiveness trial, following the end of funding of the Parent Early Intervention Programme by the DfE. In particular, these LAs were committed to maintaining staff training, support and program fidelity; as well as monitoring of effectiveness by an independent research team, so providing accountability. Future research needs to investigate further the in-depth factors that facilitate or hinder sustained service implementation following a successful take-up.
A further limitation of the present study concerns the generalizability of the findings, due to the high level of data attrition. While common in community-based research (McWey et al., 2015;Abrahamse et al., 2016), this limits our conclusions to those parents for whom evaluation data were available, rather than all parents who enrolled for the programs. There was limited information with regards to the specific reasons for this level of data attrition for the LAs featured in the current study, although our analysis did not point to a single reason (mechanism) for missing data: reasons included parents' practical difficulties related to family factors, (e.g., illness, husband's change of work pattern). However, given the high level of missing data, it is important that future research focuses on understanding patterns of attrition in users of community services to enhance the likelihood of continued engagement with programs and evaluation.

CONCLUSION
Research demonstrating the sustainability of EBPPs' effectiveness is of key importance in determining their funding-worth (August et al., 2006;Wiltsey Stirman et al., 2012). However, the focus of much sustainability research has been on optimal implementation environments, rather than the maintenance of outcomes and the demonstration of benefit across the population as a whole. The current study demonstrated that effectiveness can be maintained when services lead on provision of EBPPs. Present findings indicated that improvements in child behavior problems and parental mental well-being were significantly maintained during sustained implementation, whereas improvements in parenting laxness and over-reactivity were significant in the short-term but better maintained in the longer term under sustained implementation. Given the high costs of late intervention (Chowdry and Fitzsimons, 2016), the present findings make a strong case for the integration of EBPPs in public health approaches to reducing child behavior problems and parent well-being.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of the Humanities and Social Sciences Research Ethics Committee of the University of Warwick. The protocol was approved by the Humanities and Social Sciences Research Ethics Committee of the University of Warwick [Ref: Eth. App. 12/06-07, (Lindsay et al., 2011b); Ref: Eth. App. 45/07-08, (Lindsay and Strand, 2013); Ref: Eth. App. 93/15-16, for the current study]. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

AUTHOR CONTRIBUTIONS
VT, GG, and GL substantial contributions to the conception or design of the work, or the acquisition, analysis, or interpretation of data for the work, drafting the work or revising it critically for important intellectual content, final approval of the version to be published, agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

FUNDING
This study was supported by a grant from the British Academy (SG153174).