Newborn Screening for Long-Chain 3-Hydroxyacyl-CoA Dehydrogenase and Mitochondrial Trifunctional Protein Deficiencies Using Acylcarnitines Measurement in Dried Blood Spots—A Systematic Review of Test Accuracy

Background: Long-chain 3-hydroxyacyl-CoA dehydrogenase (LCHAD) and mitochondrial trifunctional protein (MTP) deficiencies are rare autosomal recessive fatty acid β-oxidation disorders. Their clinical presentations are variable, and premature death is common. They are included in newborn blood spot screening programs in many countries around the world. The current process of screening, through the measurement of acylcarnitines (a metabolic by-product) in dried blood spots with tandem mass spectrometry, is subject to uncertainty regarding test accuracy. Methods: We conducted a systematic review of literature published up to 19th June 2018. We included studies that investigated newborn screening for LCHAD or MTP deficiencies by tandem mass spectrometry of acylcarnitines in dried blood spots. The reference standards were urine organic acids, blood acylcarnitine profiles, enzyme analysis in cultured fibroblasts or lymphocytes, mutation analysis, or at least 10-year follow-up. The outcomes of interest were sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Assessment of titles, abstracts, and full-text papers and quality appraisal were carried out independently by two reviewers. One reviewer extracted study data. This was checked by a second reviewer. Results: Ten studies provided data on test accuracy. LCHAD or MTP deficiencies were identified in 23 babies. No cases of LCHAD/MTP deficiencies were identified in four studies. PPV ranged from 0% (zero true positives and 28 false positives from 276,565 babies screened) to 100% (13 true positives and zero false positives from 2,037,824 babies screened). Sensitivity, specificity, and NPV could not be calculated as there was no systematic follow-up of babies who screened negative. Conclusions: Test accuracy estimates of screening for LCHAD and MTP deficiencies with tandem mass spectrometry measurement of acylcarnitines in dried blood were variable in terms of PPVs. Screening methods (including markers and thresholds) varied between studies, and sensitivity, specificity, and NPVs are unknown.

. The conditions are characterized by lethargy, hypoglycemia, hypotonia, cardiomyopathy, and acute metabolic crisis (2,3). Long-term complications include liver disease, peripheral neuropathy, and retinopathy (3,4). Signs and symptoms may present immediately after birth or later in life (5). Three main forms of LCHAD/MTP deficiencies have been reported: an early-onset form, which is associated with cardiomyopathy, hypoglycemia, and sudden infant death; an infant-onset form, which is characterized by recurrent hypoketotic hypoglycemia and lethargy during illness or fasting; and a milder, late-onset form that is triggered by exercise, fasting, or infections and is associated with progressive peripheral neuropathy and recurrent rhabdomyolysis (6,7). There is no cure for LCHAD or MTP deficiencies, and premature death is common. Approximately 38% of infants die before, or within 3 months of, diagnosis (5). A number of management strategies are available, namely a highcarbohydrate and fat-modified/decreased diet that is low in longchain fatty acids, supplements (L-carnitine, docosahexaenoic acid, and medium-chain triglyceride oil, such as triheptanoin), and avoidance of fasting (1). There is some evidence that these treatments are associated with improved clinical outcomes (e.g., reduced mortality, delayed visual complications), but the effects are variable, study sample sizes are small, and few data are available from long-term follow-up studies (1,(8)(9)(10). The incidence of LCHAD and MTP deficiencies varies widely around the world. A recent estimate from the USA gives an incidence of 1:363,738 for LCHAD deficiency and 1:1,240,467 for MTP deficiency (11).
It has been proposed that earlier recognition and treatment of LCHAD/MTP deficiencies may be critical for improving health outcomes (5), and the two conditions are included in
To date, there has been one systematic review examining test accuracy of screening for LCHAD/MTP deficiencies (16). Searching up to 2012, Einoder-Moreno et al. (16) identified six studies and concluded that sensitivity, specificity, and negative predictive value (NPV) of acylcarnitine measurement in dried blood spots are close to 100%, and that the positive predictive value (PPV) ranges from 9 to 100%. However, three relevant papers were missed by their search (12,17,18), and the calculation of sensitivity, specificity, and negative predictive value were based on an assumption about the disease status of babies who screened negative, as no followup of these babies was conducted in the included studies. This approach can lead to overestimation of sensitivity and underestimation of specificity (19). The aim of the current paper, therefore, is to conduct a systematic review of test accuracy metrics (sensitivity, specificity, positive and negative predictive values) of acylcarnitine measurement in newborn screening dried blood spots (DBS) for LCHAD/MTP deficiencies using tandem mass spectrometry using a broader search than the previous review and taking into consideration whether or not babies who screen negative received follow-up assessment.

MATERIALS AND METHODS
The review protocol is registered at PROSPERO (registration number CRD42018094356).

Search Strategy
We conducted a search of the following electronic databases: MEDLINE, MEDLINE In-Process, MEDLINE Daily, MEDLINE ePub Ahead of Print, the Cochrane Library, Web of Science, and Embase. Search terms (free text and subject headings) related to the disease area (e.g., "mitochondrial trifunctional protein, " "long-chain-3-hydroxyacyl-CoA dehydrogenase, " "fatty acid oxidation disorder") and screening (e.g., "newborn screening, " "dried blood spot, " "tandem mass spectrometry"). Full details of the search are provided in Supplement 1. The reference lists of included articles and relevant systematic reviews were also examined. The search was conducted on 19th June 2018, with no restrictions on the publication date or language of articles.

Eligibility Criteria
We included journal articles and reports that investigated newborn screening for LCHAD or MTP deficiencies by TMS analysis of acylcarnitines in dried blood spots. The reference standards were urine organic acids, blood acylcarnitine profiles, enzyme analysis in cultured fibroblasts or lymphocytes, mutation analysis, or at least 10-year follow-up. These could be on their own or in any combination. Appropriate study designs were cross-sectional test accuracy studies, case-control studies, and cohort studies. The outcomes of interest were sensitivity, specificity, PPV, and NPV (or sufficient data to allow us to calculate these). We excluded non-human studies, letters, editorials, communications, conference abstracts, gray literature, studies of fatty acid β-oxidation disorders where data for LCHAD/MTP deficiencies could not be separated from data for other conditions, studies with no extractable data, and studies where more than 10% of the study sample did not meet our inclusion criteria.

Screening and Data Extraction
Titles, abstracts, and full-text papers were independently screened by two reviewers. Data extraction was conducted by a single reviewer and checked by a second reviewer. At each stage of the review, disagreements were resolved through discussion between the reviewers, with the involvement of a third reviewer if consensus could not be achieved.

Quality Appraisal
Two reviewers independently assessed risk of study bias and applicability concerns using the Quality Assessment Tool for Diagnostic Accuracy Studies (QUADAS-2) (20), which was tailored to the research question. Tailoring comprised defining cut-offs for exclusions, identifying appropriate reference standards, selecting a suitable interval between index tests and reference standards, and producing guidance notes. Disagreements were resolved through discussion by the two reviewers, leading to a consensus on assessment of risk of bias and applicability concerns for all studies. The QUADAS-2 tool is presented in Supplement 2 and the guidance notes in Supplement 3.

Data Summary and Synthesis
Due to incomplete 2 × 2 tables and heterogeneity between study designs, a narrative summary of the evidence is provided. We calculated confidence intervals for test accuracy metrics using the Wilson score method with continuity correction (21).

Searching, Sifting, and Sorting
Full details of the flow of studies through the review are outlined in Figure 1. One thousand one hundred and ninety-four unique records were identified through searching electronic databases. After examination of titles and abstracts, 39 papers were retained for full-text assessment. Eleven of these papers met the review's inclusion criteria (12-15, 17, 18, 22-26). Two papers included overlapping cohorts (17,22). Only the data from the larger, more recent paper by Lindner et al. (17) [which included all of the data from Schulze et al. (22)] are reported here. A list of excluded studies [with reasons for exclusion] is provided in Supplement 4.

Quality Appraisal
A summary of the risks of bias and applicability concerns of the included papers is provided in Figure 2. Ratings of risks of biases and applicability concerns for each individual study are provided in Supplement 5.
In the index test domain, five studies (50%) were rated as having high risk of bias as the cut-off for "screen positive" was altered during the study period (13) or was not pre-specified (12,23,24,26). Of the remaining studies, two were at unclear risk of bias (18,25) and three were at low risk of bias (14,15,17). Two (20%) studies had high applicability concerns as one included additional markers (C14:1, C14-OH) (15) and one included both blood and urine samples (18). Applicability concerns were unclear in two studies (17,25) and low in six studies (12-14, 23, 24, 26).
Finally, all studies were judged to be at high risk of bias in the flow and timing domain (12-15, 17, 18, 23-26). The reasons for this were that the reference standards used to confirm disease status for screen positives and screen negatives were not the same, follow-up of those children who screened negative was not defined or not conducted, and losses to follow-up were not reported.
The reference standards used varied between and within studies. For screen-positive babies, reference standards were blood acylcarnitines, urinary organic acids, and DNA analysis (13, 24, 25); enzyme and/or molecular studies (18); enzyme activity in fibroblasts/lymphocytes and mutation analysis (15); acylcarnitine profile in plasma/DBS and/or genotype and/or enzyme activity (17); urine organic acids, plasma acylcarnitines, and molecular-genetic analyses (14); organic acid in urine, nextgeneration sequencing, and an additional acylcarnitine profile in DBS (23); and urinary organic acids or DNA analysis (26). Lastly, one study used "standard metabolic criteria" (12). No systematic follow-up of babies who screened negative was conducted in any of the studies.

Accuracy of Screening Tests
The cut-offs used to classify a positive case of LCHAD/MTP deficiencies and the diagnostic tests used to confirm this varied between studies. Therefore, we report positive screening results as those that met/exceeded the cut-off and were diagnostically confirmed as presented in the individual study. Test accuracy data are show in Table 1.

Positive and Negative Predictive Values
PPV varied considerably between studies. It was 0% in four studies, with zero true positives and 28 false positives from 276,565 babies screened (12,23,25,26), 33% in one study, with one true positive and two false positives from 436,969 babies screened (13), 47% in one study, with 9 true positives and 10 false positives from 1,200,000 babies screened (15), and 100% in four studies, with 13 true positives from 2,037,824 babies screened (14,17,18,24). In the UK study, the single case reported as a true positive was being treated for LCHADD at the point of screening, as they had already been detected clinically (13). Confidence intervals were wide due to the small number of cases of LCHAD/MTP deficiencies detected (23 in total, zero to nine per study). It was not possible to calculate NPV as newborns who screened negative were not systematically followed up.

Sensitivity and Specificity
We were not able to determine sensitivity or specificity due to a lack of information on babies who screened negative.

DISCUSSION
We assessed the test accuracy of acylcarnitine measurement in newborn DBS using TMS for LCHAD/MTP deficiencies. Ten relevant studies were identified. All studies had a high risk of bias in at least one domain, and 9/10 (90%) studies had a high risk of bias in at least two domains. Across the 10 studies, ∼4,000,000 babies were screened and 23 cases of LCHAD/MTP deficiencies were identified; 11 babies had LCHAD deficiency, two had MTP deficiency, and 10 had undifferentiated LCHAD/MTP deficiencies. One of the cases reported as a true positive had already been detected clinically at the point at which screening took place (13). Arguably, the PPV for this study should be 0% rather than 33%, as reported in the study. Forty additional babies screened positive but were subsequently found not to have LCHAD or MTP deficiency. In four studies, no cases of LCHAD/MTP deficiencies were identified (12,23,25,26). However, in three of these studies, the sample sizes were too small to be likely to detect such rare diseases [screening population sizes were 2,440 (25), 10,048 (23), and 100,077 (26)]. The fourth study included a larger sample (n = 164,000) but only included one marker (C16OH), which might have made the screening process less accurate (12).
The only measure of test accuracy that was consistently reported (or where sufficient data were present to allow us to calculate it) was PPV. PPV in the 10 studies ranged from 0% (zero true positives and 28 false positives from 276,565 babies screened) to 100% (13 true positives from 2,037,824 babies screened). It was not possible to calculate sensitivity, specificity, or NPV as there was no systematic follow-up of babies who had screened negative. In a pilot or national screening program for a rare disease using a "promising" test, negative tests will inevitably represent the vast majority of test results. While some studies provided very high PPV, PPV is not intrinsic to the test itself, and at any particular values of sensitivity and specificity, the estimates of PPV (and also NPV) are strongly dependent on disease prevalence. This relationship is illustrated in Supplement 7 over a range of prevalence values similar to those in the included studies and over a range of specificity values. In order to provide a complete assessment of test accuracy, all four metrics (sensitivity, specificity, NPV, PPV) are required.
Whether newborn screening for LCHAD/MTP deficiencies with acylcarnitines measurement in dried blood spots using TMS is appropriate is currently unclear due to a lack of data on babies who screen negative and a lack of consistency between screening test methods. Partial verification bias is a key issue in the included studies; from nearly 4,000,000 babies screened, only 63 (those who screened positive) received a reference standard. Therefore, we cannot know the true disease state of the babies who screened negative. Partial verification bias is common in studies of test accuracy because it is often impractical, unethical, and not cost-effective to follow-up every participant. Alternative approaches to whole population follow-up include statistical methods to attempt to correct for the bias, follow-up of samples of participants who screen negative, and searching disease registers to find false negatives. Statistical methods may introduce other forms of bias (27,28).
While screening for LCHADD/MTP is conducted in the newborn bloodspot programs of a number of countries, there is little published data on the benefits and harms of these. Taylor-Phillips et al. (29) reviewed the evidence on national policy recommendations on screening newborn babies for rare diseases. They highlight three elements that might determine the balance of benefits and harms from screening programs: test accuracy, the benefit of early detection and treatment, and overdiagnosis (the detection and subsequent treatment of disease that would never have caused symptoms within a person's lifetime). Many of the national policy recommendations (including for LCHADD) did not assess all of these three elements. In relation to screening for LCHADD/MTP, the current review suggests that the evidence on test accuracy is uncertain. A recent systematic review has examined the potential benefit of early detection and treatment, comparing the health outcomes of people with LCHADD/MTP who were treated with pre-symptomatic dietary management following screen detection of the conditions compared with people detected following symptomatic presentation (30). There was some evidence of an association between timing of intervention and outcomes, such as mortality, heart problems, liver problems, visual problems, motor/muscular problems, and hypoglycemia. However, the majority of included studies found no statistically significant differences in outcomes between the two groups. Furthermore, the review identified few studies from which to draw conclusions and high risks of bias in included studies. There is no published evidence on overdiagnosis. Overall, the paucity of data and variability between studies lead to considerable uncertainty regarding the benefits and harms of screening for LCHADD/MTP.
Our review has a number of limitations. First, we were not able to synthesize (meta-analyze) our results numerically due to a lack of data on FN and TN and because of variability between screening test methods. Second, we tailored the QUADAS-2 to reflect newborn screening in the UK; this resulted in high concerns regarding applicability in the patient selection domain, as screening is often conducted sooner after birth in other countries. The definition of a high applicability concern is likely to differ in other countries.
There is currently insufficient evidence to clearly judge test accuracy. This is driven, in part, by a wide range of markers and thresholds being used in the included studies: PPV estimates differed greatly by study, with some suggesting good PPV, albeit on small numbers of cases. It was not possible for us to combine data from different studies or determine which combination of markers and thresholds may yield good accuracy as results were not presented by marker. Future research could involve collaboration between researchers to report scores on a range of relevant markers for cases of LCHAD, cases of MTP, and in the general population using consistent units. There is a precedent for this approach in the form of the Region 4 Stork (R4S) project and accompanying multivariate pattern recognition software (subsequently developed into the interactive web tool Collaborative Laboratory Integrated Reports (CLIR), https://clir.mayo.edu/). The R4S project aimed "(a) to achieve uniformity of testing panels by MS/MS to maximize detection of affected newborns within the region; (b) to improve overall analytical performance; and (c) to set and sustain the lowest achievable rates of false positive and false negative results" (31). Reference and disease ranges for LCHAD/MTP markers were reported for the R4S project (31). To date, the CLIR tool has been used in a small number of research projects (32-35). An additional piece of work should aim to clarify the disease states of babies who screen negative. This could be achieved in a number of ways, such as searching hospital/primary care records or disease registers, or following up samples of babies who have screened negative.

CONCLUSIONS
Measurement of acylcarnitines in newborn dried blood spots using TMS may prove to be a useful way to screen for LCHAD/MTP deficiencies, but currently, there are significant concerns regarding the high number of false positives in some of the studies, risks of bias in the studies, heterogeneity in the methods used, and a lack of data on sensitivity, specificity, or negative predictive values. Clinicians interested in the identification of LCHAD/MTP may consider partnership development across clinical and research networks to address the knowledge gaps identified from this study, including data available for long-term follow-up studies and alignment of diagnostic methodologies.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
CS undertook project planning and research design, coordinated the review process, conducted all aspects of the review, and co-wrote the paper with HF. HF undertook project planning, undertook all sifting, sorting, data extraction, and quality appraisal, and co-wrote the paper with CS. JG contributed to sifting and sorting and commented on draft and final versions of the paper. RJ contributed to sifting and sorting and quality appraisal and commented on first and final drafts of the paper. MC undertook project planning, and commented on first and final drafts of the paper. SJ developed and conducted the literature searches, managed references, and helped obtain fulltext references. AC undertook project planning, oversight of search strategies and methods, and commented on first and final drafts of the paper. ST-P undertook project planning and research design and commented on first and final drafts of the paper. All members of the team contributed to the development of the protocol. All authors contributed to the article and approved the submitted version.

FUNDING
This research was commissioned by the UK National Screening Committee.