Differentinating between non-transfusion dependant β-thalassemia and iron deficinecy anemia in children using ROC and logistic regression analysis: two novel discrimination indices designed for pediatric patients

Introduction This cross-sectional study enrolled a group of 271 children with microcytic anemia in order to test the performance of 41 single and 2 composite formulas andindices in distinguishing between β-thalassemia (β-thal) and iron deficiency anemia (IDA) in the pediatric population. Methods Optimal pediatric cut-off values from the previously published formulas and indices were generated using ROC analysis. Logistic regression in R using generalized linear models (GLM) generated two new indices. Results Formulas and indices with optimal cut-offvalues in children with accuracy ≥90% were (in descending order): Matos & Carvalho index, MDHL(Telmissani) formula, England and Fraser formula, Pornprasert index, Sirachainan index, Telmissani (MCHD) formula, CRUISE index, Hameed index, Sargolzaie formula and Zaghloul II index. The CroThalDD-LM1 index has an accuracy of 93.36% (AUC 0.986, 95% CI 0.975–0.997), while the second CroThalDD-LM2 index utilizes absolute reticulocyte count alongside CBC variables, with an accuracy of 96.77% (AUC 0.985, 95% CI 0.988–0.999). Discussion and conclusion We recommend using aforementioned formulas and indices with corrected cut-off values and accuracy >90% alongside two new proposed indices. A comparison of both native and these new indices is encouraged. These are the first discrimination indices generated and designed precisely for the pediatric population, which includes preschool children.

Differentinating between non-transfusion dependant β-thalassemia and iron deficiency anemia in children using ROC and logistic regression analysis: two novel discrimination indices designed for pediatric patients 1

Introduction
The discrimination between β-thalassemia (β-thal) and iron deficiency anemia (IDA) is of great socioeconomic importance.A recent multicenter study reports a 32.4% prevalence of IDA among infants in northwestern and 12% in children one year and older in central Croatia (1).Accurate and early diagnosis reduces costly and unnecessary laboratory testing.Furthermore, inappropriate empirical supplementary iron treatment, could be detrimental to the child's health.Many developing countries have limited resources for the genetic testing of β-thal, but have standard laboratory techniques for its detection.Since hemoglobin electrophoresis is not always readily available in developing countries, such discriminating formulas and indices can reduce their economic burden.Using accurate formula or index also provides an early screening with a high probability of β-thal or IDA in children.In this article, our goal was to determine the accuracy of the existing single and composite formulasand indices in distinguishing between these two diseases and, if possible, generate a new one(s).Previous adult indices and formulas with their original cut-off values have unsatisfactory diagnostic accuracy in distinguishing β-thal from IDA in children.Since children are the most vulnerable age group, especially in developing countries, creating a pediatric index that includes school and preschool children is necessary.The importance of this topic is proven by the latest works (1,2).

Methods
To this end, we extensively searched three available databases (Pubmed, Scopus, Web of Science) to find all available published formulasand indices distinguishing β-thal from IDA.We tested the accuracy of forty-one single formulas andindices and two composite indices.The accuracy of these aforementioned formulasand indices has not been tested in children thus far.The following are shown in alphabetical order: Alparslan [log10 (MCH × MCHC × RDW/RBC)], Bessman (RDW), Bordbar   CBC and reticulocyte analysis were performed on Sysmex XN-3000 automated blood analyzer for the values of red blood cell count (RBC), hemoglobin (Hb), hematocrit (Hct), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), red cell distribution width (RDW), reticulocytes (RTIC, per mille and absolute 10 12 /L).Peripheral blood samples were obtained by standard venipuncture.The osmotic fragility test assessed red cell osmotic fragility.Hb A, Hb A2, Hb F, and Hb variants were detected using an automated capillary electrophoresis system alongside high-performance liquid chromatography (HPLC) and isoelectric focusing (IEF) for confirmation.Hb A2 values >4% or hemoglobin HbF > 5% indicated β-thal carriers, while HbA values >4% with HbF up to 50% indicated NTDT β-thal.These complementary methods were used simultaneously in the routine laboratory work-up of every sample to ensure that no methodologic or clerical errors were made.IEF and capillary electrophoresis avoided the insufficient distinguishing of Hb Lepore vs. Hb A2 by HPLC due to overlapping retention times.Using manual IEF, we ensured the sharpness of the bands on the gel.The main disadvantage of capillary electrophoresis is the poor separation of Hb S vs. Hb D. To avoid the disadvantages of each method, we decided that all three methods should give the same positive results for β-thal.Iron deficiency anemia was diagnosed by measuring serum iron (Fe), ferritin (FER) [Ferritin ELISA Kit, Demeditec Diagnostics, GmbH], unsaturated iron binding capacity (UIBC), and total iron binding capacity (TIBC).Serum iron levels <4 μmol/L and serum ferritin levels <15 ng/dl indicate IDA (4).Since IDA can reduce HbA2 levels, hemoglobin variant analysis was repeated after IDA was corrected (5).DNA analysis was performed if protein-based methods revealed an unknown variant or ambiguous result.Gene-specific PCR analysis of pediatric Croatian patients was achieved by Q5 High-Fidelity PCR Kit (New England Biolabs) according to the manufacturer's instructions.Direct DNA sequencing of the amplified PCR products was done at Macrogene Europe.DNA analysis revealed the following hemoglobin subunit Informed parental consent was obtained in accordance with the Declaration of Helsinki.

Statistical analysis
The data analyzed were summarized as numbers and percentages.Quantitative data were summarized as the arithmetic mean and standard deviation.Depending on the data distribution, the data was analyzed using either parametric or nonparametric tests.Values of sensitivity, specificity, positive predictive value, negative predictive value, accuracy, Youden index, and area under the curve (AUC with 95% confidence limits) were used to assess the performance of individual formulasand indices.All applied tests were two-way; and p values ≤ 0.05 were considered statistically significant.The AIC (Akaike's Information Criterion) information criterion was used to select variables in the logistic regression model (7).Statistical analysis was performed in MedCalc ver.19.07, StatSoft Statistica 12.5, and GraphPad Prism 8.4.3.686(8,9).Logistic regression analysis was performed in R version 4.1.3(10).The optimal cutoff value for each formulaand index was based on the value of the area under the ROC curve as the value closest to the value of the area under the ROC curve in which the difference between the sensitivity and specificity was minimal (11).
The performance of formulas and indices was tested with already published and new optimal cut-off values generated by ROC analysis.
Two formulas/indices with accuracy >80% using published cutoffs were (in descending order): Matos-Carvalho index and England and Fraser (Supplementary Table S1).The most accurate previously published index was the Matos-Carvalho index, both with with its original cut-off value for adults as well as an optimal cut-off value for children.With the previously published cut-off value, this index better distinguishes children with IDA than children with β-thal.With an optimal cut-off value, the accuracy of the Matos-Carvalho index in distinguishing β-thal increased (sensitivity 89.94%, specificity 93.75%).Therefore, the overall proportion of total misclassified patients is reduced and the index has better sensitivity and specificity.
The accuracy of many formulas and indices improved with the new optimal cut off values.Diagnostic accuracy of 10 indices, previously below 80%, increased with new optimal cut-off values above 80%.The accuracy of Pornprasert and RBC indices   S2).
As we were not fully satisfied with the overall performance of the aforementined formulas and indices (primarily with sensitivity), we tried to improve diagnostic accuracy by creating a new model(s) suited for children.We applied binary classification using logistic regression (R for Windows 4.1.3).From a series of models generated using R.4.1.3,the most optimal ones were selected using AIC by combined addition and subtraction of variables.The variables used in the CroThalDD-LM1 model are MCH, MCV, and Hb.The model has the following calculation formula: where exp(y) = ey, y = β0 + β1 × 1 + β2 × 2 + β3 × 3. ×1 = MCH, ×2 = MCV, and ×3 = Hb.β0, β1, β2, β3 are coefficients of the logistic regression model (CroThalDD-LM1 index) shown in Table 2.The number calculated by CroThalDD-LM1 is between 0 and 1.The cut-off value is 0.5, and a number closer to 0 indicates the diagnosis of IDA, while a number closer to 1 indicates βthal.Three variables built a model more accurate than any previous formulas/indices (composite indices included) with a sensitivity of 94.87%, specificity of 93.51%, accuracy of 94.10%, and AUC of 0.986 (95% CI 0.975-0.997)(Table 2).
To simplify the use of the model in everyday clinical practice, we present Microsoft Office Excel and LibreOffice Calc spreadsheets, which use this model (presented in the Supplementary Excel spreadsheet).In this way, the user only needs to enter the values of MCH, MCV, Hb, MCHC RBC, and RTIC (×10 12 /L), and the spreadsheets automatically calculate the probability of belonging to each class.
The practitioner can also calculate the sum of the values obtained by the simplified formula y = β0 + β1 × 1 + β2 × 2 + β3 × 3 without additional calculation with the exponential equation.
The simplified CroThalDD-LM1 index calculation formula would be: The simplified CroThalDD-LM2 index calculation formula would be:   The CroThalDD-1 index was additionally tested on a different population of children from the Republic of North Macedonia unrelated to the Croatian population.These populations are genetically different (12).A slight decrease in sensitivity and specificity was noticed but with still high accuracy.The results are shown in Table 3. Due to the lack of RTIC in the North Macedonian blood samples, the CroThalDD-2 index was not tested.

Discussion
Since the appearance of the Mentzer index several decades ago, attempts have been made to find a universal and reliable formula or index to distinguish β-thal from IDA. Sensitivity, specificity, PPV,  NPV, accuracy, AUC, and Youden index were analyzed in most discrimination formulas and indices, but their performance differed worldwide.This is especially true for children, for whom the diagnostic accuracy of previously published discrimination formulasand indices showed lower performancethan in adults.Differential diagnosis is especially challenging when both IDA and βthal are present simultaneously.Elevated erythropoietin levels due to sustanined anemia in β-thal stimulate the release of erythroferrone through erythroblasts (13,14).This increase in erythroferrone, in turn, supresses hepcidine expression.The resulting increased intestinal iron absorption and with the release of recycled iron from the reticuloendothelial system lead to portal and hepatocyte iron loading, and free circulating iron causes organ damage.Therefore, additional unnecessary long-term iron supplementation leads to iron accumulation and with potential deleterious side effects.IDA in children becomes clinically apparent at the age of 12 months, and the American Academy of Pediatrics recommends screening infants around 1 year of age due to reduced iron storage.NTDT β-thal typically presents between the ages of 2-4 years.Our results correspond to these age groups (15,16).A higher RDW value of IDA indicates more significant heterogeneity (anisocytosis) of β-thal, with higher or average RDW values (17).In the case of progressing or long-standing IDA, RBC count and MCV may be decreased due to fewer and smaller erythrocytes being produced in IDA due to depletion of iron stores vs. β-thal.
Most formulasand indices have been generated and tested in adult populations with various discriminating performances in children.Indices with ≥90% accuracy were rated as the most accurate and recommended in a clinical setting for distinguishing β-thal from IDA in Croatian children.Indices with an accuracy between ≥80% and <90% can help to differentiate between these two diseases.Indices with accuracy <80% were not recommended due to many formulasand indices with better performances.
The Matos-Carvalho index has proven to be the most accurate of all formulas and indices in Croatian children, even with the previously published cut-off value described in the literature (18).CBC values on which the initial index was created differ from those in Croatian children, as one depends on the genetic background of patients with β-thal.Due to Croatia's association with the Mediterranean population, high accuracy of the Matos-Carvalho index is expected.A similar performance of the Matos-Carvalho index was observed in the adult Egyptian population (19).This index has not been previously tested in children.
The second-best index with an optimal cut-off value was Telmissani (MDHL) index.The performance rating of our study corresponds to a study of adult patients conducted in Saudi Arabia (21).Both Telmissani indices (MDHL and MCHD) with optimal cut-off values can be recommended for routine tesing in children.They have not been tested in children so far as well.With the published cut-off, the England and Fraser index had the second-best performance rating (22,23).The original cut-off value of the index showed excellent sensitivity, NPV, and accuracy but insufficient specificity and PPV.After defining the optimal cut-off value and performance improvement, the accuracy of the index increased due to improved specificity and PPV, and it ranked fifth best in terms of performance.England and Fraser formula was previously tested in the population of children in Turkey and had a third-best-ranked performance (below RDWI and RBC indices) (24,25).This index also proved third best in Chinese children (below RDWI and Green and King indices) (26).
The Pornprasert index uses only MCHC as a discrimination value (27).After defining the optimal cut-off value, its accuracy increased by 65% due to an increase in sensitivity and specificity.The index was created for Thai school children; therefore, a favorable result of the discriminatory performance of this index among Croatian children for the two diseases is expected.Since our study population encompasses toddlers and preschool children, an MCHC cut-off value >309 corresponds with the diagnosis of β-thal in the entire population of children.Thai and Croatian populations have different median ages for β-thal, so the difference in cut-off value is understandable.The Sirachainan index was also generated on a population of healthy school children, and excellent initial performance of this index was also expected (28).After defining the optimal cut-off value, its accuracy improved substantially by 13.92%, and the index proved valuable in distinguishing children with β-thal from children with IDA.
The last four indices which improved accuracy after defining optimal cut-off values >80% were CRUISE, Hameed, Sargolzaie, and Zaghloul II.After defining the optimal cut-off value, the CRUISE index outperformed the formulas/indices initially developed for the adult Iranian population (29).Hameed index with the new cut-off value showed balanced sensitivity and specificity with acceptable accuracy (19).After defining the optimal cut-off value, the Sargolzaie index also significantly increased its accuracy (30).Defining the cut-off values of the Zaghloul II index in children (regardless of gender), the index showed a significant improvement in accuracy mainly due to its increase in specificity and PPV (31).All of these indices have not been tested in children so far.
Using linear regression components embedded in the logit scale, logistic regression iteratively identifies the combination of variables with the most significant probability of discriminating between β-thal and IDA.Logistic regression analysis generated two new discriminations indices distinguishing children with βfrom IDA in children with the highest sensitivity, specificity, and accuracy of all formulas/indices, as mentioned earlier.
The RTIC count (per mille and absolute reticulocyte number) is high in the case of active β-thal but also severe IDA (20) due to erythropoietin stimulation.A significanlty higher reticulocyte count was already observed in NTDT thalassemia patients, in comparison with IDA (32).Since the Matos-Carvalho index shows the best results in our children, we put to the test the performance of the variables of this index (RBC, MCHC) with the addition of RTIC.The result is the new CroThalDD-LM2 index.Adding RTIC usually improves the performance of some of the already published formulas/indices.
The results of our study are logistic model-generated indices, the first discrimination indices designed precisely for the pediatric population, including preschool children.To make the calculation with the exponential equation more accessible, we created Excel and Libre Office spreadsheets.Even though CroThalDD-LM2 has better accuracy, sensitivity, and CI, we suggest using both novel formulas/indices to see which has the We tested all available formulas/indices of β-thal of native adult origin and adopted them for use in our children population.We hope that our results will help practitioners already accustomed to native formulas/indices to improve their reliability of β-thal diagnostics in children.All modified cut-offs of such native formulas/indices are listed in the Supplementary Files S1.A comparison between such native formulas/indices and our novelgenerated indices is very welcome (33)(34)(35).

Limitations of the study
The indices using the average reticulocyte cell volume, the average reticulocyte hemoglobin content, % Micro-R, or mean platelet volume to separate β-thal from IDA were not analyzed as they are not routine in routine analysis in Croatia or North Macedonia (36,37).We encountered only four children with αthalassemia trait, and therefore, we cannot support the usability of these indicies in α-thalassemia trait.Further research is required.

Conclusions
All tested formulas/indices which are generated worldwide were tested and modified on the children population.The Matos-Carvalho index shows the best diagnostic performance for distinguishing β-thal from IDA in children, We recommend the use of formulas/indices with modified cut-offs with a performance of >90% accuracy.We generated two novel indices from the children in order to reflect the peculiarities in β-thal diagnostics in such populations, which were observed by previous authors by using formulas/indices generated from the adults.A comparison of both native and our indices is encouraged.
In this case, the cut-off value has changed to 0. Values below 0 belong to IDA and above 0 to β-thal.A summary of AUC values (with CI) of 41 formulas/ indices, 2 composite indices, and novel CroThalDD-LM1 and CroThalDD-LM2 indices are shown in Figure1.The ten best indices have accuracy above 88% (marked by the dotted line on the x-axis).

FIGURE 1 A
FIGURE 1 A summary of AUC values (with CI) of 41 formulas/indices, 2 composite indices and novel CroTHalDD-LM1 and CroThalDD-LM2 indices.The ten best indices have accuracy above 88% (marked by the dotted line on the x-axis).
Values p ≤ 0.05 are highlighted as statistically significant.

TABLE 2
Variables and coefficients of the microcytic anemia via logistic regression model.

TABLE 3 The
CroThalDD-1 formula tested on an independent sample of β-thal from The Republic of North Macedonia.
CroThalDD-2 formula could not be tested due to a lack of reticulocytes.
Turudic et al. 10.3389/fped.2023.1258054Frontiers in Pediatrics 06 frontiersin.orgadvantage in practical application.Although these indices are derived from Croatian children database, we encourage their use in other children populations, even adults.