The Computerized Adaptable Test Battery (BMT-i) for Rapid Assessment of Children's Academic Skills and Cognitive Functions: A Validation Study

Background: Learning disabilities in children are a major public health concern worldwide, having a prevalence of 8%. They are associated with lost social, educational, and ultimately, professional opportunities for individuals. These disabilities are also very costly to governments and raise the issue of the appropriate means of screening. Unfortunately, validated tools for preliminary appraisal of learning and cognitive function in struggling children are presently restricted to specific age ranges and cognitive domains. This study sought to validate a first-line battery for assessment of academic skills and cognitive functions. Materials and Methods: The computerized Adaptable Test Battery, or BMT-i, includes a panel of tests for the first-line assessment of children's academic skills and cognitive functions. The tests reflect expected abilities for the age group in question, exploring academic skills (written language and mathematical cognition) and cognitive domains (verbal, non-verbal, and attentional/executive functions). The authors relied on the results of these tests for a sample of 1,074 Francophone children representative of the mainland French school-age population (522 boys and 552 girls, ages 4–13, from 39 classes at 7 public and 5 private schools). Thirteen speech-language pathologists and neuropsychologists individually administered the tests. Results: The psychometric characteristics of the empirical data obtained showed acceptable to good test homogeneity, internal consistency (Cronbach's alpha: > 0.70), test-retest reliability (intraclass correlation coefficients: ~0.80), and consistency with reference test batteries (r: 0.44–0.96). Conclusion: The BMT-i was validated in a large sample of children in mainstream French schools, paving the way for its use in first-line screening of learning disabilities among children with complaints, whether their learning difficulties have been flagged by their parents or by their teachers.


INTRODUCTION
Because of their high prevalence (8% among children 3-17 years old) (1), learning disabilities are a public health priority worldwide. They frequently concern several cognitive dimensions-written and oral language skills, mathematics, drawing and handwriting, motor function, visuospatial skills, and attentional as well as executive functions-justifying the need for a comprehensive view (2)(3)(4). The variety of terms associated with these conditions (e.g., disorder, disability, difficulty, and slow learner) illustrates the diversity of perspectives and makes it harder to share knowledge about them (5).
The emergence of cognitive sciences has enriched the theoretical models applied for the identification and evaluation of learning disabilities (LD). In the last 50 years, authors have developed integrative models considering (i) academic skills, (ii) underlying cognitive skills, and (iii) neurobiological correlates, including familial forms and environmental factors (6).
There is a growing consensus in support of early identification of LD by standardized tests and appropriate pedagogical interventions (7)(8)(9)(10). Phased implementation of screening (6) is essential for the identification of learning disabilities and their effective remediation-such as through evidence-based pedagogical interventions, the long-term benefits of which have been extensively demonstrated (11)(12)(13)(14). To meet the demands of clinical practice, screening tools must be language-specific and exhibit acceptable psychometric properties and sensitivity. Following their use, more focused assessments-conducted by speech therapists, psychomotor therapists, occupational therapists, or neuropsychologists, depending on the learning area affected-may be prescribed (9,10,(14)(15)(16).
The computerized Adaptable Test Battery (BMT-i) is a panel of tests for the first-line assessment of children's academic skills and cognitive functions, from kindergarten (age 4) to seventh grade (age 13). Designed as an adaptable set of tests suitable for a comprehensive evaluation, the BMT-i succeeds the Battery for Rapid Evaluation of Cognitive Functions (Batterie Rapide d'Evaluation des Fonctions Cognitives, or BREV) originally designed to provide health professionals with a quick clinical tool for screening acquired and developmental cognitive deficits in children ages 4-8 (17, 18). Including tests in five domains that evaluate the various cognitive components concerned by LDs (4), the computerized BMT-i permits broader exploration of written language abilities (reading fluency, reading comprehension, and spelling), mathematical cognition (numbers, arithmetic, and problem-solving), and three cognitive domains (verbal, nonverbal, and attentional/executive functions). BMT-i tests assess the skills expected to be acquired by children in their respective age groups, between the ages of 4 and 13. They are meant to be simple to administer, short (10-30 min per domain, depending on age), and easy to score, and they can be taken at school or during an appointment with a health professional. Their purpose is rapid identification of children in the general population who require specialized assessments for precise diagnosis of LD, as recommended by France's Haute Autorité de Santé (HAS) (15). Standards defined by the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME) have guided test design and contributed to their validity (19).
Here we report psychometric data on the validity of the BMT-i using a sample of over a thousand French-speaking childrenwithout prior complaints or previously identified LDsrepresentative of the mainland French school-age population.

BMT-i Description
Design of the BMT-i has proceeded in several steps since 2010. Over the last 5 years, it has been gradually implemented, stratified by age groups and cognitive functions, and finally computerized. BMT-i tests apply neuropsychological models for a separate first-line examination of each of the five major domains of academic skills-i.e., (i) written language (reading fluency, reading comprehension, and spelling) (20) and (ii) mathematical cognition (numbers, arithmetic, and problem-solving) (21)and cognitive function-i.e., (iii) oral language (vocabulary, grammar, and phonological skills) (22), (iv) non-verbal functions (reasoning, drawing, handwriting, and visuospatial construction), and (v) attentional/executive functions (see Table 1 and Supplementary Data). For this last domain, the computerization of BMT-i tests allows objective standardized measures of the scores in the main attentional/executive processes (sustained and selective attention, flexibility and inhibition, working memory). While the academic aptitude tests are adapted to each grade level, most of the cognitive function tests are identical across a given group, i.e., "youngest" (kindergarten through first grade), "intermediate" (second through fourth grade), or "oldest" (fifth through seventh grade). Scores are instantly and automatically converted into normed results that are summarized in a report. The BMT-i is intended for use by trained health professionals and their teams, including pediatricians, child psychiatrists, school doctors, general practitioners, psychologists, specialized professionals such as speech therapists, psychomotor therapists and occupational therapists. The published versions of the BMT-i tests (23) are described in the Supplementary Data.

Population Recruitment
The rational for the sample size for BMT-i corresponded to a classical approach in a descriptive study for obtaining an estimation of a prevalence p with both a specified precision (0.05) and a chosen degree of confidence (0.95). The children were exposed to an adapted testing corresponding to their grade, categorized into three levels depending on their age (kindergarten, elementary-school, middle school). Figure 1 describes the target population.
This prospective study included 1,074 children aged 4-13 (522 boys and 552 girls) from 12 mainstream public or private schools across France (Greater Paris, Toulouse, Orleans, and rural areas). The 12 schools voluntarily participated and represented the diversity of their geographic (urban, suburban, or rural) and socioeconomic environments. After approval was granted by their respective regional education authorities and 99% of parents gave informed consent, teachers agreed that children in their classrooms would be tested in alphabetical order. All children were tested except those (i) severely handicapped, (ii) having no parent who spoke French, or (iii) whose parents did not consent to the tests (see Figure 1: 5.8% of the initial sample).

BMT-i Testing
Tests were administered during the 2015-2016 academic year. During each of the three trimesters of the French academic year, a third of the participating children were tested-with the exception of the younger kindergartners (middle kindergarten section, ages 4 and 5), who began testing in February.
The tests were administered in a single session (average duration: 45 min) for kindergartners; two sessions for elementary-school students (average total duration: 90 min); and because of the greater number of mathematical cognition tests for their age group, three sessions for middle-school students (average total duration: 120 min).
The job category of each parent was recorded, using the nomenclature of the French National Institute of Statistics and Economic Studies (INSEE) (24). The most socioeconomically privileged job category for each household was used for grouping into three categories: "underprivileged" (manual workers, non-managerial employees, unemployed), "average" (higher-level non-managerial professionals, farmers, artisans, storekeepers, and small business owners), and "privileged" (managers, executives, engineers, and other knowledge workers). Households were considered bilingual if they met the INSEE criterion, i.e., one of the two parents spoke a language other than French.
Tests were individually administered by an examiner from a group of eight speech-language pathologists and five neuropsychologists, who had received two sessions of collective training. The testing took place in a designated room of each school on a Microsoft Surface Pro 3 convertible laptop running Windows 8. Instructions for each test were displayed on the screen, and the examiner also provided explanations to children, especially the youngest. For the sake of consistency, items that had to be read to the children were recorded in advance, and the recordings were played back by the application. The only exceptions were dictation and reading questions, for which the child's pace had to be considered. Because the tests were computerized, response times could be recorded by the computer. This is particularly important in the assessment of attention and executive functions, where response times are measured to the nearest mils. Children's responses were recorded automatically, when touchscreen input was possible, or manually by the examiner, for oral responses or when more complicated, explicit scoring was required. Scores were instantly and automatically converted into normed results.
Examiners participated in semimonthly review meetings led by the authors, and frequently asked questions were regularly published to address potential scoring ambiguities. A clinical research assistant verified inclusion conditions (stratification), observance of the protocol, and thoroughness of tests. After anonymized data were exported, three of the neuropsychologist examiners performed double scoring of study logs under the authors' supervision.

Inter-Rater and Test-Retest Reliability
The scoring of most tests was objective and unbiased as responses were either automatically recorded or had clear answers (written language, mathematical cognition, reasoning and attention tasks). For scoring of participants' reproductions of simple or complex figures (463 children) and of handwriting (342 children), grade-specific inter-rater reliability coefficients were calculated using a random sample (Figure 1).
The 10th child on each class list of students was scheduled to be retested for the entire battery and by the same examiner 3 weeks later under strictly identical conditions. At the request of the teachers, the planned retest could only be conducted among kindergarten and elementary school children assessed in the third quarter of the school year in three schools. Therefore, the retested subsample consisted of 22 children (10 boys and 12 girls) aged 4.8-11.3 years and belonging to one of the three groups of classes: (i) kindergarten through first grade, (ii) second through fourth grades, and (iii) fifth grade (Figure 1).

Comparison With Other Tests
An additional study was conducted within the same schools to compare the consistency of the BMT-i with standardized reference test batteries commonly used in clinical practice (Figure 1). Children were arbitrarily selected to take reference tests that assessed the same functions, according to age-specific standards, within 2 weeks of taking the BMT-i. To compare written language tests, the authors administered the standardized tests used by French speech therapists-for reading, Quelle Rencontre (25) and Le Vol du PC (26); and for dictation, Chronosdictées (27) and Le Corbeau from the L2MA test battery (28)-to 44 third graders (26 boys and 18 girls, 8.1-9.2 years old) and 96 middle schoolers (50 boys and 46 girls, 10.8-13.1 years old). For pattern completion, the BMT-i was compared to WISC-V Fluid Reasoning subtests, including Matrix Reasoning, administered to 73 children (48 boys and 25 girls, 6.5-13.5 years old, grades 1-7) (29).

Statistical Analyses
The inter-rater reliability coefficients for drawing and handwriting assessment were calculated and evaluated using correlation and linear regression coefficients.
Test-retest reliability was measured using the intraclass correlation coefficient, which considers school level to be a fixed covariate measure (30). An intraclass correlation coefficient between 0.50 and 0.75 indicates an average level of reliability; > 0.75 and ≤ 0.90, a good level; and >0.90, an excellent level (30).
Test item homogeneity was analyzed using DIMTEST (31) for dichotomous variables and LISREL (32) uni-dimensionality tests for the others. Score reliability was measured by Cronbach's alpha (33), where ≥ 0.70 indicates a good level of reliability (34).
In addition, the quality of fit between the theoretical model and the empirical data was estimated through confirmatory factor analysis using the Root Mean Square of Error Approximation (RMSEA). RMSEA values of < 0.08 are deemed acceptable (35). Analyses were conducted by grade level because of the use of age-specific items for the different domains.
Statistical analysis of the test battery comparison included correlation of raw scores (correlation and linear regression coefficients). Degree of agreement was determined by calculating Cohen's kappa: values in the range of 0.21-0.40 indicate fair agreement; 0.41-0.60, moderate; 0.61-0.80, substantial; and 0.81-1.00, almost perfect (36). For the purpose of comparison, scores on the BMT-i and reference tests scores were categorized as very low (7th percentile or lower), low (7th through 20th percentile), or normal (>20th percentile).
Analyses were carried out using JMP software (37) and the lme4 statistical package for R (38).    (40). In 6% of the cases, children had undergone reeducation or therapy before the test, and in 2% of the cases, children were still receiving such support at the time of testing. Very few students had repeated (0.6%) or skipped (1.4%) a grade.

Inter-Rater Reliability
Inter-rater reliability coefficients for a random sample revealed stable scores on the figure copying (r: 0.77-0.97) and handwriting (r: 0.76-0.84) assessments. Correlations and regression coefficients were significant for all grades ( Table 3). Table 4 shows the intraclass correlation coefficients for each test. Most coefficients ranged from 0.8 to 0.9, corresponding to a good level of reliability. None were below 0.67. Differences between values for the 2.5th and 97.5th percentiles were relatively small.

Uni-Dimensionality and Internal Consistency
The authors first sought to evaluate the hypothesis of test unidimensionality for the 1,074 participating children-that is, to confirm that each of the relevant tests did indeed evaluate the same aspect of the skill in question. For most if not all grades, tests of mathematical cognition, auditory attention, oral language, and non-verbal function (except for the figure copying test taken by the oldest kindergartners, which included the three most complicated figures) were uni-dimensional. For children in kindergarten and elementary school, due to the limited number of mathematical test items, composite scores were assigned. Table 5 shows values of Cronbach's alpha, reflecting the degree of internal consistency for BMT-i scores, and Table 6 gives means and standard deviations for tests whose format did not permit calculation of Cronbach's alpha. In the area of written language, reliability of scores for decoding among older kindergartners and first graders, and of total scores for dictations, was good to excellent. In the area of mathematical cognition, for all classes, composite scores based on the results of the main subtests demonstrated a good level of reliability. The same is true of accuracy scores obtained for mental math operations and comparison of number representations, and in middle school, for the various subtests. Scores on most of the verbal tests, the two reasoning tests, and the auditory attention test also indicated a good level of reliability. On block construction tests, levels of reliability were excellent in all classes for time to completion, and good (older kindergartners and first graders) or satisfactory (second to fifth graders) for accuracy. With regards to drawing tests, the level of reliability was good for time to completion, but insufficient for accuracy scores. Table 7 presents RMSEA values (0.036-0.075) indicating compatibility of scores for all tests-in the five areas of verbal, non-verbal, and attentional/executive functions; written language; and mathematics-and grades with the underlying theoretic model. Table 8 shows that BMT-i scores for reading time, reading accuracy, and dictations were significantly correlated with reference test battery scores at both the middle-school and third-grade levels (r ≥ 0.78). For reading comprehension, the correlation between BMT-i and reference tests scores was high at the third-grade level (r = 0.78) and average for the two BMT-i's texts at middle-school (text 1: r = 0.47 and text 2 r = 0.57). There is an average correlation between BMT-i pattern completion test scores and the WISC-V Matrix Reasoning subtest (r = 0.57) and Fluid Reasoning Index (r = 0.44), respectively. Table 8 also indicates agreement (Cohen's kappa) between the classifications of BMT-i and reference test

DISCUSSION
Here we report on the validity of psychometric data collected from a large sample of French children, without prior complaints or previously identified LDs, using a novel computerized battery of tests, the BMT-i. This single screening tool includes diverse tasks aimed at identifying the different aspects of LDs, as internationally recommended (4,10,(14)(15)(16). Each test can be used separately with specific norms, allowing relevant tests to screen for one or more areas of complaint. Its computerized format has the merit of limiting measurement bias in the reporting and rating of children' responses for most subtests. In particular, the two attentional tests of the BMT-i are computerized and the global results are directly provided by an algorithm. Inter-rater reliability coefficients, calculated to estimate the effect of subjectivity on the assessment of drawing and handwriting, confirm the stability of the total score (41). Despite the limited number of retests, intra-class correlation coefficients were appropriate for all tests-including those for which internal consistency was insufficient (30).
The uni-dimensionality of most of the tests (i.e., proof that each indeed evaluated the same aspect of the given aptitude) allows for dependable interpretation of scores as indicators of children's aptitudes for reading, spelling, math, and various cognitive functions (verbal, non-verbal, and attentional). The coefficients of internal consistency, describing test score reliability, are generally satisfactory, but scores on some tests, including for quality of drawing, were very unstable. Time to completion offers additional information about a child's skills, as long as it is carefully considered in the light of the quality score.
To verify the consistency of score data with the theoretical model and determine whether the five cognitive domains were accurately represented, confirmatory factor analyses were performed. These indicated that test scores were significantly related to the cognitive skills they theoretically represented. Hence, the results reported are aligned with the generally recognized theoretical structures associated with the five domains of academic skills and cognitive function (2,4,6,10). It is worth noting certain relationships between test types. Reading comprehension scores form a group with oral language test Values given as mean (standard deviation). KG, kindergarten; med, median; NE, negative errors (i.e., no answers given); PE, positive errors (i.e., wrong answers given); RT, reaction time; SD, standard deviation. scores but not with reading times or reading errors. At the middle-school level (sixth and seventh grades), all scores on written language tests are grouped with those for oral language tests. This grouping of reading comprehension with oral language skills is consistent with the different profiles of written language disorders described in the literature (dyslexia vs. poor reading comprehension) and with the links between oral language and reading comprehension skills (20,42), and it justifies the need to assess both reading fluency and comprehension as well as oral language (43). Comparison of BMT-i and reference tests revealed high levels of correlation in all areas of written language, except reading comprehension among middle schoolers, for which r values indicated average correlation. The correlation between the BMT-i pattern completion test scores and the WISC-V fluid reasoning subtests suggests the reliability of potential referrals for the indication of a psychometric assessment for which it is not a substitute. No comparisons were made in areas other than written language and reasoning.
Interpretation of these results must be tempered by recognition of the various limitations of the study. To begin with, the results of the reading comprehension assessment vary according to the nature of the tasks proposed, which points to a need for more precise tests. In addition, the reference tests selected were those available at the time of our study. Recent tests would have allowed a single, more elaborate battery to be used for all measures from second grade up (20,44). The inter-rater reliability could not be determined for all subtests across the entire population owing to the diversity of population of schools where testers examined children. Furthermore, test-retest reliability could only be assessed for a group of 22 children. The present validation of the BMT-i with a large sample of children representative of the diverse mainstream school population in France sets the stage for its use in first-line screening to identify LDs in children with difficulties flagged by parents or teachers. However, use in the diagnosis of LDs will require verification of its sensitivity, specificity, and predictive value, relative to other tests, in children with complaints. The BMT-i might be administered for preliminary cognitive assessment of children who are struggling in school, to properly refer them for specialized assessments.
The methods and tools employed for identification of LDs differ between countries and professions, and an international consensus has yet to be reached (5). LD screening tests are expected to be short and easy for non-specialized professionals to administer and interpret. Many tools that employ a language specific to the country in question and that target a particular domain are available to help identify children requiring a pedagogical intervention or specialized evaluation. The BMTi is the only tool in French that meets this objective for all domains concerned, over a wide age range. For oral language, the reliability of current instruments is deemed insufficient to permit screening in young children without complaints (22); the quality of these instruments must be improved (45). Present methods for identifying reading difficulties are also imperfect (46,47), ranging from a simple, carefully validated teacher questionnaire to the classic Wechsler Individual Achievement Test. Recent mathematics research insists on the importance of analyzing the different number manipulation and arithmetic skills (21,48). Future development of computerized tests is expected (49). Moreover, the frequent comorbidities of LDsnamely handwriting, visuospatial (50), or attentional, and executive disorders (51,52) deserve particular attention. In conclusion, the BMT-i can offer an initial appraisal of cognitive functions and help guiding children to specialized assessments and appropriate interventions (10). Hence, this study paves the road toward ongoing studies in populations with complaints.
Getting help for LDs, which are inconsistently recognized, is an often expensive and complicated process, and the support that is received varies, but the BMT-i could make it more accessible and affordable.

DATA AVAILABILITY STATEMENT
The anonymized results and data of our research are available upon request to the first and corresponding author.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
CB and SG led the study and collected data. CB and J-CT (test-rest reliability) performed analyses. ET reviewed the analyses. AMi, J-CT, and AMu discussed the results. CB wrote the manuscript. MT and AMu revised the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
The battery of tests was updated through extensive collaboration with clinical and research teams. Nedjma Messaouden, Violaine Baille, Pauline Dujardin, and Clémence Eber prepared the two texts and reading questions for children in grades five (CM2) to seven. Alain Ménissier helped design the arithmetic problems, and Michel Fayol, PhD, advised for all aspects of mathematical cognition. Manuela Piazza, PhD, assisted with the number-representation comparison test. Neuropsychologist Stéphanie Iannuzzi designed the attention tests, and occupational therapist Cécilia Galbiati designed the complex figure test. Neuropsychologists Sahawanatou Gassama, Hélène Cellier, Marine Chambart, Chloé Chambart, and Mèlanie Rodriguez, together with speech-language pathologists in training Gaëtane Avril, Mélanie Fruchart, Maïa Guerric, Caroline Lacombe, Louise Piednoir, Louis Raphaël, Cecilia Robson, Diane Rubini, Clémence Sagot, and Anne Vouters, contributed to the calibration of the experimental protocol. Jean Michel Albaret, Sarah Manoha, and Thiébaut Noël Willig assisted with the block construction test. Jean Denis Texier and Romain Balloy from the company Clic-Droit computerized the battery of tests. We extend our gratitude, first and foremost, to the children, for their cooperation within the constraints of the study protocol; to their parents, who trusted us; to the principals and teachers, who welcomed us into their schools and assisted with organization; and to the school inspectors.