Pediatric Sleep Tools: An Updated Literature Review

Since a thorough review in 2011 by Spruyt, into the integral pitfalls of pediatric questionnaires in sleep, sleep researchers worldwide have further evaluated many existing tools. This systematic review aims to comprehensively evaluate and summarize the tools currently in circulation and provide recommendations for potential evolving avenues of pediatric sleep interest. 144 “tool”-studies (70 tools) have been published aiming at investigating sleep in primarily 6–18 years old per parental report. Although 27 new tools were discovered, most of the studies translated or evaluated the psychometric properties of existing tools. Some form of normative values has been established in 18 studies. More than half of the tools queried general sleep problems. Extra efforts in tool development are still needed for tools that assess children outside the 6-to-12-year-old age range, as well as for tools examining sleep-related aspects beyond sleep problems/disorders. Especially assessing the validity of tools has been pursued vis-à-vis fulfillment of psychometric criteria. While the Spruyt et al. review provided a rigorous step-by-step guide into the development and validation of such tools, a pattern of steps continue to be overlooked. As these instruments are potentially valuable in assisting in the development of a clinical diagnosis into pediatric sleep pathologies, it is required that while they are primary subjective measures, they behave as objective measures. More tools for specific populations (e.g., in terms of ages, developmental disabilities, and sleep pathologies) are still needed.


INTRODUCTION
There is significant power in the efficiency and cost-effective nature of questionnaires and surveys as contributors to aetiological discoveries of a wide range of medical disorders. These instruments however, do not always possess the objective nature of medically advised and established tools, e.g., polysomnography, and can become a hindrance to adequate diagnoses, particularly when neglecting recommendations of their development (1). Despite these problems, there has been considerable effort to transform the structure of health questionnaires, specifically in the field of pediatric sleep, to reflect a systematic approach of the highest concordance to medical diagnostic standards.
The systematic review by Spruyt et al. (2,3) in 2011, publicly summarized the shortcomings of questionnaires and their developmental standards while advising a thorough procedure in which to follow to adequately evaluate or develop a tool.
Since this time, a variety of tools have been established, both adhering to and overlooking the recommended steps. More detailed information on the 11 steps can be found in Spruyt et al. (3). Briefly, Step 1 is to reflect on the variable(s) of interest and targeted sample(s).
Step 2 is to consider the research question that the instrument will be used to address. Thus, the goal of this step is to reflect on whether the tool will be suitable to collect the type of data required to address your hypothesis. Steps 3 (response format) and Step 4 (items) build on the two preceding steps. They allow us to reflect not only on "which" questions and "which'" answers assesses the variable(s) of interest, but also on "how" a question is formulated and "how" it can be answered. The common goal of steps 1-4 is that we want the underlying "concepts" and/or "assumptions" contained in the questions, such as language (e.g., jargon), meaning and interpretation of the wording to be identically understood by all respondents. Getting as close as this ideal as possible will minimize errors of comprehension and completion.
Step 5 involves piloting of your drafted tools. Piloting also prevents disasters with the actual data collection. In fact, Steps 2-5 should be an iterative process, meaning that we do them repeatedly, until a consensus has been reached among experts and/or respondents with descriptive statistics underpinning those decisions. Assessing the performance of individual test items, separately and as a whole, is Step 6 (item analysis). There are two main approaches to item analysis: classical test theory and the item-response theory, either of which should be combined with missing data analysis. The next step is about identifying the underlying concepts of the tool (Step 7 Structure) because only rarely is a questionnaire unidimensional. Steps 8 and 9 are about assessing the reliability and validity, respectively. Reliability does not imply validity, although a tool cannot be considered valid if it is not reliable! Several statistical, or psychometric, tests allow us to assess a tool's reliability and validity (cfr. textbooks written on this topic). For instance, validation statistics of the tool may involve content validity, face validity, criterion validity, concurrent validity or predictive validity.
Step 10 is about verifying the stability, or robustness, of the aforementioned steps. It is the step in which you assess the significance, inference, and confidence (i.e., minimal measurement error) of your tool, using the sample(s) for which it was designed.
Step 11 involves standardization and norm development, allowing large-scale usage of your tool.
This review aims to conclude the trends associated with these questionnaires, and reinforce the importance of certain stages of tool development and highlight the direction of research that would be ideal to follow.

MATERIALS AND METHODS
To achieve consistency and retrieve relevant studies to the Spruyt (2, 3) review, the search terms(*) and databases were mirrored; "Sleep" AND ("infant" OR "child" OR "adolescent") AND ("questionnaire," "instrument," "scale," "checklist," "assessment," "log," "diary," "record," "interview," "test," "measure"). The databases included PubMed, Web of Science (WOS), and EBSCOHOST (per PRISMA guidelines). Additional limitations to the search criteria were applied for date and age range of the respective study populations. Database-wide searches were conducted between 18 th of April 2010  publication date of search) and 1 st of January 2020. Age categories listed in PubMed filters between 0 and 18 years were also applied to restrict the search to pediatric populations alone. Contrastingly, language criteria were not specified but post hoc constrained to English. Papers in other languages could not be evaluated by one of the authors, in case a consensus on the psychometric evaluation was needed. The search for relevant studies extended to authors in listserver groups PedSleep2.0 and the International Pediatric Sleep Association (IPSA) in order to achieve maximal inclusion. The refinement of these study characteristics ensured that the systematic review would evaluate relevant studies in pediatric tool development, adaptation, and validation. Final search count was sizeable (refer to Figure 1).
Full-text access was achieved through the literary database "Library Genesis" or author contact if necessary (see Acknowledgments). All flagged citations were then manually screened for relevant keywords in their respective titles, abstracts and methods to further refine studies relevant to the systematic review-these being 11 psychometric steps (2, 3) and 7 sleep categories (sleep quantity, sleep quality, sleep regularity, sleep hygiene, sleep ecology, and sleep treatment) (4). Consequently, independent studies were highlighted and screened, and each study's descriptive variables were extracted and collated. Any absence of indispensable information regarding the tools use was addressed through contact of authors.

Statistical Analysis
A total of 11 steps (2) and 7 sleep categories (4) were extracted and were statistically analyzed for frequency and descriptive assessment (refer to Tables 1 and 2). Any variables unmentioned or neglected were described as "empty," and tabulated as such in the forthcoming interpretations. Continuous variables will be described as mean values (± standard deviation) and categorical variables will be shown as absolute and relative values. Statistical analyses were performed with Statistica version 13 (StatSoft, Inc. (2009), STATISTICA, Tulsa, OK).
were screened accordingly, resulting in the omission of 193 articles and final inclusion of 144 articles. Exported abstracts were then assigned their respective full-text. Complete text access was not available for 14, while retrieved from either the literature database "Library Genesis" or via author permission (n=4, see Acknowledgments), leaving 144 or 70 tools eligible for review based on the search conducted.
A more thorough examination of methodological processes was then executed to reveal categories to which each article was suitably assigned for ease of future assessment (refer to Table 1); "New Development (N)," "Psychometric Analysis (P)," and "Translation (T)/Adaptation (A)," or a combination thereof. Each paper was assigned to the appropriate criteria; "Development" if the report's main purpose was to produce an unprecedented tool, "Psychometric Analysis" if the explicit objective was to assess the reliability and validity of said tool, and "Translation and/or Adaptation" for all studies that in any way translated or altered a tool to suit a specific population, culture, and/or nation. Overall ( Table 2), 36.8% of the studies aimed to merely psychometrically evaluate a pediatric sleep tool, while 9% additionally translated it. 24.3% of the studies aimed to independently translate while 4.2% additionally adapted their tool. As for lone adaptations, there were 4.2% of studies that performed this, while 18.8% created an entirely new tool. 1.4% of the studies conducted both a new tool development and translation and alike, 0.7% of studies adapted their new tool to particular population, culture, or other.

Study Characteristics
The structural organization and publication features of each study are detailed in Table 1. In the Appendix are the acronyms for each tool reviewed. Since the 2011 Spruyt review on pediatric diagnostic and epidemiological tools, approximately 144 "tool"-studies have been published. The focus into pediatric tool evaluation peaked in 2014 where 16.7% of all studies were conducted, closely followed by 2017 (13.9%), and 2016 and 2019, each at 13.2% as well as 2015 at 12.5%. As for the remaining years of this decade, between 2010 and 2014, 2018 , the percentage of total studies published ranged from 0.7%-9.7% (n=1-10) per year. Over a third of the total studies were published in Europe (38.9%), followed by North America (25%), Asia (18.1%), Middle East (2.8%), South America (7.6%), Australia and Oceania (6.3%), and the United Kingdom (1.4%).
Across all 144 studies evaluated, it was evident that sleep tools were predominantly developed and evaluated for a combination of children and adolescents between the ages of 6-18 years (27.1%), followed closely by tools for adolescents 13-18 years at 22.2% and children 6-12 years alone at 16.7%. Only 10 studies covered the 0-18 years age range, and one did not define its range (82). Meanwhile, only 5.6% of all the studies assessed tools for preschool-aged children (2-5 years) alone and 1.4% for infants (0-23 months) alone. As for the studies remaining, a combination of age ranges was investigated with the most predominant combination being both preschool children and children (ages of 2-12 years) at 8.3% of the total studies. The                      lesser frequent combinations of age ranges for which tools were assessed in these studies, ranged from 0.7-7.6% per combination. As for the sample size, this ranged between 20 and 11,626 children inclusive of adult (6)(7)(8)(9)(10)(11)(12)(13) participants across all publications, where 15.6% of all studies used a sample size >1,000 participants large ( Table 2). Of these study samples, approximately 46.5% of respondents were parents, 41% were self-report, and 11.1% either a combination of experts, children, mothers, and parents. For two, the respondent is primarily a professional (17,95).

Sleep Categories
As exemplified in Table 2, the overall focus of these studies was overwhelmingly directed at tools measuring the quality of sleep or identification of sleep pathologies in all pediatric age classifications (68.1%), followed by the levels of sleepiness (55.6%) and duration of sleep (48.6%). Various secondary coobjectives of these studies were to investigate tools measuring the sleep regularity (46.5%) and sleep hygiene practices (29.2%). Rarely but in existence, was the singular assessment of sleep ecology and treatment around sleep pathologies at a frequency of 21.5% and 0.7%, respectively. About 19 studies (13.2%) queried simultaneously nearly all categories (except treatment).

Tools Newly Developed
According to our search criteria, a total of 27 novel pediatric sleep tools were developed between 2010 and 2020 (refer to Table 2 and shaded). Of these, approximately eight were published in Europe (29.6%), eight in North America (29.6%), four in Asia (14.8%), three in South America (11.1%), two in Australia and Oceania (7.4%), and two in the United Kingdom (7.4%). The majority were developed for child-adolescent age ranges (66.7%), while one for preschool children (2-5 years) and one for all three aforementioned ages (2-18 years). All newly developed tools possessed a multipurpose objective, most of which assessed sleep quality (77.8%), followed by the assessment of sleepiness (51.9%) and sleep regularity (41.7%) and sleep quantity (41.7%), while more rarely assessing hygiene (25%), ecology (12.5%), and treatment (4.2%).
In addition, three tools being newly created are an English translation of the NARQoL-21 (70) and YSIS (140), and also an adaptation, the nighttime sleep diary (NSD) (71). The latter being a diary adapted to monitor nighttime fluctuations in young children with asthma.
Only two tools were developed according to the 11 aforementioned steps required for psychometric validation of a tool; the NARQoL-21 (70) and SNAKE (129) (refer to Table 2). One other tool, OSPQ (83) also developed normative scores for widespread usage while fulfilling most steps but steps 3 and 9. Whereas the CSAQ (27)

Tools Adapted
Moreover, six studies (see Table 2) specifically aimed to adapt a tool from a preexisting one, most notably the Children's Sleep Habits Questionnaire (CSHQ) (66.7%), among these a shortened version and infant adaptation, along with the BEDS (14) (16.7%) adapted toward children with Down syndrome, and the OSA-18 Questionnaire (16.7%), which was also shortened [toward OSA-5 (80)] to suit the sample of interest. Although the number of items may have changed, no substantial changes to the answer categories could be noted. Only 33.3% reported steps 3, 4, 5, 7, 10 yet steps 6, 8, 9 were analyzed in 83.3%. None developed norms. In two studies (38,44) ROC analyses were pursued for the CSHQ.

Tools Adapted and Translated
Six studies adapted and also translated existing tools (see Table  2): CSHQ (29), PedsQL (91,92), SQS-SVQ (131), TuCASA (139), and NSS (72). The CSQH and TuCASA were adapted and translated to Portuguese, the PedsQL to Arabic and Chinese, while SQS-SVQ to Turkish and NSS to Chinese. The adaptations involved an infant version of CSHQ and child-sample for NSS, the PedsQL to children with cancer and acute leukemia, and the TuCasa was adapted toward children of low socioeconomic status. Regarding the SQS-SVQ it was modified based on personal communication with the authors of the original version. That is, four items were added.
For these tools Steps 3 and 11 were not performed, while Steps 8 and 9 were performed in all. About half (50%) did steps 5, 6, and more than half step 7 (66.7%) and less than half did step 10. Some aspects of step 4 were inconsistently applied across 83.3% of the studies (e.g., expert perspective).

Tools Psychometrically Evaluated and Adaptations
Three tools underwent evaluation but were simultaneously modified: FoSI was adapted for adolescents (54), and a reduced itemset was suggested for aMEQ-R (65) and PROMIS (102).

Tools Psychometrically Evaluated and Translated
In addition to the 53 instruments validated, there were 13 studies flagged that additionally translated their respective tools (refer to Table 2); the ASHS to Persian, the BEARS to Spanish, CCTQ to Chinese, the CSHQ to German and Spanish, the ESS to Tamil, the MEQ to European Portuguese, the MESSi to Turkish, the PSQ to Chinese, Portuguese and French, and the PedsQL to Brazilian Portuguese.

Tools Psychometrically Evaluated, Translated With Adaptations
The Russian version of the PDSS (89) did not report step 3, but executed to a certain extent all the steps to psychometrically evaluate a translated tool to its population. Based on the advice of the area specialist and the focus group of children questions #3 (Trouble getting out of bed in the morning), 4 (Fall asleep/ drowsy during class), 7 (Fall back to sleep after being awakened), and 8 (Usually alert during the day (reverse coded)) were modified for better understanding.

Translations of Tools
Although the studies reported here are English papers, popular translations are Chinese, Portuguese, Spanish, and Turkish. The CSHQ, PSQ, and OSA-18 were the most frequently translated tools.
The CAS-15, PSQ, CSRQ, and ESS studies provided "normative" ROC cutoff scores, with the Krishnamoorthy et al. (51) providing cutoffs for moderate and high excessive sleepiness.
Population-based norms were developed for preschoolers and school-aged children of JSQ. Average T-scores for all as well as for boys/girls in age bands of 2-3, 4, 5-6 years separately are available for each subscale: restless legs syndrome, sensory; obstructive sleep apnea syndrome; morning symptoms; parasomnias; insomnia or circadian rhythm disorders; daytime excessive sleepiness; daytime behaviors; sleep habit; insufficient Regarding the SDSC, French (France and French speaking Switzerland) as well as Chinese T-scores are available. The Chinese study reports average T-scores per the subscales sleepwake transition disorders; disorders of initiating and maintaining sleep; disorders of excessive somnolence; disorders of arousal; sleep hyperhidrosis; and sleep breathing disorders. Whereas the French study copied the approach of the original report, i.e., tabulated the full T-score range from 31 to 100 including marks for clinical ranges.
The CSHQ study aimed to validate the Dutch version of the tool for toddlers while developing norms due to the current inaccessibility of the CSHQ in this age group. Norm values were decidedly the mean total score in the sample population and while the factor-structure was unsupported, the normative score developed was still representative of the presence and severity of sleep problems in 25% of toddlers. Authors report the mean total score for lower/higher socioeconomic status, 2 and 3 year olds, girls and boys, yes/no problem sleepers. The authors similarly provided means and standard deviations for the 23 items of the CRSP.
The MEQ studies are comparable providing means and standard deviations as well as percentiles. Also percentiles are reported in the YSIS study.
For the NARQoL-21 a comparison was made with a validated health-related quality of life tool, and a cutoff of <42 was deemed as sensitive and specific, supplementary available are cutoff scores for differentiating between optimal and suboptimal quality of life.
T-scores for subscales by gender and age (5-7 and 8-10 years old) are provided for OSPQ: sleep routine, bedtime anxiety, morning tiredness, night arousals, sleep disordered breathing and restless sleep.
For SNAKE a t-distribution was generated for Disturbances going to sleep, Disturbances remaining asleep, Arousal disorders, Daytime sleepiness, and Conduct disorders for children in ages between 1 and 25 years old. For the Children's Sleep Comic (ages 5 to 11) stanines were generated for the raw intensity of sleep problem score.

Tools With ROC Analyses
Twenty-eight (19.4%) studies reported ROC findings. This was primarily done for (refer to Table 2) CSHQ (n=4) and PSQ (n=5). That is, in 20% the ROC was calculated given clinical versus control/community samples, while in 48% of the papers a PSG parameter was used (e.g., apnea-hypopnea index, obstructive index). Another criterion was used in 32% of the cases (e.g., validated questionnaire, parental report, or optimal cutoff from original paper).

Papers With Questionnaires Available
In Table 1, the studies (32.6%) that printed or made available their questionnaire in supplementary files or appendix are shown.

Use of Classification Systems
Primarily the ICSD classification system was used to generate/ mimic items for the following new tools: the Pediatric Sleep CGIs (90), RLS (117), SDSC* (125), SNAKE (129), the Children's Sleep Comic (137), and YSIS (140). When tools were psychometrically evaluated and/or translated/modified such as the CSHQ or the SDSC the classification system upon which their original items were generated remains.

Tools Used in Specific Populations
The SNAKE has been specifically developed for children with psychomotor disabilities, and hence serves as a good example of tool development. Whereas the vast majority of studies involved tools that are modifications or compilations, as well as a psychometric evaluation of the tool utility in an "atypical" population.

DISCUSSION
Since the 2011 Spruyt (2, 3) review, it has been encouraged that further psychometric validation is pursued for all questionnaires to develop a broader and more reliable range of tools. While "tools do not need to be perfect or even psychometrically exceptional, they need to counterpart clinical decision-making and reduce errors of judgment when screening for poor sleep," suggested Spruyt (personal communication). This is done through the descriptive, iterative process of a tool protocol and often requires all steps of psychometric evaluation. Without this we have observed that tools rely on minor aspects of their psychometric validity for (clinical) application when this is often fallacious and nonspecific to the study population. Following the systematic review however, a dramatic increase in tool translations and adaptations has been observed which is to be irrefutably applauded. Nonetheless, it is important to develop standardized tests that are culture-free and fair in order to identify sleep issues across the board based on an unbiased testing process.
Twenty-seven new tools have been developed, while most of the papers published reported translations/adaptations or a psychometric evaluation of an existing tool. More than half of the tools queried general sleep problems. Irrespective of the infrequency of tools developed in categories like sleep ecology and treatment, there is an emerging need for further research into these areas given the environmental impact of technology on pediatric sleep in the 21 st century (141,142).
The two new tools that underwent all 11 steps aimed at investigating sleep problems either in terms of a quality of life tool for narcoleptics (NARQoL-21) (70) or as a sleep disorder tool for children with severe psychomotor impairment (SNAKE) (129). Several other tools accomplished nearly all steps (see Tables: OSPQ, CSAQ, BRIAN-K, PADSS, SDSC*, NSD, and YSIS).
Since the 2011 review, tools for specific populations (e.g., in terms of ages, developmental disabilities, sleep pathologies) are still needed. Epidemiological tools assessing sleep in adolescents specifically have received some focus, where they were second in publication frequency. This dramatic influx of relevant research can be a result of the rising sleep-reduction epidemic in teenage populations influenced by biological, psychological and sociocultural factors. In addition, the investigation into the effects of sleep hygiene and ecology (143), which are heavily influenced by sociocultural phenomena, have slowly presented themselves across children and adolescents (6-18 years). With the introduction of technology at the forefront of childhood influence (144,145), pediatric sleep habits and consequently quality is slowly gaining traction where studies flagged here are acknowledging the underlying weight of sleep hygiene on sleep quality and sleep quantity. Although at present, these tools are still demanding attention for further psychometric validation. An urgent call for tools with adequate psychometric properties is concluded in several recent reviews (146)(147)(148).
Especially assessing the factor structure of tools toward construct validation has been pursued, while other steps continue to be overlooked. Similarly, general tools to screen for sleep pathologies remain preponderant since the 2011 review. Alternatively, a file-drawer problem can be expected. Combined with the difficulty of finding a suitable journal to publish a tool validation study, this may lead to a skewed scientific literature toward commonly published and used tools. This is potentially echoed in atypical populations as seen by the influx of psychometric evaluations of existing tools. Undoubtedly, more studies are needed in an era where sleep is rapidly gaining public interest, and the need for a scientifically sound answer on the consequences of a "poor sleep" endemic is pressing.
Several tools pop out for diverse reasons. The first tool of note is the JSQ (58, 59) validated for Japanese children investigating sleep in a large population-based sample flagged by our search and developing normative values for this tool at a 99% confidence interval. This tool is notable in that given its statistical validity and reliability in a large population sample, the plausibility of this being mirrored in other cultures is possible. Important to note however, is that sleeping habits in Japanese children may vary greatly to those in western countries. Therefore, the changes in sociocultural sleep habits when adapting for other populations should be considered. Secondly, SNAKE the sleep questionnaire for children with severe psychomotor impairment underwent all 11 steps and was uniquely developed (hence not modified) for a specific population. More alike are needed (149). Thirdly, PADSS, and BRIAN-K both newly developed tools drew our attention because they examine arousal level and biological rhythm. Although the PADSS may need some further validation studies toward diagnosing, monitoring, and assessing the effects of treatment in arousal disorders in childhood particularly, it addresses the need for more specialized tools. Whereas the BRAIN-K being a modification of an adult version may benefit from additional psychometric evaluations beyond the current age range. Also, the FoSI, measuring fear, being based on the adult version assessing fear in a rural trauma-exposed sample (150) warrants further psychometric scrutiny. In contrast to others, the RLS (117) proposes a difference in scores between two time points 14 days apart to identify RLS-related symptoms. Lastly, addressing the need for tools allowing the child to express themselves regarding sleep is the Children's Sleep Comic, being an adapted version of the unpublished German questionnaire "Freiburger Kinderschlafcomic" and providing pictures for items and responses. Hence, pinpointing to the "un"published tools in the field and a welcomed child's perspective regarding inquiring about sleep in an alternative way.
Adhering to the words of Spruyt, that instruments should be enhancing clinical decision-making and significantly reducing errors of judgment, the study by Soh et al. identified, developed, and abbreviated the OSA-5 questionnaire after recognising preexisting faults in the original 18-item version. It was identified that the OSA-18 was initially designed as a diseasespecific quality of life tool that does not predict obstructive sleep apnea (OSA) symptoms consistent with the gold-standard PSG. Recently Patel et al. (151) scrutinized the accuracy of such clinical scoring tools. Additionally, the study by Soh et al. (80) acknowledged that there exists a lack of parental understanding of some items and their wording in the original instrument. As a result, the OSA-18 was abbreviated to 11-items and then to 5-so that ultimately it would "perform better as a screening tool for use in triage and referral planning." Our review also revealed other tools addressing this sleep problem: I'm sleepy (55). While OSA is increasingly relevant in pediatric epidemiology due to the rise in obesity, parental knowledge of the condition and consequent treatment options is imperative. A recent 2017 study regarding the development of a questionnaire informing parents of this treatment was designed by Links et al. (82). The tool aims to alleviate parental conflict around the choice for or against this treatment in children and is a first in its approach as a questionnaire focusing on medical treatment decision making. Like the objectives of OSA-5, this tool is notable in that it aims to "improve the quality and impact of patient and family decisions about OSA diagnosis and treatment" (82). As part of the personalized/precision medicine era, the CAS-15 (17) and PROMIS-papers pop out. The CAS-15 is one of the few tools where the respondent is the professional. The PROMIS, although presented as a potential screening/diagnostic tool, recently underwent several psychometric evaluations. It involves an item bank of Patient Reported Outcomes Measurement, or better it is intended to measure the subject's "view" of their health status (e.g. sleep). Although these patients reported outcome measures (PROM) adhere to the same psychometric characteristics as diagnostic/screening tools, the scope of a PROM is very different. Namely, PROMs allow the efficacy of a clinical "intervention" to be measured from the patients' perspective. Unfortunately, these specific instruments have not undergone all steps, accordingly, they would benefit from further validation and possible cultural/linguistic adaptation to achieve a more widespread use in the future.
As for the majority of tools that lack the detailed mention above, there is need for comment on the gradually increasing recognition for disease-specific instruments or instruments for specific populations. Alternatively, measuring the severity of sleep conditions over the frequency is still much needed. It was observed by Spruyt that nearly all questionnaires up until the 2010 search, focused on the frequency of sleep problems, however since then, several tools have aimed to increase the specificity and sensitivity of sleep tools to the severity of common pediatric illnesses and specific age groups associated with them e.g. Down syndrome, Narcolepsy (148), infancy, etc. This specificity of condition severity and age may help to refine treatment measures and streamline clinical interventions.
Additionally, in contrast to our review in 2011, the studies reported here are English papers, although popular translations are Chinese, Portuguese, Spanish, and Turkish. That is, between 2010 and 2020 especially the CSHQ, PSQ, and OSA-18 were translated. This is likely an approximation due to the exclusion of non-English papers and of dissertations etc. In 2011, we observed that the development or modification of tools may not always evolve into a scientific paper.
Vis-à-vis fulfillment of psychometric criteria, preliminary and confirmative factor analysis methods have been included in the scope of, and completed in either partially or completely, most the studies which was lacking prior. Primarily construct and content validity via factor structure or item correlation, and Cronbach alpha statistics are noticed. Standardized scoring and item generation however, is still illmanaged as a requirement and is an important step in developing a diagnostic tool or adapting/translating an existing one. Nonetheless, generally, it can be said that much of the studies into tool-psychometrics deserve recognition for endeavoring to adhere to steps 1 through 11. But the overarching suggestion thus far, is to more thoroughly fulfill the facets of validation; i.e. content, convergence, discriminative, and criterion-related validity (steps 8 and 9), pilot questionnaires in the event of an adaptive change made (step 5), examine the underlying factors to ensure (uni) dimensional structure of a said tool (steps 7 and 10) and develop norms alongside cutoff scores (step 11). Furthermore, although several tools mimic classification systems a more thorough psychometric scrutiny thereof is still needed. As a consequence, to date, the vast majority of tools reflect an appraisal of the frequency of a sleep complaint.
Several limitations should be noted. We post hoc limited our flagged studies to only English language given that they reach the broader scientific community. Furthermore, several of the tools included are not 100% sleep tools (e.g. health related). In addition, our way of presenting being "New Development (N)," "Psychometric Analysis (P)," and "Translation (T)/Adaptation (A)," or a combination thereof, involved overlaps in descriptive analyses. Contrary to the original paper by Spruyt, this one did not apply searches in Dissertations and Theses, Google Scholar (Web crawling), ebooks and conference Sleep abstract books, and as a consequence might not be an exhaustive list of tools. Alternatively, studies involving app's did "hit" our search terms yet were not retained during further screening toward our aims. Lastly, given that this is a systematic review we didn't pursue a quality assessment of study designs investigating sleep tools. Nevertheless, in Spruyt et al.
(2) each of the necessary steps are stipulated.

Recommendations
It is recommended that future tools further the investigation into sleep hygiene, ecology [see (143)] and schedules of pediatric populations as this is becoming a highly relevant field of research upon the introduction of technology into sleeping habits and routines. The increasing prevalence of sleep deprivation in children (152)(153)(154)(155) requires in depth discovery as to what damage or lack thereof is being done as a result of a 21 st century society.
In addition to this, it is suggested that pediatric tools should be further introduced and adapted or validated for reporting by children older than 8 years of age. Since there is evidence to suggest that children as young as eight years can report information critical to their own health, it is recommended that a large proportion of questionnaires be designed for children in this age category as well as parents (1). Conjunctional use of these however, is advised to develop any diagnosis.
Although several tools listed mimic classification systems, or were psychometrically evaluated in samples that underwent clinical diagnoses upon a classification system, there is still room for improvement. Combined with primarily convenience samples such as clinical referrals and lack of details on (at risk of being poor) sampling techniques, the internal and external validity of studies might be seriously jeopardized.
Sensitivity and specificity are key in differencing screening versus diagnostic tools. Yet also, the sample on which this difference is determined plays a key role, where the diagnostic tools chiefly aims at subjects believed to have the problem. Thus, screening tests are chosen toward high sensitivity while diagnostic tests are chosen toward high specificity (true negatives).
Lastly, caution is warranted upon a general positive score regarding reliability and validity assessment, and readers are advised to remain critical concerning the statistical techniques applied in the individual studies. Several recommendations for future tool development or evaluation have been listed in Box 1.

BOX 1 | Research agenda: a need for
• Tools assessing sleep ecology, sleep routines/hygiene, regularity, treatment • Psychometric evaluation of apps • Tools for daytime sleep • Tools per sleep pathology • Tools for specific populations • Tools sensitive and specific regards classification systems • Tools adept to developmental changes • Tools differentiating between school days and nonschool days • Tools as a PROM, Patient-Reported Outcome Measures • A venue to publish psychometric evaluations of tools • Methodologic scrutiny regarding sampling (patient/population), statistical techniques, the aim(s), and type of study • Availability of the tools published, especially translations • Equal attention to all 11 steps; e.g. step 3 such as answer but also time format • Replication studies • Self-reporting tools for school-aged children • Question and/or Response formats beyond frequency • Sleep duration not being a categorical answer • Caution regarding "child"-modifications of adult tools or applications beyond the intended age range • Culture-free or fair tools • Reviews and meta-analyses on criterion validity of subjective tools Tool development and evaluation, as mentioned in the past is time and labor-intensive (2). In short, scientific copycats (i.e. replication studies) are needed!

AUTHOR CONTRIBUTIONS
TS performed first search, extracted data, and wrote the first draft during her internship. Her work was updated, verified and finalized by KS.