Validation of PleaseApp: a digital tool for the assessment of receptive pragmatic abilities in children with neurodevelopmental disorders

Background Pragmatic skills allow children to use language for social purposes, that is, to communicate and interact with people. Most children with neurodevelopmental disorders (NDDs) face pragmatic difficulties during development. Nevertheless, pragmatic skills are often only partially assessed because the existing instruments usually focus on specific aspects of pragmatics and are not always adapted to children with communication difficulties. In this sense, digital tools (e.g., apps) are an optimal method to compensate for some difficulties. Moreover, there is a lack of pragmatic tools measuring the receptive domain. Therefore, the present study aims to validate PleaseApp as a digital instrument that measures eight pragmatic skills by presenting the design of the assessment tool and its psychometric properties. Methods PleaseApp was designed based on previous empirical studies of developmental pragmatics in children with and without NDD. PleaseApp assesses eight receptive pragmatic skills: figurative language, narrative, reference, indirect speech acts, visual and verbal humor, gesture-speech integration, politeness, and complex intentionality. The study involved 150 typically developing children between 5 and 12 years of age. Results A confirmatory factor analysis proposes an eight-factor model with no underlying factor structure. The eight tests that compose PleaseApp have obtained a model with a good fit and with adequate reliability and validity indices. Discussion PleaseApp is an objective, valid, and reliable tool for assessing pragmatic skills in children with NDD. In this sense, it helps to assess whether a child has acquired pragmatic skills correctly according to his/her age and clarify the specific problems a child has in eight different components to plan personal and personalized interventions.


Introduction 1.Pragmatics
Language is made up of phonetic-phonological, lexical-semantic, morphosyntactic, and pragmatic skills.Pragmatic skills allow us to appropriately use language in context or a communicative situation (1).In this sense, information from the context includes physical aspects of the situation where the conversation occurs and the social, cognitive, and linguistic context of the discourse (2).Consequently, the acquisition and development of pragmatic skills depend on linguistic abilities (e.g., structural language and exposure to conversations) (3,4) and theory of mind skills to correctly infer the communicative intention of the speaker during a dialog (5).
Importantly, pragmatic competence is a multidimensional phenomenon that encompasses a wide range of interdependent expressive and receptive skills that are linked to other developmental skills to a greater or lesser degree.Therefore, its acquisition and development also cover a wide range of ages throughout childhood and adolescence (6,7).It should be noted that the subcomponents of pragmatics may be variable depending on the literature or the field of knowledge consulted since pragmatics is a multidisciplinary area in which studies that come from psychology, speech-language therapy, and linguistics come together.
In this sense, there is a need for multidimensional approaches to pragmatic development to create a receptive method of assessment and to apply it in the health and education fields.Receptive pragmatic assessments would allow us to assess children's pragmatic skills as listeners, which is important to ensure success in conversation.Some of the most relevant pragmatic receptive skills are the following: -Figurative language understanding.This ability involves deducing when the speaker's productions have a different meaning (figurative) than what is actually expressed (literal) (4,8).For its understanding, it is necessary to inhibit literal meaning and include the comprehension of metaphors (novel and conventional), idioms, or similes, among others (9,10).In this sense, similes are considered easier than idioms or metaphors to understand because similes contain an explicit syntactic cue (e.g., "like") that a comparison is necessary (8).-Narrative.Narrative skills include mainly expressive abilities to generate a story and to retell it after having heard it (11).However, before generating a story (telling or retelling), children must use their receptive pragmatic skills to order the given episodes (e.g., pictures) using details of contextual information (mainly cognitive, social, and linguistic details) to construct a coherent and chronologically ordered story (2).-Reference.Reference skills allow a speaker to describe or represent reality through language, providing enough information so that another person can understand it (12).In this sense, the listener also has expectations about the optimality of the reference expressions used by the speaker, and he must detect when these are not met (e.g., when the speaker does not provide enough information or it is ambiguous).Therefore, he must request more information or clarification (13).-Indirect speech acts.Speech acts can be classified as direct and indirect acts (14).Indirect acts are used to communicate more information than what is actually said (e.g., using insinuations).
Thus, to understand indirect speech acts (that is, to understand the actual intentionality of a speaker), a listener must grasp aspects linked to the theory of mind, especially in relation to the recognition of facial expressions or intonation (15).-Humor.Understanding humor requires making inferences to the context and the communicative intentions of the speaker, to understand the ludic or funny meaning (16).Humor occurs when there is a discrepancy between what is expected and what really happens or is perceived (17).The incongruity and resolution that leads to finding humor can occur both through a visual element or in a verbal element of the context (e.g., pictures or sentences).-Gesture-speech integration.Multimodal skills include the ability to integrate iconic gestures with speech to improve the understanding of words and messages, especially if it is complex.
In this sense, a listener must be able to integrate gestures (to complement information, supplement information, or finish a sentence) and the sentences of a speaker in a conversation (18, 19).-Politeness.Courtesy consists of being polite to others or showing solidarity and kindness.It requires adapting linguistic behavior by choosing the appropriate words and understanding the mental states of people and the social norms of the situation (20).In addition, it requires a social understanding of interpersonal relationships, such as relationships with the speaker or vertical relationships (21).-Complex intentionality.It includes the ability to understand the communicative speaker's intention hidden behind a non-literal, indirect, or false message (22).A speaker breaks these pragmatic rules both deliberately (e.g., when lying or being ironic to make a joke) as well as non-deliberate (e.g., when committing a mistake or confusion).In this sense, it is considered a metapragmatic skill (20), and theory of mind skills are essential to infer the actual intentions and other mental states of the people involved (9,23).
Pragmatic skills are a key aspect of socialization with peers in inclusive settings, and therefore, accurately detecting what specific problems a child has in these areas would allow these skills to be improved and treated in a way adapted to real needs (24).

Typical and atypical pragmatic development
For most children, the ability to use language to communicate with others is a taken-for-granted skill.However, most children with neurodevelopmental disorders (NDDs) present pragmatic difficulties of a greater or lesser extent as a consequence of the implication of linguistic and cognitive factors in the correct development of pragmatic skills (2), which prevents them from taking part in daily social activities (e.g., at school or family environments).It is important to note that children must have developed sufficient structural language skills before higher-order pragmatic deficits can be detected (5-6 years), but difficulties may be latent during the preschool period (24,25).Among some of the child NDD populations at greatest risk of suffering difficulties in the acquisition and development of pragmatic skills is the Autism Spectrum Disorder (ASD).Regarding figurative language, various empirical and theoretical studies describe general difficulties in this area (e.g., 8, 9, 26), as well as specific difficulties in novel metaphors (27) or idioms (4).In this sense, impairments on the metaphor tasks seemed to be linked to language impairment within the disorder regardless of autistic features (8).Regarding narrative skills related to ordering a story from pictures, autistic children do not have special difficulty in ordering causal or mechanical scenarios or referring to everyday routines, but difficulties appear when ordering episodes, which include the mental and psychological states of the people (28).Similarly, they have difficulties in narrative production and when it comes to realizing inferences in narratives, including issues with coherence, connection between events, and/or giving irrelevant information or saying unusual or bizarre comments, among others (2,29,30).In relation to reference skills, autistic children manifest both expressive problems and receptive problems, such as detecting violations of conversational maxims related to quantity (e.g., make your contribution as informative as required) (31, 32).
Regarding indirect speech acts comprehension, various studies describe both difficulties and strengths in autistic people, and results are often mixed in most cases depending on their level of structural language and their age.For example, difficulties in understanding indirect requests in children and autistic adults have been demonstrated (30,33), but some strengths are found in autistic adults (34), preadolescents (35), and children (36) with a better level of language.In relation to humor understanding, studies demonstrate the existence of problems understanding some forms of humor from childhood to adolescence (17,30).Specifically, autistic people can understand certain types of humor (from puns, antics, or simple jokes to very clever and precisely formulated comments), but they have more difficulty solving mentalistic-type jokes (37).Moreover, the veracity of the context can influence their sense of humor since their creativity is based more on reality than on imagination or fiction.Regarding gesture-speech integration, a low competence has been demonstrated as well (38).
Regarding politeness, there are some studies that describe the difficulties these children have in using some forms of courtesy (39).Finally, in relation to complex intentionality, studies show both difficulty in understanding mistakes, that is, discerning intentionality from unintentionality (23,40), and also correctly understanding masked communicative intentions (9,31,41).Similarly, the difficulty in understanding irony has also been detailed in a more concrete way, closely related to their theory of mind skills (34).Moreover, regarding children with Communication Disorders, various studies have observed difficulties in pragmatic components in both children with Social Communication Disorder (SCD) and children with Developmental Language Impairment (DLD), although they are usually less pronounced in children with DLD (29).Regarding Figurative language, these difficulties have been described both for SCD and DLD in understanding idioms (42, 43) and for DLD in novel metaphors (8).Moreover, most studies have focused on children with DLD (e.g., as a control group for autistic children or to better study the role of structural language in pragmatic difficulties).In this sense, some studies describe that children with DLD have difficulties in narrative production or inferring information from narratives (2,44) and in understanding indirect requests (45).Difficulties in reference skills have also been observed in children with DLD (13), as well as identifying uninformative quantifiers (46) or detecting violations of conversational maxims related to informativeness (31).Regarding humor, some studies describe difficulties in understanding graphic humor for children (47) and general humor in the adolescent population (48).Similarly, some studies also describe difficulties in gesture-speech integration (18, 49).In relation to politeness, certain difficulties in the use of politeness formulas have been stated (32).Finally, regarding complex intentionality, different studies show difficulties in understanding mistakes and faux pas (50) or irony (31) for children with DLD, as well as difficulties in understanding masked communicative intentions for both DLD and SCD (47,51).
To a lesser extent, children with Attention Deficit Hyperactivity Disorder (ADHD) also present more pragmatic difficulties than their peers with typical development, although not with the same severity as autistic children (52).Regarding Figurative language, they have difficulties understanding figurative language with and without context, as well as idioms (53).In relation to narrative skills, some studies describe difficulties in narrative production (e.g., topic maintenance, event sequencing, and referencing), which are evident over and above general language functioning (54).Regarding reference, again, the studies found are fewer, but some difficulty is also demonstrated in their reference skills (55).Similarly, they have difficulty understanding some indirect requests (53).Regarding humor, some studies have found that it is an area of strength in these people since low inhibitory control is advantageous for divergent thinking (56).In this sense, difficulties in the appreciation of humor have only been demonstrated in children with ADHD with comorbidity with non-verbal learning disorder (57).Regarding gesture-speech integration, no explicit evidence has been found, although some difficulty in the perception of non-verbal cues has been demonstrated.Moreover, some difficulty is described in some forms of politeness (58).Finally, regarding complex intentionality, there is evidence that their primary difficulties in executive function can lead to difficulties in understanding mistakes, irony, and intentionality (59).It must be noted that these difficulties are related to other pragmatic expressive components such as the management of social discourse (impulsive speech, interruption of conversations and inappropriate initiations, loss of information in the dialog, and little attention paid to context) (54).In fact, there are studies that have shown that these partially explain the high rates of social incompetence (54).
Finally, it is important to note that children with Intellectual Disabilities, as a consequence of general cognitive difficulties, also show pragmatic difficulties such as understanding long and complex conversations, understanding deceptions, double meanings, and metaphors, or organization of discourse.Specifically, the existence of specific pragmatic difficulties has been studied in Williams Syndrome, Fragile X syndrome (60), and Down syndrome (61).
This bulk of evidence shows that different pragmatic abilities are relevant to children's development and differentiate profiles in children with pragmatic needs, which motivates the assessment of these abilities.Moreover, the population with NDD has shown a great variability of pragmatic abilities with potential difficulties in one or more of its subcomponents, which requires a comprehensive assessment to identify the strengths and weaknesses of the child's pragmatic skills (25,62).

Pragmatic assessment
Pragmatic assessment is one of the linguistic components that has received less attention in clinical research.However, there are some In this sense, some of the existing tests are designed in other countries (generally English-speaking), and some items are not valid for other cultures (65).Regarding the methods used by experimental pragmatics research, these investigations have developed a wide set of empirically validated and research-based methods that extract direct measures of comprehensive pragmatic abilities.The design of these tasks has isolated different pragmatic capabilities in the comprehensive domain by quantifying the number of correct responses.However, to the best of our knowledge, there is a lack of pragmatic assessment tools with direct measures of pragmatic comprehensive skills that implement evidence-based methodologies.
On the other hand, it is difficult to find assessment tools that cover the full set of existing pragmatic skills in the expressive and comprehensive domains and that give a comprehensive view of pragmatic ability across the entire developmental age range (24,25).Moreover, existing pragmatic tools tend to focus on specific aspects of pragmatics (e.g., understanding of figurative language or use of conversational skills) and forget certain essential aspects (e.g., theory of mind skills or dimensions empirically studied by leading authors like reference skills) (3).To the best of our knowledge, there is a lack of formal assessment tools that integrate different measurements of pragmatic receptive abilities, which would better inform the actual competences of children with NDD in the comprehensive domain.

Digital assessment tools
In recent years, technology-based assessments have been increasing, as they provide a motivating and attractive environment for children (with and without NDD) and they prefer them to more traditional methods (66).Moreover, additional processing time and reduced anxiety were associated with face-to-face interactions for people with NDD, such as autistic children (67).Regarding the assessment of pragmatic abilities, digital formats also provide innovative ways to provide information about contextual factors that are crucial for the assessment of pragmatic disorders (25).Thus, in the pragmatic area, technology-based assessments offer a unique opportunity to create communicative contexts as similar as possible to real communicative situations, as they allow multimodality, such as the use of audio, image, movement, and text (e.g., audio recordings to provide structural language information, background images of the context to be integrated with the information, or the interaction with response buttons that include contextual information).Nevertheless, they cannot substitute real communicative contexts and real interactions with people, so the information that they provide must be used together or matched with other pragmatic ecological assessments if possible (e.g., observational measures or questionnaires) to have a better assessment of the real pragmatic behaviors of children.
Digital tools also have great potentialities in health, as well as in other disciplines, as they can be used both in face-to-face and teleintervention formats.Even though assessment tools to diagnose NDD have typically been created in an analogic format, research has already explored its online use with teleassessment practices (e.g., 68).In fact, the COVID-19 pandemic served to develop remote diagnosis and intervention models for autistic people (69).In line with this, professionals used assessment strategies during the COVID-19 pandemic lockdowns as an alternative method of service delivery (70).Moreover, teleintervention has already been shown to be a valid and effective modality in the screening and diagnosis of children with socio-communicative needs (71).In general, telehealth facilitates the diagnosis of children with developmental concerns (72).However, these studies demonstrate the feasibility, effectiveness, and diagnostic accuracy of teleassessment tools and protocols that have been adapted to a digital format.In this sense, there is a need for innovative digital assessment tools that integrate evidence-based procedures newly created in an online format that can be administered in face-to-face and online formats.

Aim of the study
For this reason, the design of PleaseApp was considered to provide the clinical and scientific community with a formal measure that allows the evaluation of receptive pragmatic skills.PleaseApp aims to expand the number of pragmatics components (and items) to carry out a complete assessment of pragmatics that contributes to making a differential diagnosis between those disorders that present comorbidity in this area.In this sense, item variations of PleaseApp (e.g., including the theory of mind content in some items) have the potential to be evaluated through the tool that would help in some cases to determine a specific diagnosis toward pragmatic difficulties.
Moreover, PleaseApp will allow establishing differences between and within disorders, pointing out weaknesses and strengths in the different components and subcomponents of pragmatics, and also allow the type of error children make when they do not answer correctly.Finally, the variation of items and the inclusion of the theory of mind contents will allow the assessment of the theory of mind's comorbid difficulties.
Therefore, the aim of the present study is to evaluate the psychometric characteristics of the digital tool PleaseApp as a formal assessment of receptive pragmatics in a sample of primary school children and to analyze the data obtained, taking into account the age and sex of the participants.

Design of the tool
The development and design of PleaseApp include different steps.In the first step, previous scientific evidence about pragmatics and existing instruments were analyzed.On the one hand, several theoretical and empirical studies on the developmental milestones of pragmatic skills in typically developing children were reviewed, as well as the scientific procedures for the assessment and intervention of these skills.On the other hand, the specific strengths and difficulties common in children with different NDDs were also reviewed, such as ASD, DLD, SCD, or ADHD.This helped us to establish the structure and different subtests of the app and the variations between items within each level.In the second step, 10 different levels were created, with 12 items per level.However, the data from the validation of the present study allowed us to define the items and levels that had adequate psychometric properties to be part of the test.So, the current version has only eight subtests because the subtests related to lexical inference from contextual information and metapragmatic skills in conversation were excluded due to the reduction of valid items.
Moreover, PleaseApp screening happens in eight scenarios familiar to the children (e.g., school) that contextualize each level and its plot.This is an attempt to make sense of the instruction given to the child at the beginning of each level, and to engage the child in the story requires employing his or her receptive pragmatic skills correctly.
Table 1 shows a definition of each subtest (level), and the variation of its items is presented.Moreover, references to the most relevant empirical or theoretical studies for the design and construction of each of the levels are also included.
An example of an item from each of the levels is given in Figure 1.
Each item has three types of response coding: one is considered a correct response and the other two are considered incorrect.These three gradations of the response have been established depending on their use of pragmatic skills to go beyond the explicit or literal meaning, taking criteria of appropriateness (relevance of the answer given to the question that was asked), accuracy (informativeness), and veracity (truthfulness of the information provided), and following scorings similar to Andrés-Roqueta and Katsos (31).Therefore, each item is scored with 2, 1, or 0 points.For example, at the Figurative Language level, the three response options refer to figurative meaning (2 points), literal but probable meaning (1 point), and incoherent/ inconsistent literal meaning (0 points).

Participants
A total of 153 boys and girls who attended different mainstream schools in the Valencian Community (Spain) were recruited in the present validation study, of which 54.9% were boys and 45.09% were girls.The distribution by age and sex is shown in Table 2.

Procedure
As discussed in the Method section on PleaseApp design, the authors designed a battery of 12 items for each of the 10 initial assessment levels following the theoretical review.Out of these 12 items, the first two items were test items, and they were used to ensure the child's understanding of the instruction, and the next 10 items were the assessment test.Thus, the initial version of the instrument was composed of 120 items (20 test items and 100 assessment items).
After obtaining the authorization of the regional government, the ethics committee of the university, where the project was carried out, and the school authorities, informed consent was requested from the parents of the participating children.
First, this initial version of the instrument was administered to a pilot group of 16 children (two children of each of the target ages of the digital tool App, that is, from 5 to 12 years) to assess the comprehension and appropriateness of the items and instructions.
Then, the research group administered PleaseApp individually to each participant of the sample using computers, tablets, and laptops from the school and from the research groups, together with other standardized tests presented in the section on instruments.
Once the data were collected, the psychometric properties of PleaseApp were analyzed.For this purpose, construct validity analysis was performed by means of confirmatory factor analysis (CFA), reliability analysis, and convergent validity.
These analyses led to the elimination of a number of items due to their low correlation within their level and the elimination of two levels (metapragmatic in conversation and new lexical contextual inference) that did not obtain a model with adequate properties, thus configuring the final structure of the tool, which is presented in the Results section.

Pragmatic formal measure: BLOC-S-R
The pragmatic subtest of the Objective Language Criteria Test-Screening Revised (Batería de lenguaje objetiva y criterial, BLOC-S-R) (80) was used to have a measure of the expressive pragmatic competence of the participants with an existing and validated measure.It is aimed at children between 5 and 14 years of age.It consisted of 19 items (raw scores 0-19) that evaluate a child's use of language in concrete communicative situations and social interaction with respect to different speech acts such as greetings and farewells, requesting, giving or refusing permission, asking for specific information making questions (e.g., who/what, where/when, and why/how), or making direct demands for action, among others.
The child is presented with a black-and-white graphic scene (a vet's clinic) with different characters.The child is the main character of the scene, and he is asked to produce a sentence that fits different interactions within the scene (e.g., the main character has to greet a woman he already knows in the clinic: "what would he say to greet the woman?").In this sense, the child must understand the communicative intention of the character and say what the character should say in that particular situation (in the first person).

Skill Level task mechanism Item variation
Example of the gradations of response (related to items presented in Figure 1) Cinema.Put in order the different scenes of a movie.
2 points: correct order of the four pictures (In the example: 1 point: order with correct ending (In the example, combinations different to the correct one but with number 2 as the last picture).
0 points: other possible order combinations.

Politeness (POL)
Train.Choose the correct, polite expression and avoid those that are too direct (rude) or incoherent.
Greetings and goodbye, thanking, forgiveness, apologizing, asking for permission, and presenting or rejecting a gift through a white lie, among others (73-75).
2 points: polite and coherent (Excuse me, can I seat here?) 1 point: rude and coherent (Ei, I want to seat here).
0 points: polite but incoherent (Excuse me, why are the seats green?).

Reference (REF)
Kitchen.Ask (or not) for more information to find the correct object on the shelf because the cook sometimes is underinformative.
Chef: "Pass me the pineapple." 2 points: Informative (First the child presses "?, " the chef specifies if the big or the small one, and then the child presses the correct referent).
2 points: funny and coherent (In the example, picture 3).
1 point: not funny but coherent (In the example, picture 2).0 points: funny but not coherent (In the example, picture 1).Zookeeper: The parrot house does not get the sun; it is like a fridge 2 points: figurative meaning (In the example, picture 1).
1 point: literal but likely meaning (In the example, picture 2).0 points: literal and unlikely meaning (In the example, picture 3).
Moreover, an example is given of the three gradations of response related to examples of items provided in Figure 1.

Receptive vocabulary: PPVT-III
The Spanish version of the PPVT-III was used to assess the level of comprehensive vocabulary of the sample (83).It is aimed at children between 2 and 6 and 90 years old.The test items are organized in blocks of 12, each ordered by age.The total score can range between 0 and 192 points.This test has adequate psychometric properties: (a) Internal consistency of items: High reliabilities (minimum of 0.90) were reported for the 25 age groups of the norm sample with median reliability of 0.95; (b) Split half reliability: reliabilities ranged from 0.86 to 0.97 for the standardization age groups for both forms; and (c) Test-retest: corrected coefficients were reported between 0.91 to 0.94 with no difference in magnitude between the two forms (84).

Theory of mind formal measure: TEC
The Test of Emotion Comprehension is a formal measure of emotional understanding for children between 3 and 11 years of age (85), and it was used to obtain a social cognition measure of the children.In this sense, the ability to understand emotions is linked with social cognition because understanding others' emotions requires understanding the significance of the relations of other people with their goals and context (86).The Spanish version of the test is currently in the validation phase.Thus, to conduct the present study, one of the authors of the TEC provided the authors with the Spanish version of the instructions adapted by professors Carlos Hernández Blasi and Francisco Pons.
This instrument allows the formal assessment of nine components of emotion understanding: recognition of emotions (component 1), external causes (component 2), emotions based on desires (component 3), emotions based on beliefs (component 4), emotions based on memories (component 5), regulation of emotions (component 6), hiding emotions (component 7), mixed emotions (component 8), and moral emotions (component 9).The TEC raw score ranged from 0 to 9, and it was obtained by adding the sub-scores for the nine components.The TEC consists of 23 cartoon scenario stories (black-and-white pictures), and it is available in both girl and boy versions.A brief story is read by the examiner first, and then the child is asked to choose the correct facial expression (emotion) for the main character from among four given options of a combination of happy, sad, angry, scared, Examples of PleaseApp items.

Non-verbal reasoning formal measure: Raven's progressive matrices
The Raven's Progressive Matrices test was administered to have a non-verbal reasoning score for each participant (RPM) (89).It is a multiple-choice visual task of abstract reasoning.The test requires the participant to infer a rule to generate the next items in a series or to determine whether a presented design is consistent with the rule.Items become progressively more difficult, building upon knowledge accumulated from the test.Nevertheless, as the age of the sample ranged from 5 to 12 years old, both General and Colored versions of the test were used, depending on the age of the participant.
In this sense, the Colored Progressive Matrices test (CPM) was aimed at children from 4 to 9 years of age; therefore, it was used to assess non-verbal reasoning and learning potential from participants in our sample aged between 5 and 9 years.It contains 36 items, so raw scores range between 0 and 36 points.The Standard Progressive Matrices (SPM) is aimed at people from 9 to 70 years of age, and it has 48 items, so raw scores range between 0 and 48 points.Hence, it was used to assess non-verbal reasoning and learning potential from participants in our sample from 10 to 12 years of age.
The standardization study generated a value of 0.80 in test-retest reliability (90).

Data analysis
First, the psychometric properties of PleaseApp are presented.For this purpose, confirmatory factor analysis (CFA), reliability analysis, and external (convergent) validity tests were performed.The data were found to follow a normal distribution (Z = 0.072; p = 0.053).
To empirically examine the factor structure of the test, a confirmatory factor analysis (CFA) based on structural equation programming (91,92) was carried out using the AMOS 29 program with a sample of 153 participants.The variances of the latent variables were set at 1.0.The variances of the error terms were specified as free parameters.The maximum likelihood (ML) estimation method was used.
The following fit statistics were performed: the Satorra-Bentler chi-square (ꭓ 2 ), the standardized chi-square (ꭓ2/df), the statistical likelihood (p), the root mean square error of approximation (RMSEA), the comparative fit index (CFI), the Tucker-Lewis index (TLI), and the standardized root mean square residual (SRMR).The reliability of the questionnaire is examined using Cronbach's alpha coefficient (93), with factor loadings obtained in the CFA and corrected correlations.Descriptive analyses of each item, such as mean, standard deviation, skewness, and kurtosis, are also calculated.To analyze the correlation among the different levels and the relationship between PleaseApp levels and other related measures (convergent validity), Pearson's correlation is used.
Second, an analysis of the data obtained is presented.For this purpose, to determine the level of the children in each dimension measured, as well as the differences between the variables sex and age, descriptive analyses (mean and standard deviation) and statistical analyses were performed using Student's t-test and analysis of variance (ANOVA) with the corresponding size effects (Cohen's d and Eta squared).Taking into account the ANOVA results, Tukey's post-hoc is calculated.
Finally, the structure that showed better properties was an eightfactor one without an underlying structure for these factors.Table 3 displays the model of each factor.
As can be seen in Table 3, all the indices are in the desirable ranges, indicating a good fit of the models.
The estimated values of the parameters are presented graphically in Figure 2, all being statistically significant.
To find out the relationship between the different levels measured, Table 4 shows the correlations between each of the levels with the total PleaseApp.
As can be seen, narrative ability (NARR) is significantly related to all dimensions except reference skills and humor.

The dimension of gesture-speech integration (GES) correlates with the understanding of indirect speech acts (IND), complex intentionality (INT), and figurative language (FIG). Referencing ability (REF) is significantly related to the understanding of verbal and visual humor (HUM). Comprehension of indirect speech acts (IND) is related to complex intentionality (INT), politeness (POL), and understanding of figurative language (FIG). Complex intentionality (INT) is correlated with politeness (POL) and understanding of figurative language (FIG)
; and politeness is also related to the understanding of figurative language (FIG).Furthermore, as shown in the Table, all dimensions, except humor, are related to the total dimension of PleaseApp.
As can be seen, all levels obtain an adequate alpha value, between 0.704 and 0.81, so that all levels have a satisfactory internal consistency.
Next, for each of the levels, the mean score, standard deviation, skewness, and kurtosis are shown below.The corrected correlations and the reliability of each test are presented using Cronbach's alpha coefficient.Similarly, changes in Cronbach's alpha are estimated if any item is eliminated (see Table 5).
As can be seen, all the correlation values between the item and the corrected total (eliminating the item, corrected homogeneity index) exceed the value of 0.30, and there is no improvement in the Cronbach's alpha value if any item is eliminated; therefore, we cannot distinguish any weak item in the test.

Convergent validity
To analyze the content validity of the different levels of PleaseApp, the participant sample was administered other instruments that assess factors associated with the development of pragmatics, such as non-verbal reasoning, structural language (grammar and vocabulary), and theory of mind (different components of emotion comprehension).Moreover, an existing pragmatic subtest of a Spanish Language Battery was also administered as a direct similar measure test of pragmatics.The correlations between these tests and the PleaseApp levels are presented in Table 6.
First, in relation to the non-verbal reasoning measure, a positive and strong correlation of the scores was observed between the participant's performance in general (PleaseApp) and the scores in both CPM and SPM.Table 6 also shows the specific correlations of each level with this non-verbal measure.In this sense, the data in the first two columns indicate that performance on all levels, except gesture-speech integration (GES), was correlated with participants' non-verbal reasoning skills.
Second, in relation to structural language measures, a positive and strong correlation of the scores obtained by the participants in the tool in general (PleaseApp) was observed with both the grammatical measure (CEG) and the vocabulary measure (Peabody).Table 6 also shows the specific correlations of each level with both receptive measures of structural language.In this regard, the data in the third and fourth columns indicate that all levels, with the exception of reference (REF) and humor (HUM), were correlated with participants' structural language skills.
Similarly, a positive and strong correlation of the scores obtained by the participants in the tool in general (PleaseApp) was also observed with the theory of mind measure (TEC).Table 6 also shows the specific correlations of each level with the theory of mind measure.In this regard, the data in the fifth column indicate that all levels, with the exception of reference (REF) and humor (HUM), were correlated with participants' theory of mind skills.
Finally, a strong positive correlation between the scores obtained by the participants on the tool in general (PleaseApp) and the pragmatic measure of BLOC was observed.Table 6 also shows the specific correlations of each level with the BLOC expressive pragmatics measure.In this regard, the data in the last column indicate that all levels, with the exception of gesture-speech integration (GES), reference (REF), and humor (HUM), were correlated with participants' expressive pragmatic skills.

Analysis of demographic differences (age and sex)
To analyze the differences in the different levels by sex, a comparison of means for independent samples was carried out.The data are presented in Table 7.
As can be seen, in general, no significant differences were found between boys and girls.Specifically, significant differences were found in the reference level, although the small effect size should be taken into account (d = −0.44).
Regarding the age of the children, Table 8 shows the means and standard deviations by the eight age ranges, both in the general measure PleaseApp and in the different levels.
Thus, to know if there were differences in the scores obtained according to the age of the participants, an ANOVA was carried out.The data indicate that there were differences according to the age of the participants in six out of eight levels (NARR: F = 13.06,p < 0.001, η 2 = 0.39; GES: F = 3.01, p = 0.006, η 2 = 0.12; IND: F = 3.19, p = 0.004, η 2 = 0.13; INT: F = 3.36, p = 0.002, η 2 = 0.14; POL: F = 7.155, p < 0.001, η 2 = 0.25; FIG: F = 12.72, p < 0.001, η 2 = 0.33), except in the reference  Similarly, significant differences by age were found in the total measure PleaseApp (F = 10.413,p < 0.001, η 2 = 0.33).Table 9 presents the means and standard deviations by age groups.In this sense, to find out whether there were significant differences in means between the different age groups in the different levels measured in PleaseApp, a post-hoc analysis (Tukey) was conducted.Results showed that, for each level, there is a significant jump at different ages.

Discussion
The aim of the present study was to evaluate the psychometric characteristics of the digital tool PleaseApp as a formal assessment of receptive pragmatics in a sample of primary school children and to analyze the data obtained, taking into account the age and sex of the participants.
The field of study of pragmatics has been supported by different professional disciplines and theoretical approaches, as well as the clinical implications for the assessment of pragmatic skills in different populations with NDD.This fact motivated the authors to develop PleaseApp as a digital tool for the formal assessment of receptive pragmatic skills.The development of PleaseApp included different phases.In the first phase, the different pragmatic components were decided based on previous scientific evidence.For this purpose, on the one hand, several theoretical and empirical studies on the developmental milestones of pragmatic skills in typically developing children were reviewed, as well as the scientific procedures for the assessment of these skills.On the other hand, we also reviewed the specific needs detected in populations with different NDD that have primary language and/or communication difficulties, above all those coming from ASD and DLD samples and also those coming from other disorders who also suffer secondary pragmatic impairments such as children with ADHD.
PleaseApp is composed of eight levels, with item variation within each level: (1) narrative (cinema level), which assesses the child's ability to order sequences of a story (mechanical, behavioral, and mentalistic); ( 2) politeness (train level), which assesses the ability to identify correct politeness formulas in a specific situation; (3) reference (kitchen level), which assesses the understanding of the optimality of reference expressions and when an expression is under informative); (4) indirect speech acts (school level), which aims to assess the ability to detect indirect communicative intentions and hints; (5) complex intentionality (beach level), which assesses the understanding of masked communicative intentionality, specifically the detection of intentionality/non-intentionality, the use of irony (sarcasm, questions, assertions, and exaggerations), and the valence of the intention (positive/negative); (6) gesture-speech integration (circus level), which assesses the ability to understand meanings when a speaker integrates iconic gestures with speech; (7) humor (TV level), which assesses comprehension of humor, specifically the detection of incongruities (visual/verbal), and type of humor (puns, semantic analogy, and jokes that include theory of mind); and (8) figurative language (zoo level), which assesses comprehension of non-literal language, specifically metaphors (novel and conventional) and similes.
After testing the structure of two models, one with eight firstorder factors and one second-order factor and a second model with eight interrelated factors, these did not yield acceptable fits.Therefore, an eight-factor model without an underlying structure to these factors was the one that showed better properties.The eight tests that makeup PleaseApp have obtained a model with adequate adjustments and with adequate reliability and validity indexes, so it is considered an adequate evaluation instrument to measure these eight aspects.
The significant correlations between all these dimensions with the variable PleaseApp (total score), except for humor, indicated the existence of an association between the components assessed with pragmatics.Nevertheless, in relation to the humor component, there may be different explanations for why it was not correlated with the total score of PleaseApp.First, this component includes both visual and verbal items to grasp the funny meaning, and perhaps only verbal ones are more related to pragmatic competence but not the total visual ones.Second, it is observed that from all the related measures, it only correlates with non-verbal reasoning (and in children from 9 to 12 years old, since this correlation is only observed in the group in which the Standard Progressive Matrices have been used).Third, as happens in the reference component, age does not contribute to differentiating the abilities of the sample since the means are very similar in the different age groups assessed.In this sense, it seems that, on the one hand, humor is a skill that is more related to the ability to find creative solutions to a problem (especially from a visual level), which relies more on the pragmatic skills of the subject when the joke includes verbal information.Future studies on each specific component should study this issue in depth, considering both item variation of the component and its association with different cognitive, linguistic, and social aspects related to pragmatics.
Similarly, significant correlations were observed between the PleaseApp components with non-verbal reasoning, structural language (i.e., grammar and vocabulary), and theory of mind as related factors associated with the development of pragmatics.Moreover, a significant correlation was observed between PleaseApp and an independent pragmatic standardized formal measure.Therefore, the relationship between the aspects evaluated in PleaseApp and pragmatics abilities measured externally is confirmed.However, it must be noted that the TEC measure was used as a formal measure of theory of mind and that this measure tests nine different components of emotional understanding (including complex aspects such as morality), and its raw score was obtained by adding the sub-scores (ranging from 0 to 9).In this sense, although it is a good measure that covers different periods of development of the affective theory of mind skills, the range of scores that it offers is not as wide as the other instruments used in the present study to offer a complete profile of social cognition of the participants.So, future studies should study the relationship of the pragmatic subcomponents of PleaseApp  with more measures of social cognition to map the whole competence (94) to better conceptualize existing impairments in the different neurodevelopmental conditions (95,96).
Except for the reference level, no significant differences were found taking into account the sex of the participants, although it should be noted that girls obtained higher mean levels in the different dimensions.In contrast, there are significant differences according to the age of the participants in all levels except for reference level (kitchen) and humor level (TV).In this regard, means of the different age groups showed a progressive increase in all the levels.
One of the main contributions of PleaseApp to the research and clinical community is the provision of an evidence-based, comprehensive approach to multiple skills linked to pragmatic development in children from 5 to 12 years of age.It allows the assessment of pragmatic ability in eight areas with direct measurements, determining a general idea of age-appropriate    interlocutor's intention [e.g., (31,48)].In this sense, it is a tool that will help future studies assessing neurodivergent children to state strengths and difficulties in different aspects of pragmatics and to study in depth its association with different aspects of the disorder (primary or secondary).Finally, another contribution is that it is a receptive formal measure that will allow us to triangulate information with other ecological and formal expressive instruments available in our context [e.g., CCC-2) (63); CORP (97); or CELF-5 (64)].
PleaseApp has also been designed to motivate children when they are being assessed.In this sense, the different levels have been contextualized in familiar environments and activities for children to provide a functional and meaningful environment for assessment.Moreover, PleaseApp is a digital tool created digitally from its origin, that is, it is not an adaptation of an existing tool, although it is based on the study of different tasks and manuals, books, and existing tests of pragmatics for the theoretical preparation of its structure and its items.In this sense, it facilitates the assessment and the correction by professionals, and its digital features make it appropriate for the assessment of both typically developing children and children with NDDs, as has been demonstrated in other studies (62, 63), above all to compensate for cognitive and linguistic aspects when being assessed such as oral expression, language comprehension, attention, speed processing, imagination, or working memory, among others.
Therefore, PleaseApp is an easy tool to use based on scientific background, and it has adequate psychometric properties to be implemented in different contexts, such as clinical health, social health, and education.Future research could address the integration of other pragmatic skills, such as metapragmatics in conversation, new lexical contextual inference ability, or quantifier comprehension.
Furthermore, the success of the communicative situation or dialog not only depends on the pragmatic receptive skills as a listener but also on the expressive skills of the speaker (25,58).In this sense, expressive formal measures would complete a picture of the actual strengths and difficulties a child has in the area of pragmatics.In this sense, although PleaseApp allows to offer communicative contexts with multimodal details as similar as possible to real situations, it is not as ecological as real communicative contexts, and a professional must match the information obtained with other pragmatic ecological assessments (e.g., observational measures, interviews to caregivers, or questionnaires) to have a better picture of the actual strengths and difficulties of pragmatic behaviors of children in real interactions with people.As a general conclusion, the importance of the present results for research is also highlighted because this is the first empirical study that measures different pragmatic components in the same typically developing sample of 5-to 12-year-olds.In addition, evidence is provided that the PleaseApp has adequate psychometric properties and, therefore, can be used reliably and validly in the assessment of these pragmatic aspects in primary school children with NDD.
0 points: Incorrect (Another combination, e.g., pressing the orange).Indirect speech acts (IND) School.Decide what to respond when non-explicit information is given to you.
2 points: the correct indirect speech act is inferred (Do you want me to bring you a book?). 1 point: a different indirect speech act is inferred (You can buy another one with wheels).0 points: a literal meaning is inferred (I think it will weigh about 10 kilos).Complex intentionality (INT) Beach.Decide why characters are saying these sentences.Intentionality/non-intentionality (e.g., faux pas, mistakes, confusions, errors versus lies or white lies) (9, 23), irony (sarcasm, questions, statements, exaggerations) (76), and the moral valence of the main intention of the speaker (positive or negative) (38).Girl: "Do you have any ice cream left?. " Boy: "No, I do not."Question: Why does he say "No, I do not." 2 points: the correct intention is inferred (Because he does not want to give it to her.). 1 point: another intention is inferred (Because he has forgotten he has it.).0 points: response is based on literal or incoherent aspects (Because he has already eaten it).sentence.Grammatical category: actions and objects (18, 19).Ballerina: "While I was cutting things for the show." 2 points: correct object (scissors).1 point: semantic competitor (knife).0 points: gesture competitor (clothespin).Humor (HUM) TV.Detect the inconsistency by selecting the visual or verbal element that makes the image funny.

FIGURE 2
FIGURE 2Diagram of the 8 models that compose PleaseApp with standardized parameter values.

TABLE 1
Definition of each subtest (skill and level) of PleaseApp and the variation of its items is presented.

TABLE 2
Age and sex descriptives of the sample.

TABLE 3
Values of the indices used to evaluate the fit of the models of the eight factors that make up PleaseApp., chi-square; df, degrees of freedom; p, overall model significance; ꭓ2/df, normalized chi-square; CFI, comparative fit index; TLI, Tucker-Lewis index; RMSEA, mean square error of approximation; SRMR, standardized root mean square residual. χ2

TABLE 4
Correlations (Pearson's r) between the eight levels with each other and with PleaseApp.

TABLE 5
Descriptive statistics of the items and item-total correlation of each of the test.

TABLE 5 (
Continued) pragmatic skills of the child in comparison to the sample of reference, and also regarding the strengths and weaknesses in each component.Moreover, it is possible to find out relevant aspects related to each component that will allow us to disentangle different difficulties in different disorders (e.g., processing theory of mind content in narrative skills and the rest of the areas assessed), as their variation of items according to the theoretical basis, and also is possible to find out what type of mistakes children are having when using their pragmatic skills (e.g., literal interpretation vs. incoherent interpretation).Coding participants' responses for pragmatic appropriateness is important to investigate what type of inference the person is making, and to what extent they benefit from contextual cues or understanding of the

TABLE 6
Correlations (Pearson's r) between PleaseApp levels and PleaseApp total score, with direct measures of non-verbal reasoning, vocabulary, grammar, theory of mind, and pragmatics.

TABLE 7
Differences by sex of the participants in the different levels.

TABLE 8
Means and standard deviation in PleaseApp and each level by age group.

TABLE 9
Differences between each of the ages at all levels.