Comprehensive Analysis of Rare Variants of 101 Autism-Linked Genes in a Hungarian Cohort of Autism Spectrum Disorder Patients

Background Autism spectrum disorder (ASD) is genetically and phenotypically heterogeneous. Former genetic studies suggested that both common and rare genetic variants play a role in the etiology. In this study, we aimed to analyze rare variants detected by next generation sequencing (NGS) in an autism cohort from Hungary. Methods We investigated the yield of NGS panel sequencing of an unselected ASD cohort (N = 174 ) for the detection of ASD associated syndromes. Besides, we analyzed rare variants in a common disease-rare variant framework and performed rare variant burden analysis and gene enrichment analysis in phenotype based clusters. Results We have diagnosed 13 molecularly proven syndromic autism cases. Strongest indicators of syndromic autism were intellectual disability, epilepsy or other neurological plus symptoms. Rare variant analysis on a cohort level confirmed the association of five genes with autism (AUTS2, NHS, NSD1, SLC9A9, and VPS13). We found no correlation between rare variant burden and number of minor malformation or autism severity. We identified four phenotypic clusters, but no specific gene was enriched in a given cluster. Conclusion Our study indicates that NGS panel gene sequencing can be useful, where the clinical picture suggests a clinically defined syndromic autism. In this group, targeted panel sequencing may provide reasonable diagnostic yield. Unselected NGS panel screening in the clinic remains controversial, because of uncertain utility, and difficulties of the variant interpretation. However, the detected rare variants may still significantly influence autism risk and subphenotypes in a polygenic model, but to detect the effects of these variants larger cohorts are needed.


INTRODUCTION
Autism spectrum disorder (ASD) is a neurodevelopmental disorder, characterized by the core symptoms of impaired social communication, restricted interests and stereotyped, repetitive behavior (Bourgeron, 2015). ASD has an estimated heritability of 64-91% (Woodbury-Smith and Scherer, 2018), suggesting a strong genetic effect, but the genetic background is highly heterogeneous (Vorstman et al., 2017). Common risk variants and rare variants both play a role (Bourgeron, 2015), and mutation types range from single nucleotide variants to large chromosomal aberrations, as well as variations in regulatory DNA elements (Turner et al., 2016). The number of genes implicated in ASD pathogenesis is >1,000 according to the SFARI database 1 , which converge on different cellular pathways (Krishnan et al., 2016). A shared genetic environment is possible with other psychiatric disorders as well, especially with schizophrenia (Brainstorm Consortium et al., 2018).
The clinical expression of ASD is nevertheless varied, despite the common umbrella term (Chaste et al., 2015). This high phenotypic variability, however, is mirrored poorly in the everyday clinical diagnostics of ASD. Indeed, DSM-5 now contains a single autism spectrum (Lord et al., 2018). This might be beneficial in the clinic but doesn't serve the analysis of the genetic-phenotypic background. Earlier attempts to identify ASD subtypes or endophenotypes based on clinical features have met with limited success, and there were attempts to define subtypes based on genetics, which led for example to the identification of ASD with CHD8 mutation subtype (Bernier et al., 2014). According to the classical definition, the syndromic ASD is a "disorder with a clinically defined pattern of somatic abnormalities and a neurobehavioral phenotype that may include ASD, " however, this is only present in 4-5% of cases (Fernandez and Scherer, 2017). Instead, if we move to a molecularly defined approach, in 25% of the cases there is a detectable strong genetic change, which may be even higher if multiple MPA are present (Fernandez and Scherer, 2017).

MATERIALS AND METHODS
In this study, we aimed to comprehensively analyze rare single nucleotides, and small INDEL variants in candidate ASD genes in a Hungarian ASD cohort, with NGS. Specifically, we tried to answer four questions in two hypothesis frameworks. On a case-by-case level: (1) How many patients can be detected with a probably strong acting variant (syndromic cases or Monogenic disorder) by targeted NGS gene sequencing? On a cohort level: (2) Is the detected number of variants in the tested genes significant? Does this confirm the pathogenicity status of the given genes? (3) Does rare variant burden associate with autism severity? (4) Do rare variants associate with any autism subphenotype? 1 https://gene.sfari.org/

Patients
Autism spectrum disorder patients were recruited from the Vadaskert Child and Adolescent Psychiatry Hospital and Outpatient Clinic. Detailed clinical examinations consisting of a general medical examination and neurological assessment (NÁV, PB) were performed. A diagnosis of ASD was made by a qualified psychologist (CSP) using the ADI-R (Autism Diagnostic Interview-Revised) and ADOS (Autism Diagnostic Observation Schedule). Patients were screened for MPA, which were selected based on the Méhes Scale (Méhes, 1986). Family history and detailed environmental/societal data were collected from the parent (or parents) of each patient. Any disorders present in the parents, as well as environmental factors, were registered. The diagnosis of ASD was based on the standardized ADI-R in Hungarian, which was published by the Autism Foundation (Kapocs Publisher), according to the following scores: A ≥ 10 (social interaction), B ≥ 7 (communication), C ≥ 3 (repetitive stereotype manner), D ≥ 1 (abnormal development under 36 months).

Genetic Analysis
DNA was isolated from peripheral blood samples from all participants using the QIAamp DNA blood kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. We performed Fragile X-screening at every patient using the Amplidex FMR1 PCR kit (Asuragen, Inc., Austin, TX, United States). Since Fragile-X-syndrome is a well-known cause of syndromic autism, and a unique molecular pathomechanism is probable, patients having FMR1 gene alteration were excluded from all subsequent genetic analysis and phenotypic cluster analysis. The 101 ASD-associated genes (Betancur, 2011) were investigated with NGS, which was performed on a MiSeq (Illumina, San Diego, CA, United States) using the TruSight Autism Rapid Capture Kit (Illumina, San Diego, CA, United States) and the SureSelect QXT Kit (Agilent Technologies, Santa Clara, CA, United States) according to the manufacturer's instructions. In the autism panel, 24 samples were multiplexed in a single run using the MiSeq reagent kit v2 and 300 cycles (Illumina, San Diego, CA, United States). The mean read depth was 135× in the gene panel, 20× coverage was achieved in a minimum of 90% of target regions. Pathogenic and likely pathogenic mutations from NGS data were validated by Sanger sequencing. Parents were Sanger sequenced for specific variants in selected cases, where syndromic ASD was suspected based on the sequencing results of the index case, and the parent was available for genetic analysis.

Bioinformatic and Statistical Analysis
Raw sequences were filtered with Picard tools (version-2.1.0) 2 and quality filtered reads were aligned to the hg19 reference genome with BWA-mem (Li and Durbin, 2009) using default parameters. Variant calling was performed using GATK HaplotypeCaller (version 3.3-0) (McKenna et al., 2010) and VCF files were annotated with SnpEff (version 4.1) (Cingolani et al., 2012). We analyzed only those variants that were found in the canonical transcripts of the gene. Variant quality was assessed by GATK, and only variants, which were flagged as PASS (RD >10, Mapping quality >40, quality by depth >2) were analyzed.
To filter potentially causal single-gene Mendelian variations on a case-by-case level, we used the VariantAnalyzer software developed at the Budapest University of Technology and Economics. This software application annotates SNPs and short INDELs with several types of annotations, such as their predicted function on genes using SnpEff, observed allele frequencies in several genomic projects including the 1000 Genomes Project and the ExAC, conservation scores based on PhyloP, PhastCons, GERP, predicted function of non-synonymous SNPs using dbNSFP, and disease associations with HGMD and ClinVar. By creating filter cascades based on these annotations and other information (e.g., genotypes and variant quality annotations), the software can easily be used to filter the variants through a user-friendly graphical interface. First, we filtered for variants known to be disease-causing, using the HGMD (Cooper et al., 1998), and ClinVar database (Landrum et al., 2014). Second, we filtered for rare variants based on the MAF and frequency of the mutation in our NGS data repository of 200 WES. Since large-scale genomic data of the Hungarian population is not available, a mutation with a low MAF may also be populationspecific. We labeled a variant as rare if it was present maximum in one homozygous or two heterozygous samples within our ASD cohort of 174 patients (equal to a MAF-cutoff of ∼5 ), and the MAF in Europeans from the 1000 Genomes and ExAC databases, as well as in our in-house exome database was less than 5%. It is important to note that a limitation of this method is that it may exclude identification of founder mutations and disease-associated polymorphisms. Finally, mutations were prioritized based on their predicted effects. Exonic frameshifts, stop mutations and canonical splice site variants were considered damaging, whereas the effects of missense mutations were predicted using multiple prediction tools: SIFT (Sim et al., 2012), Polyphen2 (Adzhubei et al., 2010), CADD (Kircher et al., 2014), Radial SVM (Dong et al., 2015). The variants were assessed as recommended by the ACMG guideline 2015 (Richards et al., 2015). When siblings were analyzed for Mendelian diseasecausing variants, both of them had to be carrier of the variant in question to be considered.
For the analysis of rare variants in a multifactorial hypothesis framework on a cohort level, the following methods were used.
(1) We tested, whether the total number of detected rare missense or loss of function (stop, canonical splice site, and frameshift) variants in a given gene is greater than expected, with the method described by Rao and Nelson (2018). We filtered rare variants, with a MAF cut-off of 5% in the 1000 Genome European dataset, and in our internal exome database of 200 patients. P-value was calculated with the associated software: SORVA 3 . (2) For the calculation of rare variant burden, genes were normalized according to genic intolerance to mutation (Petrovski et al., 2013). Specifically, we used the inverse RVIS percentile [1-(RVIS percentile÷100)] accessed from http://genic-3 https://sorva.genome.ucla.edu/ Parent's education level College or higher at both parents 34% College or higher at one parents 27% High school or lower at both parents 39% The table describes the characteristics of the cohort. Ethnicity is self-reported. M/F is male to female ratio.
intolerance.org/, to give a weight to every gene than summed the number of variants in a given patient [ (variants × weightedgene-score)]. For example, if a given gene was in the 2% percentile with RVIS, that means, that the gene got a weight of 0,98. When performing variant burden analysis we allowed to analyze siblings separately, because different rare variants might contribute to their phenotype, according to our hypothesis. Linear regression was used then to test for correlation between rare variant burden and autism severity, and rare variant burden vs. minor malformation burden. Autism severity was assessed by the total ADOS score, in patients, at whom ADOS was available (N = 47), and also by calibrated severity score using the method described by Gotham et al. (2009). For comparison of rare variant burden in males versus females, and the number of minor malformations in syndromic versus non-syndromic cases two-tailed T-test was used.
(3) For the analysis of rare variant association with potential autism subphenotypes first, we assessed, whether such subphenotypes can be created based solely on the clinical data. We have used our clinical questionnaire containing 149 questions about family history, concomitant diseases, drugs, physical examination (neurologic and screening of minor malformations), and psychological status for cluster analysis. For the phenotypic cluster analysis, given our sample size and the low expected number of clusters, we utilized two kernel-based methods, namely kernel PCA and spectral clustering. Kernel methods have the additional benefit of being non-linear, i.e., able to identify non-linear combinations of clinical variables as relevant features. This requires the definition of a kernel matrix, which can be thought of as a pairwise similarity matrix over the samples, for which we used the mutual information as similarity measure in conjunction with the binarized clinical variables. In accordance with earlier studies, we set the number of clusters in spectral clustering to four. We also investigated the variables characterizing each cluster via computing the relative frequency of the presence of each feature. We performed the kernel PCA with 3 dimensions to visualize the transformed samples and cluster assignments. A natural question to ask is whether we can find a correlation between the subphenotypes and the genetic background, hinting at causal roles of genes in a given phenotype. In our case, the former is given by the clusters resulting from the questionnaire and the latter is given by the detected rare variants aggregated on candidate genes. To assess the correlation between the subphenotypes and genetics, we investigated whether detected rare variants of a candidate gene occur more frequently in either of the resulting clusters using ANOVA and pairwise T-tests, in conjunction with Bonferroni correction for multiple hypothesis testing.

Phenotypic Description of the Cohort
One hundred and seventy-four individual with ASD [45 females and 129 males, median age = 6 years, interquartile range (IQR) = 7] were included in our study, including 5 sibpairs. Clinical-demographic features of the cohort are presented in Table 1.
Minor somatic abnormalities were frequent, at least one minor abnormality was present in 84% of the cohort from the Méhes scale. Histogram of the number of minor malformation per individual and the frequency of different minor malformations are shown in Figure 1. In general, molecularly diagnosed syndromic patients (13 cases, see next section) had on average no more minor malformations compared to non-syndromic cases (average number of minor malformations were 4.9/person in syndromic, and 5.03/person in non-syndromic cases, T-test p = 0.91). Strongest indicators of molecularly provable syndromic autism were ID, epilepsy or other neurological sign (such as ataxia), or a specific constellation of minor abnormalities as in the case of Fragile-X (FXS) and CHARGE syndrome.

Syndromic Forms of ASD
We have diagnosed 13 syndromic autism cases based on genetic findings ( Table 2). Four patients were diagnosed with Fragile X syndrome, detected by screening, but these FIGURE 2 | Family history of the probands. The figure is showing the probands with positive family history. Every row represents a single patient. Light red indicates, when a second-degree relative was affected with a given condition, and dark red indicates, when a first-degree relative was affected. Hierarchical clustering was applied on the heat map for better visualization. ADHD, attention deficit hyperactivity disorder; ASD, autism spectrum disorder; BPD, bipolar disorder; ID, intellectual disability; OCD, obsessive-compulsive disorder.
patients also had phenotypes consistent with FXS besides the definite autism. However, one of them is a girl (P1). In this case, the family history (2 healthy brothers), did not suggest an X-linked inheritance. Overall eight mutations were considered as pathogenic according to ACMG classification, with close phenotype match (four patients with FXS, one patient with Dravet syndrome, one patient with CHARGE syndrome, one patient with Duchenne muscular dystrophy, one patient with atypical Rett syndrome), and clear additional features (ID, epilepsy, muscular dystrophy, and multiple minor malformations) suggesting syndromic autism. The muscular dystrophy at the Duchenne patient (P7) was already known before the NGS study.
One likely pathogenic mutation occurred in the PTPN11 gene. The phenotype in this case (P9) was not typical for Noonan syndrome, however, the detected variant was reported earlier as pathogenic in ClinVar (RCV000037618.3) by a single submitter, the variant is rare (absent from 1000G and ExAc database, single occurrence in the cohort, absent from 200 Hungarian WES), predicted consequently as damaging by multiple prediction algorithms (SIFT, Polyphen, CADD, and MetaSVM), the gene is associated with dominant inheritance, and the variant is located in an evolutionarily conserved region (GERP score 5.13). Segregation analysis was not possible in this family. We detected two likely pathogenic variants in the RELN gene with different phenotypes. A heterozygous RELN variant (NM_005045: p.P672L) was present in a patient (P10) with epilepsy and multiple minor anomalies. This variant is also reported as pathogenic in the ClinVar database (RCV000193679.1). Segregation analysis proved that the variant is inherited from the father, who had no epilepsy, however, incomplete penetrance is possible according to the literature (OMIM:616436). At the case of P11, a de novo variant occurred in the RELN gene. The lissencephaly phenotype is associated with autosomal recessive inheritance according to the databases, but we classified this variant as likely pathogenic instead of a VUS, because of its proven de novo status. To rule out other possible genetic causes, an additional commercial lissencephaly panel testing was also performed (at Centogene, Rostock, Germany) which gave the same result.
Two variants were classified as VUS. At the case of P12, the detected SPAST variant is reported as likely pathogenic in the ClinVar (RCV000199081.1), however, neither the patient nor the mother, who is also a carrier of the variant, have spasticity. In the case of P13, an unusual molecular event occurred. Two single nucleotide variants affected a single codon in the AUTS2 (KIAA0442) gene (as proved by visualizing BAM files, and Sanger sequencing), resulting in p.E814M amino acid change. The phenotype is consistent with the literature (OMIM:615834), however, segregation analysis was not possible in the family. This patient died later as a consequence of a severe epileptic seizure. In total four variants was proved as de novo, and five variant was inherited. At four cases segregation analysis could not be performed in order to determine, whether the variant was de novo or inherited. The graphical representation of detected mutations in syndromic cases are presented 4 in Figure 3.

Cohort Level Number of Rare Variants in the Candidate Gene Panels
The total number of different rare variants (as defined in the section "Materials and Methods"), was 370. Among the 101 candidate gene, 80 genes contained rare variants, and 44 genes contained a rare predicted pathogenic variant (CADD score ≥ 20 OR SVM = damaging). These 44 genes are represented in Figure 4. Loss of function mutations occurred in 8 genes: AUTS2, CHD7, DHCR7, DMD, GNA14, MECP2, SHANK2, and SHANK3 gene. Among these genes, AUTS2, CHD7, SHANK2, and SHANK3 can be considered highly intolerant to functional variants (RVIS < 5 percentile), which suggest a pathogenic role for these genes. A table describing the number of different types of variants (missense, nonsense, frameshift INDEL, and nonframeshift INDEL) in all the investigated genes is provided as Supplementary Table S1.
We have used the software SORVA, with the method described by Rao and Nelson (2018) to assess whether the total number of rare missense and loss of function variants is greater than expected in the tested genes. Significantly greater than expected number of rare variants were detected in 5 of the 101 tested autism-linked genes: AUTS2, NHS, NSD1, SLC9A9, and VPS13B (Table 3). Rare variant burden did not differ between females and males (mean = 3.49 for males and 3.31 for females; p = 0,57). There was 67 individual (38.5%) in the cohort who did not carry any rare, predicted damaging variant in the candidate genes.

Correlation of Rare Variant Burden With Autism Severity
Autism severity was calculated by two methods: total raw ADOS score (available from 47 patients), and calculating calibrated severity score from ADOS raw total as described in the Section "Materials and Methods." For the purpose of correlation analysis, we found that the calibrated severity score was not suitable, because most of the patients, who had ADOS fall into the most severe categories (8-10 score). ADOS raw total scores were better distributed among individuals (Figure 5), however, there was no correlation between ADOS raw total scores and rare variant burden by linear regression analysis (R 2 = 0,0047; p = 0,648) ( Figure 5). There was no correlation between rare variant burden and the number of minor malformations neither (R 2 = 0,003; p = 0,51).

Cluster Analysis and Gene Enrichment in the Phenotypic Clusters
The identified clusters and the most frequent 10 characterizing variables are shown in Figure 6. It can be seen that total 22 feature is involved in the 10 most common features of the clusters, thus significant overlap exists between the clusters. However, there are features, that particularly characterize certain clusters. Cluster_3 might be characterized by severe social disturbances, with prominent speech disturbance and social isolation. Cluster_4 is probably fitting into the ASD with comorbid ADHD category. FIGURE 4 | Genes, which contained rare, predicted pathogenic variants. Gene network representation of the genes, which harbored rare, predicted variants was created with STRING (https://string-db.org/). Edges represent interactions between the genes. It can be seen, that the interaction network contains a set of genes strongly connected (namely involved in nervous system development pathway GO:0007399; p = 3.22e-15), however, this is only suitable for demonstration purposes since the panel selection is a bias in the pathway analysis.
Cluster_2 is characterized by ID plus frequent obsessivecompulsive (OCD) features with uncompensated behavior. Cluster_1 is the hardest to interpret, but one outstanding feature is the presence of BPD in the family and more than one minor physical anomaly. However, gene enrichment analysis did not show a significant overrepresentation of single genes in certain clusters. Syndromic cases were not enriched into a single cluster.

DISCUSSION
According to the literature, the overall heritability of ASD is estimated to be around 64-91% (Tick et al., 2016), suggesting a strong genetic factor in the background, but the underlying genetics seems to be very complex. The current models of autism genetics implicate a miscellaneous genetic environment, where common and rare variants act additively to create a risk for ASD (Weiner et al., 2017). According to Bourgeron's hypothesis, common low-risk alleles set the background risk for ASD, and buffer the effects of rare deleterious variants (Bourgeron, 2015).
While it is estimated that common variants account for 40-60% of the overall liability of ASD (Klei et al., 2012), most genome-wide association (GWAS) studies were until now underpowered, and very few loci have been identified with genome-wide significance (Ramaswami and Geschwind, 2018). In contrast, hundreds of genes were described with rare variants, which significantly increase the risk of ASD, and it is estimated that they account for approximately 10-30% of autism cases The table shows the genes, in which a significantly greater number than expected rare variant occurred. P-value was calculated with the SORVA software (MAF cutoff 5%). We summed the variants (heterozygous or homozygous) in the given gene in unrelated individuals (variant/indiv). In sib pairs, those variants were counted, which were shared (shared_variant/sib pairs). (Fernandez and Scherer, 2017). Examples of evidence are an increased ASD risk with single rare variants of NLGN1 (Nakanishi et al., 2017), VPS13B (Ionita-Laza et al., 2014), GABRB3 (Delahanty et al., 2011), and PTCHD1 (Torrico et al., 2015). However, the contribution of these variants on a cohort level might be small. For example, a large WES study identified the strong risk factor gene CHD8, but pathogenic variants were only present in 15 out of 3,730 ASD cases (0.4%) (Bernier et al., 2014). For this reason, WES studies need very large sample sizes to reach a genome-wide significance (Liu et al., 2013). A possible solution to increase detection rate in WES studies is focusing on genes within shared haplotypes among individuals with ASD (Matsunami et al., 2014) or within regions that have been implicated by common variant GWAS analysis in large case-control cohorts, and collapsing individually rare variants on genes (burden-analysis) (Griswold et al., 2015). In this study we utilized the method and software, developed by Rao and Nelson (2018), to test the gene level burden on a cohort level. This approach allowed us to use a general control group from earlier exome sequencing projects. In order to avoid population-specific variants, we excluded variants over 5% MAF in our in-house exome database. Significantly greater than expected number of rare variants were detected in five genes (AUTS2, NHS, NSD1, SLC9A9, and VPS13B), which confirm their role as a strong risk modifier gene. Due to our low cohort size, however, we could not confirm the roles of other candidates, or well-known ASD risk genes. We also tested, whether rare variant burden is correlated with autism severity, or the number of minor somatic abnormalities, but found no correlation. However, this might also be due to our low sample size.
The change of the view on the AUTS2 gene is highlighting an important issue with rare variant interpretations in a complex disorder, like ASD. As knowledge was gathered, the view of AUTS2 gene changed from an "autism susceptibility candidate gene" to a cause of a syndromic form of ASD ("AUTS2 syndrome"), with frequently associated ID, microcephaly and craniofacial abnormalities (Hori and Hoshino, 2017). We speculate, that this might also happen with other genes too. In our study, we identified loss of function mutations in the GNA14, SHANK2, and SHANK3 genes, which are yet not linked to a characterized syndrome, but they are evolutionary intolerant to functional mutations. This makes genetic counseling very difficult in these cases. Further problems arise by the fact that ACMG interpretation is not suitable for the assessment of these genes (Richards et al., 2015), so there is no standardized framework for pathogenicity interpretation.
Many of the rare variants, detected in ASD studies, are de novo, according to the literature (O'Roak et al., 2011), and might act together with common polygenic risk variants, in a way, where rare variants modulate the ASD phenotype. Weiner et al. (2017) study showed that ASD cases with a strong acting de novo variant also carry a greater than expected polygenic risk, independently from the presence of the de novo variant. However, the presence of a de novo variant associated with an increased risk of co-occurring adverse neurodevelopmental outcomes (delayed walking, seizure, and ID). In our study, four of the 13 variants, linked to a syndromic autism case, were proved to be de novo. However, we could not perform segregation analyses in the whole cohort, for non-syndromic cases, to check whether the detected rare variants are de novo or not, therefore cannot draw further conclusions.
Due to the known challenges of phenotypic heterogeneity and variability of clinical presentation, identifying phenotypic subgroups has been a long-standing goal in ASD research (Prior et al., 1998). Previous studies indicate that objective approaches, such as dimensionality reduction methods (Tadevosyan-Leyfer et al., 2003) and cluster analysis (Verté et al., 2006;Hu and Steinberg, 2009;Hu and Lai, 2013) can be utilized in establishing homogeneous subgroups based on clinical features. In particular, linear methods, such as PCA, k-means and hierarchical clustering have been applied. Studies showed that the resulting clusters indeed characterized by certain symptoms (such as severe language deficits) and correlate with gene expression profiles (Hu and Steinberg, 2009). In this study, we identified four reasonable phenotypic clusters by spectral method, but could not find genes, which are enriched particularly in a given cluster. Syndromic cases were also not enriched into a single cluster.
The main purpose of phenotypic cluster analysis would be to potentially identify homogenous subgroups (endophenotypes), which may be linked to specific genetic background, pathogenesis, perhaps therapeutic approach. The phenotypic subgroups identified in this study show partial overlap with former studies. One of the strongest characterizing feature, which seems to be recurring in more studies is the presence of language impairment. Indeed it is also specified in the DSM-V (ASD with accompanying language disorder). Further distinctive features, which are clearly identifiable are association of ID, attention deficit and hyperactivity, and obsessive-compulsive features. It worth to mention, that many earlier attempts to identify ASD subgroups were mainly based on standard diagnostic questionnaires (Tadevosyan-Leyfer et al., 2003;Hu and Steinberg, 2009;Kienle et al., 2015;Cholemkery et al., 2016). The strength of this approach, that it utilizes existing resources to demonstrate, that subgroups may also be delineate on a spectrum disorder. However, there is a question whether this can provide new information. There FIGURE 5 | Autism severity scores calculated from ADOS, and the correlation between rare variant burden vs. ADOS scores. Panel (A) shows the scatter plot of ADOS raw total scores (blue points), and the calibrated severity scores (red points). Calculated scores were distributed only between 7 and 10, thus raw total scores were used for linear regression analysis (B).
is an opportunity also to use other resources too, such as electronic health records (Doshi-Velez et al., 2014), or brain imaging data (Hrdlicka et al., 2005). In our approach we used parental interview, and data on physical examination. On one hand we also have seen that language impairment, attention deficit, hyperactivity, and OCD features are important distinctive features. On the other hand we identified an additional cluster, where somatic malformation might be an outstanding feature. However, this need to be confirmed in larger samples, and possibly beneficial is a combined method (i.e., using standardized questionnaires together with additional data).
Besides the patients with rare genetic variants acting as a strong risk factor, a small group of ASD is called syndromic, where a single genetic variation (such as FMR1 repeat expansion, PTEN, TSC1, or MECP2 mutation) is considered as causal (Fernandez and Scherer, 2017). It is estimated that approximately 10-25% of individuals with ASD carry a highly penetrant genetic alteration, which might explain the phenotype (Bourgeron, 2015). In our cohort, we identified 13 syndromic patients (7,47% in total and 5,17% if we don't consider Fragile X patients detected by screening). This is somewhat lower than the above-mentioned average, however, we only analyzed single nucleotide variants and small INDELs, which is a limitation of this study. In a recent study, microarray, Fragile X testing and targeted gene panel testing was performed in 100 children with ASD (Kalsner et al., 2018). Copy number variants believed to contribute to ASD risk were identified in 12%, but the gene panel did not increase the diagnostic yield. It has to be mentioned although, that rare variants, not considered as causing a syndrome, might also significantly modify the phenotype (such as IQ, head size) (Pizzo et al., 2018). This again draws the attention to the issue that the clinical interpretation of these variants may significantly differ among laboratories. It has to be mentioned also, that none of the identified syndromic cases was unexpected, in a sense that all of them had additional signs suggesting syndromic autism ab ovo.
Potential clinical translation of genetic diagnosis can be manifold. In the syndromic group it helps to estimate recurrence risk, the prognosis, potential specific comorbidities, and therapeutic approaches. Taking these into account, wellproved syndromic ASD-associated genes need to be investigated at least in the clinically defined ASD-group. The phenotypic clustering can potentially help to correctly identify which group has to be tested. In the group of non-syndromic ASD the indication for genetic investigation is currently less clear. In this group the simultaneous effect of the different common and rare variants should be estimated, which is currently dubious. Polygenic risk scores are emerging as a tool to predict disease risk in multifactorial diseases, but have many potential bias (De La Vega and Bustamante, 2018;Torkamani et al., 2018). Even so polygenic risk score show promise, for example earlier studies showed that ASD polygenic risk score is positively correlated with general cognitive ability (Clarke et al., 2016), or with specific fMRI connectivity pattern (Wang et al., 2017). However, we are far away from everyday clinical utilization.
Certain limitations of the study should be mentioned. The size of the cohort is perhaps the most important one. As we looked specifically for rare variants in a relatively small cohort, the probability of finding variants with very low MAF is low. This could explain our result, that only five gene had significantly more rare variant burden than expected by spontaneous mutation rate, gene size and genic intolerance. This can result in false negative findings. False positive findings may also arise due to small sample size, however, this is mitigated by the fact, that well established ASD-genes were included in the panel. However, this approach has also disadvantages. The low number of total variant detected made impossible to analyze ethnicity and cryptic relatedness from the genetic data. On the other hand selection of a gene panel always contains a bias, since the number of genes linked to ASD is around 1000 according to the SFARI database 5 . However, many of these do not have an associated Mendelian disorder. Finally we did not carry out functional experiments in this study, so cannot distinct clearly between rare non-functional and rare functional variants. We used protein prediction scores to assess the probability of functional impact of a variant.

CONCLUSION
In this study, we performed an analysis of rare single nucleotide and small INDEL variants in a Hungarian ASD cohort, detected by NGS panel testing, in order to identify syndromic autism cases and to assess the contribution of rare variants in formerly established ASD genes on a cohort level. Our study indicates that NGS panel gene sequencing can be useful in dedicated cases, where the clinical picture suggests a clinically defined syndromic autism (i.e., associated ID, epilepsy, neurological signs, a certain pattern of somatic malformation, or positive family history). In this group, targeted panel sequencing may provide reasonable diagnostic yield. However, the necessity of unselected NGS panel screening in the clinic remains controversial, because of uncertain clinical utility, and difficulties of the variant interpretation. The detected rare variants may still significantly influence autism risk and subphenotypes in a polygenic model. However, to detect the effects of these variants large cohorts are needed. As knowledge will increase about the contribution of these rare variants on the phenotype, an individual assessment might also be beneficial in the future for personalized management of patients with ASD. 5 https://gene.sfari.org/

ETHICS STATEMENT
Written informed consent was obtained from the parents of the patient, or over 18 years of age, directly from the patients. This study was performed in accordance with the Helsinki Declaration of 1975 and was approved by the Hungarian Research Ethics Committee (44599-2/2013/EKU).

AUTHOR CONTRIBUTIONS
MM, PB, and NV designed the study. NV and PB performed data collection and physical examination of the patients. KP performed the next generations sequencing. PB and NV analyzed the NGS data. AGé provided the bioinformatical platform to analyze the NGS data and provided bioinformatical support for the study. RB and AGá made the validation by Sanger sequencing and the segregation analysis of the family members. CP performed the neuropsychological testing of the patients. BB, PB, and VM performed the statistical analyses. PB, NV, and MM wrote the manuscript. MM coordinated the research team, led the manuscript preparation, and revised and corrected the manuscript. All authors read and approved the final manuscript.

ACKNOWLEDGMENTS
We are grateful for the children for their participation in this study, and also for their parents. We thank the contribution of the nurses (Margit Kovács and Izabella László), and the Vadaskert Foundation for making possible to conduct this study.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00434/full#supplementary-material  Table only includes rare variants, as defined in the Section "Materials and Methods." n/a, not applicable. Next generation sequencing is not suitable for detection of repeat expansion events. Repeat expansions were only investigated in the FMR1 gene, in order to diagnose Fragile-X patients.