Targeted next-generation sequencing for genetic variants of left ventricular mass status among community-based adults in Taiwan

Background: Left ventricular mass is a highly heritable disease. Previous studies have suggested common genetic variants to be associated with left ventricular mass; however, the roles of rare variants are still unknown. We performed targeted next-generation sequencing using the TruSight Cardio panel, which provides comprehensive coverage of 175 genes with known associations to 17 inherited cardiac conditions. Methods: We conducted next-generation sequencing using the Illumina TruSight Cardiomyopathy Target Genes platform using the 5% and 95% extreme values of left ventricular mass from community-based participants. After removing poor-quality next-generation sequencing subjects, including call rate <98% and Mendelian errors, 144 participants were used for the analysis. We performed downstream analysis, including quality control, alignment, coverage length, and annotation; after setting filtering criteria for depths more than 60, we found a total of 144 samples and 165 target genes for further analysis. Results: Of the 12,287 autosomal variants, most had minor allele frequencies of <1% (rare frequency), and variants had minor allele frequencies ranging from 1% to 5%. In the multi-allele variant analyses, 16 loci in 15 genes were significant using the false discovery rate of less than .1. In addition, gene-based analyses using continuous and binary outcomes showed that three genes (CASQ2, COL5A1, and FXN) remained to be associated with left ventricular mass status. One single-nucleotide polymorphism (rs7538337) was enriched for the CASQ2 gene expressed in aorta artery (p = 4.6 × 10–18), as was another single-nucleotide polymorphism (rs11103536) for the COL5A1 gene expressed in aorta artery (p = 2.0 × 10–9). Among the novel genes discovered, CASQ2, COL5A1, and FXN are within a protein–protein interaction network with known cardiovascular genes. Conclusion: We clearly demonstrated candidate genes to be associated with left ventricular mass. Further studies to characterize the target genes and variants for their functional mechanisms are warranted.


Downstream analyses and related quality control procedures
Re-call variants per sample using proper information of targeted genes, and merge individual calling result into one big matrix (n samples X m variants; two sub-steps in GATK4), then filter variants based on following criteria: keep variants with DP >=60, and replace missing data to reference, and split vcf into two files (SNP or Indel), then perform annotation and SNP Conversion, annotate variants and convert to snpMatrix (ATCG type) for further biostatistical analyses.
The following bioinformatics procedures for the quality assessment of the study samples: Quality control: Our data quality is good, shown by the depths of sequencing are more than 30 ( Figure S2). The FASTQC file adapter content plot showed a good pattern ( Figure S3). And the mean quality score for FASTQC is good ( Figure S4). We also checked the GC contents (%) in per sequence in our samples, and the results are acceptable ( Figure S5). We checked the depths of the study samples, and found the mean sequence quality by Phred scores were good ( Figure S6).
The summary table for the quality control of the study sample, by the average values of various indicators, is the following ( Table S2): The average of Q30 was 86.44, and the average of duplicated Reads was 62.31%, with average GC content was 46.86%, indicating a good quality control. The proportions of Q30 values in the study samples were shown in Figure S7. In addition, Figure S8 list the total sequences of the study samples, from 5.0x10 6 to 2.0x10 7 , and the data provided sufficient sequences.
With regards to the alignment rates, our study samples showed an excellent alignment rates (>90%) ( Figure S9). In addition, we checked the coverage rates using the average reads and the depths of the target genes and variants (Figure S10), and we found that the average reads depth were sufficient for further analysis In selecting the variants, we set up the filtering criteria ad depth (DP)>=60, and our data showed that total raw detected variants among 145 samples: 2,514,185, total remained detected variants among 145 samples: 55,294, total remained detected SNPs among 145 samples: 12,842. No remained detected SNPs within the 4 genes (HRAS, KCNA5, ACTA1 and ACTC1) under this filtering criteria (DP = 60) were found. We used the annotation databases including: refGene, avsnp150, ljb26_all (includes PolyPhen2 & SIFT), cosmic70, and exac03.

Variant identification: and SNP Matrix file
We checked the VCF file about the distributions of the SNPs, MNPs, Insertions, Deletions, Indels, and the missing Genotype, SNP transitions/transversions and total heterozygous/homozygous ratio in the study samples, and the data showed acceptable.
Therefore, a total of 145 samples with 12842 variants are annotated successfully and we provided one SNP Matrix file for the variants' information, and the Figure S14 showed the summary plot histograms of the variant classification, variant types, SNV class, and the distribution of variants per sample, and the top 10 mutated genes in the study samples, and the results were acceptable.
Next, we performed the annotation procedures, and the annotation of the variants in the study samples was successful in each target gene. Figure S15 showed oncoplot, the onco-strip figures of the various functional mutations of the target genes in the were shown in Figure S16 and the results were good. We provided the genecloud plot to see the proportions of the target gene variants in the study samples ( Figure   S17).
After bioinformatics platform, we genotyped the 175 genes and a total of 12,287 variants, and the target genes and related frequency and percentage of the study participants in Table S1. The frequency distribution of variants was ranged from 0.01% in CBS gene to 5.8& in TTN gene. Figure S19