A Dataset of 26 Candidate Gene and Pro-Inflammatory Cytokine Variants for Association Studies in Idiopathic Pulmonary Fibrosis: Frequency Distribution in Normal Czech Population

Idiopathic pulmonary fibrosis (IPF) is a specific form of chronic, progressive fibrosing interstitial pneumonia with poor diagnosis and a median survival of 2–3 years from initial diagnosis (1, 2). The cellular inflammation drives the fibrotic response in lung and plays a major role in IPF pathogenesis (3). Inflammatory cells (majorly, type 2 alveolar epithelial cells) release TGF-β, the key mediator of pulmonary fibrosis, that regulates several profibrotic cytokines/chemokines, their receptors, receptor subunits, and growth factors inducing process of epithelial–mesenchymal transition (EMT) (3, 4). Among the pro-inflammatory and profibrotic cytokines involved in IPF pathogenesis, interleukin (IL)-1 (4), IL-1β (5), IL-4, IL-5 (6), IL-6Rα (4), angiogenic IL-8/CXCL-8 (7), IL-13, its receptor IL-13 Rα2 (8), and IL-33 (9) have been implicated in accelerated inflammation and irreversible damage to lung architecture with loss of alveolar-capillary barrier basal membrane leading to persistent fibrosis. Genes encoding these factors exhibit nucleotide variation that could affect the severity of immune/inflammatory reactions and extent of any subsequent dysregulated fibroproliferative activity in disease development. Furthermore, variants in mucin-encoding genes (10–13) and in genes for pathogen-associated molecular patterns (PAMPs) receptors of innate immunity known as toll-like receptors (TLRs) (14, 15) have been also implicated in IPF immunopathogenesis and related to rapid progression of the disease. Investigations of these “candidate” gene variant(s) e.g., in case-control association studies may, therefore, provide novel insight into underlying mechanism of IPF susceptibility/disease outcome and, further, may aid to develop novel diagnostic approaches and eventually therapeutic interventions based on genetic information (16). 
 
A candidate gene study typically involves genotyping 5–50 single nucleotide polymorphisms (SNPs) within gene(s) for its coding and non-coding/regulatory regions (17). Irrespective of the number of tested gene variants; for a standard conductance, data collection, and transparent reporting of a genetic association study, the recommendations of STrengthening the REporting of Genetic Association studies (STREGA) and STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) should be considered (16). In case-control studies, knowledge of frequency distribution of candidate gene loci/variants among normal (healthy control) population(s) is necessary and could be useful also for genetically related population(s) to determine the gene variants associated with disease and/or its clinical course. 
 
The role of inflammatory and profibrotic mechanisms involving gene variation has been investigated in IPF and spectrum of susceptible polymorphic gene variants, including those in genes of immune reactions and signaling processes, have been recently reported from both genome-wide association studies (GWAS) and population-based case-control studies (14, 18, 19) performed mostly in US Caucasians, and also in some other ethnicities. The nominated gene variants, summarized in Table ​Table1,1, are of different functions and implicate some yet-unanticipated pathways in IPF pathogenesis, including endoplasmic reticulum stress and unfolded protein response, cellular senescence, DNA-damage response, and already known Wnt–β-catenin signaling (20). The distribution of SNPs may greatly differ with populations (ethnicities), for example, a high frequency of MUC5B rs35705950*T allele in IPF cases is observed among European-Americans (14–34%) (21, 22), while its low frequency is characteristic for Asians, such as Chinese (3.3%) (23), Japanese (3.4%) (24), and Korean (1.0%) cohorts (11). Similarly, MUC2 rs7934606*A allele exhibit frequency of 41% in Europeans and 1% in Asians (1000 Genomes Project Phase 3 allele frequencies). Furthermore, in context of participation of more than one gene in IPF pathogenesis, it will be important to analyze multiple susceptible gene variants. The approach of analyzing common and rare genetic factors in IPF susceptibility may provide novel insights into IPF and it could also be helpful in identifying population-specific rare variants, predominant panel of candidate gene variants for IPF risk and in understanding the basis of variable disease severity or progression among different populations. 
 
 
 
Table 1 
 
List of candidate SNPs investigated in the study. 
 
 
 
No complex data have been yet reported on IPF-related variants in Slavonic populations, including Czechs. Starting our investigations of plausible multiple IPF susceptibility polymorphisms primarily in Czech and also related populations, we adopted allele-specific MALDI-TOF mass spectrometry-based SNPs genotyping assay for determination of gene variation in the relevant targets. Several IPF susceptible SNPs in genes of various functional categories were multiplexed, and in the first phase genotyped in probands from normal (healthy) Czech population using Sequenom MassARRAY platform. In the current dataset manuscript, we, besides genotyping methodology, report the genotype, allele, and phenotype (carriage rate) frequencies for plausible IPF susceptibility variants among normal population of Czech Republic of Western Slavonic (Caucasian) ancestry.


Introduction
Idiopathic pulmonary fibrosis (IPF) is a specific form of chronic, progressive fibrosing interstitial pneumonia with poor diagnosis and a median survival of 2-3 years from initial diagnosis (1,2). The cellular inflammation drives the fibrotic response in lung and plays a major role in IPF pathogenesis (3). Inflammatory cells (majorly, type 2 alveolar epithelial cells) release TGF-β, the key mediator of pulmonary fibrosis, that regulates several profibrotic cytokines/chemokines, their receptors, receptor subunits, and growth factors inducing process of epithelial-mesenchymal transition (EMT) (3,4). Among the pro-inflammatory and profibrotic cytokines involved in IPF pathogenesis, interleukin (IL)-1 (4), IL-1β (5), IL-4, IL-5 (6), IL-6Rα (4), angiogenic IL-8/CXCL-8 (7), IL-13, its receptor IL-13 Rα2 (8), and IL-33 (9) have been implicated in accelerated inflammation and irreversible damage to lung architecture with loss of alveolar-capillary barrier basal membrane leading to persistent fibrosis. Genes encoding these factors exhibit nucleotide variation that could affect the severity of immune/inflammatory reactions and extent of any subsequent dysregulated fibroproliferative activity in disease development. Furthermore, variants in mucin-encoding genes (10)(11)(12)(13) and in genes for pathogen-associated molecular patterns (PAMPs) receptors of innate immunity known as toll-like receptors (TLRs) (14,15) have been also implicated in IPF immunopathogenesis and related to rapid progression of the disease. Investigations of these "candidate" gene variant(s) e.g., in case-control association studies may, therefore, provide novel insight into underlying mechanism of IPF susceptibility/disease outcome and, further, may aid to develop novel diagnostic approaches and eventually therapeutic interventions based on genetic information (16).
A candidate gene study typically involves genotyping 5-50 single nucleotide polymorphisms (SNPs) within gene(s) for its coding and non-coding/regulatory regions (17). Irrespective of the number of tested gene variants; for a standard conductance, data collection, and transparent reporting of a genetic association study, the recommendations of STrengthening the REporting of Genetic Association studies (STREGA) and STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) should be considered (16). In case-control studies, knowledge of frequency distribution of candidate gene loci/variants among normal (healthy control) population(s) is necessary and could be useful also for genetically related population(s) to determine the gene variants associated with disease and/or its clinical course.
The role of inflammatory and profibrotic mechanisms involving gene variation has been investigated in IPF and spectrum of susceptible polymorphic gene variants, including those in genes of immune reactions and signaling processes, have been recently reported from both genome-wide association studies (GWAS) and population-based case-control studies (14,18,19) performed mostly in US Caucasians, and also in some other ethnicities. The nominated gene variants, summarized in Table 1, are of different functions and implicate some yet-unanticipated pathways in IPF pathogenesis, including endoplasmic reticulum stress and unfolded protein response, cellular senescence, DNA-damage response, and already known Wnt-β-catenin signaling (20). The distribution of SNPs may greatly differ with populations (ethnicities), for example, a high frequency of MUC5B rs35705950*T allele in IPF cases is observed among European-Americans (14-34%) (21,22), while its low frequency is characteristic for Asians, such as Chinese (3.3%) (23), Japanese (3.4%) (24), and Korean (1.0%) cohorts (11). Similarly, MUC2 rs7934606*A allele exhibit frequency of 41% in Europeans and 1% in Asians (1000 Genomes Project Phase 3 allele frequencies). Furthermore, in context of participation of more than one gene in IPF pathogenesis, it will be important to analyze multiple susceptible gene variants. The approach of analyzing common and rare genetic factors in IPF susceptibility may provide novel insights into IPF and it could also be helpful in identifying population-specific rare variants, predominant panel of candidate gene variants for IPF risk and in understanding the basis of variable disease severity or progression among different populations.
No complex data have been yet reported on IPF-related variants in Slavonic populations, including Czechs. Starting our investigations of plausible multiple IPF susceptibility polymorphisms primarily in Czech and also related populations, we adopted allele-specific MALDI-TOF mass spectrometry-based SNPs genotyping assay for determination of gene variation in the relevant targets. Several IPF susceptible SNPs in genes of various functional categories were multiplexed, and in the first phase genotyped in probands from normal (healthy) Czech population using Sequenom MassARRAY platform. In the current dataset manuscript, we, besides genotyping methodology, report the genotype, allele, and phenotype (carriage rate) frequencies for plausible IPF susceptibility variants among normal population of Czech Republic of Western Slavonic (Caucasian) ancestry. Dataset for IPF susceptible loci Frontiers in Immunology | www.frontiersin.org

Characteristics of the Study Group
Ninety-six unrelated healthy subjects (45 males, 51 females), free of any disease as assessed by physician's enquiry about their personal and family history, were enrolled. The mean age ±standard deviation was 34.5 ± 8.9 years and ranged from 18-57 years. All were Caucasians, and as assessed by surname and tracking personal history of Czech (Western Slavonic) ancestry living in Moravian region of the Czech Republic. All probands were informed about the purpose of the study and provided informed consent; the study was realized with the approval of the institutional ethics committee. Genomic DNA was isolated from peripheral blood leukocytes by standard salting out method (30).
A list of reference single nucleotide polymorphism database ID (rsSNP) was prepared for IPF-associated gene variants identified from available literature (GWAS and case-control studies) ( Table 1)

assay Design, pCR amplification, and Genotyping
For online genotyping assay design of multiplexed SNPs using ADS v2.0, an input list of SNP IDs (rsSNP of each target) was provided and following steps were followed: (1) the sequence for each rsSNP was retrieved from database and formatted accordingly, (2) the proximal SNPs for each rsSNP were identified from database, and (3) optimal primer areas were identified that result in a unique amplicons containing a target for the extension primer. To avoid extension primer rejection due to insufficient known bases, the proximal base was replaced with inosine. (4) PCR and extension primers were designed and checked for false priming, hairpin/dimer formation. The primer multiplexes with mass separation of analytes (alleles) were created. The PCR primer plex was prepared for PCR amplification, and extension primer mix was prepared for single base extension (SBE)-based iPLEX reaction. The SBE pool plex consists of multiplexed primers that anneal adjacent to polymorphic site for each reaction present together in the multiplexed assay pool. Thus, several individual DNA polymorphisms with their corresponding SNP sites could be analyzed in a single reaction. Due to the inverse relationship between peak intensity and extend primer mass, the extension primers in iPLEX assays were adjusted for concentration to ensure the possible equal intensity of extension primers.
For multiplexed PCR amplification and genotyping assay, protocol described in literature was followed (31). In brief, multiplexed PCR amplification was performed using 10 ng DNA template. A cleanup reaction for amplicons plex was performed with shrimp alkaline phosphatase (SAP) mix. These SAP cleaned amplicons plex were further subjected to SBE reaction with iPLEX mix containing extend primer plex. The iPLEX reaction products were treated with a cationic resin (SpectroCLEAN, Sequenom) to remove salts, such as Na + , K + , and Mg 2+ ions. The desalted iPLEX extend amplicons were dispensed on SpectroCHIPs using MassARRAY ® Nanodispenser RS 1000 station and analyzed on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS)-based MassARRAY analyzer for target allele(s). As assay control of SNP genotyping, a duplicate, positive, and negative control samples were included in each assay plate. In each assay, the specific ddNTP incorporated at target site could be identified with peak representing increase in mass of extend primer. The assay peak spectrum and call cluster plot resulting from MALDI-TOF MS analysis were analyzed with MassARRAY Typer 4.0.20 software that traces primer masses to the assayed alleles.

Statistical analysis
Each SNP was tested for Hardy-Weinberg Equilibrium (HWE) by Pearson's goodness-of-fit, Chi-square (χ2) test. SNPs within HWE (p > 0.05) and sufficiently common (Minor Allele Frequency, MAF > 5%) in the general population were included. Phenotype frequency (carriage rate) was calculated as proportion of individuals carrying one or two copies of a particular allele on one or both (maternal or paternal) chromosomes.

Results
Using online Assay Design Suite v2.0 for primer designing, two plexes were generated. Plexes I and II consisted of 22 and 7 SNPs, respectively. For a successful assay, inosine was used instead of proximal SNP in the extend primer sequence for rs2736100, rs2243250, and rs4277405 of plex I.
In assay control of multiplexed SNP genotyping, SNP rs2395655 in CDKN1A showed low assay-success rate (< 95%) and two SNPs DSP rs2076295 and TOLLIP rs5743890 were found as positive in no template control. These SNPs failed the quality control assessments and were removed from further analysis ( Table 2). The genotyping assays success rates for all other analyzed SNPs were 98-100%. In our Czech healthy control population, all analyzed SNPs were in HWE, except for IL-4Rα rs1801275 that exhibited minor deviation (p = 0.04) reflecting a small anomaly, so the locus was not excluded from analysis ( Table 2).

Discussion
The present dataset reports the genotype distribution, genotype, allele, and phenotype frequency of 26 gene variants involved in immune-related pathomechanisms of IPF in normal Czech population using Sequenom MassARRAY based genotyping platform. Besides the relevance to the delineation of immunogenetic component of IPF, the knowledge of frequency distribution of gene variants in normal populations is of considerable importance for their evaluation as genetic markers in susceptibility, manifestation, prognosis, and potentially treatment of diseases in different populations (32). A SNP rs35705950 in the putative promoter of MUC5B has been shown to exhibit strong association with both familial interstitial pneumonia and IPF (33). The observed rs35705950*T risk-allele frequency of 9% in normal Czech population was in concordance with other reports in normal Caucasians of European-American descents, as 9-11% in American (33), 10% in UK Caucasians (34), 11% in French (22), and 4.3% in Germans (24) populations among Europeans. Interestingly, the MUC5B promoter polymorphism is observed less frequently in normal Asian populations, such as 0.8% in Japanese (24), 0.7% in Chinese (23), and <1% in Koreans (11). Overall, mucin glycoprotein encoding MUC5B has role in normal lung function by regulating immune function, microbial population, airway infection, and mucociliary clearance in lungs (35,36).
Among analyzed cytokines, IL-4 has significant role in IPF pathogenesis by regulating fibroblast functions, such as chemotaxis, proliferation, collagen synthesis, myofibroblast differentiation, and Th1/Th2 equilibrium (19). The angiogenic IL-8 was shown as predictive for early stage of IPF (37) and as poor IPF survival (38). Additionally, IL-13 and IL-13 pathway markers (39) and the innate immune signaling receptor TLR3 have been suggested as potential markers of rapidly progressive form of IPF. Several recent studies have suggested that defective TLRs are linked to dysregulated fibrogenesis and have key role in myofibroblast activation, increased profibrotic cytokines, collagen deposition, fibrosis, and tissue destruction and, thus, promoting the progression of disease during the later phase of IPF (14,15,40,41).
Of the four variants that exhibited absence of homozygous genotypes in this data report: (1) the frequency of TP53 rs12951053 CC genotype has been reported as 6% in Caucasian HC (28), 1.2% in European and Africans and relatively higher in Asian (11.9% in Han Chinese and 11.6% in Japanese) populations (http://snp-nexus.org/temp/snpnexus_10220/results.html); (2) For TP53 rs12602273, CC genotype frequency has been reported as 3% in Caucasian healthy controls (28); (3) For TF rs1799899, AA genotype frequency has been reported as 0.6% in European, and 0.0% among African, Han Chinese, and Japanese populations (http://snp-nexus.org/temp/snpnexus_10168/results.html); and (4) For IL-4 rs2243248, low GG genotype frequency (IL-4, -1098 G/T) has been reported in another independent study for characterization of IL-4 gene polymorphism in a relatively small cohort of IPF patients of same ethnicity (19).
The present findings are widely applicable in IPF genetics research in other related populations as well. In a current research initiatives in immunogenetics by HLA-NET network, a working group for population definitions and sampling strategies in population genetics analyses strongly recommend the usage of geographical and/or cultural criteria (with anthropological considerations) to describe human populations instead of a priori misclassifications of racial and ethnic groups (42). In this context, Central Europe populations have been demonstrated as similar and genetically homogeneous (32,43,44). Therefore, the present findings are relevant for IPF gene case-control studies not only in Czech but also in neighboring populations, namely Slovak and Polish, and also in Germans and Austrians, as we could recently exemplify in preliminary investigations of immune-related IPF susceptible variants in Czech and German population cohorts (10,13).

Conclusion
The present data on a spectrum of 26 gene variants including 10 variants of immune and inflammatory response (cytokines/ chemokines and TLR) and their frequency distribution in normal Czech (Western Slavonic, Caucasian) population has wider application as standard control along with cases in association studies for IPF. It is also relevant in other fibrotic lung diseases among Czech and genetically related/neighboring population(s) and in the wider context for further delineation of the role of immune and inflammatory reactions in this debilitating disease. acknowledgments Grant support: CZ.1.07/2.3.00/30.0041, LO1304, and IGA PU LF_2015_020.