Genetic Association of Pulmonary Surfactant Protein Genes, SFTPA1, SFTPA2, SFTPB, SFTPC, and SFTPD With Cystic Fibrosis

Surfactant proteins (SP) are involved in surfactant function and innate immunity in the human lung. Both lung function and innate immunity are altered in CF, and altered SP levels and genetic association are observed in Cystic Fibrosis (CF). We hypothesized that single nucleotide polymorphisms (SNPs) within the SP genes associate with CF or severity subgroups, either through single SNP or via SNP-SNP interactions between two SNPs of a given gene (intragenic) and/or between two genes (intergenic). We genotyped a total of 17 SP SNPs from 72 case-trio pedigree (SFTPA1 (5), SFTPA2 (4), SFTPB (4), SFTPC (2), and SFTPD (2)), and identified SP SNP associations by applying quantitative genetic principles. The results showed (a) Two SNPs, SFTPB rs7316 (p = 0.0083) and SFTPC rs1124 (p = 0.0154), each associated with CF. (b) Three intragenic SNP-SNP interactions, SFTPB (rs2077079, rs3024798), and SFTPA1 (rs1136451, rs1059057 and rs4253527), associated with CF. (c) A total of 34 intergenic SNP-SNP interactions among the 4 SP genes to be associated with CF. (d) No SNP-SNP interaction was observed between SFTPA1 or SFTPA2 and SFTPD. (e) Equal number of SNP-SNP interactions were observed between SFTPB and SFTPA1/SFTPA2 (n = 7) and SP-B and SFTPD (n = 7). (f) SFTPC exhibited significant SNP-SNP interactions with SFTPA1/SFTPA2 (n = 11), SFTPB (n = 4) and SFTPD (n = 3). (g) A single SFTPB SNP was associated with mild CF after Bonferroni correction, and several intergenic interactions that are associated (p < 0.01) with either mild or moderate/severe CF were observed. These collectively indicate that complex SNP-SNP interactions of the SP genes may contribute to the pulmonary disease in CF patients. We speculate that SPs may serve as modifiers for the varied progression of pulmonary disease in CF and/or its severity.

Surfactant proteins (SP) are involved in surfactant function and innate immunity in the human lung. Both lung function and innate immunity are altered in CF, and altered SP levels and genetic association are observed in Cystic Fibrosis (CF). We hypothesized that single nucleotide polymorphisms (SNPs) within the SP genes associate with CF or severity subgroups, either through single SNP or via SNP-SNP interactions between two SNPs of a given gene (intragenic) and/or between two genes (intergenic). We genotyped a total of 17 SP SNPs from 72 case-trio pedigree (SFTPA1 (5), SFTPA2 (4), SFTPB (4), SFTPC (2), and SFTPD (2)), and identified SP SNP associations by applying quantitative genetic principles. The results showed (a) Two SNPs, SFTPB rs7316 (p = 0.0083) and SFTPC rs1124 (p = 0.0154), each associated with CF. (b) Three intragenic SNP-SNP interactions, SFTPB (rs2077079, rs3024798), and SFTPA1 (rs1136451, rs1059057 and rs4253527), associated with CF. (c) A total of 34 intergenic SNP-SNP interactions among the 4 SP genes to be associated with CF. (d) No SNP-SNP interaction was observed between SFTPA1 or SFTPA2 and SFTPD. (e) Equal number of SNP-SNP interactions were observed between SFTPB and SFTPA1/SFTPA2 (n = 7) and SP-B and SFTPD (n = 7). (f) SFTPC exhibited significant SNP-SNP interactions with SFTPA1/SFTPA2 (n = 11), SFTPB (n = 4) and SFTPD (n = 3). (g) A single SFTPB SNP was associated with mild CF after Bonferroni correction, and several intergenic interactions that are associated (p < 0.01) with either mild or moderate/severe CF were observed. These collectively indicate that complex SNP-SNP interactions of the SP genes may contribute to the pulmonary disease in CF patients. We speculate that SPs may serve as modifiers for the varied progression of pulmonary disease in CF and/or its severity.
Keywords: surfactant protein, SP-A, SP-B, SP-C, SP-D, cystic fibrosis INTRODUCTION Cystic fibrosis (CF) is an autosomal multi-organ recessive inherited disease. Mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) protein are key in CF pathogenesis (1). CFTR is activated via cAMP through β 2 adrenoceptor stimulation; coding sequence polymorphisms in the CFTR result in CF (2). CFTR functions as a chloride channel on the surface of airway epithelial cells. In patients with CF, loss of CFTR channel function at the cell surface results in impermeability and increased sodium absorption (3,4). Depletion of the airway surface liquid causes reduced mucus clearance resulting in bacterial colonization, recurrent infections, chronic inflammation, and irreversible damage to the airway epithelium.
Pulmonary function deterioration is one of the primary complications of CF and pulmonary surfactant is essential for normal lung function. Pulmonary surfactant, a surfaceactive lipoprotein complex, is composed of 90% lipids and 10% surfactant proteins. The latter includes plasma proteins and surfactant proteins SP-A1, SP-A2, SP-B, and SP-C. The surfactant proteins comprise a hydrophobic group of proteins (SP-B and SP-C) and a hydrophilic group of proteins (SP-A1 and SP-A2). The SP-D, although co-isolates with surfactant, is not an integral part of the surfactant complex, but it is grouped with SP-A1/A2 based on its structural similarity and function (5,6). Pulmonary surfactant is synthesized and secreted by the alveolar epithelial Type II cells of the lung and maintains the stability of the pulmonary tissue by reducing the surface tension of fluids that coat the lung. Broadly speaking the hydrophobic surfactant proteins (SP-B and SP-C) are primarily involved in surface properties of surfactant and are important for normal lung function (7); the hydrophilic proteins (SP-A1, SP-A2, and SP-D) are primarily involved in host defense (6,8,9), although SP-A1 and SP-A2 exhibit differential effects on the surfactant structural reorganization (10), on the organization of phospholipid monolayers containing SP-B (11), and lung mechanics (12). Moreover, lipid-mediated interactions of SP-A/SP-B may contribute to normal lung surfactant function (13).
Unlike in rodents that have a single gene encoding surfactant protein A (SP-A), in humans SP-A consists of SP-A1 and SP-A2 encoded by SFTPA1 and SFTPA2, respectively; each gene has been identified with several genetic variants (14), and these have been shown to associate with several pulmonary diseases (15,16). The human SP-D locus is linked to the SP-A locus and is located proximal to the centromere at approximately 80-100 kb from the SFTPA2 gene (14). Genetic associations between SFTPD SNPs and lung disease have also been identified (17,18). Although SP-A1, SP-A2, and SP-D are molecules of the innate immunity of the lung, there may be differences in the mechanisms via which host defense is achieved (19,20). Surfactant proteins SP-B and SP-C play a key role in lung function in normal lung. SP-B is essential Abbreviations: CF, Cystic fibrosis; CFTR, cystic fibrosis transmembrane conductance regulator; FEV 1 , Forced expired volume in 1 sec; SP-A, surfactant protein A; SFTPA1, gene encoding SP-A1; SFTPA2, gene encoding SP-A2; SFTPB, gene encoding SP-B; SFTPC, gene encoding SP-C; SFTPD, gene encoding SP-D; SNPs, single-nucleotide polymorphism.
for life (21), and both SP-B and SP-C via their role in surfactant function may contribute to CF.
Lung surfactant proteins may contribute to the outcome in CF (22). Bronchoalveolar lavage levels of SP-A have been shown to be increased early in the course of the CF (23), but decrease as disease progresses. Lower levels are correlated with more inflammation and diminished lung function (22,(24)(25)(26)(27). Although, the level of SP-B (encoded by SFTPB) was unchanged in BAL from CF patients (24,28,29), in CF patients with mild lung disease (30) SP-B was found to be increased, but SP-A did not change. No changes were observed in the levels of SP-C and SP-D, encoded by the SFTPC and SFTPD genes, respectively. However, increased levels of SP-D were observed in serum of CF patients (31).
Moreover, in CF patients with well-conserved lung function (26), SP-C was increased, SP-A was decreased but SP-B and SP-D were not changed. Recently, it was reported that in CF patients, complex forms of SP-A were associated with better lung function. This indicates that the structural organization of SP-A affects its functional activity and this is linked to disease severity (32). SP-A1, shown previously to form higher size oligomers compared to SP-A2 (33), was shown recently to affect more efficiently (than SP-A2) the structural organization of surfactant, which in turn may affect lung function (10). Furthermore, genetic associations of SFTPA1 and SFTPA2 with CF have been observed (34). Moreover, different size SP-D oligomers have been associated with functional differences in patients with chronic lung disease such as CF (35).
Because individuals with identical CFTR mutations may differ in their pulmonary disease, it has been postulated that other genetic factors (i.e. gene modifiers) as well as the microenvironment may contribute to the variable outcome of pulmonary disease (36)(37)(38)(39)(40)(41). Surfactant proteins play an important role in surfactant function (7,34,42), pulmonary mechanics (12), and innate immunity (6,8,9). Furthermore, disruption of these functions can compromise normal lung health. Therefore, we postulated that the surfactant proteins contribute to the progression of the pulmonary disease in CF. We hypothesized that multiple genetic variants of the surfactant protein genes, SFTPA1, SFTPA2, SFTPB, SFTPC, and/or SFTPD, are associated with CF or disease severity subgroups (mild and moderate/severe) through single genetic variations within a gene, and through intragenic or intergenic interactions between variants of a single gene or variants of two different genes. Allele frequencies and linkage disequilibrium of these loci in races and ethnic groups has been previously studied (43). We further hypothesized that some of the associations are unique to patients with mild CF and moderate/severe CF. The observations made indicate that complex SNP-SNP interactions of the surfactant protein genes may contribute to the pulmonary disease in CF patients, and the SPs could serve as modifier genes in lung CF.

Study Samples
The patient samples were collected with informed and written consent from patients and/or parents under an approved protocol by the institutional review board from the Human Subject Protection Office of the Pennsylvania State University College of Medicine. The clinical data of the study samples are given in Supplementary Table 1 and summarized below. Seventy-two pedigrees (family trees) were studied, of which, 43 pedigrees had one case with two parents, 21 pedigrees had one case with a single parent, five pedigrees had 2 cases with 1 or 2 parents, and 3 pedigrees had 1 or 2 cases in 3 generations. There were a total of 205 study samples in the 72 pedigrees and of these 79 were CF cases. Their ethnicity was as follows: 198-White, 7-Hispanic; 196-American, 6-Mexican, and 3-Unknown (Supplementary Table 1). A correction for ethnicity and sex was performed in the analysis but no correction was made for age. The lung function was assessed by standard spirometry, and for children, <5 years of age, assessment of disease severity was done by the clinical scoring of CXR by the Wisconsin Scoring System (44,45). CF disease severity was classified as mild, moderate, and severe by lung function impairment based on percent predicted forced expired volume in 1 sec (FEV 1 ). FEV 1 /FVC: mild = 70-89%; moderate = 40-69%; severe ≤ 40% (Cystic Fibrosis Foundation). The number of patients in the present study under this classification is as follows: severe in 4 cases, moderate in 11 cases, and mild in 64 cases.

DNA Isolation
Genomic DNAs were prepared from blood samples using QIAamp Blood kit following the manufacturer's instructions (Qiagen, Valencia, CA, United States).

Selected Genetic Variations for This Study
The target surfactant protein genes, SFTPA1, SFTPA2, SFTPB, SFTPC, and SFTPD, were selected based on gene function and association with lung diseases (especially with CF) from our findings and other published data as described above. A total of 17 genetic SNPs were selected. These SNPs were previously shown to associate with various lung diseases, be important in function or structure, or other: 5 SNPs from SFTPA1, rs1059047, rs1136450, rs1136451, rs1059057, and rs4253527; 4 SNPs from SFTPA2, rs1059046, rs17886395, rs1965707, and 1965708; 4 SNPs from SFTPB, rs7316, rs2077079, rs3024798, and rs1130866; 2 SNPs from SFTPC, rs4715 and rs1124; and 2 SNPs from SFTPD, rs721917 and rs2243639. The SNP ID, other used name, nucleotide change, and association with human disease as well as related references are given in Table 1. The genotype frequencies of these SNPs in mild and moderate CF compared to controls are given in Supplementary Table 2.

Genotype Analysis
The PCR-based RFLP or cRFLP (135) method was used for genotyping as described in previous publications for SFTPA1, SFTPA2, and SFTPD (136,137), SFTPB (133,137), and SFTPC (61). The PCR primer sequences and restriction enzymes used are given in Table 2. Briefly, PCRs were performed at 95 • C for 2 min, 5 cycles of 95 • C for 30 seconds, 50 • C for 1 min, and 70 • C for 1 min, then 25 cycles of 95 • C for 30 s, 55 • C for 1 min, and 70 • C for 1 min, followed by an extension at 70 • C for 4 min. Five microliter of each PCR products were used for appropriate restriction enzyme digestion ( Table 2). The digested PCR product was separated by 8 or 10% of PAGE gel (based on the length of digested PCR fragments). The genotyping was done blindly. As samples (CF and Controls) were received, each was given a sequential laboratory number with no other identifiers and were genotyped all together without knowledge as to which sample is CF or Control. Therefore we believe that there was no bias in the genotyping. The genotype was scored based on the pattern of the digested PCR products for each sample.

Statistical Analysis
We used the Wang et al.'s (138) approach (provides computer code (written in R) for public use), which is a more efficient approach compared to more traditional methods (139) to test and estimate genetic effects of each pair of the 17 SNPs. This approach integrates the principle of quantitative genetics, enabling the decomposition of the overall genetic effect into different components: the additive (a) and dominant genetic effects (d) of each SNP and additive x additive (aa), additive x dominant (ad), dominant x additive (da), and dominant x dominant epistatic effects (dd) between the two SNPs. By estimating the role of each of these components, this approach can provide a better understanding of inheritance mode by which SNPs impact the disease. By analyzing each SNP pair, we calculated p-values for each genetic component. A Bonferroni correction was used to adjust for multiple comparisons.

Association of the Surfactant Protein Genes With CF
Associations of single SNPs or SNP-SNP interactions with CF discussed below are shown in Tables 3, 4. The SNPs studied here are not rare alleles (136). The column "interaction type, " in Table 3 as well as in subsequent relevant tables, is the type of the SNP-SNP interaction (interaction between two SNPs within a given gene -intragenic, or interaction between SNPs of two genes-intergenic). The letter a or d is for additive effect or dominant effect. The number 1 or 2 is for the SNP1 or SNP2. In Table 3 the d1 stands for a dominant effect for SNP1, and d2 stands for a dominant effect for SNP2. If it is a1d2 (Table 4), the interaction type is additive effect for SNP1 and dominant effect for SNP2. For example in Table 4, (1) SFTPC rs1124 has a significant dominant effect (d2) (P = 0.0053). This means that the heterozygote is beyond the mean of two homozygotes in the degree of severity at this SNP. (2) SFTPA1 rs1059047 × SFTPC rs1124 has a significant additive x dominant epistatic effect (a1d2) (P = 0.0014). This means that the combination of the homozygote at the first SNP and the heterozygote at the second SNP is significantly different from other combinations. (3) SFTPA1 rs1059047 × SFTPC rs1124 has a significant dominant x dominant epistatic effect (d1d2) (P = 0.0021). This means that the combination of the heterozygote at the first SNP and the heterozygote at the second SNP is significantly different from other combinations. In general, it looks like SFTPB and SFTPC have a dominant effect in most of the interactions, whereas the All the SNP changes are located within the exons, except the three SFTPB marked with **. The SFTPB a) rs2077079 is located 10 nt downstream of TATAA box, 5 ′ regulatory region (133); b) rs3024798 is located in the intron at the splice sequence of intron 2-exon 3 (65); and c) rs7316 is located in the 3 ′ UTR 4 nucleotides upstream of the TAATAA polyadenylation signal (133). The SFTPB rs1130866 marked with † is located within a potential N-linked glycosylation site, which has been shown to be glycosylated (134).
genes encoding the hydrophilic proteins exhibit primarily an additive effect ( Table 4). We speculate that the surfactant related functions imparted by SP-B and SP-C variants play a critical differential role in pulmonary CF and that functions imparted by SP-A1/SP-A2 and SP-D variants further enhance the CF disease progression.

Intergenic interactions that contain SFTPA1 SNPs
All of the 5 studied SFTPA1 SNPs are associated with CF through 13 intergenic SNP-SNP interactions with SNPs in SFTPA2 (n = 2), SFTPB (n = 3), and SFTPC (n = 8) (X 2 is 2.2285-7.8947, p = 0.0487-0.0007) (Table 4, Figure 1A). Of interest, no SNP-SNP interaction was observed with SFTPD. Each SFTPA1 SNP is shown to have 1-3 intergenic SNP interactions with the other surfactant protein genes. 1) SNP rs1136451 exhibits a significant association with CF through interactions with another two SFTPA1 SNPs (rs1059057 and rs4253527), as well as with both SFTPC SNPs (rs1124 and rs4715). 2) SNPs rs1136450 and rs4253527 are associated with CF through interaction with SNP rs1059046 of the SFTPA2.
Of the 13 intergenic SFTPA1 SNP-SNP interactions, 11 are with SFTPB and SFTPC encoding the hydrophobic surfactant proteins, and only two interactions are with SFTPA2 that encodes the hydrophilic surfactant protein A2 (Figure 1A), whereas no interactions are observed with SFTPD.

Intergenic interactions that contain SFTPA2 SNPs
Each SFTPA2 SNP is shown to have 1-3 intergenic SNP interactions with other surfactant protein genes. Table 4 shows that the SFTPA2 gene is associated with CF through 9 intergenic SNP-SNP interactions with SFTPA1 (n = 2), SFTPB (n = 4), and SFTPC (n = 3) (X 2 is 2.4172-6.4974, p = 0.0485-0.0038, Figure 1A). Similarly to SFTPA1, no interaction of SFTPA2 with SFTPD was found to be associated with CF. SFTPA2 SNP rs1059046 appears to stand out from the other SFTPA2 SNPs studied, as this SNP (1) is associated with CF via five of the total nine interactions observed with the other SP genes, (2) is the only SNP that shows interactions with two SFTPA1 SNPs, and (3) shows interactions with both hydrophobic surfactant proteins SFTPB and SFTPC (Figure 1A). Of the 9 intergenic interactions, 7 are with the SFTPB and SFTPC hydrophobic surfactant proteins genes, and only two interactions are with the hydrophilic surfactant protein gene, SFTPA1, and no interactions were observed with SFTPD.  Table 4 shows that SFTPD is associated with CF through 10 intergenic SNP-SNP interactions with SNPs in SFTPB and SFTPC, but as noted above no interactions were observed with SFTPA1 or SFTPA2. Of the 10 intergenic SFTPD interactions associated with CF, seven of these are with SFTPB and three with SFTPC (X 2 is 2.2285-8.4508, p = 0.0487-0.0007) (Figure 1B). SFTPD SNP rs721917 is associated with CF through 5 intergenic interactions with SFTPB (n = 3) and SFTPC (n = 2); rs2243639 is associated with CF also through 5 intergenic interactions but four of these are with SFTPB and only one with SFTPC ( Figure 1B).

Intergenic interactions that contain SFTPD SNPs
In summary, when we studied the entire CF cohort, we observed a) Two SNPs (one from SFTPB and the other from SFTPC) to be individually associated with CF; b) Three intragenic interactions (2 of SFTPA1 and one of SFTPB) to associate with CF; c) A total of 34 intergenic interactions of different combinations between the various genes studied, except between SFTPD and SFTPA1 or SFTPA2, to associate with CF; A summary of all the significant intragenic and intergenic interactions is shown in Supplementary Table 3, and Figure 2. Moreover, our results have a potential clinical impact. For example, since SFTPA1 rs1059047 x SFTPC rs1124 has a significant additive x dominant epistatic effect (a1d2) (P = 0.0014), the combination of the homozygote at the first SNP and the heterozygote at the second SNP is significantly different from other combinations. Thus, we can make a prediction of patients' severity based on their genotypes at these two SNPs.

Association of the Surfactant Protein Genes With CF Disease Severity Subgroups
To gain insight into the contribution of the SP genes to CF disease severity, we separated the CF cohort into two subgroups, mild (n = 64) and moderate/severe (n = 15). The data showed that after Bonferroni correction a single SFTPB SNP (rs7316) to be associated with mild CF and no other SNPs were found to associate with either CF subgroup. However, there are a number of intergenic SNP interactions (p < 0.01 prior to Bonferroni correction) that associated with each CF subgroups (Supplementary Table 4).
Given the smaller number of subjects in each CF subgroup, and as we wished to gain further insight into the interactions observed, we focused our attention on significant associations (p < 0.01) observed in each subgroup that were also significant after Bonferroni correction in the entire CF group. These are shown in Table 5 and Figure 3. Eight intergenic interactions were observed for the mild subgroup and only one for the moderate/severe subgroup. In the mild group, SNPs of the SFTPB exhibited the same number of interactions with SFTPD (n = 4) as they did with SFTPA1+SFTPA2 (n = 4). More interactions were observed between SFTPB and SFTPA1 (n = 3) than SFTPA2 (n = 1) in the mild subgroup. No significant associations with SFTPC were observed with either disease severity group. In addition similar to the entire CF group, no associations were found between SFTPD and SFTPA1 or SFTPA2 in either subgroup.

DISCUSSION
In this study, we investigated the genetic contribution of the surfactant protein genes, SFTPA1, SFTPA2, SFTPB, SFTPC, and SFTPD to CF and disease severity subgroups by genetic association analysis of single SNP and intragenic and intergenic SNP-SNP interactions. (a) For the entire CF group, we observed that all 5 surfactant protein genes are associated with CF by single  FIGURE 3 | Interactions in mild and moderate/severe CF subgroups with p-value < 0.01; these interactions were significant in the entire CF group after Bonferroni correction. Mild CF: Significant interactions between SFTPA1, SFTPA2, SFTPB, and SFTPD are shown with black arrows. Green box indicates the only SNP that remained significant after Bonferroni correction. Star (*) indicates that this SNP is significant at p < 0.03 by itself. Moderate/severe CF: Red arrow shows the only significant interaction at p < 0.01 that is also significant in the entire CF groups after Bonferroni correction. No significant interactions that include SNPs of the SFTPC are observed in either severity group.
SNP association, intragenic ( Table 3), and/or intergenic SNP-SNP interactions ( Table 4).( b) For the CF severity subgroups, we observed after Bonferroni correction a single SFTPB SNP to be associated with mild CF and several interactions (p < 0.01) to associate with mild or moderate/severe CF subgroups (Supplementary Table 4 and Table 5).
Human diseases are complex and determined by environmental factors and genes. Single genetic mutations or multiple mutations in a single gene, constitute a part or a small part of disease mechanisms. To study the full spectrum of gene(s) contributing to a disease either by being the primary cause or by modifying disease expression, an integrated genetic approach is needed to understand genetic control and clinical therapy. By integrating quantitative genetic principles (138) a statistical method was developed to test associations of pairwise SNPs and disease in a case-control setting. This method can decompose the overall genetic effect into its underlying components and test the significance of each component, gaining insight into the mechanisms of how SNPs affect disease. It has been used in our previous studies in human inflammatory bowel diseases (IBD) that included both case-control and casetrios studies, and targeting SNPs in one gene and in multiple genes of a metabolic pathway (140,141). Because this method is powerful and helps understand genetic contribution to a disease via interactions of genes in a metabolic pathway or gene network, we used it in the present study.

Association of the Surfactant Protein Genes With the Entire CF Group
Association of the Hydrophobic Surfactant Protein Genes, SFTPB and SFTPC, With CF Surfactant proteins SP-B and SP-C are hydrophobic membrane proteins that increase the rate that surfactant spreads over the alveolar surface, and are required for proper biophysical function of surfactant and lung function. SP-B is also important for SP-C processing, as indicated in SP-B deficient states where an aberrant SP-C was observed (7,142,143).
All four of the SFTPB SNPs studied showed significant interactions with SNPs of one or more SP genes. The rs7316 is located in the 3 ′ -UTR and could affect regulation of polyadenylation (133) and has previously been associated with acute lung injury (77). This SFTPB SNP is not only significant by itself but interacts with SNPs of all 4 SP genes. The rs2077079 interacts with SNPs of all SP genes, except SFTPA1. It is located at 11 nt downstream of the TATA box and could affect gene transcription. The rs1130866, which also interacts with SNPs of SP genes, except SFTPA1, is a missense mutation that changes the encoded amino acid (Ile/Thr) and an N-linked glycosylation site of the protein, shown previously to be indeed glycosylated in the Thr-containing variant (1580_C) (134). This is a significant change and may be an important contributing factor in various diseases and/or in response to environmental oxidative stress. Moreover, the SFTPB 1580_C (rs1130866) genetic variant has been observed to be a risk factor in several lung disease, such as idiopathic pulmonary fibrosis (IPF) (61), chronic obstructive pulmonary disease (COPD) (17), acute respiratory distress syndrome (ARDS) (137), septic shock and those with risk of respiratory failure after community acquired pneumonia (144), as well as increases mortality, apoptosis, and lung injury in mice carrying the human SP-B 1580_C variant compared to those with the 1580-T variant (145). On the other hand, the SP-B1580-T/T (rs1130866) is associated with protection against interstitial lung disease (ILD) with systemic sclerosis (64). The rs3024798 is located within the splicing sequence of intron 2-exon 3, and although its effect on splicing is unknown (65) it has previously been associated with invasive pneumococcal disease (IPD) (71). This SNP, in addition to an intragenic interaction, is the only SNP that interacts with a single SP gene, the SFTPD. Together these SNP variations in SFTPB could affect SP-B function by altering N-linked glycosylation and/or affect regulation at several levels including transcription and splicing.
SP-C is a hydrophobic surfactant protein and plays an important role in surfactant function. Mutations in SP-C have also been shown to associate with a number of pulmonary diseases (15), such as ILD and pulmonary alveolar proteinosis (PAP). Both SFTPC variants are missense, where amino acids 186 and 138 are changed, Ser/Asn in rs1124, and Thr/Asn in rs4715, respectively. While both SNPs associate with CF through numerous intergenic interactions with the other SPs, the rs1124 is also associated with CF by itself. The potential mechanisms via which these may affect function are not known. The two SP-C variants (rs1124, Ser/Asn and rs4715, Thr/Asn) have previously been associated with RDS (86,88,89), and children infected with respiratory syncytial virus (RSV) (90).
In summary, the observations made indicate that the hydrophobic proteins, shown previously to be key in surfactant function and homeostasis and consequently in lung function may play a central role in CF. The only two individual SNPs to associate with CF by themselves were SNPs of SFTPB rs7316 (p = 0.0083) and SFTPC rs1124 (p = 0.0154) (indicated by star in Figure 1A). Moreover, of the 37 interactions shown to associate with CF, SFTPB or SFTPC was part of the 35 interactions. Given the importance of SP-B and SP-C in surfactant function and lung function, and the fact that lung function deterioration is a major issue in CF, it is likely that the SFTPB and SFTPC genes are modifier genes for CF lung function, by modulating surfactant structural organization and stability.

Association of Hydrophilic Surfactant Protein Genes SFTPA1/A2 and SFTPD With CF
The SP-A1, SP-A2, and SP-D mediate innate immunity in the lung and via interactions with the alveolar macrophage the sentinel cell of innate lung host defense, promote, among others, bacterial and viral phagocytosis, and cytokine production, as well as affect lung inflammatory processes. The mechanisms implicated in these processes may differ among these molecules (19). Of interest in the present study, there were no significant interactions between SNPs of SFTPD and SFTPA1 or SFTPA2 and only two significant intergenic interactions were observed between SFTPA1 and SFTPA2 ( Figure 1A). A single SNP in SFTPA2 (rs1059046) interacted with two different SNPs in SFTPA1 (rs1136450, aa 50 Leu/Val and rs4253527, aa 219 Arg/Trp). The SFTPA2 SNP (rs1059046) changes the amino acid (Thr/Asn) at codon 9 of the precursor molecule which is part of the signal peptide, having the potential to affect processing of the SP-A2. This SNP was previously shown to associate with increased risk in RSV (47). The two SFTPA1 SNPs (rs1136450, rs4253527) that interact with this SFTPA2 SNP are located in the collagen-like domain and in the carbohydrate recognition domain (CRD), respectively, of the SP-A2 and change the encoded amino acid (rs1136450, aa 50 Leu/Val, rs4253527, aa 219 Arg/Trp). Moreover, higher-or lower-order of oligomerization of SP-A and SP-D is known to affect their functional capabilities (146)(147)(148)(149), and has been observed that naturally occurring SP-A and SP-D oligomers have functional relevance in patients with chronic lung diseases such as CF (32,35). Thus, each of the surfactant protein variants may differentially affect innate immune functions and in CF may each differentially modify lung host defense.
Multiple interactions between the hydrophobic and the hydrophilic protein genes were observed indicating that these two groups of proteins may co-operatively or synergistically contribute to the expression of pulmonary CF. As depicted in Figure 2, SFTPD SNPs are primarily found in interaction with SNPs of SFTPB (n = 7), whereas SNPs of SFTPA1 are primarily found in interactions with SNPs of SFTPC (n = 8). The latter is of interest because SP-A1 has been shown to affect more efficiently (compared to SP-A2) the structural organization of surfactant (10), indicating that SP-A1 and SP-C may cooperatively affect surfactant structure, which in turn may affect surfactant function and lung function. Moreover, the large number of SNP interactions between SFTPD and SFTPB is puzzling and not intuitively understood as far as surfactant structure or function is concerned. This is because, although SP-D is generally found in the bronchoalveolar lavage fluid and significantly less in alveolar epithelia, and grouped with SP-A1/A2 based on its structural similarity and function with these proteins (5,6), it is not found in the surfactant lipoprotein complex.

II. Association of the Surfactant Protein Genes With CF Subgroups
When we studied disease severity subgroups (mild and moderate/severe), we found after Bonferroni correction a single SFTPB SNP (rs7316) to associate with mild CF. This SNP has also been shown to associate with acute lung injury (77). Lack of SP-B is not compatible with life, and SP-B deficiency affects SP-C processing (7,142,143). This SFTPB SNP is located within the 3 ′ UTR and may affect regulation of polyadenylation (133). Thus, this SFTPB SNP may contribute to CF either by affecting its regulation (133) and/or by affecting SP-C processing (7,142,143). In the CF subgroups of the eight significant intergenic interactions (p < 0.01) that were also significant in the entire CF group after Bonferroni correction, seven were for the mild group and one for the moderate/severe subgroup. The ones for the mild group all were between SFTPB SNPs and SNPs of the genes encoding the hydrophilic surfactant proteins.
The SFTPB SNP (rs7316) that is significant by itself in mild CF after Bonferroni correction was the only SFTPB SNP that interacted with SNPs of SFTPA1 or SFTPA2, whereas the SNP-SNP intergenic interactions between SFTPB and SFTPD include three of the four SFTPB SNPs studied. These indicate the importance of SFTPB in mild CF which may provide protection in the sense of enabling a milder form of pulmonary CF through its surfactant-associated function or perhaps other currently unknown function. Based on the differences of SNP-SNP interactions; it is likely that the mechanisms via which the hydrophilic proteins contribute to mild CF may differ. The SFTPA1 SNP (rs4253527) was also significant by itself (p < 0.03) (Supplementary Table 4) in the mild CF group. SP-A and SP-D bind via their carbohydrate recognition domains (CRD) to the carbohydrate motifs on bacteria, viruses, fungi, etc. (150)(151)(152). Thus, SNPs in CRD may differentially affect binding to various ligands to trigger the host's innate immune response. The SFTPA1 SNP (rs4253527) is located in the CRD and changes the encoded amino acid (Arg/Trp) holding the potential to differentially affect innate immunity under various conditions including oxidative stress, since Trp is more sensitive to oxidation than Arg (33). In fact SP-A1 variants that differ in CRD at rs4253527 have been shown to differ in their ability to enhance phagocytosis (153) and cytokine production (154). Moreover, SP-A1 has been shown to more efficiently affect surfactant reorganization (than SP-A2) in the alveolar space (10). Whether this SNP provides protection in CF via its role in surfactant function or innate immunity or both remains to be determined. Recently, SP-A1 and SP-A2 have been shown to differentially affect lung mechanics (12) indicating another role of the SP-As beyond innate immunity.
For the moderate/severe group the only significant interaction is between the SNP rs1136450, (aa 50 Leu/Val) of SFTPA1 and SNP rs1059046 (aa 9 Thr/Asn) of SFTPA2. This interaction may be unique to moderate/severe disease group because it was not identified in the mild group even in interactions with p < 0.05 (Supplementary Table 4). Similarly, interactions 1, 3, 4, 7, and 8 in the mild group ( Table 5) were not identified in the moderate/severe disease group even in interactions with p < 0.05 (Supplementary Table 4), indicating that these may be unique to the mild CF group.
In summary, a single SNP of the SFTPB is a marker for mild pulmonary disease in CF. A number of intergenic interactions that all include SNPs of SFTPB as well as a single intragenic SFTPA1 and SFTPA2 interaction are likely to be markers for mild and moderate/severe pulmonary disease in CF, respectively.
Limitations of the study include: (a) The small number of subjects especially after the CF group was divided in the mild and moderate/severe subgroups. This also precluded analysis of the individual components of FEV 1 and FVC; (b) The limited clinical information; the samples were collected at an earlier time using only FEV 1 as a discriminator for the severity subgroups. However, this is still the main biomarker to assess disease severity. (c) The lack of associations with bacterial strain and correction for age. (d) The CFTR mutation is not known although most of the subjects are expected to carry the F508, which is approximately found in 70% of the CF patients.
However, in spite of these limitations the present findings indicate that both groups of surfactant proteins, those involved in innate immunity, and those affecting surfactant functions, associate with CF via complex interactions. These may contribute to pulmonary disease progression in CF, by affecting surfactant structure and function and/or by affecting innate immunity functions. Altered surfactant leads to a compromised lung function, and lung function deterioration is a major setback in CF pulmonary disease. Similarly host defense and inflammatory processes are partially affected in CF and SPs may play a role in these. Thus, based on the collective information with regards to their function and the observations made here, the SP genes are likely to be significant gene modifiers of CF and must be studied further. As gene modifiers, SPs may explain the varied progression of the pulmonary disease in CF in terms of lung function and host defense.

DATA AVAILABILITY
All the data are presented as supplementary files.

ETHICS STATEMENT
All protocols used in this study were evaluated and approved by the institutional review board from the Human Subject Protection Office of the Pennsylvania State University College of Medicine.

AUTHOR CONTRIBUTIONS
ZL analyzed and synthesized the data, contributed to the manuscript writing. NT analyzed and synthesized data for CF subgroups and contributed to manuscript writing. RW performed statistical analysis and contributed to manuscript writing. SD performed all the genotyping. MY assisted with statistical analysis. NJT attended to human subjects issues, sample collection and contributed to manuscript writing. XL checked genotype data sheets after multiple transfers. TL reviewed literature and made Table 1. SW helped with clinical assessment. JF designed the study and provided oversight to the entire project, involved in data analysis, integration, and writing of the manuscript.

FUNDING
This work was supported by NIH HL68947 of JF.