Sister haplotypes and recombination disequilibrium: a new approach to identify associations of haplotypes with complex diseases

Haplotype-based association analysis has several advantages over single-SNP association analysis. However, to date all haplotype-disease associations have not excluded recombination interference among multiple loci and hence some results might be confounded by recombination interference. Association of sister haplotypes with a complex disease, based on recombination disequilibrium (RD) was presented. Sister haplotypes can be determined by translating notation of DNA base haplotypes to notation of genetic genotypes. Sister haplotypes provide haplotype pairs available for haplotype-disease association analysis. After performing RD tests in control and case cohorts, a two-by-two contingency table can be constructed using sister haplotype pair and case-control pair. With this standard two-by-two table, one can perform classical Chi-square test to find statistical haplotype-disease association. Applying this method to a haplotype dataset of Alzheimer disease (AD), association of sister haplotypes containing ApoE3/4 with risk for AD was identified under no RD. Haplotypes within gene IL-13 were not associated with risk for breast cancer in the case of no RD and no association of haplotypes in gene IL-17A with risk for coronary artery disease were detected without RD. The previously reported associations of haplotypes within these genes with risk for these diseases might be due to strong RD and/or inappropriate haplotype pairs.


Introduction
High-throughput sequence technologies enable us to easily genotype dozens of single nucleotide polymorphisms (SNPs) within any interesting gene.Such genome-wide SNP data are rapidly growing in disease association studies (Neale and Sham, 2004;Cheng et al., 2005).The association analysis includes single-SNP (Cordell and Clayton, 2002) and haplotypebased disease associations (Zhao et al., 2003a;Zhao et al., 2003b;Clark, 2004;Niu, 2004).Haplotype-based association analysis has several advantages over single-SNP association analysis (Clark, 2004;Yang et al., 2008).The theoretical evidence is that haplotype-based tests would be more powerful because single-marker linkage-disequilibrium (LD)-based methods may not capture all of the available LD information, which is contained in multi-locus haplotypes (Akey et al., 2001;Schaid, 2006;Wen and Tsai, 2014).Therefore, there have been a lot of reporters of haplotypedisease association studies in recent years.However, to date all associations between haplotypes and complex diseases have not excluded recombination interference among multiple loci within haplotypes and hence some results might be confounded by recombination interference.In addition, although many methods (Akey et al., 2001;Sham et al., 2004;Allen and Satten, 2005, 2007, 2009;Fardo et al., 2011;Wen and Tsai, 2014) can be used to test for haplotype-based association, inappropriate haplotype pairs have broadly been used and might lead to finding spurious haplotypedisease associations.To exclude confounding of recombination interference in haplotype-disease association studies, we here introduce recombination disequilibrium (RD) (Tan, 2020).By following definitions of Hardy-Weinberg disequilibrium (HWD) at one locus and linkage disequilibrium (LD) between two loci (Robbins, 1918;Geiringer, 1944;Lewontin and Kojiana, 1960;Lewontin, 1964;Hill and Robertson, 1968), recombination disequilibrium (RD) is defined among three or more loci (Tan, 2020).Although LD has been widely used in haplotype-disease association studies, LD among multiple loci becomes very complicated and poorly understood due to recombination interference.Hastings (1984) indicated that commonly used measures of linkage disequilibrium are not appropriate for a multilocus system.Thomson and Baur (1984) also showed by an example that combinations of allele frequencies and pairwise linkage disequilibrium terms, which are permissible at two-locus level, may not be permissible at three-locus level.LD between two loci is not important for haplotype association, while recombination interference is a key factor in haplotype analysis because it determines frequencies of haplotypes (gametes) in populations.For example, double crossover types in positive interference status are less than those in independent status.The interference intensity is dependent of distance between two adjacent intervals.In classical genetics, coefficient of coincidence is used to measure crossover interference because of the fact that only positive interference has been discovered.With a great advance of technologies in molecular genetics, in particular, with a broad application of genotyping at molecular markers such as SNPs, negative interference has been observed in all species.Likewise, negative interference intensity becomes stronger as distance between adjacent intervals becomes shorter.Coefficient of coincidence is not available to describe negative interference because it is significantly asymmetric in positive and negative directions.This asymmetry leads to difficulty in testing for positive or negative interference in statistics.However, RD can easily measure positive and negative interferences and can easily be tested by Chi-square test (Tan, 2020).In single locus-disease association, Hardy-Weinberg equilibrium (HWE) test is required because frequencies of gene and genotypes follow HWE, then locus-disease associations found are true.In genome-wide study (GWS), linkage disequilibrium (LD) would result in false locus-disease associations due to the fact that linking of non-risk loci to disease gene alters genotype frequencies.Frequencies of haplotypes in recombination disequilibrium status contain linkage or recombination interference effect and hence would generate false haplotype-disease associations.Therefore, RD test is required in haplotype-based association of diseases.
In addition, haplotype pairs are also a very important factor impacting association of haplotypes with diseases because correct factor pair is a necessary condition testing for association between two factors.In this paper, we offered a new approach to study haplotype-disease association.The new approach is based on RD and sister haplotypes.We used four public haplotype-based controlcase data to show power and robustness of this method.

Haplotype data quality
Recently many large-scale GWAS analyses have been carried out in samples of several thousands of patients and normal individuals.Large SNP data make it possible to conduct largescale haplotype association analysis of diseases.One can use the above haplotype estimation methods and software packages to create haplotype data from the SNP data.But before performance of our method for haplotype-disease association analysis, haplotype data are necessarily checked in following aspects: 1) since our method is based on biallelic haplotypes, SNPs with multiple alleles must be removed from haplotypes; 2) data with less than 7 types of haplotypes are not available for RD test; 3) haplotypes consisting of more than 3 SNPs should be dissected into three-SNP haplotypes.

Construction of sister haplotypes
An important step for finding association of haplotypes with a complex disease of study is to construct sister haplotypes.Since haplotypes consist of four base types in DNA sequence, unlike gametes in classical genetics, it is difficult to determine which haplotypes are paired to be sister haplotypes.To construct sister haplotypes, one is first required to translate notation of DNA base haplotypes into notation of classical genetic genotypes.For doing so, we set three pairs of capital and lower letters, for example, Aa at site1, Bb at site 2 and Cc at site 3.A capital letter is assigned to an allele at one site and a lower letter to another allele.For example, in Table 1, M1M4*M6 has sites1 and 2 with alleles C and T, and site 3 with alleles A and G.But for the convenience of understanding, the best assignment way is that the capital letters are assigned to alleles of parental haplotypes and lower letter is assigned to mutation alleles.The parental type has the largest frequencies.In our current example, the parental haplotype is TTA, so we set T = T and t = C at site 1, B = T and b = C at site 2 and A = A and a = G at site 3. Thus, we can translate 8 DNA haplotypes to 8 genotypes of dominant gametes and determine sister gametes.

Construction of two-by-two contingency tables
After sister haplotypes are constructed by using the above method, two-by-two tables are required to be constructed.As an example, two-by-two contingency tables (Table 2) with sisterhaplotypes in rows and case-control of AD in columns were made by using data in Table 1.

Chi-square test for association between haplotypes and diseases
A pair of sister haplotypes is similar to a pair of alleles at a locus, therefore, a two-by-two contingency table constructed with sisterhaplotypes and case-control of a disease satisfies Chi-square test for independence between two variables.Using contingency tables, a null hypothesis that a pair of sister haplotypes is not associated with a disease of study can be tested by using Chi-square with degree of freedom = 1.For haplotypes constructed with three SNPs, we have four pairs of sister haplotypes and hence four null hypotheses that are tested by using Chi-squares.To exclude false associations due to recombination interference, testing for RD in haplotypes in control and case cohorts (Tan, 2020) are required.The method for testing for RD can be found in (Tan, 2020).RD is recombination disequilibrium among multiple loci.Similarly to linkage disequilibrium (LD), strong RD also results in spurious findings in haplotype-disease associations because strong RD would significantly change frequencies of haplotypes: D r 4(p 1 p 4 − p 2 p 3 ) where p 1 p(ABC), p 2 p(Abc), p 3 p(ABc), and p 4 p(AbC).
The p 1 is frequency of parental types.The p 4 is frequency of double crossover, and the p 2 and p 3 are frequencies of two single-crossovers.D r reflects difference between frequencies of double-crossover and single-crossovers.The frequency of

R package SHAD
R package SHAD (sister haplotype-based association of disease) was designed to implement RD tests and association analysis of haplotype with disease in case and control populations.SHAD package works in R environment and has two functions for haplotype association analysis: One is applied to three-SNP haplotypes and another is applied to m-SNP haplotypes where m>3.Function hapAnalysis is used to analyze three-haplotype association with disease.Three-SNP haplotypes have four pairs of sister haplotypes.It outputs RD, Chi-square results and p-value for

Results
In nature populations, sister-gametes may have different frequencies due to mutation, deletion, gene conversion and selection.But the disequilibrium between sister-gametes interestingly allows us to develop a statistical approach to test for association of sister-gametes with a complex disease of study.Under the null RD, if difference between sister-gametes in a patient (case) population is significantly different from the health (control) population, then the sister-gamete disequilibrium would be associated with the disease.Current SNP data provide us with a broad way to study haplotype-disease association.Fallin et al. (2001) reported a SNP haplotype dataset of 210 Alzheimer disease (AD) cases and 159 non-demented elderly controls.They used an EM algorithm to estimate frequencies of haplotype consisting of 8 SNPs (C19M1~C19M8) in a 205kbp region that contains ApoE gene in chromosome 19.Since they just reported haplotype data of configures 1 and 2 (configure1: M1M3M4*M6 and configure 2: M1M2M5M6) where M4* is C19M4 that is part of ApoE-ε4 that has been found to be a risk gene increasing risk for AD (Corder et al., 1993;Saunders et al., 1993;Strittmatter et al., 1993;Farrer et al., 1997), we here did not consider the other configures.We used the haplotype data of these two configures to test for RD among SNPs and associations between haplotypes and risk for AD.We constructed four combinations of three-locus haplotypes from configure 1 by collapsing the same haplotypes and generated three-locus haplotype data (Table 1).According to Fallin et al. (2001), SNPs C19M1,C19M2, C19M5, and C19M6 followed HWE. No LD occurred between C19M1 and C19M4,between C19M1 and C19M5,between C19M1 and C19M6,between C19M2 and C19M3,between C19M2 and C19M5,and between C19M2 and C19M6,but LD existed between C19M4 and C19M6,between C19M3 and C19M4,between C19M3 and C19M5 and between C19M3 and C19M6.The loci C19M1 and C19M8 flank physical interval of 205 kbp on chromosome 19.Our RD analysis shows that there is no RD among loci C19M1, C19M4, and C19M6, among loci C19M1, C19M3, and C19M6 in the case, control, and overall populations, while loci C19M3, C19M4, and C19M6 had very significant RD in all these three populations (p = 0.0014 in overall, p = 8.8E-06 in the case population and p = 0.044 in the control population, Table 3), which is very consistent with significant LDs between them given by Fallin et al. (2001).In haplotype M1M3M4* combination, we detected RD only in the case population (p = 0.0076, Table 3).This may be attributed to strong linkage between C19M3 and C19M4.From twoby-two data, we calculated odds ratios and their Chi-square statistics (Table 4).In haplotype combination of three-SNP M1M3M4*, sister haplotypes CCT and TTC (ABC and abc) and sisterhaplotypes CTC and TCT (Abc and aBC) were associated with risk for AD (p <0.05).In haplotype combination of three-SNP M1M4*M6, sisterhaplotypes TTA and CCG (ABC and abc) and sister haplotypes TTG and CCA (ABc and abC) were detected to be associated with risk for AD (p <0.05).These two three-SNP combinations all contain AD risk factor ApoE-ε4 and had no recombination interference among the three loci.But three-SNP M1M3M6 haplotype combination does not contain AD risk factor ApoE-ε4 (M4), its sister haplotypes TCA and CTG (ABC and abc) were also associated with risk for AD (p < 0.05) without RD confounding.Sister haplotypes TCG and CTA (ABc and abC) and sister haplotypes CCA and TTG (aBC and Abc) were very significantly associated with risk for AD (p <0.01).This result demonstrates that M3 is also a risk factor of AD (called ApoE-ε3) because in configure 2 (M1M2M5M6) without M3 and M4, none of sister haplotype pairs was found to be significantly associated with risk for AD and no RD among triplet SNPs in all four haplotype combinations (Supplementary Tables S1-S3).As M3, M4 and M6 are tightly linked, associations of the sister haplotypes CTA and TCG (ABC and abc) and sister haplotypes CTG and TCA (ABc and abC) with risk for AD in three-SNP M3M4*M6 haplotype combination (p < 0.01) were confounded by RD.
Another haplotype data published by Faghih et al. (2009) provide an opposite example.By using differential analysis method (Faghih et al., 2009), found that two haplotypes (ACA and CCA) of three variants in gene IL-13 were significantly associated with risk for breast cancer.By using our method, we got four pairs of sister haplotypes and their frequencies in the case and control populations (Table 5).But as we predicted, our RD analysis showed that RD>0.02 was extremely significant (p = 2.81e-06, 5.12e-05, and 1.53e-07 in overall, control, and case populations, respectively, Table 6).Obviously these three variants are in a very short interval of 3.5kbp (457bp + 3099bp) such that extremely strong negative recombination interference occurred.But interestingly none of sister-haplotype pairs was found to be associated with risk for breast cancer (Table 7).The significant differences in frequencies of haplotypes ACA and CCA between the case and control groups in Faghih et al. (2009) just were due to RD and/or inappropriate haplotype pairs used.We did not find any other reports that variants in gene IL-13 are associated with risk for breast cancer.Another similar example can be found in Vargas-Alarcon et al.'s report of association of haplotypes in interleukin-17A gene with risk for premature coronary artery disease (CAD).Four SNPs (rs8193036, rs3819024, rs2275913 and rs8193037) in gene IL-17A were genotyped in 900 premature CAD patients and 935 health persons (Vargas-Alarcon et al., 2015) performed haplotype-based association analysis of premature CAD using individual and common haplotype pairs (called individualcommon haplotype pairs).The common haplotype is TAGG.They found that TAGA was associated with risk for CAD at significance level of p <0.05.But TAGA has different alleles at only one locus from the common haplotype TAGG.This association, which is equivalent to SNP-disease association, conflicts with the fact that none of SNPs within gene IL-17A was associated with CAD.Our haplotype analysis indicates that these four SNPs should construct 16 haplotypes, of which only 10 haplotypes were observed with hapview, hence only rs8193036, rs3819024, and rs2275913 are valid to construct 8 haplotypes (see Supplementary Material).The RD test shows that in the premature CAD and control populations a very strong negative recombination interference occurred among these three SNP loci within gene IL17A (Supplementary Material).The RD results (D r 0.0199 in the case population with p = 7.90E-06 and 0.0274 in the control population with p = 1.59E-09) are very agreeable with the fact that these SNPs are in a very short region within gene IL-17A indicated by high LD value (r 2 >0.9 and D'>0.8).As seen in gene IL-13, none of sister haplotype pairs was found to be associated with risk for CAD (Supplementary Material).This result is well consistent with the result that none of SNPs was found to be associated with risk for CAD (Vargas-Alarcon et al., 2015).
To furthermore demonstrate that our method is broadly useful, we constructed an R package SHAD (Supplementary Package and Material) and applied it to a COMT haplotype dataset published by Peterson et al. (2010).This dataset has 15 haplotypes consisting of 6 SNPs in Catechol-O-methyl transferase (COMT) genes.Gene COMT has 6 exons and 5 introns (McGregor, 2014).SNP1(rs1544325), SNP2(rs174674) and SNP3(rs7290221) are located in intron 1 and the intervals between SNPs 1 and 2 and between SNPs 2 and 3 are 2357 bp and 12447bp, respectively.SNP4 (rs2239393) is located in intron 3, SNP5 (rs4680) in exon4 and SNP6 (rs46462316) in intron5.Intervals between SNP2 and SNP4, between SNP4 and SNP5, and between SNP5 and SNP6 are separately 16414bp, 833bp, and 861bp.Since Peterson et al. (2010) did not recognize how to construct sister haplotypes, they used individual-common haplotype pairs in the case and control groups to calculate OR and found that haplotypes GAGAGC and AGCGAC were significantly associated with risk for breast cancer.Our sister haplotype analysis was still based on three-SNP system.Haplotypes consisting of 6 SNPs should have 20 three-SNP haplotype combinations, which are more than 15 haplotypes observed, so many haplotypes were missed.In theory, each three-SNP combination should have 8 haplotypes.In haplotype combination list (Supplementary Table S4), 11 combinations had 6 haplotypes and 8 combinations had 7 haplotypes and only one had 8 haplotypes.Since 6 haplotypes cannot construct valid sister gamete pairs, we removed them from our analysis.For combinations with 7 haplotypes, we assigned frequencies of rare haplotypes in the case and control groups to the missing haplotype in each combination.Thus these 8 combinations each had 8 haplotypes.Using our R package SHAD (Sister-haplotype Association of Disease), we obtained the results of RD and disease association tests.The results summarized in Supplementary Table S5 show that except that combination 19 had no significant RD, the other 7 combinations had very significant RD.Combination 6 (SNP1, SNP3 and SNP5), combination13 (SNP2, SNP3 and SNP6), and combination16 (SNP2, SNP5 and SNP6) had very strong negative recombination interference but in combination 9 (SNP1, SNP4 and SNP6), combination 10 (SNP1, SNP5 and SNP6), combination11(SNP2, SNP 3 and SNP4), and combination12 (SNP2, SNP3 and SNP5) there was very strong positive recombination interference among three SNPs.Unsurprisingly, in all combinations none of sister-haplotype pairs was found to be associated with risk for breast cancer (Supplementary Table S5).These results are completely predicted by recombination interference occurring in so short intervals within the gene and within introns.To our knowledge, COMT is chiefly produced by nerve cells in the brain and its variants were found to be associated with risk for mental illness and schizophrenia, other disorders that affect thought (cognition), emotion, bipolar disorder, panic disorder, anxiety, obsessive-compulsive disorder (OCD), eating disorders, and attention deficit hyperactivity disorder (ADHD) (disease http://ghr.nlm.nih.gov/gene/COMT).So far we have not yet found any other evidence for that variants of COMT are associated with risk for breast cancer.

Discussion
Theoretically, RD reveals recombination interference among multiple loci in an ideal population because in such a population RD is completely derived from recombination interference.In a natural population, however, in addition to recombination interference, RD may also be derived from selection, mutation, gene conversion, migration and/or genetic drift in a small population because these factors can also alter frequencies of gametes or haplotypes (Tan, 2020).In human local populations, these factors may also result in haplotype-based association of complex diseases.Therefore, RD test is required in haplotypebased association of disease.
Frequencies of haplotypes in natural or human populations can be estimated by using the existing methods such as PHASE (Stephens et al., 2001), fastPHASE (Scheet andStephens, 2006), BEAGLE (Browning and Browning, 2007), IMPUTE2 (Howie et al., 2009), RCEH (Gao et al., 2009) and MaCH (Li et al., 2010).However, current statistical methods for haplotype-disease association analysis, as seen in the above examples, do not consider recombination interference though LD has been excluded in haplotype-based association analysis of diseases.LD can easily be tested between two loci (Robbins, 1918;Geiringer, 1944;Lewontin and Kojiana, 1960;Lewontin, 1964;Hill and Robertson, 1968) but get very complicated among multiple loci because LD cannot measure recombination interference.Recombination interference becomes strong in a short interval.Recombination interference results in change of frequencies of haplotypes which would lead to spurious association between haplotypes and a complex disease.An example is that association of haplotype in gene IL-17A with CAD reported by Vargas-Alarcon et al. (2015) was due to recombination interference within gene IL-17A.In addition, small populations also result in change of haplotype frequencies because of genetic drift, which leads to false association of haplotypes with the disease.Therefore, in a small population, testing for RD in haplotypes can exclude false hapoltype-disease associations.If no RD in haplotypes is found in control and case populations, identified association of sister haplotypes with a disease of study is acceptable in statistics.For example, M1M3M4* haplotype containing risk factor apoE-ε4 and M1M3M6 haplotype containing risk factor apoE-ε3 were found to be associated with risk for AD in small human population (210 AD cases and 159 non-demented elderly controls) using our sister haplotypes and RD test.ApoE-ε3 (Huang et al., 1995;DeMattos et al., 2001;Hopkins et al., 2002;Sen et al., 2012;Pedachenko et al., 2015;Mahan et al., 2022;Sepulveda-Falla et al., 2022;Mulgrave et al., 2023) and apoE-ε4 (Ayyubova, 2023;Chen et al., 2023;Hamza et al., 2023;Koutsodendris et al., 2023;Pires and Rego, 2023;Sun and Xie, 2023;Zhou et al., 2023) have been verified to be risk factors for AD.Fallin et al. (2001) however found that 3 haplotypes in configure 2 flanking M3 and M4 were significantly associated with risk for AD by using individual-others pairs.However, haplotypes in configure 2 (M1M2M5M6) should not be associated with risk for AD because haplotypes in configure 2 do not contain M3 and M4.For example, three SNPs can construct 8 genotypes ABC, abc, ABc, abC, aBC, Abc, AbC and aBc, if we just consider SNP1 and SNP3 and ignore SNP2, we then have four two-SNP genotypes: AC (ABC and AbC), ac (aBc and abc), Ac (ABc and Abc), aC (aBC and abC) each containing B and b alleles at SNP2 locus.If SNP2 is assumed to be a risk factor, then there should not be associations between SNP1-SNP3 haplotypes and risk for the disease.So (Fallin et al., 2001) findings of haplotypes associated with AD in configure 2 are incorrect.
A null hypothesis for haplotype-disease association is that under recombination equilibrium, if disequilibrium between two sister haplotypes does not result in disease, then difference in frequency between sister haplotypes in the case population should be independent of that in the control population.Since two sister haplotypes, like a pair of alleles at a locus, are respectively derived from father and mother and hence are genetically a pair of sister gametes.It is reasonable to construct two-by-two contingency tables with sister haplotypes and case-control for association test.Therefore, inappropriate haplotype pairs would result in false findings of haplotype-disease associations.For example, in individual-common haplotype pairs (Gaudet et al., 2006;Peterson et al., 2010), only one haplotype (e.g., CTA in Table 5) has different alleles at all three loci from the common haplotype (e.g., ACG in Table 5), while the others have the same alleles at two or one locus with the common haplotype.This means that only one haplotype can be paired with the common haplotype in biology.Individual-others pairs (Fallin et al., 2001), as seen in configure 2, would create an incorrect association between haplotypes and risk for the disease because most of the other haplotypes are irrelevant to this haplotype and cannot be paired with it in biology.In order to validate this conclusion, we applied individual-common haplotype pair and individual-others pair methods to the haplotype data (Table 5) of Faghih et al. (2009) and to a new haplotype dataset (Supplementary Table S6) created by assigning 500 patients to the 8 three-SNP haplotypes using their frequencies in the case population and 400 health individuals to the same 8 haplotypes using their frequencies in the normal population.In the original haplotype data (Table5 or Supplementary Table S6), the individual-common pair and sister haplotype pair methods did not find any association between haplotypes and risk for breast cancer but the individual-other pair method identified that ACA was associated with risk for breast cancer (p = 0.03254) (Supplementary Table S7).In the new haplotype data (Supplementary Table S6), both individual-common pair and individual-other pair methods found that haplotypes ACA and ATA were very significantly associated with risk for breast cancer (p ≤ 0.005191).The inconsistent results between two datasets with the same haplotype frequencies in the case and control populations indicate that both individual-common pairs and individual-other pairs are incorrect haplotype pairs in association analysis.However, we did not find that four pairs of sister haplotypes were associated with risk for breast cancer (Supplementary Table S7) in the original and new haplotype data, suggesting that sister haplotype pairs are correct pairs for testing for association between haplotypes and risk for disease.These four examples above show that our sister haplotype method based on RD has high-sensitivity and lower specificity.Theoretical analysis show that our method satisfies conditions of independence of two random variables, that is, two sister haplotypes are paired and case and control of disease are also paired.We will use simulation data to show that our method would have higher power, higher ROC courve, and lower FDR in multiple haplotype-disease tests than the other haplortype-based methods in future study.

TABLE 1
Data of haplotypes consisting of three SNPs derived from four-SNP haplotypes in configure 1 (M1M3M4*M6) where M4* is C19M4 that is part of ApoE-ε4.
TABLE 2 Four two-by-two tables made by using sister haplotypes and case-control.aHaplotypedatafromFallin et al., (2009).

TABLE 3
RD and chi-square testing RD among three SNPs in four haplotypes (M1M3M4*M6) where M4* is C19M4 that is part of ApoE-ε4.

TABLE 4
Chi-square test of associations between sister haplotypes and Alzheimer disease.
for construction of sister haplotype pairs, converting haplotypes to genotype of three loci, RD test, and Chi-square test for association between sister haplotype pairs and a disease of study including a practical example is given in Supplementary Material.

TABLE 5
Eight kinds of haplotypes consisting of 3SNP in IL-13 and their distribution in patient and normal populations a .

TABLE 6
RD test for recombination interference among the three loci in gene IL-13.

TABLE 7
Results for association between sister gametes and risk for breast cancer.Chi-square test, and p-values for OR in case-control.Function hapADA is used to dissect m-SNP haplotypes into n combinations of three-SNP haplotypes and perform association analysis of sister haplotype pairs with disease in all combinations.SHAD package is available for request.