<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fgene.2016.00034</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Using Incomplete Trios to Boost Confidence in Family Based Association Studies</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Dhankani</surname> <given-names>Varsha</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn003"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/282328/overview"/></contrib>
<contrib contrib-type="author">
<name><surname>Gibbs</surname> <given-names>David L.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn003"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/113183/overview"/></contrib>
<contrib contrib-type="author">
<name><surname>Knijnenburg</surname> <given-names>Theo</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/299223/overview"/></contrib>
<contrib contrib-type="author">
<name><surname>Kramer</surname> <given-names>Roger</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Vockley</surname> <given-names>Joseph</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Niederhuber</surname> <given-names>John</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Shmulevich</surname> <given-names>Ilya</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Bernard</surname> <given-names>Brady</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Institute for Systems Biology</institution> <country>Seattle, WA, USA</country></aff>
<aff id="aff2"><sup>2</sup><institution>Inova Translational Medicine Institute</institution> <country>Falls Church, VA, USA</country></aff>
<aff id="aff3"><sup>3</sup><institution>School of Medicine, Virginia Commonwealth University</institution> <country>Richmond, VA, USA</country></aff>
<aff id="aff4"><sup>4</sup><institution>School of Medicine, John Hopkins University</institution> <country>Baltimore, MD, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Madhuchhanda Bhattacharjee, University of Hyderabad, India</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Robert Brian O&#x00027;Hara, Biodiversit&#x000E4;t und Klima - Forschungszentrum, Germany; Matteo Benelli, University of Trento, Italy</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Brady Bernard <email>brady.bernard&#x00040;systemsbiology.org</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics</p></fn>
<fn fn-type="other" id="fn003"><p>&#x02020;These authors have contributed equally to this work.</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>03</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>7</volume>
<elocation-id>34</elocation-id>
<history>
<date date-type="received">
<day>31</day>
<month>10</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>02</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2016 Dhankani, Gibbs, Knijnenburg, Kramer, Vockley, Niederhuber, Shmulevich and Bernard.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Dhankani, Gibbs, Knijnenburg, Kramer, Vockley, Niederhuber, Shmulevich and Bernard</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Most currently available family based association tests are designed to account only for nuclear families with complete genotypes for parents as well as offspring. Due to the availability of increasingly less expensive generation of whole genome sequencing information, genetic studies are able to collect data for more families and from large family cohorts with the goal of improving statistical power. However, due to missing genotypes, many families are not included in the family based association tests, negating the benefits of large scale sequencing data. Here, we present the CIFBAT method to use incomplete families in Family Based Association Test (FBAT) to evaluate robustness against missing data. CIFBAT uses quantile intervals of the FBAT statistic by randomly choosing valid completions of incomplete family genotypes based on Mendelian inheritance rules. By considering all valid completions equally likely and computing quantile intervals over many randomized iterations, CIFBAT avoids assumption of a homogeneous population structure or any particular missingness pattern in the data. Using simulated data, we show that the quantile intervals computed by CIFBAT are useful in validating robustness of the FBAT statistic against missing data and in identifying genomic markers with higher precision. We also propose a novel set of candidate genomic markers for uterine related abnormalities from analysis of familial whole genome sequences, and provide validation for a previously established set of candidate markers for Type 1 diabetes. We have provided a software package that incorporates TDT, robustTDT, FBAT, and CIFBAT. The data format proposed for the software uses half the memory space that the standard FBAT format (PED) files use, making it efficient for large scale genome wide association studies.</p>
</abstract>
<kwd-group>
<kwd>family based association tests</kwd>
<kwd>missing genotypes</kwd>
<kwd>randomized imputation</kwd>
<kwd>quantile intervals</kwd>
<kwd>population stratification</kwd>
<kwd>whole genome analysis</kwd>
<kwd>memory efficient data format</kwd>
</kwd-group>
<contract-sponsor id="cn001">National Center for Research Resources<named-content content-type="fundref-id">10.13039/100000097</named-content></contract-sponsor>
<contract-sponsor id="cn002">Inova Health System Foundation<named-content content-type="fundref-id">10.13039/100001181</named-content></contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="5"/>
<equation-count count="13"/>
<ref-count count="11"/>
<page-count count="12"/>
<word-count count="8032"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>A wide variety of genetic association studies have been performed with the aim of discovering genomic markers for a given phenotype of interest. While many of these studies are population based, there is a renewed interest in family-based studies due to the inability of population based studies to account for much of the heritability of most common phenotypes (Ott et al., <xref ref-type="bibr" rid="B6">2011</xref>). Family based designs are commonly employed in genetic association studies because they are robust to population stratification (Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>). The two most widely used family based tests are Transmission Disequilibrium Test (TDT; Spielman et al., <xref ref-type="bibr" rid="B9">1993</xref>) and Family Based Association Test (FBAT; Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>). TDT compares the frequency of allelic transmission from heterozygous parents to affected offspring. FBAT is a generalization of TDT that allows the use of families with unaffected offspring as controls, improving statistical power when studying common diseases (Lange and Laird, <xref ref-type="bibr" rid="B5">2002</xref>). Both TDT and FBAT require complete genotypes. Any family where one or more members have missing genotypes is not used in these tests, resulting in loss of statistical power. Moreover, the genotypes are often not missing at random, but can be related to technical errors or to observed covariates. In such cases, ignoring or imputing missing genotypes can lead to systematic bias in the test statistic.</p>
<p>Several extensions to TDT have been proposed to handle missing data in affected families. Likelihood methods that deal with missing parental genotype information assume a homogeneous population in Hardy-Weinberg equilibrium (Van Steen et al., <xref ref-type="bibr" rid="B10">2006</xref>). Croiseau et al. presented a multiple imputation approach for case-parent trio studies and showed it to have advantage of model flexibility over likelihood approaches (Croiseau et al., <xref ref-type="bibr" rid="B3">2007</xref>). However, multiple imputation methods use posterior probabilities derived from the available data and as such, also assume a homogeneous population within the study cohort. Alternately, the robustTDT (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>) method handles incomplete genotypes without assuming any underlying patterns of missing data by exploring all possible genotype completions and returns upper and lower bounds of the TDT statistic (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>).</p>
<p>TDT-type tests are known to inflate the type I error rate where there is missing parental genotype information or undetected genotype errors or both (Van Steen et al., <xref ref-type="bibr" rid="B10">2006</xref>). Moreover, most of the current genome wide association studies involve latent population substructure due to technical artifacts or diverse ancestries. Imputation methods that assume a homogeneous study population are not applicable in such cases. Cobat et al. proposed FBATdosage that computes the FBAT statistic by imputing missing genotypes using allele dosage (posterior mean genotype; Cobat et al., <xref ref-type="bibr" rid="B2">2014</xref>). Here, we present a method to compute quantile intervals of the FBAT statistic (CIFBAT) without imputing missing genotypes. In this work we refer to the (&#x003B1;/2, 100-&#x003B1;/2) quantile intervals as &#x0201C;QIs,&#x0201D; where by default &#x003B1; &#x0003D; 0.05. These intervals are used to represent Z score and <italic>p</italic>-value spreads. CIFBAT computes QIs of the FBAT statistic by considering all valid completions of incomplete trios equally likely, and as such, does not assume homogeneous population allele frequencies. It includes families with unaffected offspring as controls, and most importantly, includes incomplete trios regardless of whether the parental or the offspring genotypes are missing. Table <xref ref-type="table" rid="T1">1</xref> compares various features between PLINK&#x00027;s implementation of TDT (Purcell et al., <xref ref-type="bibr" rid="B7">2007</xref>; <ext-link ext-link-type="uri" xlink:href="http://pngu.mgh.harvard.edu/~purcell/plink/">http://pngu.mgh.harvard.edu/&#x0007E;purcell/plink/</ext-link>), robustTDT (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>), FBAT (Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>), and our implementations of these methods, as well as CIFBAT. CIFBAT has been designed to analyze large numbers of whole genome sequences efficiently and can handle additive, dominant, and recessive genetic models for autosomal chromosomes, as well as the X chromosome. We have provided a software package called FamSuite with implementations of TDT, robustTDT, FBAT, and CIFBAT (<ext-link ext-link-type="uri" xlink:href="https://github.com/IlyaLab/FamSuite">https://github.com/IlyaLab/FamSuite</ext-link>). We present analysis of simulated genotype data to demonstrate applicability of CIFBAT in detecting bias in the FBAT statistic due to missing data and in identifying genomic markers with higher precision. We also present results from analysis of familial whole genome sequencing data set for maternal uterine anomalies and from a candidate marker data set for type 1 diabetes.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p><bold>Comparison of features of TDT, robustTDT, FBAT, and CIFBAT</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th/>
<th valign="top" align="center"><bold>CIFBAT</bold></th>
<th valign="top" align="center"><bold>FamSuite FBAT</bold></th>
<th valign="top" align="center"><bold>FBAT (Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>)</bold></th>
<th valign="top" align="center"><bold>FamSuite TDT</bold></th>
<th valign="top" align="center"><bold>PLINK TDT</bold></th>
<th valign="top" align="center"><bold>FamSuite robustTDT</bold></th>
<th valign="top" align="center"><bold>robustTDT (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Unaffected offspring</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Incomplete trios</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td/>
<td/>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
</tr>
<tr>
<td valign="top" align="left">Support for ChrX</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Genetic models</td>
<td valign="top" align="center">A,D,R</td>
<td valign="top" align="center">A,D,R</td>
<td valign="top" align="left">A,D,R</td>
<td valign="top" align="center">A,D,R</td>
<td valign="top" align="center">A</td>
<td valign="top" align="center">A,D,R</td>
<td valign="top" align="left">A</td>
</tr>
<tr>
<td valign="top" align="left">Memory efficient data format</td>
<td valign="top" align="center">&#x02713;</td>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td/>
<td valign="top" align="center">&#x02713;</td>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Original implementations of TDT, robustTDT and FBAT are compared with their implementations in FamSuite (For genetic models, A, additive; D, dominant; R, recessive)</italic>.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<p>In the following section, we refer to trios with disease affected offspring as &#x0201C;affected trios,&#x0201D; trios with unaffected offspring as &#x0201C;unaffected trios,&#x0201D; trios with one or more family members with missing genotypes as &#x0201C;incomplete trios,&#x0201D; and trios with at least one heterozygous parent as &#x0201C;informative trios.&#x0201D;</p>
<p>Since CIFBAT computes the FBAT statistic for every iteration of random completion of missing genotype data, we briefly recapitulate the FBAT parameters and statistic here. Further details about FBAT can be found in Laird et al. (<xref ref-type="bibr" rid="B4">2000</xref>).</p>
<sec>
<title>FBAT</title>
<p>The FBAT (Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>) statistic <italic>Z</italic><sub><italic>c</italic></sub> is based on the covariance <italic>U</italic><sub><italic>c</italic></sub> between the offspring&#x00027;s traits and genotypes:
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E2"><label>(2)</label><mml:math id="M2"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E3"><label>(3)</label><mml:math id="M3"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>&#x003BC;</mml:mi><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E4"><label>(4)</label><mml:math id="M4"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02215;</mml:mo><mml:msqrt><mml:mrow><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
Here, <italic>X</italic><sub><italic>i</italic></sub> denotes the offspring genotype in trio <italic>i</italic> at the genomic marker being tested. For a nuclear family with multiple offspring, there will be as many father-mother-offspring trios contributing to the test independently. The subscript &#x0201C;c&#x0201D; in the above formula denotes that FBAT is based on only &#x0201C;complete&#x0201D; trios in the data. <italic>X</italic><sub><italic>i</italic></sub> is defined by the genetic model (additive, dominant, recessive) under consideration. For example, for additive model, <italic>X</italic><sub><italic>i</italic></sub> counts the number of non-reference alleles observed in the offspring, and can take a value of 0, 1, or 2 for a bi-allelic genomic marker (Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>).</p>
<p><italic>T</italic><sub><italic>i</italic></sub> is the coded trait defined as <italic>Y</italic><sub><italic>i</italic></sub> &#x02212; &#x003BC;, where <italic>Y</italic><sub><italic>i</italic></sub> denotes the observed trait of the offspring in trio <italic>i</italic>. Although Y can take several types of values, in this paper we focus on dichotomous traits where the observed trait <italic>Y</italic><sub><italic>i</italic></sub> is &#x0201C;1&#x0201D; for affected offspring and &#x0201C;0&#x0201D; for unaffected offspring.</p>
<p>&#x003BC;&#x02208;<italic><italic>[0, 1]</italic></italic> is an offset value that can be chosen to maximize the power of the test (Laird et al., <xref ref-type="bibr" rid="B4">2000</xref>). When &#x003BC; &#x0003D; 0, <italic>T</italic><sub><italic>i</italic></sub> &#x0003D; <italic>Y</italic><sub><italic>i</italic>,</sub> implying that only affected trios are used in the test (since <italic>Y</italic><sub><italic>i</italic></sub> is 0 for unaffected offspring). When &#x003BC; &#x0003E; 0, affected trait <italic>T</italic><sub><italic>i</italic></sub> &#x0003E; 0 and unaffected trait <italic>T</italic><sub><italic>i</italic></sub> &#x0003C; 0, so both affected and unaffected trios are used in the test. For the analyses presented in this paper, we used &#x003BC; &#x0003D; 0.5 in order to assign equal but opposite weights to affected and unaffected trios.</p>
<p>Figure <xref ref-type="fig" rid="F1">1A</xref> shows an example of an informative complete trio for autosomal chromosomes (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>). Figures <xref ref-type="fig" rid="F1">1B,C</xref> show examples of informative trio types with female and male offspring respectively for X chromosome. A comprehensive list of informative complete trios for autosomal chromosomes, as well as the X chromosome, is shown shown in Figure <xref ref-type="supplementary-material" rid="SM2">S1</xref>. The corresponding statistics <italic>X</italic> &#x02212; <italic>E[X]</italic> and <italic>Variance(X)</italic> shown in Figure <xref ref-type="supplementary-material" rid="SM2">S1</xref> are for the additive genetic model. Statistics for dominant and recessive models are in Table <xref ref-type="supplementary-material" rid="SM14">S1</xref> (autosomal chromosomes) and Table <xref ref-type="supplementary-material" rid="SM15">S2</xref> (X chromosome).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p><bold>Examples of informative complete trios. (A)</bold> Autosomal chromosomes <bold>(B)</bold> X chromosome; trio with female offspring <bold>(C)</bold> X chromosome; trio with male offspring.</p></caption>
<graphic xlink:href="fgene-07-00034-g0001.tif"/>
</fig>
<p>Here we describe an example to explain computation of the statistics <italic>X</italic> &#x02212; <italic>E[X]</italic> and <italic>Variance(X)</italic>. For the trio type shown in Figure <xref ref-type="fig" rid="F1">1A</xref> where one parent is reference homozygous, the other parent is heterozygous and the offspring is heterozygous, under additive genetic model,
<disp-formula id="E5"><mml:math id="M5"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn><mml:mo>&#x0002A;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x0002A;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn><mml:mo>=</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>&#x0002A;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x0002B;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x0002A;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>-</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>2</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x02215;</mml:mo><mml:mn>4</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
</p>
<p>Trio types 6 and 7 for autosomal chromosomes (Figure <xref ref-type="supplementary-material" rid="SM2">S1</xref>) are not informative under the dominant model (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>) because both the possible offspring genotypes&#x02014;heterozygous and alternate allele homozygous&#x02014;have equal penetrance under dominant genetic model. Similarly, trio types 1 and 2 are not informative under the recessive model.</p>
<p>Once <italic>X</italic><sub><italic>i</italic></sub> &#x02212; <italic>E[X</italic><sub><italic>i</italic></sub><italic>], Variance(X</italic><sub><italic>i</italic></sub><italic>)</italic>, and <italic>T</italic><sub><italic>i</italic></sub> are computed for each trio, <italic>U</italic><sub><italic>c</italic></sub> and <italic>Variance(U</italic><sub><italic>c</italic></sub><italic>)</italic> are computed by summation over all the trios and, finally, the FBAT statistic <italic>Z</italic><sub><italic>c</italic></sub> is computed as ratio of <italic>U</italic><sub><italic>c</italic></sub> and standard deviation of <italic>U</italic><sub><italic>c</italic></sub>.</p>
<p><italic>Z</italic><sub><italic>c</italic></sub> is essentially a z-score measuring deviation from the null hypothesis of no linkage and no association. When evaluating bi-allelic markers, a positive <italic>Z</italic><sub><italic>c</italic></sub> indicates that the allele being tested was over-transmitted to the affected offspring, whereas a negative <italic>Z</italic><sub><italic>c</italic></sub> indicates under-transmission to affected offspring. <italic>P</italic>-values are computed considering this as a two-sided test.</p>
</sec>
<sec>
<title>CIFBAT&#x02014;boosting confidence in FBAT with quantile intervals</title>
<p>The FBAT test described above does not account for incomplete trios in the study cohort. This might lead to undetected bias in the test statistic. We have implemented CIFBAT to detect potential bias in the FBAT statistic in presence of incomplete trios and to identify significant genomic markers with higher precision. CIFBAT considers all the valid completions of each incomplete trio equally likely and computes QIs of the FBAT statistic over many iterations of randomized completions.</p>
<p>Figure <xref ref-type="fig" rid="F2">2A</xref> shows an example of an admissible incomplete trio type for autosomal chromosomes (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>). Figures <xref ref-type="fig" rid="F2">2B,C</xref> show examples of admissible incomplete trio types with female and male offspring respectively for X chromosome. A complete list of all admissible incomplete trio types for autosomal chromosomes (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>) as well as X chromosome is shown in Figure <xref ref-type="supplementary-material" rid="SM3">S2</xref>. We computed <italic>X</italic> &#x02212; <italic>E[X]</italic> and <italic>Variance(X)</italic> for all valid completions of these incomplete trios under additive, dominant, and recessive models respectively. Table <xref ref-type="supplementary-material" rid="SM16">S3</xref> lists these statistics for autosomal chromosomes, and Tables <xref ref-type="supplementary-material" rid="SM17">S4</xref>, <xref ref-type="supplementary-material" rid="SM18">S5</xref> list these statistics for the X chromosome for trios with male and female offspring respectively. For non-informative completions (both homozygous parents), both <italic>X</italic> &#x02212; <italic>E[X]</italic> and <italic>Variance(X)</italic> are equal to 0.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p><bold>Examples of admissible incomplete trios. (A)</bold> Autosomal chromosomes <bold>(B)</bold> X chromosome (female offspring) <bold>(C)</bold> X chromosome (male offspring). CIFBAT considers all valid completions of incomplete trios in the data as equally likely. Using randomly selected completions over several repetitions, CIFBAT computes a quantile interval of the FBAT statistic.</p></caption>
<graphic xlink:href="fgene-07-00034-g0002.tif"/>
</fig>
<p>We will now describe how CIFBAT computes QIs of the FBAT statistic. In the following explanation, subscript &#x0201C;<italic>c&#x0201D;</italic> denotes a complete trio or a statistic related to complete trios, subscript &#x0201C;<italic>m&#x0201D;</italic> denotes missing (incomplete) trio or a statistic related to incomplete trios, and subscript &#x0201C;<italic>r</italic>&#x0201D; denotes a random variable.</p>
<p>Suppose for a genomic marker under evaluation, our data set consists of <italic>k</italic> complete trios and d incomplete trios. The total <italic>U</italic><sub><italic>c</italic></sub> and Variance(<italic>U</italic><sub><italic>c</italic></sub>) for all the complete trios are computed as:
<disp-formula id="E6"><label>(5)</label><mml:math id="M6"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E7"><label>(6)</label><mml:math id="M7"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002A;</mml:mo><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mtext>&#x000A0;</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
Next, for each incomplete trio, CIFBAT chooses a completion assuming uniform distribution of all valid completions as listed in Tables <xref ref-type="supplementary-material" rid="SM16">S3</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM18">S5</xref>. Let <italic>X&#x00027;</italic> denote the offspring&#x00027;s genotype for a randomly chosen completion. The total contribution of incomplete trios is a random variable <italic>U</italic><sub><italic>mr</italic></sub> computed as summation of contributions based on their random completions by CIFBAT:
<disp-formula id="E8"><label>(7)</label><mml:math id="M8"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E9"><label>(8)</label><mml:math id="M9"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:msubsup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x0002A;</mml:mo><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
For a single iteration of random completion of all incomplete trios, the corresponding total <italic>U</italic> statistic and the variance are computed as the sum of the statistics from complete and incomplete trios.
<disp-formula id="E10"><label>(9)</label><mml:math id="M10"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E11"><label>(10)</label><mml:math id="M11"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
The corresponding <italic>Z</italic> statistic is computed as:
<disp-formula id="E12"><label>(11)</label><mml:math id="M12"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd columnalign="right" class="align-odd"><mml:msub><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02215;</mml:mo><mml:msqrt><mml:mrow><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>U</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msqrt></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
For each genomic marker, CIFBAT executes a pre-defined number (1000 by default) of iterations of computing <italic>U</italic><sub><italic>mr</italic></sub> and the corresponding <italic>Z</italic><sub><italic>r</italic></sub>, each time with a randomly selected set of completions for all incomplete trios. QIs of <italic>Z</italic><sub><italic>r</italic></sub> (&#x003B1;/2 and 100- &#x003B1;/2) and the corresponding <italic>p</italic>-values are then computed for a pre-defined confidence level (&#x003B1; &#x0003D; 0.05 by default).</p>
</sec>
<sec>
<title>Simulation of family genotype data</title>
<p>In order to explore the statistical response of CIFBAT under various scenarios of missing data, we simulated family genotype data under no association and a genetic model of disease explained in detail in Data Sheet <xref ref-type="supplementary-material" rid="SM1">1</xref>. The following three standard scenarios of missing data, as defined in the statistics literature, were simulated.</p>
<p>When data is &#x0201C;missing completely at random&#x0201D; (MCAR), the probability of an observation being missing does not depend on observed or unobserved measurements. This scenario was simulated by introducing missing genotypes for randomly chosen samples in the data. This type of missing data was use in the study of markers with no association.</p>
<p>Data can also be &#x0201C;missing at random&#x0201D; (MAR), which means that the missingness mechanism depends only on an observed measurement, and not on any unobserved measurements. This scenario was simulated with respect to the following observed variables separately: gender, subpopulation, and disease status.</p>
<p>Lastly, data can be &#x0201C;missing not at random&#x0201D; (MNAR) when the missingness mechanism depends on unobserved measurements. This can be difficult to detect and can lead to invalid inference if the missing data is ignored or incorrectly modeled. We simulated this scenario by concentrating missing data on heterozygous or homozygous samples, i.e., samples that were heterozygous or homozygous for the non-reference allele were more likely to be missing.</p>
<p>The simulated binary phenotype is either uniformly random or based on a bi-allelic additive genetic model closely following the work of Yang et al. (<xref ref-type="bibr" rid="B11">2003</xref>), where each individual starts with a low probability for disease and each causative SNP additively increases the probability. The model for log probability of disease, &#x003C0;<sub><italic>ij</italic></sub>, is defined as:
<disp-formula id="E13"><label>(12)</label><mml:math id="M13"><mml:mtable columnalign="left" class="align-star"><mml:mtr><mml:mtd columnalign="right" class="align-odd"><mml:mtext>log</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C0;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x003B2;</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mtext>&#x000A0;&#x000A0;</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
Here, &#x003B1;<sub><italic>ij</italic></sub> is the initial log probability of disease for offspring <italic>j</italic> in family <italic>i</italic>. &#x003B1; represents environmental or other hidden factors that contribute to disease. For each causative SNP <italic>m</italic>, &#x003B2; is a log ratio of penetrance values comparing having one or two alleles to none. X(G) is a vector of dummy variables indicating a count of alternative alleles. Disease status was probabilistically determined using &#x003C0;<sub><italic>ij</italic></sub>; offspring within a family have correlated probabilities for disease, but are independent across families.</p>
<p>In each simulated scenario, 4000 pedigrees were generated and the mean population prevalence was 12.5%. An equal number of cases and controls were used, and missing data was generated at rates of 0, 1, 5, and 10% at the sample level (Table <xref ref-type="table" rid="T2">2</xref>).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p><bold>Simulation scenarios for comparison of FBAT and CIFBAT</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Missingness type</bold></th>
<th valign="top" align="left"><bold>Missing data concentrated on:</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">MCAR</td>
<td valign="top" align="left">Random</td>
</tr>
<tr>
<td valign="top" align="left">MAR</td>
<td valign="top" align="left">Small Pop.</td>
</tr>
<tr>
<td valign="top" align="left">MAR</td>
<td valign="top" align="left">Large Pop.</td>
</tr>
<tr>
<td valign="top" align="left">MAR</td>
<td valign="top" align="left">Males</td>
</tr>
<tr>
<td valign="top" align="left">MAR</td>
<td valign="top" align="left">Females</td>
</tr>
<tr>
<td valign="top" align="left">MNAR</td>
<td valign="top" align="left">Cases</td>
</tr>
<tr>
<td valign="top" align="left">MNAR</td>
<td valign="top" align="left">Controls</td>
</tr>
<tr>
<td valign="top" align="left">MNAR</td>
<td valign="top" align="left">Heterozygotes</td>
</tr>
<tr>
<td valign="top" align="left">MNAR</td>
<td valign="top" align="left">Homozygotes</td>
</tr>
<tr>
<td valign="top" align="left" colspan="2">Missing data was split by the above variables 80/20</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Different missingness patterns (MCAR, MAR, MNAR) were simulated for comparing FBAT and CIFBAT. Population, gender, affectation status and zygosity were used to specify distribution of missing data for MAR and MNAR. In all, 9 scenarios were simulated at 0, 1, 5, 10% missing rate each</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>In the MAR scenario, for each level of missingness, we arbitrarily split the missing data 80/20 given a binary variable, such as gender or subpopulation, so that one population contains 80% of the missing data. The analysis was performed 1000 times for each level of missingness within each scenario, which results in 9 scenarios <sup>&#x0002A;</sup> 4 missingness levels <sup>&#x0002A;</sup> 1000 runs. The 9 scenarios are listed in Table <xref ref-type="table" rid="T2">2</xref>. The parameters used and their range of values are listed in Table <xref ref-type="table" rid="T3">3</xref>.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p><bold>Parameters for simulation of family genotype data</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Parameter</bold></th>
<th valign="top" align="left"><bold>Value</bold></th>
<th valign="top" align="left"><bold>Notes</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Populations</td>
<td valign="top" align="left">2</td>
<td valign="top" align="left">Sized 1/3 and 2/3 of individuals</td>
</tr>
<tr>
<td valign="top" align="left">Families</td>
<td valign="top" align="left">4000</td>
<td valign="top" align="left">Each assigned to a population</td>
</tr>
<tr>
<td valign="top" align="left">Number affected offspring</td>
<td valign="top" align="left">636 (sd 294)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Equal number of controls</td>
<td valign="top" align="left">Drawn from remainder of all pedigrees</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Number of offspring</td>
<td valign="top" align="left">Uniform random (1,2)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Number of markers</td>
<td valign="top" align="left">300</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Number causative markers</td>
<td valign="top" align="left">3</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">CIFBAT trials</td>
<td valign="top" align="left">100</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Penetrance with 2 causative SNPs</td>
<td valign="top" align="left">f<sub>2</sub>&#x0007E; N(0.1, 0.01)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Penetrance with 0 causative SNPs</td>
<td valign="top" align="left">f<sub>0</sub>&#x0007E; N(0.001, 0.001)</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Environmental effect</td>
<td valign="top" align="left">&#x003BB;<sub><italic>s</italic></sub> &#x0003D; 3</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Genomic effect</td>
<td valign="top" align="left">&#x003BB;<sub><italic>g</italic></sub> &#x0003D; 2</td>
<td/>
</tr>
<tr>
<td valign="top" align="left">Marker Frequency</td>
<td valign="top" align="left">Gamma(shape &#x0003D; 2, scale &#x0003D; 2)/35.0</td>
<td/>
</tr>
</tbody>
</table>
</table-wrap>
<p>A false discovery rate (FDR) cut-off of 10% was used to identify significant results. The equivalent <italic>p</italic>-value threshold (from the FDR) was also used to identify significant CIFBAT QIs. An interval was considered significant if the <italic>p</italic>-value spread was below the <italic>p</italic>-value threshold.</p>
<p>In each simulation, the causative SNPs are known. This allows computation of performance metrics useful for comparing CIFBAT and FBAT. Among the metrics computed are sensitivity (recall), the proportion of true positives called, specificity, the proportion of true negatives called, precision, proportion of predicted trues that are actually true, and the F-measure which is the harmonic mean of recall and precision.</p>
</sec>
<sec>
<title>Real data: familial whole genome sequence data for maternal uterine anomalies</title>
<p>As more and more association studies employ whole genome sequences, it becomes important that novel association methods are evaluated for applicability and performance in genome wide association analyses. With this purpose, we used CIFBAT to analyze whole genome sequences of 784 nuclear families for uterine anomalies in mothers, represented as a binary phenotype. Samples of peripheral blood were collected from fathers, mothers and newborns at the Inova Fairfax Medical Center in Falls Church, Virginia, and sequencing was done at &#x0003E;40X depth using Complete Genomics&#x00027; whole genome sequencing platform. Fifty-two (52) out of the 784 mothers were diagnosed with various uterine anomalies including endometriosis, bicornuate uterus, didelphic uterus, etc. (complete list in Text S1). These were used as cases in our study. Various filters were applied to the genomic markers to reduce noise in the data and to ensure applicability of family based association tests (Text S1). Altogether, we tested 3,808,482 autosomal markers under both FBAT as well as CIFBAT. A false discovery rate (FDR) cut-off of 10%, which translated to a <italic>p</italic>-value cut-off of 5.71e-05, was used to identify significant hits from FBAT. The <italic>p</italic>-value cut-off from FBAT results was also used to identify significant 95% QIs computed with CIFBAT. A QI was considered significant if the entire spread was below the <italic>p</italic>-value cut-off. A significant FBAT <italic>p</italic>-value together with a significant CIFBAT quantile interval indicated robustness against missing data. On the other hand, a significant FBAT statistic and a non-significant CIFBAT quantile interval hinted at potential bias in the FBAT statistic due to missing data.</p>
</sec>
<sec>
<title>Real data: candidate marker set for type 1 diabetes</title>
<p>To demonstrate the use of CIFBAT in refining candidate sets of markers, we analyzed Type 1 diabetes data that consisted of a dichotomous phenotype variable and 351 markers within 22 candidate genes that have been previously identified as contributing to the risk of Type 1 diabetes. The data were collected by the Type 1 Diabetes Genetics Consortium (T1DGC). There were 2313 pedigrees consisting of 2345 nuclear families with one or more offspring. To ensure independence of nuclear families, we excluded one randomly selected family from every pair of families that had at least one common parent. We also excluded offspring whose type 1 diabetes affectation status was unknown as well as founders that had no sequenced offspring in the data. Finally, we were left with 2314 nuclear families with one or more offspring which we then analyzed using CIFBAT. An FDR cut-off of 10% was used to indicate significant results from FBAT. The equivalent <italic>p</italic>-value cut-off of 1.10e-02 was used to indicate significant results based on CIFBAT QIs. Again, a significant FBAT statistic with a significant CIFBAT quantile interval implied robustness against missing data, whereas a significant FBAT statistic with a non-significant CIFBAT quantile interval implied potential bias due to missing data.</p>
</sec>
<sec>
<title>Extensions to TDT and robustTDT</title>
<p>In addition to developing the CIFBAT method, we have extended capabilities of the original TDT and robustTDT implementations as follows. PLINK (Purcell et al., <xref ref-type="bibr" rid="B7">2007</xref>) is widely used for applying Transmission Disequilibrium Test (TDT) to family based studies. PLINK&#x00027;s implementation of TDT handles both autosomal chromosomes as well as the X chromosome, but only under an additive model. We extended our TDT implementation to handle dominant and recessive models in addition to the additive model. Table <xref ref-type="supplementary-material" rid="SM19">S6</xref> lists the informative trios for autosomal chromosomes and transmission counts for the two alleles <italic>b</italic> and <italic>c</italic> under all three models. Tables <xref ref-type="supplementary-material" rid="SM20">S7</xref>, <xref ref-type="supplementary-material" rid="SM21">S8</xref> list the informative trios with male and female offspring respectively for the X chromosome. Similar extensions were added to robustTDT (Sebastiani et al., <xref ref-type="bibr" rid="B8">2004</xref>) to handle missing genotypes for the X chromosome in addition to autosomal chromosomes under all the three genetic models (Tables <xref ref-type="supplementary-material" rid="SM22">S9</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM11">S11</xref>).</p>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec>
<title>Simulated family genotype data</title>
<p>Analysis of simulated family genotype data allowed us to compare the performance of FBAT and CIFBAT under no phenotype association and the various patterns of missing data (MCAR, MAR, MNAR). The simulated scenarios of missing data are not meant to be comprehensive, but rather illustrative of the effects on algorithms that use missing data in making inferences. Comparisons were primarily based on recall and precision.</p>
<p>When phenotypes were uniformly random, and data was missing completely at random, the FBAT <italic>p</italic>-values remained uniformly distributed, resulting in 10% false positives at <italic>p</italic>-value threshold of 0.1 (with no FDR correction). CIFBAT Z<sub>r</sub> score QIs, based on the interval median, were normally distributed around zero (see Figure <xref ref-type="supplementary-material" rid="SM12">S12</xref>). As the amount of missing data increases, the intervals widen, and fewer intervals lie completely below a given threshold. At 1% missing data 5.1% of <italic>p</italic>-value intervals were completely below 0.1 while at 10% missing data 0.3% of <italic>p</italic>-value intervals were below 0.1. However, when the CIFBAT method is used as described (computing FDR based on FBAT and comparing intervals to the <italic>p</italic>-value threshold), zero simulated markers are found to be significant at a 10% FDR level, as desired.</p>
<p>Across all scenarios, for both FBAT and CIFBAT, the smallest effect on the recall occurred when missing data was concentrated on the controls, and the largest effect occurred when missing data was concentrated on the cases. In general, the cases will have more occurrences of a causative variant, so when missing data is concentrated on the cases, it weakens the power of the tests to detect the overabundance of causative alleles in cases, leading to false negative results. On the other hand, when missing data is concentrated on controls, it does not affect the signal to the same extent. Figure <xref ref-type="fig" rid="F3">3A</xref> shows the difference in recall for a sliding FDR threshold when missing data is concentrated on the cases or controls.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p><bold>Performance of FBAT and CIFBAT under Missing At Random (MAR) simulation scenario</bold>. Shown is the <bold>(A)</bold> Precision, <bold>(B)</bold> Recall, and <bold>(C)</bold> F-measures related to calling the causative variant. Missing data is concentrated within cases or controls and performance is measured for different missing data rates and FDR thresholds.</p></caption>
<graphic xlink:href="fgene-07-00034-g0003.tif"/>
</fig>
<p>The simulations showed that overall, in comparison to FBAT, CIFBAT tended to trade lower recall for higher precision; meaning that while fewer variants were called significant, they were more likely to be true positives (Figure <xref ref-type="fig" rid="F3">3B</xref>). For example, in the scenario where missing data was concentrated in cases, the CIFBAT recall fell from 0.48 at 1% missing data, to 0.26 at 10% missing data, compared to 0.52 at 1% to 0.40 at 10% for FBAT (Table <xref ref-type="table" rid="T4">4</xref>). However, the CIFBAT precision rose from 0.96 to 1.0 over the same range of missing data, compared to 0.93 for all ranges of missing data with FBAT (Table <xref ref-type="table" rid="T4">4</xref>). CIFBAT specificity and negative predictive value remained above 99% over all missingness scenarios and levels (Table <xref ref-type="supplementary-material" rid="SM12">S12</xref>). This suggests that CIFBAT can be useful in detecting potential bias in the FBAT statistic due to missing data and in refining the set of significant markers to be validated for downstream analyses.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p><bold>Results from analysis of simulated familial genotype data</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Algorithm</bold></th>
<th valign="top" align="left"><bold>Mode</bold></th>
<th valign="top" align="center"><bold>Missing rate (%)</bold></th>
<th valign="top" align="left"><bold>Missing data concentrated in:</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>F-measure</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">CIFBAT</td>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">0</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.547</td>
<td valign="top" align="center">0.922</td>
<td valign="top" align="center">0.686</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">1</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.523</td>
<td valign="top" align="center">0.959</td>
<td valign="top" align="center">0.677</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.490</td>
<td valign="top" align="center">0.976</td>
<td valign="top" align="center">0.653</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.484</td>
<td valign="top" align="center">0.970</td>
<td valign="top" align="center">0.646</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">0</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.538</td>
<td valign="top" align="center">0.933</td>
<td valign="top" align="center">0.682</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">1</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.484</td>
<td valign="top" align="center">0.962</td>
<td valign="top" align="center">0.644</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.356</td>
<td valign="top" align="center">0.989</td>
<td valign="top" align="center">0.524</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.263</td>
<td valign="top" align="center">1.000</td>
<td valign="top" align="center">0.417</td>
</tr>
<tr>
<td valign="top" align="left">FBAT</td>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">0</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.547</td>
<td valign="top" align="center">0.922</td>
<td valign="top" align="center">0.686</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">1</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.540</td>
<td valign="top" align="center">0.936</td>
<td valign="top" align="center">0.685</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.537</td>
<td valign="top" align="center">0.931</td>
<td valign="top" align="center">0.681</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Controls</td>
<td valign="top" align="center">0.530</td>
<td valign="top" align="center">0.941</td>
<td valign="top" align="center">0.678</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">0</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.538</td>
<td valign="top" align="center">0.933</td>
<td valign="top" align="center">0.682</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">1</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.522</td>
<td valign="top" align="center">0.934</td>
<td valign="top" align="center">0.670</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">5</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.473</td>
<td valign="top" align="center">0.932</td>
<td valign="top" align="center">0.627</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">MAR</td>
<td valign="top" align="center">10</td>
<td valign="top" align="left">Cases</td>
<td valign="top" align="center">0.404</td>
<td valign="top" align="center">0.934</td>
<td valign="top" align="center">0.564</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Performance of FBAT and CIFBAT was compared based on recall, precision, and F-measure. CIFBAT tended to trade lower recall for higher precision; meaning that while fewer variants were called significant, they were more likely to be true positives. F-measure was comparable between FBAT and CIFBAT over all the simulation scenarios; however, the variance of F-measure was higher for CIFBAT</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Ultimately, the levels of recall and precision are influenced by the disease model parameters. For example, changes in recall and precision can be seen in Figures <xref ref-type="supplementary-material" rid="SM4">S3</xref>, <xref ref-type="supplementary-material" rid="SM5">S4</xref> respectively when the mean penetrance is varied in the model. A complete report of changes in various performance metrics of FBAT and CIFBAT with changes in model parameters is given in Table <xref ref-type="supplementary-material" rid="SM12">S12</xref>.</p>
<p>Overall, the F-measure, which is the harmonic mean of precision and recall, was comparable between FBAT and CIFBAT, with FBAT having a median F-measure of 0.67 and CIFBAT having a median F-measure of 0.64 over various simulation scenarios (Table <xref ref-type="table" rid="T4">4</xref>). However, the variance of F-measure was greater for CIFBAT (Figure <xref ref-type="fig" rid="F3">3C</xref> and Figure <xref ref-type="supplementary-material" rid="SM6">S5</xref>).</p>
<p>The simulation also generated results where the CIFBAT QIs were significant but the FBAT statistic was not, although these were few in number (5/300) and no clear pattern emerged to explain their occurrence. Figures <xref ref-type="supplementary-material" rid="SM7">S6</xref>, <xref ref-type="supplementary-material" rid="SM8">S7</xref> show how precision and recall of FBAT and CIFBAT are affected by percentage of missing data. While precision of CIFBAT is higher than FBAT in presence of missing data, its recall decreases with missing data. Figures <xref ref-type="supplementary-material" rid="SM9">S8</xref>, <xref ref-type="supplementary-material" rid="SM10">S9</xref> show effect of the FDR threshold chosen to indicate significant results.</p>
</sec>
<sec>
<title>Familial whole genome sequence data for maternal uterine anomalies</title>
<p>We analyzed familial whole genome sequencing data to identify potential genomic markers for maternal uterine anomalies. Further, we utilize this section to explain interpretation and usage of CIFBAT QIs in validating robustness of FBAT results in presence of missing genotype data.</p>
<p>We compared the counts of significant markers exclusively and jointly identified by FBAT and CIFBAT. Figure <xref ref-type="fig" rid="F4">4A</xref> shows that out of the 551 markers that were significant under FBAT, 242 (&#x0007E;44%) were also significant under CIFBAT when incomplete trios were included in the test. Validation under CIFBAT indicates the robustness of these results against missing data and serves as a way to select features with higher precision for downstream analyses. All but one of the 242 CIFBAT markers were either intergenic or were positioned in non-coding regions of their respective genes. Figure <xref ref-type="fig" rid="F4">4B</xref> compares the FBAT statistic and CIFBAT quantile interval for chr7:142008644 (GRCh37), which lies in the second exon of gene <italic>TCRBV9S1A1T</italic>. This marker was significant under FBAT with <italic>p</italic>-value less than machine epsilon (2.22e-16), as well as under CIFBAT with a 95% QI [&#x0003C;2.22e-16, 3.23e-10]. Figure <xref ref-type="fig" rid="F5">5</xref> illustrates distribution of complete and incomplete trio types between case and control trios for this marker. A complete list of all the markers significant under both FBAT and CIFBAT can be found in Table <xref ref-type="supplementary-material" rid="SM13">S13</xref>.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p><bold>Analysis of familial whole genome sequencing data for uterine anomalies. (A)</bold> Comparing the number of significant results exclusively and jointly under FBAT and CIFBAT. Of the 551 markers significant under FBAT, 242 (&#x0007E;44%) were validated, and 309 (&#x0007E;56%) were negated by CIFBAT after including incomplete trios in the test. Thirty nine additional markers were identified as significant exclusively by CIFBAT. <bold>(B)</bold> An example of a marker which was significant under FBAT and further validated by CIFBAT. <bold>(C)</bold> An example of a marker which was significant under FBAT, but was not validated by CIFBAT upon inclusion of incomplete trios in the test. <bold>(D)</bold> An example of a marker which was exclusively significant under CIFBAT.</p></caption>
<graphic xlink:href="fgene-07-00034-g0004.tif"/>
</fig>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p><bold>Distribution of trio types within cases and controls for chr7:142008644. (A)</bold> Complete trio types&#x02014;Trio type numbers mentioned in the legend correspond to those in the Figure <xref ref-type="supplementary-material" rid="SM2">S1</xref>. <bold>(B)</bold> Incomplete trio types - Trio type numbers mentioned in the legend correspond to those in the Figure <xref ref-type="supplementary-material" rid="SM3">S2</xref>. Only trio types that had non-zero counts are shown here.</p></caption>
<graphic xlink:href="fgene-07-00034-g0005.tif"/>
</fig>
<p>The remaining 309 (&#x0007E;56%) markers, significant under FBAT and not under CIFBAT, indicates that these FBAT results might have been affected by missing data. In such cases, CIFBAT is useful in identifying potential false positives and refining the feature set for further downstream analyses. Example of one such marker is chr15:29443416 (GRCh37; rs7171494) which is an intronic variant in gene <italic>FAM189A1</italic>. Figure <xref ref-type="fig" rid="F4">4C</xref> compares the FBAT statistic and the CIFBAT QI for this marker, and Figure <xref ref-type="supplementary-material" rid="SM11">S10</xref> illustrates the underlying distribution of complete and incomplete trio types between case and control trios. A complete list of all the markers that were significant under FBAT, but could not be validated by CIFBAT can be found in Table <xref ref-type="supplementary-material" rid="SM14">S14</xref>.</p>
<p>Additionally, there were 38 markers that were exclusively significant under CIFBAT (Table <xref ref-type="supplementary-material" rid="SM15">S15</xref>). While it is not possible to make conclusive remarks about this set of markers without further independent validation, these could either be categorized as markers where CIFBAT had improved statistical power due to increased sample size or as false positive results. An example marker from this set is chr11:55861880 (rs117149792) which is an upstream gene variant for gene <italic>OR8I2</italic>. Figure <xref ref-type="fig" rid="F4">4D</xref> compares the FBAT statistic and the CIFBAT QI for this marker, and Figure <xref ref-type="supplementary-material" rid="SM12">S11</xref> illustrates the underlying distribution of complete and incomplete trio types between case and control trios.</p>
<p>Overall, CIFBAT can be a useful test to evaluate robustness of FBAT statistic against missing data. Markers that were significant under FBAT and further validated by CIFBAT after inclusion of incomplete families provide a refined set of candidate markers for downstream analysis. In contrast to refining the set of candidate markers by &#x0007E;50%, CIFBAT exclusively identified only a small fraction (6.8%) of significant markers collectively identified (589) by FBAT and CIFBAT. These can be potentially interesting markers that warrant further investigation. This feature of CIFBAT is particularly useful in genome wide association studies where the initial set of markers to be evaluated for candidacy is invariably very large. CIFBAT can be used in these studies to refine the set of markers for downstream analyses. Moreover, often the underlying genetic model or the pattern of missing data is unknown; in such cases, CIFBAT can be a great advantage in evaluating the effects of missing data without any assumptions about the underlying model.</p>
</sec>
<sec>
<title>Type 1 diabetes data</title>
<p>Table <xref ref-type="table" rid="T5">5</xref> lists results from the analysis of Type 1 Diabetes data for 351 markers within 22 candidate genes. The <italic>p</italic>-values from FBAT were corrected for multiple testing using Benjamini-Hochberg method (Benjamini and Hochberg, <xref ref-type="bibr" rid="B1">1995</xref>). A cut-off of 10% false discovery rate (FDR) was used to define significance. For FBAT, the <italic>p</italic>-value cut-off at 10% FDR was 1.10e-02, which was also used to indicate significant QIs from CIFBAT.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption><p><bold>Detailed results from analysis of candidate markers for Type I Diabetes</bold>.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left"><bold>Marker</bold></th>
<th valign="top" align="center"><bold>SNP ID</bold></th>
<th valign="top" align="center"><bold>FBAT <italic>p</italic>-value</bold></th>
<th valign="top" align="left"><bold>CIFBAT 95% Quantile interval (<italic>p</italic>-value)</bold></th>
<th valign="top" align="center"><bold>Missing data (%)</bold></th>
<th valign="top" align="center"><bold>MAF (%)</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" colspan="6" style="background-color:#bbbdc0"><bold>INS</bold></td>
</tr>
<tr>
<td valign="top" align="left">11:2137971</td>
<td valign="top" align="center">rs3842748</td>
<td valign="top" align="center">4.44E-16</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, &#x0003C; 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">22.95</td>
<td valign="top" align="center">10.95</td>
</tr>
<tr>
<td valign="top" align="left">11:2130023</td>
<td valign="top" align="center">rs7924316</td>
<td valign="top" align="center">2.32E-07</td>
<td valign="top" align="left">(2.18E-09, 9.33E-15)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.25</td>
<td valign="top" align="center">49.12</td>
</tr>
<tr>
<td valign="top" align="left">11:2157914</td>
<td valign="top" align="center">rs11564709</td>
<td valign="top" align="center">2.21E-07</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, &#x0003C; 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.04</td>
<td valign="top" align="center">5.37</td>
</tr>
<tr>
<td valign="top" align="left">11:2147527</td>
<td valign="top" align="center">rs6356</td>
<td valign="top" align="center">9.00E-07</td>
<td valign="top" align="left">(3.14E-06, 7.01E-03)</td>
<td valign="top" align="center">13.27</td>
<td valign="top" align="center">46.16</td>
</tr>
<tr>
<td valign="top" align="left">11:2151386</td>
<td valign="top" align="center">rs7119275</td>
<td valign="top" align="center">1.32E-06</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 7.99E-15)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.42</td>
<td valign="top" align="center">25.31</td>
</tr>
<tr>
<td valign="top" align="left">11:2126719</td>
<td valign="top" align="center">rs1004446</td>
<td valign="top" align="center">4.11E-06</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 4.44E-15)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.24</td>
<td valign="top" align="center">43.86</td>
</tr>
<tr>
<td valign="top" align="left">11:2124119</td>
<td valign="top" align="center">rs1003483</td>
<td valign="top" align="center">4.65E-06</td>
<td valign="top" align="left">(1.51E-04, 2.29E-08)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.44</td>
<td valign="top" align="center">43.17</td>
</tr>
<tr>
<td valign="top" align="left">11:2152413</td>
<td valign="top" align="center">rs10840495</td>
<td valign="top" align="center">6.01E-06</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 4.93E-14)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.08</td>
<td valign="top" align="center">25.46</td>
</tr>
<tr>
<td valign="top" align="left">11:2119686</td>
<td valign="top" align="center">rs4244808</td>
<td valign="top" align="center">7.62E-06</td>
<td valign="top" align="left">(1.36E-07, 8.90E-04)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">15.69</td>
<td valign="top" align="center">41.53</td>
</tr>
<tr>
<td valign="top" align="left">11:2156905</td>
<td valign="top" align="center">rs11564710</td>
<td valign="top" align="center">2.81E-04</td>
<td valign="top" align="left">(1.22E-12, 8.48E-08)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.09</td>
<td valign="top" align="center">30.37</td>
</tr>
<tr>
<td valign="top" align="left">11:2154012</td>
<td valign="top" align="center">rs4929966</td>
<td valign="top" align="center">6.55E-04</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, &#x0003C; 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.44</td>
<td valign="top" align="center">17.59</td>
</tr>
<tr>
<td valign="top" align="left">11:2150966</td>
<td valign="top" align="center">rs10840491</td>
<td valign="top" align="center">2.85E-03</td>
<td valign="top" align="left">(1.11E-14, 1.26E-08)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.13</td>
<td valign="top" align="center">12.26</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6" style="background-color:#bbbdc0"><bold>PTPN22</bold></td>
</tr>
<tr>
<td valign="top" align="left">1:114089610</td>
<td valign="top" align="center">rs2476601</td>
<td valign="top" align="center">2.40E-14</td>
<td valign="top" align="left">(0.65, 1.53E-02)</td>
<td valign="top" align="center">13.13</td>
<td valign="top" align="center">12.17</td>
</tr>
<tr>
<td valign="top" align="left">1:114141503</td>
<td valign="top" align="center">rs2358994</td>
<td valign="top" align="center">1.76E-08</td>
<td valign="top" align="left">(0.85, 4.48E-02)</td>
<td valign="top" align="center">13.09</td>
<td valign="top" align="center">17.70</td>
</tr>
<tr>
<td valign="top" align="left">1:114127410</td>
<td valign="top" align="center">rs2488457</td>
<td valign="top" align="center">1.16E-07</td>
<td valign="top" align="left">(0.21, 0.48)</td>
<td valign="top" align="center">13.04</td>
<td valign="top" align="center">21.43</td>
</tr>
<tr>
<td valign="top" align="left">1:114132370</td>
<td valign="top" align="center">rs12566340</td>
<td valign="top" align="center">1.02E-07</td>
<td valign="top" align="left">(0.16, 0.66)</td>
<td valign="top" align="center">13.06</td>
<td valign="top" align="center">23.20</td>
</tr>
<tr>
<td valign="top" align="left">1:114132504</td>
<td valign="top" align="center">rs7529353</td>
<td valign="top" align="center">2.24E-07</td>
<td valign="top" align="left">(0.21, 0.56)</td>
<td valign="top" align="center">13.07</td>
<td valign="top" align="center">23.40</td>
</tr>
<tr>
<td valign="top" align="left">1:114086477</td>
<td valign="top" align="center">rs1217395</td>
<td valign="top" align="center">4.98E-07</td>
<td valign="top" align="left">(0.20, 0.33)</td>
<td valign="top" align="center">16.89</td>
<td valign="top" align="center">24.80</td>
</tr>
<tr>
<td valign="top" align="left">1:114138866</td>
<td valign="top" align="center">rs7524200</td>
<td valign="top" align="center">5.81E-07</td>
<td valign="top" align="left">(1.72E-04, 4.16E-02)</td>
<td valign="top" align="center">13.10</td>
<td valign="top" align="center">32.80</td>
</tr>
<tr>
<td valign="top" align="left">1:114078476</td>
<td valign="top" align="center">rs3789607</td>
<td valign="top" align="center">1.60E-05</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 4.44E-15)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.98</td>
<td valign="top" align="center">39.95</td>
</tr>
<tr>
<td valign="top" align="left">1:114142398</td>
<td valign="top" align="center">rs1539438</td>
<td valign="top" align="center">1.80E-05</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 5.60E-13)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.63</td>
<td valign="top" align="center">23.91</td>
</tr>
<tr>
<td valign="top" align="left">1:114129479</td>
<td valign="top" align="center">rs1235005</td>
<td valign="top" align="center">2.47E-05</td>
<td valign="top" align="left">(2.04E-02, 3.28E-05)</td>
<td valign="top" align="center">13.22</td>
<td valign="top" align="center">38.32</td>
</tr>
<tr>
<td valign="top" align="left">1:114131802</td>
<td valign="top" align="center">rs1217384</td>
<td valign="top" align="center">4.29E-05</td>
<td valign="top" align="left">(1.35E-14, &#x0003C; 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.07</td>
<td valign="top" align="center">22.29</td>
</tr>
<tr>
<td valign="top" align="left">1:114145701</td>
<td valign="top" align="center">rs1217394</td>
<td valign="top" align="center">4.55E-05</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 3.47E-13)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.07</td>
<td valign="top" align="center">24.08</td>
</tr>
<tr>
<td valign="top" align="left">1:114129885</td>
<td valign="top" align="center">rs6665194</td>
<td valign="top" align="center">6.89E-05</td>
<td valign="top" align="left">(2.69E-02, 3.57E-05)</td>
<td valign="top" align="center">13.49</td>
<td valign="top" align="center">38.31</td>
</tr>
<tr>
<td valign="top" align="left">1:114063748</td>
<td valign="top" align="center">rs6537798</td>
<td valign="top" align="center">7.51E-05</td>
<td valign="top" align="left">(1.35E-02, 2.39E-05)</td>
<td valign="top" align="center">13.28</td>
<td valign="top" align="center">39.43</td>
</tr>
<tr>
<td valign="top" align="left">1:114113273</td>
<td valign="top" align="center">rs1217418</td>
<td valign="top" align="center">9.21E-05</td>
<td valign="top" align="left">(1.58E-05, 1.68E-02)</td>
<td valign="top" align="center">13.09</td>
<td valign="top" align="center">39.50</td>
</tr>
<tr>
<td valign="top" align="left">1:114081776</td>
<td valign="top" align="center">rs2476600</td>
<td valign="top" align="center">1.09E-04</td>
<td valign="top" align="left">(1.61E-02, 1.94E-05)</td>
<td valign="top" align="center">13.09</td>
<td valign="top" align="center">39.40</td>
</tr>
<tr>
<td valign="top" align="left">1:114056125</td>
<td valign="top" align="center">rs1217379</td>
<td valign="top" align="center">1.73E-04</td>
<td valign="top" align="left">(1.83E-02, 3.20E-05)</td>
<td valign="top" align="center">13.69</td>
<td valign="top" align="center">39.37</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6" style="background-color:#bbbdc0"><bold>CTLA4</bold></td>
</tr>
<tr>
<td valign="top" align="left">2:204567056</td>
<td valign="top" align="center">rs231727</td>
<td valign="top" align="center">1.62E-03</td>
<td valign="top" align="left">(0.24, 0.48)</td>
<td valign="top" align="center">13.32</td>
<td valign="top" align="center">48.15</td>
</tr>
<tr>
<td valign="top" align="left">2:204566672</td>
<td valign="top" align="center">rs1427676</td>
<td valign="top" align="center">3.02E-03</td>
<td valign="top" align="left">(0.31, 0.37)</td>
<td valign="top" align="center">13.32</td>
<td valign="top" align="center">29.98</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6" style="background-color:#bbbdc0"><bold>IL4R, IL2RA, IL12B</bold></td>
</tr>
<tr>
<td valign="top" align="left">16:27281465</td>
<td valign="top" align="center">rs1805012</td>
<td valign="top" align="center">5.62E-07</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, &#x0003C; 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">14.57</td>
<td valign="top" align="center">0.37</td>
</tr>
<tr>
<td valign="top" align="left">10:6163501</td>
<td valign="top" align="center">rs12251307</td>
<td valign="top" align="center">8.15E-04</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, &#x0003C; 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.20</td>
<td valign="top" align="center">7.81</td>
</tr>
<tr>
<td valign="top" align="left">10:6139051</td>
<td valign="top" align="center">rs2104286</td>
<td valign="top" align="center">3.97E-03</td>
<td valign="top" align="left">(&#x0003C;2.22E-16, 2.22E-16)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.20</td>
<td valign="top" align="center">18.71</td>
</tr>
<tr>
<td valign="top" align="left">5:158700244</td>
<td valign="top" align="center">rs17056704</td>
<td valign="top" align="center">4.62E-03</td>
<td valign="top" align="left">(2.10E-07, 8.96E-13)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.32</td>
<td valign="top" align="center">23.70</td>
</tr>
<tr>
<td valign="top" align="left" colspan="6" style="background-color:#bbbdc0"><bold>IFIH1</bold></td>
</tr>
<tr>
<td valign="top" align="left">2:162949558</td>
<td valign="top" align="center">rs1990760</td>
<td valign="top" align="center">4.32E-03</td>
<td valign="top" align="left">(1.31E-06, 1.75E-11)<sup>&#x0002A;</sup></td>
<td valign="top" align="center">13.32</td>
<td valign="top" align="center">29.97</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>FBAT results were corrected for multiple testing using Benjamini-Hochberg method with a 10% false discovery rate cut-off. The corresponding p-value cut-off of 1.10e-02 was also used to indicate significant QI from CIFBAT. Out of the 36 markers significant under FBAT, 20, indicated by an <sup>&#x0002A;</sup> following the quantile interval, were validated under CIFBAT showing significant lower and upper bounds of the quantile interval</italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Twelve (12) markers in the insulin (<italic>INS</italic>) gene were identified as significant by FBAT based on complete trios. Of these, 11 markers also showed significant QIs under CIFBAT when incomplete trios were included in the test (example shown in Figure <xref ref-type="fig" rid="F6">6A</xref>). For <italic>rs6356</italic>, inclusion of incomplete trios caused the distribution of z-scores to overlap with the null distribution making the lower bound of the 95% CI insignificant as shown in Figure <xref ref-type="fig" rid="F6">6B</xref>.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p><bold>Analysis of candidate set of markers for Type I Diabetes. (A)</bold> Example of a marker in the INS gene that was significant under FBAT, and further validated by CIFBAT upon inclusion of incomplete trios in the test. <bold>(B)</bold> Example of a marker in the INS gene that could not be validated by CIFBAT. <bold>(C)</bold> Example of a marker in gene PTPN22 which was significant under FBAT, but could not be validated by CIFBAT.</p></caption>
<graphic xlink:href="fgene-07-00034-g0006.tif"/>
</fig>
<p>Seventeen (17) markers in the <italic>PTPN22</italic> gene were identified as significant by FBAT. Of these, 4 were further validated by CIFBAT after inclusion of incomplete trios, but the remaining 13 were not significant anymore (example shown in Figure <xref ref-type="fig" rid="F6">6C</xref>).</p>
<p>Two (2) markers in the CTLA4 gene were significant under FBAT, but inclusion of incomplete trios under CIFBAT produced insignificant QIs for both the markers.</p>
<p>Four (4) markers in the interleukin (IL4R, IL2RA, IL12B) family of genes, and 1 marker in the <italic>IFIH1</italic> gene were significant under FBAT and were further validated by CIFBAT after inclusion of incomplete trios in the test.</p>
<p>Overall, CIFBAT provided further validation of 20 (55.55%) out of 36 markers that were significant based on FBAT, indicating that these results were not biased by missing data. The remaining 16 markers were not significant anymore when incomplete trios were included under CIFBAT, suggesting that the FBAT test might have been biased due to missing data.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s4">
<title>Discussion</title>
<p>Missing data in genetic association studies pose several challenges. They result in loss of statistical power if samples with missing data are excluded from the study. On the other hand, if missing data is imputed without consideration for the underlying data model, it might lead to often undetectable biases in the results. Here, we have implemented CIFBAT to detect potential bias in FBAT due to the presence of missing data and to identify significant genomic markers with higher precision. Unlike likelihood based methods that use sufficient statistics to impute missing genotypes, CIFBAT considers all valid completions of an incomplete trio equally likely and computes QIs of the FBAT statistic over many randomized iterations. In doing so, CIFBAT does not assume a homogeneous population and retains robustness to population stratification which is a crucial feature of family based association tests.</p>
<p>Using simulated data, we have shown that CIFBAT is useful in validating the robustness of FBAT statistic against missing data and identifying candidate markers with higher precision. We have also demonstrated the applicability of CIFBAT in genome wide association studies, and its usefulness in refining sets of candidate markers for more targeted downstream analyses.</p>
<p>CIFBAT uses a memory efficient data format, making it apt for analyzing whole genome sequencing. We also extended TDT to handle dominant and recessive genetic models in addition to the default additive model, and extended robustTDT to handle the X chromosome in addition to autosomal chromosomes. We have provided a comprehensive software package called FamSuite (<ext-link ext-link-type="uri" xlink:href="https://github.com/IlyaLab/FamSuite">https://github.com/IlyaLab/FamSuite</ext-link>) that researchers can use to analyze their data and compare results from TDT, robustTDT, FBAT, and CIFBAT.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>VD conceived the study, wrote the software and conducted analysis of familial whole genome sequences, DG conducted simulation study, TK contributed significantly in method development and analysis guidance, RK contributed toward method development, BB provided critical reviews toward strengthening method and analysis, JV, JN, and IS provided supervision, interpretation of the results and contributed to the writing of the manuscript.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ack><p>Type 1 Diabetes data was provided by the Type 1 Diabetes Genetics Consortium, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID), National Human Genome Research Institute (NHGRI), National Institute of Child Health and Human Development (NICHD), and Juvenile Diabetes Research Foundation International (JDRF). Genotyping was performed by The Broad Institute Center for Genotyping and Analysis (Cambridge, MA, USA) which is supported by grant U54 RR020278 from the National Center for Research Resources. Familial whole genome sequence data for maternal uterine related anomalies was obtained from the Inova translation Medicine Institute&#x00027;s IRB approved study of preterm birth and its childhood longitudinal cohort study. These studies have been generously supported by the Inova Health System, Falls Church, VA 22042.</p>
</ack>
<sec sec-type="supplementary-material" id="s6">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://journal.frontiersin.org/article/10.3389/fgene.2016.00034">http://journal.frontiersin.org/article/10.3389/fgene.2016.00034</ext-link></p>
<supplementary-material xlink:href="DataSheet1.DOCX" id="SM1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Data Sheet 1</label>
<caption><p><bold>Simulation parameters, list of uterine anomalies, and whole genome sequence filters</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image1.TIF" id="SM2" mimetype="image/tif" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S1</label>
<caption><p><bold>Comprehensive list of informative complete trios. (A)</bold> Autosomal chromosomes, <bold>(B)</bold> X chromosome: trios with female offspring, <bold>(C)</bold> X chromosome: trios with male offspring.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image2.TIF" id="SM3" mimetype="image/tif" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S2</label>
<caption><p><bold>Comprehensive list of admissible incomplete trios. (A)</bold> Autosomal chromosomes, <bold>(B)</bold> X chromosome: trios with female offspring, <bold>(C)</bold> X chromosome: trios with male offspring.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image3.PDF" id="SM4" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S3</label>
<caption><p><bold>Changes in the sensitivity of detecting simulated causative markers, depending on the missing data rate and the level of penetrance in the phenotype model</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image4.PDF" id="SM5" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S4</label>
<caption><p><bold>Changes in the positive predictive value (PPV) when detecting simulated causative markers, depending on the missing data rate and the level of penetrance in the phenotype model</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image5.TIFF" id="SM6" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S5</label>
<caption><p><bold>F1 scores for detecting simulated causative markers by missingness scenario, at different missingness rates and FDR thresholds</bold>. The scenarios indicate where missing data was concentrated.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image6.TIFF" id="SM7" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S6</label>
<caption><p><bold>Precision (TP/TP&#x0002B;FP) for detecting simulated causative markers versus missing data rates</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image7.TIFF" id="SM8" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S7</label>
<caption><p><bold>Recall (TP/TP&#x0002B;FN) for detecting simulated causative markers versus missing data rates</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image8.TIFF" id="SM9" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S8</label>
<caption><p><bold>Precision (TP/TP&#x0002B;FP) for detecting simulated causative markers by missingness scenario, at different missingness rates and FDR thresholds</bold>. The scenarios indicate where missing data was concentrated.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image9.TIFF" id="SM10" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S9</label>
<caption><p><bold>Recall (TP/TP&#x0002B;FN) for detecting simulated causative markers by missingness scenario, at different missingness rates and FDR thresholds</bold>. The scenarios indicate where missing data was concentrated.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image10.tiff" id="SM11" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S10</label>
<caption><p><bold>Distribution of trio types within cases and controls for chr15:29443416. (A)</bold> Complete trio types&#x02014;Trio type numbers mentioned in the legend correspond to those in the Figure <xref ref-type="supplementary-material" rid="SM2">S1</xref>. <bold>(B)</bold> Incomplete trio types&#x02014;Trio type numbers mentioned in the legend correspond to those in the Figure S2. Only trio types that had non-zero counts are shown here.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image11.TIF" id="SM12" mimetype="image/tif" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S11</label>
<caption><p><bold>Distribution of trio types within cases and controls for chr11:55861880. (A)</bold> Complete trio types&#x02014;Trio type numbers mentioned in the legend correspond to those in the Figure S1. <bold>(B)</bold> Incomplete trio types&#x02014;Trio type numbers mentioned in the legend correspond to those in the Figure S2. Only trio types that had non-zero counts are shown here.</p></caption></supplementary-material>
<supplementary-material xlink:href="Image12.TIFF" id="SM13" mimetype="image/tiff" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Figure S12</label>
<caption><p><bold>Z score distributions after randomizing phenotypes</bold>. The density plot <bold>(A)</bold> shows the distribution of Z scores produced when the null (no association) is true. The density plots in <bold>(B)</bold> show the effect missing data rates have on quantile interval widths (wider with more missing data).</p></caption></supplementary-material>
<supplementary-material xlink:href="Table1.XLSX" id="SM14" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S1</label>
<caption><p><bold>Informative trios for autosomal chromosomes and corresponding FBAT statistics for dominant and recessive models</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table2.XLSX" id="SM15" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S2</label>
<caption><p><bold>Informative trios with female offspring and corresponding FBAT statistics for the X chromosome under dominant and recessive models</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table3.xlsx" id="SM16" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S3</label>
<caption><p><bold>Admissible incomplete trio types for autosomal chromosomes and the corresponding FBAT statistics X &#x02212; E[X] and Vartiance(X) for all possible completions of the incomplete trios</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table4.xlsx" id="SM17" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S4</label>
<caption><p><bold>Admissible incomplete trio types with male offspring and the corresponding FBAT statistics X &#x02212; E[X] and Vartiance(X) for all possible completions of the incomplete trios for the X chromosome</bold>. Since the father and offspring are haploid and the mother is always heterozygous for these trio types to be informative, all the three genetic models have equivalent risk factors.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table5.XLSX" id="SM18" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S5</label>
<caption><p><bold>Admissible incomplete trio types with female offspring and the corresponding FBAT statistics X &#x02212; E[X] and Variance (X) for all possible completions of the incomplete trios for the X chromosome</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table6.XLSX" id="SM19" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S6</label>
<caption><p><bold>Autosomal chromosomes: Transmission counts for alleles <italic>b</italic> and <italic>c</italic> for informative trios for additive, dominant, and recessive models</bold>. Trio types 1 and 2 are not informative under recessive model, and trio types 6 and 7 are not informative under dominant model.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table7.XLSX" id="SM20" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S7</label>
<caption><p><bold>X chromosome: Transmission counts for alleles <italic>b</italic> and <italic>c</italic> for informative trios with male offspring</bold>. Haploid fathers and offspring as well as heterozygous mothers have equivalent risk factors under additive, dominant, and recessive models, hence transmission counts for alleles are also equivalent for the three models.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table8.XLSX" id="SM21" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S8</label>
<caption><p><bold>X chromosome: Transmission counts for alleles <italic>b</italic> and <italic>c</italic> for informative trios with female offspring</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table9.XLSX" id="SM22" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S9</label>
<caption><p><bold>Admissible incomplete trios for autosomal chromosomes and transmission counts of the two alleles <italic>b</italic> and <italic>c</italic> for all possible completions of the incomplete trios under additive, dominant, and recessive models</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table10.XLSX" id="SM23" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S10</label>
<caption><p><bold>Admissible incomplete trios with male offspring for the X chromosome and transmission counts of the two alleles <italic>b</italic> and <italic>c</italic> for all possible completions of the incomplete trios</bold>. All the three genetic models have equivalent risk factors because the father and offspring are haploid and the mother is heterozygous for all informative trios.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table11.XLSX" id="SM24" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink">
<label>Table S11</label>
<caption><p><bold>Admissible incomplete trios with female offspring for the X chromosome and transmission counts of the two alleles <italic>b</italic> and <italic>c</italic> for all possible completions of the incomplete trios</bold>.</p></caption></supplementary-material>
<supplementary-material xlink:href="Table12.XLSX" id="SM25" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table13.XLSX" id="SM26" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table14.XLSX" id="SM27" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table15.XLSX" id="SM28" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benjamini</surname> <given-names>Y.</given-names></name> <name><surname>Hochberg</surname> <given-names>Y.</given-names></name></person-group> (<year>1995</year>). <article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>. <source>J. R. Stat. Soc. Ser. B</source> <volume>57</volume>, <fpage>289</fpage>&#x02013;<lpage>300</lpage>.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cobat</surname> <given-names>A.</given-names></name> <name><surname>Abel</surname> <given-names>L.</given-names></name> <name><surname>Alcais</surname> <given-names>A.</given-names></name> <name><surname>Schurr</surname> <given-names>E.</given-names></name></person-group> (<year>2014</year>). <article-title>A general efficient and flexible approach for genome-wide association analyses of imputed genotypes in family-based designs</article-title>. <source>Genet. Epidemiol.</source> <volume>38</volume>, <fpage>560</fpage>&#x02013;<lpage>571</lpage>. <pub-id pub-id-type="doi">10.1002/gepi.21842</pub-id><pub-id pub-id-type="pmid">25044438</pub-id></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Croiseau</surname> <given-names>P.</given-names></name> <name><surname>G&#x000E9;nin</surname> <given-names>E.</given-names></name> <name><surname>Cordell</surname> <given-names>H. J.</given-names></name></person-group> (<year>2007</year>). <article-title>Dealing with missing data in family-based association studies: A multiple imputation approach</article-title>. <source>Hum. Hered</source> <volume>63</volume>, <fpage>229</fpage>&#x02013;<lpage>238</lpage>. <pub-id pub-id-type="doi">10.1159/000100481</pub-id><pub-id pub-id-type="pmid">17347570</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laird</surname> <given-names>N.</given-names></name> <name><surname>Horvath</surname> <given-names>S.</given-names></name> <name><surname>Xu</surname> <given-names>X.</given-names></name></person-group> (<year>2000</year>). <article-title>Implementing a unified approach to family based tests of association</article-title>. <source>Genet. Epidemiol.</source> <volume>19</volume>(<supplement>Suppl. 1</supplement>), <fpage>S36</fpage>&#x02013;<lpage>S42</lpage>. <pub-id pub-id-type="doi">10.1002/1098-2272(2000)19:1&#x0002B;&#x0003C;::AID-GEPI6&#x0003E;3.0.CO;2-M</pub-id><pub-id pub-id-type="pmid">11055368</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lange</surname> <given-names>C.</given-names></name> <name><surname>Laird</surname> <given-names>N. M.</given-names></name></person-group> (<year>2002</year>). <article-title>Power calculations for a general class of family-based association tests: dichotomous traits</article-title>. <source>Am. J. Hum. Genet</source>. <volume>71</volume>, <fpage>575</fpage>&#x02013;<lpage>584</lpage>. <pub-id pub-id-type="doi">10.1086/342406</pub-id><pub-id pub-id-type="pmid">12181775</pub-id></citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ott</surname> <given-names>J.</given-names></name> <name><surname>Kamatani</surname> <given-names>Y.</given-names></name> <name><surname>Lathrop</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <article-title>Family-based designs for genome-wide association studies</article-title>. <source>Nat. Rev. Genetics</source> <volume>12</volume>, <fpage>465</fpage>&#x02013;<lpage>474</lpage>. <pub-id pub-id-type="doi">10.1038/nrg2989</pub-id><pub-id pub-id-type="pmid">21629274</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Purcell</surname> <given-names>S.</given-names></name> <name><surname>Neale</surname> <given-names>B.</given-names></name> <name><surname>Todd-Brown</surname> <given-names>K.</given-names></name> <name><surname>Thomas</surname> <given-names>L.</given-names></name> <name><surname>Ferreira</surname> <given-names>M. A. R.</given-names></name> <name><surname>Bender</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2007</year>). <article-title>PLINK: a toolset for whole-genome association and population-based linkage analysis</article-title>. <source>Am. J. Hum. Genet.</source> <volume>81</volume>, <fpage>559</fpage>&#x02013;<lpage>575</lpage>. <pub-id pub-id-type="doi">10.1086/519795</pub-id><pub-id pub-id-type="pmid">17701901</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sebastiani</surname> <given-names>P.</given-names></name> <name><surname>Abad-Grau</surname> <given-names>M. M.</given-names></name> <name><surname>Alpargu</surname> <given-names>G.</given-names></name> <name><surname>Ramoni</surname> <given-names>M.</given-names></name></person-group> (<year>2004</year>). <article-title>Robust transmission/disequilibrium test for incomplete family genotypes</article-title>. <source>Genetics</source> <volume>168</volume>, <fpage>2329</fpage>&#x02013;<lpage>2337</lpage>. <pub-id pub-id-type="doi">10.1534/genetics.103.025841</pub-id><pub-id pub-id-type="pmid">15611196</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spielman</surname> <given-names>R. S.</given-names></name> <name><surname>McGinnis</surname> <given-names>R. E.</given-names></name> <name><surname>Ewens</surname> <given-names>W. J.</given-names></name></person-group> (<year>1993</year>). <article-title>Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM)</article-title>. <source>Am. J. Hum. Genet.</source> <volume>52</volume>, <fpage>506</fpage>&#x02013;<lpage>516</lpage>. <pub-id pub-id-type="pmid">8447318</pub-id></citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Steen</surname> <given-names>K.</given-names></name> <name><surname>Laird</surname> <given-names>N. M.</given-names></name> <name><surname>Markel</surname> <given-names>P.</given-names></name> <name><surname>Molenberghs</surname> <given-names>G.</given-names></name></person-group> (<year>2006</year>). <article-title>Approaches to handling incomplete data in family-based association testing</article-title>. <source>Ann. Hum. Genet.</source> <volume>71</volume>, <fpage>141</fpage>&#x02013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-1809.2006.00325.x</pub-id><pub-id pub-id-type="pmid">17096676</pub-id></citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>Q.</given-names></name> <name><surname>Xu</surname> <given-names>X.</given-names></name> <name><surname>Laird</surname> <given-names>N.</given-names></name></person-group> (<year>2003</year>). <article-title>Power evaluations for family-based tests of association with incomplete parental genotypes</article-title>. <source>Genetics</source> <volume>164</volume>, <fpage>399</fpage>&#x02013;<lpage>406</lpage>. <pub-id pub-id-type="pmid">12750350</pub-id></citation>
</ref>
</ref-list>
</back>
</article>