Linkage disequilibrium measures for fine-scale mapping of disease loci are revisited

Zapata, Carlos

doi:10.3389/fgene.2013.00228

GENERAL COMMENTARY article

Front. Genet., 05 November 2013

Sec. Statistical Genetics and Methodology

Volume 4 - 2013 | https://doi.org/10.3389/fgene.2013.00228

This article is part of the Research TopicAssessing the Effects of Multiple Markers in Genetic Association StudiesView all 19 articles

Linkage disequilibrium measures for fine-scale mapping of disease loci are revisited

Carlos Zapata^*

Department of Genetics, University of Santiago de Compostela, Santiago de Compostela, Spain

A commentary on
A comparison of linkage disequilibrium measures for fine-scale mapping

by Devlin, B., and Risch, N. (1995). Genomics 29, 311–322. doi: 10.1006/geno.1995.9003

Genome-wide association (GWA) studies for fine-scale mapping of disease loci are conceptually based on the occurrence and history of non-random association of alleles at different loci in populations (or linkage disequilibrium). Thus, a new disease mutation arises on a single ancestral haplotype of a given population in complete linkage disequilibrium with syntenic polymorphic markers. Owing to historical recombination events, markers close to the disease locus will tend to be in strongest disequilibrium than distant markers when the presence of evolutionary mechanisms generating non-uniform disequilibrium patterns is ignored. Therefore, the use of optimal measures to assess gradients of the strength of disequilibrium along chromosomes is crucial for fine mapping of disease loci. In practice, an ideal disequilibrium measure for fine-scale mapping should be a monotone function of the recombination fraction between the marker and the disease locus.

In an influential paper, Devlin and Risch (1995, henceforth referred to as “D&R”) compared the efficiency of five measures of linkage disequilibrium used to refine the location of disease loci: the robust formulation of the population attributable risk δ (Lewin and Bertell, (1978), Lewontin's D' (Lewontin, 1964), the correlation coefficient Δ (Hill and Robertson, 1968; also described in the literature as r), Yule's Q (Nei and Li, 1980), and Kaplan and Weir's d (Kaplan and Weir, 1992). It was concluded that the measures δ and D' outperform all other measures for fine-scale mapping. In particular, D&R established analytically that δ is the ideal measure for fine mapping because changes of δ over generations are directly related to the recombination fraction (θ) between the disease and the marker locus. In contrast, they found that D' not only depends on θ, but also on the haplotype frequencies. This supposed advantage of δ over D' for fine-scale mapping of disease loci has become a classical paradigm in the literature on disequilibrium. Nevertheless, D&R's conclusion about the relative merits of δ and D' for fine-scale mapping based on analytic results is erroneous. As is shown below, the source of the error lies in a misinterpretation of the formulas used by D&R to assess the relationship between the decay of disequilibrium by recombination and the measures δ and D'. Despite the time elapsed, this misconception remains uncorrected.

Following D&R, let us consider that there are two alleles at each of two loci: a disease allele and a normal allele segregate at the first locus, and two marker alleles segregate at the other locus. The layout and notation for population haplotype, marker allele and disease allele frequencies are given in Table 1. The measures δ and D' are defined as δ = D/π₊₁π₂₂ and D' = D/D_max, respectively; where D = π₁₁π₂₂ − π₁₂π₂₁ and D_max is the lesser of π₁₊π₊₂ and π₊₁π₂₊ when D > 0, or the lesser of π₁₊π₊₁ and π₂₊π₊₂ when D < 0. Under a deterministic population model, D&R assumed initial complete linkage disequilibrium between disease and marker loci; no change of disease allele and marker allele frequencies over time; the disease allele arose in a haplotype carrying the allele A₁ at the marker locus, so that π₂₁ = 0, π₁₁ = π₊₁, π₂₂ = π₂₊ and D > 0 at generation 0; and θ between the disease and the marker locus is constant along generations. After n generations, the decay of the initial disequilibrium (D₀) can be obtained as a function of θ by means of the expression D_n = (1 − θ)ⁿD₀, where D_n is the value of disequilibrium in the nth generation (Hedrick, 2005). Expressing the measures δ and D' in terms of (1 − θ)ⁿ, D&R obtained then the following mathematical relationship:

\begin{matrix} \begin{array}{l} {(1 - θ)}^{n} = D_{n} / D_{0} \\ = ⌊ π_{11} π_{22} - π_{12} π_{21} ⌋ / \\ π_{+ 1} π_{22} = δ \end{array} & (1) \end{matrix}

where π₊₁π₂₂ was considered to be the best estimate of D₀, the initial amount of linkage disequilibrium, given that π₂₁ = 0 at generation 0 and hence π₁₁ = π₊₁. In contrast:

\begin{matrix} {(1 - θ)}^{n} = D^{'} [1 + (π_{21} / π_{22})] & (2) \end{matrix}

D&R concluded, therefore, that the measure δ is a function of θ only, whereas the relationship between D' and θ depends on haplotype frequencies. Nevertheless, the formulas (1) and (2) are conceptually erroneous. Any measure of the strength of disequilibrium is computed from haplotype frequencies at a given generation. Note, however, that the numerator and the denominator in the formula (1) refer to haplotype frequencies at generations n and 0, respectively. As initial complete linkage disequilibrium between disease locus and marker locus decays from D > 0 to D = 0, coupling (π₁₁ and π₂₂) and repulsion (π₁₂ and π₂₁) haplotype frequencies decrease and increase, respectively, by a proportion θ each generation. Accordingly, haplotype frequencies at generations 0 and n are distinct and they should be denoted differently. Distinguishing haplotype frequencies at generations 0 and n by π_ij(0) and π_ij(n), respectively, where i, j = {1, 2}; the formula (1) can be then rewritten as follows:

\begin{array}{l} {(1 - θ)}^{n} = D_{n} / D_{0} \\ = ⌊ π_{11} (n) π_{22} (n) - π_{12} (n) π_{21} (n) ⌋ / \\ π_{+ 1} π_{22} (0) = δ [π_{22} (n) / π_{22} (0)] \end{array}

Therefore, the relationship between δ and θ depends on the frequencies of the haplotype π₂₂ at generations n and 0. In terms of D', the formula (1) can be rewritten as:

\begin{array}{l} {(1 - θ)}^{n} = D_{n} / D_{0} \\ = D_{n} / π_{+ 1} π_{22} (0) \\ = D_{n} / π_{+ 1} π_{2 +} \\ = D_{n} / D_{max} = {D^{'}}_{n} \end{array}

given that π₂₂(0) = π₂₊ and D_max = π₊₁π₂₊ when D > 0 and π₊₁π₂₊ < π₁₊π₊₂. It is shown, therefore, that D' is directly related to the recombination fraction. Likewise, the formula (2) can be rewritten as:

{(1 - θ)}^{n} = {D^{'}}_{n} {1 + [π_{21} (0) / π_{22} (0)]} = {D^{'}}_{n}

given that π₂₁(0) = 0. D&R studied the relationship between D' and θ when π₁₊ < π₊₁. Note, however, that π₁₊ = [π₁₁(0) + π₁₂(0)] cannot be lower than π₊₁ = [π₁₁(0) + π₂₁(0)] = π₁₁ (0).

TABLE 1

Table 1. Layout and notation for population haplotypes, marker alleles, and disease allele frequencies in a 2 × 2 table.

Our reanalysis on the relationship of the recombination fraction with the measures of disequilibrium δ and D' thus, contradicts the contention by D&R that δ outperforms D' for fine-scale mapping because changes of δ over generations are directly related to the recombination fraction. In fact, we reached the opposite conclusion to that of D&R. This finding reinforces the view that D' exhibits better statistical properties as a general measure of linkage desequilibrium than other commonly used measures (Zapata, 2011). In particular, D' seems to be an optimal measure for mapping of marker association and localization of disease loci sensu D&R.

References

Devlin, B., and Risch, N. (1995). A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322. doi: 10.1006/geno.1995.9003

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hedrick, P. W. (2005). Genetics of Populations. Boston: Jones and Bartlett Publishers.

Hill, W. G., and Robertson, A. (1968). Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38, 226–231. doi: 10.1007/BF01245622

CrossRef Full Text

Kaplan, N., and Weir, B. S. (1992). Expected behavior of conditional linkage disequilibrium. Am. J. Hum. Genet. 51, 333–343.

Pubmed Abstract | Pubmed Full Text

Lewin, M. L., and Bertell, R. (1978). Re: “Simple estimation of population attributable risk from case-control studies.” Am. J. Epidemiol. 108, 78–79.

Pubmed Abstract | Pubmed Full Text

Lewontin, R. C. (1964). The interaction of selection and linkage. I. General considerations: heterotic models. Genetics 49, 49–67.

Pubmed Abstract | Pubmed Full Text

Nei, M., and Li, W.-H. (1980). Non-random associations between electromorphs and inversion chromosomes in finite populations. Genet. Res. 35, 65–83. doi: 10.1017/S001667230001394X

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Zapata, C. (2011). On the uses and applications of the most commonly used measures of linkage disequilibrium from the comparative analysis of their statistical properties. Hum. Hered. 71, 186–195. doi: 10.1159/000327732

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Keywords: genome-wide association studies, linkage disequilibrium and genetic recombination, linkage disequilibrium between diallelic loci, new disease mutations, 2 × 2 contingency tables

Citation: Zapata C (2013) Linkage disequilibrium measures for fine-scale mapping of disease loci are revisited. Front. Genet. 4:228. doi: 10.3389/fgene.2013.00228

Received: 30 July 2013; Accepted: 17 October 2013;
Published online: 05 November 2013.

Edited by:

Xuefeng Wang, Harvard University, USA

Reviewed by:

Alexandre Bureau, Université Laval, Canada

Copyright © 2013 Zapata. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence:Yy56YXBhdGFAdXNjLmVz

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.