|
|
||||||||


* Department of Dairy and Animal Science, The Pennsylvania State University, University Park 16802
Department of Animal Science, University of Tennessee, Knoxville 37996
Bovine Functional Genomics Laboratory, Agricultural Research Service, US Department of Agriculture, Beltsville, MD 20705
Corresponding author: R. L. Vallejo; e-mail: rvallejo{at}psu.edu.
| ABSTRACT |
|---|
|
|
|---|
Key Words: background linkage disequilibrium genetic diversity Holstein linkage disequilibrium
Abbreviation key: BLD = background linkage disequilibrium, DLD = disease trait associated linkage disequilibrium, D' = Lewontins normalized pair-wise disequilibria, GE = genetic equilibrium, GHR = growth hormone receptor, HFD = haplotype frequency distribution, HWE = Hardy-Weinberg equilibrium, LD = linkage disequilibrium
| INTRODUCTION |
|---|
|
|
|---|
During the past two decades, linkage analysis has been successful in localizing genes for Mendelian diseases and traits in human and livestock populations. Linkage disequilibrium (LD) analysis has often complemented the final phases of gene localization. These successes have fueled hopes that similar approaches will be effective in mapping genes for complex traits. Encouraged by the success of LD mapping of Mendelian disorders in isolated populations (de la Chapelle and Wright, 1998), many investigators are currently using these genetic isolates in the search for loci underlying complex diseases (Sheffield et al., 1998; Wright et al., 1999; Peltonen, 2000). Similarly, in dairy cattle, it is encouraging to see successful efforts towards the positional cloning of QTL affecting milk yield and composition using linkage and LD approaches (Grisart et al., 2002; Blott et al., 2003).
The identification of a large number of densely spaced microsatellite markers has led to empirical investigations into the distribution of LD in the human (Laan and Paabo, 1997; Service et al., 2001; Devlin et al., 2001) and bovine genomes (Farnir et al., 2000). Quantifying the degree of such "background" LD (BLD; i.e., marker-marker loci LD) is a crucial undertaking in paving the way for whole genome association studies. To demonstrate that LD between a disease trait and marker loci is meaningful, the likelihood of simply detecting BLD should be evaluated (Freimer et al., 1997).
For genome-wide association screens to be successful, the LD signal due to the association with a shared disease allele must stand out from the BLD signal. Theoretical studies have suggested that such BLD is highly dependent on the history of a population (Slatkin, 1994), with rapidly growing populations showing less BLD than populations of constant size. In samples of affected individuals sharing a phenotype, and possibly sharing a susceptibility allele at the same disease locus, the amount of LD around the shared disease locus should be greater in a younger population. It has been suggested that if such a young population has also undergone rapid growth, it would be ideal for LD mapping of disease loci (Freimer et al., 1997). In this latter case, BLD should be less extensive in the population.
The North American Holstein population can be considered a relatively young population of constant effective size, which should be suitable for mapping chromosomal regions that underlie complex diseases and traits (i.e., QTL mapping). It was reported that LD extends over 10-cM of genetic map distance in the Dutch Friesian Holstein cattle and that most of this LD is due to random genetic drift (Farnir et al., 2000). This suggests that high levels of BLD must be the rule in dairy cattle populations and that the fine-localization of genetic factors for complex diseases and traits will not be trivial. Given this likely scenario, it is important to determine whether any proportion of observed LD resembles a pattern of haplotype frequency distribution (HFD) likely produced by disease trait associated LD (DLD; i.e., marker-disease loci LD).
To date, no study on the genetic diversity and BLD distribution in contemporary North American Holstein cattle using a population-based sample has been reported. Genetic diversity in French (Maudet et al., 2002) and North-East Asian (Kim et al., 2002) cattle breeds has been reported using a limited number of microsatellite markers. The report on genome LD by Farnir et al. (2000) may not reflect true population estimates because they used pedigree-based samples (i.e., granddaughter designs) of Dutch Holstein Friesian cattle. Furthermore, although the Dutch Holstein population has some North American Holstein influence, it does not represent the North American Holstein population. Migration generates BLD so the US and the Dutch Holstein populations may have different levels of BLD because of the rapid use and migration of the US Holstein families into the Dutch population in the 1980s and 1990s.
The specific objectives of this study were to 1) identify highly heterozygous elite Holstein bulls that are as unrelated as possible to use as parental sires to develop informative families for mapping genes for complex diseases; 2) quantify the level of genetic diversity in the US Holstein cattle; and 3) determine the extent of BLD and DLD in the contemporary North American Holstein cattle population.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
Measuring Linkage Disequilibrium
The extent of LD between syntenic marker pairs and gametic phase disequilibrium between nonsyntenic marker pairs were determined using the computer program Arlequin version 2.0 (Schneider et al., 2000). Exact LD P-values for the observed allelic association under the null hypothesis of random allelic assortment were estimated by Monte Carlo approximation (10,000 simulations) using the computer program Arlequin version 2.0 (Schneider et al., 2000).
Background Linkage Disequilibrium and Disease Trait-Associated LD
The factors that affect LD (e.g., genetic drift, mutation, linkage) lead to different expectations of haplotype distributions; thus, BLD and DLD are expected to produce different patterns of HFD (Freimer et al., 1997). A pattern typical of BLD is presented in Figure 1a
. In this example, the marker UWCA20 has four alleles, and the marker HUJII77 has six alleles. The two haplotypes deviating most from linkage equilibrium involve different alleles, namely haplotypes 3-3 and 2-1. In contrast, Figure 1b
represents a pattern typical of DLD. In this example, the marker INRA048 has six alleles, and the marker BM719 has four alleles. In this case, the common ancestral haplotype (or founder chromosome) was 4-3 because of an over representation of haplotypes with allele 4 (frequency 0.591) from INRA048 or allele 3 (frequency 0.614) from BM719. The exceptionally high frequency of haplotypes (and alleles) in family- and population-based samples allows identification of a hypothetical common ancestral haplotype (Bitti et al., 2001; Gaspar et al., 2001; Shinar et al., 2002).
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
For the microsatellite loci typed in this study, the average observed heterozygosity in the elite Holstein sample was 0.61, which was close to the average heterozygosity reported for the MARC reference families (Table 2
). For the evaluated microsatellite loci, there was a positive relationship (r = 0.43) between levels of heterozygosity reported in the MARC reference families and the heterozygosity observed in the sample of elite Holstein bulls. The average heterozygosity observed in the elite Holstein sample was also close to the average heterozygosity of 0.56 reported for microsatellite loci typed in bovine genome-wide scans (Georges et al., 1995). The average number of alleles observed in the MARC reference families was larger than those observed in the sample of elite Holstein bulls (Table 2
). The smaller number of alleles per microsatellite locus observed in the elite Holstein sample might be due to the small sample size used in this study and also because the MARC reference families were developed using a four-way cross of different cattle breeds (Bishop et al., 1994). In the sample of Holstein bulls, in agreement with population genetic theory, a positive relationship between degree of heterozygosity and number of alleles per microsatellite locus was observed (Figure 2
). The heterozygosity increased 0.05 units for each unit increase in number of alleles per microsatellite locus (by.x = 0.05).
|
Hardy-Weinberg and Genetic Equilibrium
The notions of HWE and linkage equilibrium are central in population genetics theory. In contrast to HWE, linkage equilibrium may be reached very slowly even under ideal conditions. Hardy-Weinberg equilibrium and linkage equilibrium generally simplify the statistical analysis and are assumed when performing genetic linkage analysis. However, it is important to know how valid these simplifying assumptions are by testing the marker loci for HWE and GE.
Fishers exact test for HWE for each microsatellite locus is presented in Table 2
. Eleven microsatellite loci (20%) showed significant deviation from HWE (P-value < 0.05). Fishers exact tests for GE between adjacent microsatellite loci were also estimated (Table 3
). Genetic equilibrium tests permit a combined testing for HWE and linkage equilibrium. Most of the marker pairs were in GE, and only five marker pairs (9%) showed significant deviation (P-value < 0.05) from GE. From the marker pairs that were not in GE, two of them had significant LD (BL1036-BMS2055 and INRA048-BM719; Table 4
). Few of the loci departing from HWE proportions had a high number of alleles (Table 2
) with low allelic frequency (data not shown). The departures from HWE and GE are expected in some degree because several key assumptions such as random mating, nonoverlapping generations, and infinite population size cannot be met in dairy cattle populations.
|
|
Linkage Disequilibrium
Exact LD P-values were estimated for all syntenic and nonsyntenic marker pairs, and a summary of these results is presented in Table 5
. The proportion of marker pairs with significant LD (LD P-value < 0.05) for syntenic and nonsyntenic marker pairs was 0.15 and 0.10, respectively. The hypothesis that these proportions are statistically similar could not be rejected. For syntenic marker pairs, significant LD P-values were observed for genetic distances greater than 10 cM (Figure 3
). As expected, the extent of LD tends to dissipate with genetic map distance illustrated by a positive relationship between LD P-values and map distances (rxy = 0.21).
|
|
The extent and distribution of LD in the bovine genome will affect the goals of testing for association and gene localization in different ways. It is simpler to test for association if LD extends over long distances around the disease mutation, because not as many markers are needed to scan for associations. However, at a later stage, when the goal is to infer gene location, long-ranging LD is potentially problematic. This means that strong associations may be observed far from the causative site(s), and these associations could lead to effort spent in the wrong genomic regions.
In the human genome, the distribution and extent of LD is quite variable and much smaller, respectively. In the bovine genome, significant LD extends over large distances (Farnir et al., 2000), and little empirical information on the distribution of LD in the bovine genome is known. In human populations, reports on LD are quite variable and extend from 5 kb to 4 Mb (Huttley et al., 1999; Pritchard and Przeworski, 2001; Service et al., 2001). As a consequence, the number of markers that will be needed to scan the human genome for association is very large. In contrast, fewer markers may be needed to perform genome association studies in the bovine genome. However, the fine-localization of these genes may be a difficult (if not impossible) task in dairy cattle populations. Recently, a successful positional cloning of a QTL was reported in dairy cattle (Grisart et al., 2002). This was possible because several ideal conditions were met: large gene effects, one single mutation in a gene (absence of allelic genetic heterogeneity), and an easily interpretable missense mutation (rather than a regulatory promoter mutation). Forthcoming QTL cloning experiments are likely to be more complicated, because these ideal conditions may not apply.
Background and Disease Trait Associated LD
To estimate the proportion of marker pairs that resembled a typical pattern of BLD and DLD, pair-wise HFD and D' were estimated for 27 marker pairs: seven syntenic marker pairs with significant LD (LD P-value < 0.05, Table 5
), and a sample of 20 nonsyntenic marker pairs that had the lowest LD P-value (sampled from 132 marker pairs with LD P-value < 0.05; Table 5
). To illustrate this analysis, the HFD of two marker pairs are shown in Figure 1
: a syntenic marker pair (INRA048-BM719) displaying a typical pattern of HFD produced by DLD (Figure 1b
); and a nonsyntenic marker pair (HUJII77-UWCA20) presenting a typical pattern of HFD due to BLD (Figure 1a
). The pattern of observed HFD for the 27 marker pairs is presented in Table 4
. Using D', most of these marker pairs exhibited strong LD, except one marker pair (INRA048-BM719). Approximately half of the syntenic markers pairs (57%) presented a typical pattern of DLD (Table 6
). As expected, few of the nonsyntenic marker pairs (5%) had a HFD that resembles those produced by DLD (Table 6
).
|
Previous reports and findings reported here indicate that LD extends over large distances in dairy cattle populations and that most is due to random genetic drift. Based on the current breeding structure of the dairy industry, it is reasonable to predict that dairy cattle populations will not expand quickly and that they will display high levels of BLD mostly due to genetic drift and migration in the foreseeable future. Given this likely scenario, the extent of LD observed in dairy cattle populations will be useful in mapping chromosomal regions containing genes affecting complex disease traits. The success in pinpointing the causal genes for a QTL effect will greatly depend on the study design, accuracy of phenotype measurement, size of gene effects, level of genetic heterogeneity, extent and distribution of BLD and DLD, and the use of refined statistical methods that account for BLD to minimize the rate of false positive findings.
In a few QTL cloning experiments, some ideal conditions will be met, and the LD mapping methods used will be successful in pinpointing the gene(s) and polymorphism(s) responsible for the effect (Grisart et al., 2002). However, in most QTL cloning experiments, these ideal conditions will not be met and cloning will be a complicated task. For example, Blott et al. (2003) report a nonsynonymous mutation (F279Y) in the growth hormone receptor (GHR) gene that contributes to the QTL effect on milk yield and composition. This mutation accounts for 3 to 5% of total trait variation that indicates that additional genes might contribute to the QTL effects observed on Bos taurus autosome 20. The fact that the maximum log of the odds score is distal to the GHR gene (42-cM far and outside of the 95% QTL CI) indicates either the mutation F279Y is not the causative mutation or other closely linked genes may be responsible for the observed effects on Bos taurus autosome 20. Thus, when LD extends over long regions and is mostly due to genetic drift and migration (i.e., BLD), refined statistical methods that account for BLD must be used or incorrect candidate genes (or genomic regions) may be identified and studied.
If one is applying haplotype analysis methods or searching for shared chromosomal segments, the high levels of BLD will increase the rate of false positives. Therefore, shared segment approaches are liberal due to the BLD, whether or not a disease trait allele exists in their vicinity. Furthermore, LD generated by genetic drift is not expected to present itself in the form of predominantly shared segments or haplotypes. As a result, such approaches to gene mapping are not very powerful when BLD is present since it will not take this form. In contrast, single marker analysis should benefit from the marker-marker correlations, and multiple two-point analysis is expected to be close to optimal for detecting this type of LD (Terwilliger et al., 1998).
The difference between a rapidly growing population and one that remains of constant size is that substantial LD between closely linked loci can be created by genetic drift alone in a population of constant size but not in one that has grown sufficiently rapidly (Slatkin, 1994). In relatively young populations of constant size, such as the bovine genome, genome-wide LD mapping will be feasible even without dense marker maps (for mapping chromosomal regions).
However, BLD will confuse the interpretation of LD analysis for mapping complex disease trait loci, as most methods of LD analysis assume linkage equilibrium between markers in control chromosomes (i.e., individuals not affected with the disease or trait of interest and sampled independently from one another). For LD mapping to succeed, it will be necessary to develop statistical methods that distinguish DLD from BLD, by either accounting for observed BLD or modeling the population history through coalescent methods.
As geneticists move from the mapping of relatively tractable Mendelian disorders to the identification of loci underlying complex disease traits, the utility of LD mapping approaches either in large farm animals or biomedical research remains a challenging task. Effective experimental design and sampling scheme based on adequately justified criteria defined by sound population genetic principles and empirical information on the distribution of LD in the bovine genome will be crucial in the mapping of complex disease trait loci through genome-wide association studies.
| CONCLUSIONS |
|---|
|
|
|---|
As expected, there is extensive LD in the US Holstein cattle population that confirms previous reports on the distribution of LD in Dutch Holstein cattle (Farnir et al., 2000). Approximately half of the syntenic marker pairs presented a typical pattern of disease trait associated LD and, as expected, few of the nonsyntenic marker pairs had a HFD produced by DLD. These results suggest that the observed LD in the US Holstein population is not purely due to genetic drift and that a portion may be due to DLD. This raises our hopes of successful fine-localization of genes affecting complex disease traits using LD mapping in the US Holstein cattle population.
Background LD should be studied in cattle populations using a population-based sample and a reference set of closely linked and evenly spaced highly polymorphic microsatellite markers and single nucleotide polymorphisms. These studies will clarify factors that influence the distribution and magnitudes of BLD in the bovine genome, aid in the dissection of BLD from LD associated with disease trait loci, and facilitate the design of optimal genome-wide association studies.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication December 19, 2002. Accepted for publication May 22, 2003.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Andreescu, S. Avendano, S. R. Brown, A. Hassen, S. J. Lamont, and J. C. M. Dekkers Linkage Disequilibrium in Related Breeding Lines of Chickens Genetics, December 1, 2007; 177(4): 2161 - 2169. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Martin-Burriel, C. Rodellar, J. A. Lenstra, A. Sanz, C. Cons, R. Osta, M. Reta, S. De Arguello, A. Sanz, and P. Zaragoza Genetic Diversity and Relationships of Endangered Spanish Cattle Breeds J. Hered., November 5, 2007; (2007) esm096v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Khatkar, K. R. Zenger, M. Hobbs, R. J. Hawken, J. A. L. Cavanagh, W. Barris, A. E. McClintock, S. McClintock, P. C. Thomson, B. Tier, et al. A Primary Assembly of a Bovine Haplotype Block Map Based on a 15,036-Single-Nucleotide Polymorphism Panel Genotyped in Holstein-Friesian Cattle Genetics, June 1, 2007; 176(2): 763 - 772. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Sharma, I. Leyva, F. Schenkel, and N. A. Karrow Association of toll-like receptor 4 polymorphisms with somatic cell score and lactation persistency in holstein bulls. J Dairy Sci, September 1, 2006; 89(9): 3626 - 3635. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. M. Heifetz, J. E. Fulton, N. O'Sullivan, H. Zhao, J. C. M. Dekkers, and M. Soller Extent and Consistency Across Generations of Linkage Disequilibrium in Commercial Layer Chicken Breeding Populations Genetics, November 1, 2005; 171(3): 1173 - 1181. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |