|
|
||||||||





* Animal Improvement Programs Laboratory, and
Bovine Functional Genomics Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
Division of Animal Sciences, University of Missouri, Columbia 65211
Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada
1 Corresponding author: paul.vanraden{at}ars.usda.gov
| ABSTRACT |
|---|
|
|
|---|
Key Words: genomic selection genomic prediction reliability evaluation accuracy
| INTRODUCTION |
|---|
|
|
|---|
Genetic effects must exist somewhere on the chromosomes for any trait with a nonzero heritability. Previously, marker-assisted selection was used to trace the inheritance of only a few major genes. Success was limited, and few studies reported substantial gains with real populations (Dekkers, 2004). More recently, selection in France resulted in R2 with eventual daughter evaluations that were 5 to 19% greater for genomic predictions than for parent averages (PA; Boichard et al., 2006). Expected gains in reliability from simulation were slightly lower (Guillaume et al., 2008).
A recently developed high-density assay of SNP can now be used to trace even small genetic effects (Van Tassell et al., 2008). "Fairly soon, as information continues to accumulate, a point will be reached where there is sufficient information on marker/QTL coupling in the ancestors of the candidate bulls, to eliminate the progeny testing step altogether, and shift completely to MAS as the primary means of selection among young sires" (Soller, 1994). Although that prediction did not come true in the decade after it was made, large gains are theoretically possible and could come true given sufficient numbers of genotyped animals and markers.
Objectives of this research were 1) to apply genomic prediction methods to genotypes for a large population of Holstein bulls, 2) to estimate gains in reliability from using genomic evaluations instead of traditional evaluations and PA, and 3) to document how the density of markers and numbers of genotyped bulls affect predictive ability.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
10 daughters in their evaluations by April 2008. An initial proposal was to select predictor bulls with the most extreme evaluations to help identify major genes, but selective genotyping was not used so that the selected bulls would be more representative of the general population and give more realistic estimates of achievable prediction accuracy.
Genomic Data
The main source of extracted DNA was from semen held in the Cooperative Dairy DNA Repository maintained by the Bovine Functional Genomics Laboratory, ARS, USDA (Beltsville, MD). All major AI organizations routinely contributed semen samples to the repository when young bulls were enrolled in progeny testing in the United States (Ashwell and Van Tassell, 1999). Also, Semex Alliance (Guelph, Ontario, Canada) routinely contributed semen and DNA from young bulls tested exclusively in Canada. Semen from significant ancestor bulls was purchased independently or was provided by the National Center for Genetic Resources Preservation, ARS, USDA (Fort Collins, CO), and genotyped to help trace genetic inheritance.
Marker genotypes were obtained using the BovineSNP50 BeadChip (Illumina, San Diego, CA). Markers on the chip were selected to be evenly distributed across chromosomes and polymorphic across a variety of breeds included in the International Bovine HapMap Project (International Bovine HapMap Consortium, 2006). Extraction of DNA and genotyping was conducted by the Bovine Functional Genomics Laboratory, ARS, USDA (Beltsville, MD); Division of Animal Sciences, University of Missouri (Columbia, MO); Department of Agricultural, Food and Nutritional Science, University of Alberta (Edmonton, Canada); GeneSeek (Lincoln, NE); Genetics & IVF Institute (Fairfax, VA); and Illumina Inc. (San Diego, CA). Scoring of marker genotypes was done using Illuminas Beadstudio software (v3.2.23).
For most SNP, genotypes were read for >99% of bulls with <0.1% error rates among those read. Success rate was quantified by agreement of son with sire genotypes and by reading the same DNA more than once for 9 individual bulls, for 2 pairs of identical twins, and for a trio of clones. Some SNP had minor allele frequencies of <0.05 in Holsteins and were excluded. That edit reduced the number of loci to 40,426 from the original 51,386 SNP that could be reliably read. Single nucleotide polymorphisms with lower minor allele frequencies may be included in future analyses because their effects are estimated more accurately as sample size increases.
Each SNP was compared with all others to eliminate those that were redundant (correlation of 1) because of complete linkage disequilibrium. Of the selected SNP, 2,010 had an inheritance pattern identical to another SNP for all 5,335 bulls or had <10 differences. Those "duplicate" SNP, which had physical distances between loci only half as large as the mean distance between adjacent loci, were removed leaving 38,416 loci for genomic predictions. Allele frequencies in the base (founder) population were estimated using the algorithm of Gengler et al. (2007) that solves for gene content of nongenotyped ancestors and descendants using pedigrees. The pedigree file with all known ancestors of the 5,335 bulls included 41,414 cows and bulls. The genotype file included 205 million known and 2.0 million (1%) unknown genotypes. For genotyped animals, missing genotypes were set to 0, 1, or 2 (number of the counted allele present) if the allele count estimated from relatives for that SNP was different by
0.20 from 0, 1, or 2, respectively. Using this process, 974,961 (49%) of the 2 million missing genotypes were imputed. Equations of VanRaden (2008) allowed distinction between known and missing genotypes, but the alternative of regressing on probabilities for all genotypes could increase accuracy and should be examined.
Official genetic evaluations were combined with genomic data secondarily instead of analyzing phenotypic records directly. All results were expressed on the US scale and included multitrait across-country evaluations from the Interbull Centre (Uppsala, Sweden) for bulls that had been progeny tested in Canada. Official evaluations for predictor bulls were obtained from August 2003 when the predicted bulls were 1 to 4 yr old. The dependent variable for analysis was daughter deviation weighted by reliability from daughters, which was computed from total daughter equivalents minus daughter equivalents from PA.
Genomic Predictions
Predictions were computed using linear and nonlinear genomic models (VanRaden, 2007, 2008). For linear predictions, the traditional additive genetic relationship matrix is replaced by a genomic relationship matrix and is equivalent to assigning equal genetic variance to all markers. For nonlinear predictions, markers with smaller effects are regressed further toward zero; markers with larger effects are regressed less to account for a nonnormal prior distribution of marker effects. Differing assumptions about numbers and sizes of QTL effects could result in better predictions than those of this initial test.
Genomic predictions and PA calculated from August 2003 data of older animals were compared for ability to predict April 2008 evaluations for younger bulls for 27 traits: milk, fat, and protein yields; fat and protein percentages; productive life; SCS; daughter pregnancy rate; sire and daughter calving ease; final score; stature; strength; body depth; dairy form; foot angle; rear legs (side and rear views); rump angle and width; fore udder; rear udder height; udder depth and cleft; front teat placement; teat length; and net merit. The experimental design provided an independent, realistic test by separating early daughter information of ancestors used to compute predictions from later daughter information of descendants used to assess prediction accuracy.
Because 2003 PA had not been stored for type traits or for calving ease, 2003 pedigree indexes (PI) constructed as 0.5(sire PTA) + 0.25(maternal grandsire PTA) + 0.25(birth year mean PTA) were substituted for PA for those traits. Reliability of PI is lower than that of PA, especially for highly heritable traits, because records for the dam are excluded. The 2008 PA was not substituted for the 2003 PA because then the sons information would have added to his dams reliability.
Direct genomic predictions included less phenotypic information than the official PA because genotypes were available and evaluations were included for only a subset of the total population. Some sires and grandsires of the predicted bulls were not genotyped, and none of their dams were genotyped. For comparison with genomic predictions, a second set of PA for predicted bulls was computed using traditional relationships with only the subset of genotyped ancestors (evaluations of nongenotyped ancestors were excluded from PA). Information from the other relatives was included after all other processing.
Final genomic predictions for predicted bulls combined 3 terms by selection index: 1) direct genomic prediction; 2) PA computed from the subset of genotyped ancestors using traditional relationships; and 3) published PA or PI. The selection index for the predictor bulls included: 1) direct genomic prediction; 2) subset PTA; and 3) published PTA. Some of the predicted bulls already had PTA for service sire calving ease by August 2003, and in that case, their published PTA from 2003 was used to compute the combined PTA. To avoid a part-whole correlation between 2003 and 2008 data, only the 552 bulls with no progeny by 2003 were used to test predictions for sire calving ease.
For each bull, a 3 x 3 symmetric matrix V was set up with reliabilities for the 3 terms on the diagonals and the following functions of those 3 reliabilities on the off-diagonals:
![]() |
![]() |
![]() |
For all bulls, V11 and V33 were constrained to be greater than V22 to ensure positive definite matrices. Selection index coefficients were then c' V–1, where c' is a vector with elements [V11 V22 V33]. The direct genomic reliabilities used in V11 were obtained by inverting a matrix with dimension equal to the number of genotyped animals. As numbers of genotyped animals increase, approximation strategies will be required to avoid the need to invert that large matrix.
Regressions and correlations were used to test predictions. A bulls published PTA is a weighted mean of his daughter deviation and his PA, and the use of deregressed evaluations or daughter deviations as dependent variables helps to avoid part-whole correlations with PA. Because daughter deviations as defined by VanRaden and Wiggans (1991) were not available for all traits, daughter deviations were computed as deregressed evaluations:

The regression coefficient was calculated from daughter equivalents from progeny, which was obtained by subtracting daughter equivalents from parents from the bulls total daughter equivalents. The 5,335 genotyped bulls averaged 1,949 daughters each, for a total of >10 million daughters with phenotypic data.
Genomic reliabilities were calculated in 2 ways. Expected genomic reliabilities were obtained by inverting mixed model equations that included genomic instead of traditional relationships. Realized genomic reliabilities were calculated from R2 of 2003 predictions with 2008 daughter deviations after adjusting for error variance in the daughter deviations and for prior selection on pedigree. The R2 from PA and from the nonlinear model were divided by mean reliability of daughter deviations (Rdau), and then the difference between the published and observed PA reliability was added to the adjusted genomic R2 to obtain the realized genomic reliability. Mathematically
![]() |
![]() |
The gain from genotyping is the difference between the realized genomic reliability and the reliability of traditional PA.
Sex Chromosomes
The X chromosome of a bull is inherited by all of his daughters but by none of his sons. Thus, 2 estimates of his genetic merit can be provided: PTA for his daughters is the sum of all marker effects, whereas PTA for his sons excludes effects of 605 markers on the X chromosome. Another 44 markers were located on the pseudo-autosomal region of X and included in the autosomal sum rather than the X chromosome sum. Fewer SNP have been identified on the X chromosome, and the spacing between markers is about 3 times greater than on the autosomes.
Cows also can have different PTA for daughters than for sons. For cows, effects on the X chromosome are doubled for producing sons because the X chromosome transmitted to sons will be transmitted to 50% of granddaughters instead of the 25% expected for autosomes.
Son merit for bulls was constructed as twice the mean of his sons daughter deviations adjusted for PTA of the sons dams. For 796 genotyped bulls that had
10 evaluated sons, differences between PTA from daughters and mean of sons PTA were used to test if estimated effects for net merit on the X chromosome were statistically significant. Another test included only the autosomal and pseudo-autosomal markers in the genotype file and compared predictions computed with and without the 605 markers on X.
Numbers of Bulls and SNP
More predictor bulls can increase reliability by providing more data to estimate each SNP effect. Large numbers of records are required to estimate the small effects of individual genes accurately. Numbers of bulls were compared using subsets of the bull genotypes as they became available. Net merit R2 values for younger bulls were compared using 3 progressively larger subsets that included 1,402, 2,391, and 3,319 bulls. Methods used were the same as for the full set of 5,335 bulls.
More markers can increase the accuracy of genomic selection by providing SNP located closer to the causative genes. Three SNP densities were compared using the same methods and genotypes for the full set of predictor and predicted bulls. The edited set of 38,416 SNP with >5% minor allele frequency in Holsteins (designated as 40K) was compared with subsets of exactly 50 or 25% of those SNP: 19,208 (20K) or 9,604 (10K). The 20K and 10K subsets were obtained by keeping every other or every fourth SNP sequentially across each chromosome, respectively. Results for 5 yield traits (milk, fat, and protein yields and fat and protein percentages), 3 fitness traits (productive life, SCS, and daughter pregnancy rate), and net merit were obtained using the nonlinear genomic model.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
Marker effects for most other traits were evenly distributed across all chromosomes with only a few regions having larger effects, which may explain why the infinitesimal model and standard quantitative genetic theories have worked well. The distribution of marker effects indicates primarily polygenic rather than simple inheritance and suggests that the favorable alleles will not become homozygous quickly, and genetic variation will remain even after intense selection. Thus, dairy cattle breeders may expect genetic progress to continue for many generations.
Nonlinear and linear predictions were correlated by >0.99 for most traits. The nonlinear genomic model had little advantage in R2 over the linear model except for fat and protein percentages with increases of 8 and 7%, respectively (Table 2
). Gains in R2 averaged 3% with simulated data (VanRaden, 2008) but generally were smaller with real data, which indicated that most traits are influenced by more loci than the 100 QTL used in simulation. The R2 improved when the prior assumption was that all markers have some effect rather than that most have no effect. Results comparing differing priors and a detailed summary of the locations of markers with largest effects for each trait were reported by Cole et al. (2008). Further nonlinear optimization procedures should be investigated and could result in larger advantages than those tested here.
Actual R2 may differ from expected reliability for 5 main reasons: 1) daughter deviations contain error, especially for lowly heritable traits, resulting in lower R2 than reliability; 2) selection of elite parents decreases R2 for directly selected traits, such as net merit, whereas published reliabilities assume no selection; 3) genetic effects may reside between the markers but are assumed to be located only at the markers; 4) gains in R2 may have large standard errors because of limited numbers of predicted bulls; and 5) a few genotypes are missing or read incorrectly. Observed gains in R2 were adjusted for effects of 1) and 2) to compute observed reliability, but no theoretical adjustments were available to correct expected gains in reliability for effects of 3), 4), and 5).
Gains in reliability from genotyping the predicted bulls are shown in Table 3
and averaged 23% across traits with a range from 8 to 43%. Gains were also converted to daughter equivalents or the number of phenotyped daughters that would provide the same increase in reliability. Daughter equivalents were calculated from the published heritability of each trait and averaged 11 for predicted bulls (Table 4
). Gains in reliability were uniform across traits. Gains in daughter equivalents were smaller for traits with greater heritability than for traits with lesser heritability because each daughter equivalent is worth more for more heritable traits. For net merit, observed reliability of PA was less than theoretical reliability because of intense selection and because the net merit index was changed in 2006 to include stillbirths. Fat yield and some conformation traits also had lower observed than published reliability of PA.
|
|
10% from 2003 to 2008. Only the gain for service-sire calving ease was nonsignificant. Gains in reliability for cows with records should be intermediate between those for young bulls and proven bulls because traditional reliabilities for most cows are only somewhat greater than their PA reliabilities. Selection index regressions were fairly uniform for all predicted bulls even though separate 3 x 3 matrices were used for each. Mean regression coefficients for the direct prediction, subset PA, and published PA were 0.99, –0.52, and 0.53, respectively. The selection index regressions are a function of the mean reliabilities for the predicted bulls. Inclusion of the subset PA allows the difference between genomic and traditional predictions (for the same subset of data) to be added to the published PA, which included all national and international data. As genotypes and phenotypes are included for more parents, regressions should approach 1 for the direct genomic prediction and approach 0 for the 2 PA terms.
Genomic predictions were expected to have the same mean as traditional evaluations, but their standard deviation (SD) was expected to increase in proportion to the increased accuracy. Thus, the SD of change from PA to genomic prediction should equal the SD of true transmitting ability multiplied by the square root of the gain in reliability for each trait, where reliability is expressed as a fraction (divided by 100) rather than a percentage. That formula can be applied to gains in reliability from any source of information (daughters, animals own records, and so on). Genomic predictions follow most of the same normal distribution formulas that animal breeders are already using.
Most animal breeders will conclude that these gains in reliability are sufficient to make genotyping profitable before breeders invest in progeny testing or embryo transfer. Rates of genetic progress should increase substantially as breeders take advantage of these new tools for improving animals (Schaeffer, 2008). Further increases in number of genotyped bulls, revisions to the statistical methods, and additional edits should increase the precision of future genomic predictions.
Sex Chromosomes
Effects on the X chromosome were smaller than expected; SD was about 0.1 genetic SD and accounted for only about 1% of genetic variance for most traits. However, those effects were associated (P < 0.0001) with differences between genetic merit of bull sons compared with bull daughters. Official PTA measures daughter genetic merit almost entirely because most sires have many more daughters than sons with data. For net merit, the regression on X effect was –1.3 with an SD of 0.3, which was close to the theoretical value of –1.0. Predictions computed without the markers on X had slightly lower R2 for 8 of 9 traits than for the full set (Table 5
).
|
Numbers of Bulls and SNP
For bull subsets (Table 6
), gains in R2 for net merit were nearly linear with increasing numbers of predictor bulls. Gains for most other individual traits (not shown) followed that same pattern. Although linear increases cannot continue indefinitely, the results suggest that genotyping additional predictor bulls will be profitable and that genomic selection within small populations will not achieve the large gains obtained for the North American Holstein population.
|
In a preliminary study with fewer bulls, differences in R2 between 20K and 40K SNP densities were not consistent or significant. Gains in reliability were expected from estimates of linkage disequilibrium for North American Holsteins (Sargolzaei et al., 2008) and from simulation studies (Calus et al., 2008). Although SNP density is already high, actual QTL are between the SNP, which may explain why most realized reliabilities were less than expected reliability. In the future, affordable SNP chips with greater density will likely become available and lead to further small increases in reliability.
The genetic history of the Holstein population may help to explain the results. Many animals share common DNA segments from Round Oak Rag Apple Elevation, Pawnee Farm Arlinda Chief, To-Mar Blackstar, and other popular ancestors occurring 4 to 10 generations back in current pedigrees. Few common ancestors occur >10 generations back because individual bulls had limited influence before AI with frozen semen began (Young and Seykora, 1996). Lengths of the shared chromosome segments are thus 0.10 to 0.25 of the mean chromosome length, and a few hundred markers per chromosome are adequate to trace those segments shared within families.
In the next generation, the common ancestors will be 1 generation further back, and more crossovers will occur between their adjacent alleles. If the allele effects estimated from families in this study were applied to less-related animals from other populations, predictions could be much less reliable. Divergent populations may require greater SNP densities. As more bulls are genotyped, more phenotypes will be available to estimate each effect. This will increase the value of having more SNP, but will also require the expense of genotyping the predictor bulls again using a denser chip.
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication July 2, 2008. Accepted for publication August 19, 2008.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
W. G. Hill Understanding and using quantitative genetic variation Phil Trans R Soc B, January 12, 2010; 365(1537): 73 - 85. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Seabury, P. M. Seabury, J. E. Decker, R. D. Schnabel, J. F. Taylor, and J. E. Womack Diversity and evolution of 11 innate immune genes in Bos taurus taurus and Bos taurus indicus cattle PNAS, January 5, 2010; 107(1): 151 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. W. de Roos, B. J. Hayes, and M. E. Goddard Reliability of Genomic Predictions Across Multiple Populations Genetics, December 1, 2009; 183(4): 1545 - 1553. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Luan, J. A. Woolliams, S. Lien, M. Kent, M. Svendsen, and T. H. E. Meuwissen The Accuracy of Genomic Selection in Norwegian Red Cattle Assessed by Cross-Validation Genetics, November 1, 2009; 183(3): 1119 - 1126. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Weigel, G. de los Campos, O. Gonzalez-Recio, H. Naya, X. L. Wu, N. Long, G. J. M. Rosa, and D. Gianola Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers J Dairy Sci, October 1, 2009; 92(10): 5248 - 5257. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Konig and H. H. Swalve Application of selection index calculations to determine selection strategies in genomic breeding programs J Dairy Sci, October 1, 2009; 92(10): 5292 - 5303. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Misztal, A. Legarra, and I. Aguilar Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information J Dairy Sci, September 1, 2009; 92(9): 4648 - 4655. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Legarra, I. Aguilar, and I. Misztal A relationship matrix including full pedigree and genomic information J Dairy Sci, September 1, 2009; 92(9): 4656 - 4663. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola, G. de los Campos, W. G. Hill, E. Manfredi, and R. Fernando Additive Genetic Variability and the Bayesian Alphabet Genetics, September 1, 2009; 183(1): 347 - 363. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. R. Wiggans, T. S. Sonstegard, P. M. VanRaden, L. K. Matukumalli, R. D. Schnabel, J. F. Taylor, F. S. Schenkel, and C. P. Van Tassell Selection of single-nucleotide polymorphisms and quality of genotypes used in genomic evaluation of dairy cattle in the United States and Canada J Dairy Sci, July 1, 2009; 92(7): 3431 - 3436. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Cole, P. M. VanRaden, J. R. O'Connell, C. P. Van Tassell, T. S. Sonstegard, R. D. Schnabel, J. F. Taylor, and G. R. Wiggans Distribution and location of genetic effects for dairy traits J Dairy Sci, June 1, 2009; 92(6): 2931 - 2946. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Zhong, J. C. M. Dekkers, R. L. Fernando, and J.-L. Jannink Factors Affecting Accuracy From Genomic Selection in Populations Derived From Multiple Inbred Lines: A Barley Case Study Genetics, May 1, 2009; 182(1): 355 - 364. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Hayes, P. J. Bowman, A. J. Chamberlain, and M. E. Goddard Invited review: Genomic selection in dairy cattle: Progress and challenges J Dairy Sci, February 1, 2009; 92(2): 433 - 443. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Green ASAS Centennial Paper: Future needs in animal breeding and genetics J Anim Sci, February 1, 2009; 87(2): 793 - 800. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |