|
|
||||||||

* Biosciences Research Division, Department of Primary Industries Victoria, 1 Park Drive, Bundoora 3083, Australia
Faculty of Land and Food Resources, University of Melbourne, Parkville 3010, Australia
1 Corresponding author: ben.hayes{at}dpi.vic.gov.au
| ABSTRACT |
|---|
|
|
|---|
Key Words: genomic selection reliability
| INTRODUCTION |
|---|
|
|
|---|
The genomic selection revolution began with 2 developments. The first was the recent sequencing of the bovine genome, which led to the discovery of many thousands of DNA markers, in the form of SNP. Concurrent with the discovery of numerous SNP markers throughout the livestock genomes has been a dramatic reduction in the cost of genotyping.
The second development was the demonstration that it was possible to make very accurate selection decisions when breeding values were predicted from dense marker data alone, using a method termed genomic selection (Meuwissen et al., 2001). Genomic selection refers to selection decisions based on genomic breeding values (GEBV). To calculate GEBV, a prediction equation based on the SNP is first derived. The entire genome is divided into small segments, the effects of which are estimated in a reference population in which animals are both phenotyped and genotyped. In this way, the effects of all loci that contribute to genetic variation are captured, even if the effects of the individual loci are very small. In subsequent generations, animals can be genotyped for the markers to determine which chromosome segments they carry, and the estimated effects of the segments the animal carries can then be summed across the whole genome to predict the GEBV. This breeding value is termed a GEBV. Meuwissen et al. (2001) demonstrated in simulations that it was possible to achieve accuracies of predicted breeding values from markers alone of 0.85 (where accuracy is the correlation between true breeding value and EBV, and reliability is the square of this result).
The implications of achieving such accuracies for animals at birth are profound. The simulation results suggest that the accuracy of the GEBV for a bull calf can be as high as the accuracy of an EBV after a progeny test. Potentially, genomic selection could lead to a doubling of the rate of genetic gain through selection and breeding from bulls at 2 yr of age rather than 5 yr of age or later (Schaeffer, 2006). By avoiding progeny testing, bull breeding companies could save up to 92% of their costs (Schaeffer, 2006). However, some of these savings may be offset by the need to invest more money in genotyping to increase selection intensities and thereby increase the rates of genetic gain.
In this paper, we first review the progress of genomic selection, including results from dairy cattle breeding programs around the world. We then discuss how the accuracy of GEBV could be improved above what is currently being achieved. Finally, we investigate the effect of genomic selection on long-term genetic gain and other challenges.
| ACCURACY OF GEBV FROM DAIRY CATTLE BREEDING PROGRAMS AROUND THE WORLD |
|---|
|
|
|---|
Results from Australia
The calculation of GEBV is described in some detail here, because similar methodologies were used in New Zealand and the United States. A total of 798 Australian Holstein-Friesian bulls born between 1998 and 2003 and progeny tested by Genetics Australia were genotyped for 56,947 SNP by using the Illumina Bovine SNP50TM chip. Samples were screened for the proportion of missing genotypes, and animals with greater than 10% missing genotypes were removed. The SNP were included only if they met the following criteria: percentage of missing genotypes across samples <10%, minimum allele frequency >2.5%, deviation of observed genotype frequencies from expected frequencies calculated from allele frequencies (Hardy Weinberg
2 values) <600. These criteria were chosen in an attempt to exclude SNP with a high rate of genotyping error, and to exclude very low frequency SNP from the data set, because the effects of such SNP will be very poorly estimated. Parentage checking was performed, and any genotypes incompatible with pedigree were removed. A total of 730 of the 798 sires had greater than 90% of SNP genotyped. A total of 38,259 SNP satisfied all the SNP selection criteria.
The implementation of genomic selection methodologies is more difficult if some animals are missing genotypes for some markers. We used the following approach to impute missing genotypes. Single nucleotide polymorphisms were ordered by chromosome position by using Bovine Genome Build 4.0 (http://www.ncbi.nlm.nih.gov/projects/genome/guide/cow/). Genotypes were then submitted to fastPHASE analysis (Scheet and Stephens, 2006) chromosome by chromosome. The missing genotypes were taken as those filled in by fastPHASE. We assessed the accuracy with which the missing genotypes were filled in by removing known genotypes at every 50th position for 10% of animals on chromosome 26. The imputed genotypes were then compared with the known genotypes to evaluate the accuracy of the approach. A total of 3,571 missing genotypes were filled in by the fastPHASE program, 3,525 of which were correct, giving an accuracy of 98.7%. For comparison, an approach that filled in missing genotypes by sampling from a binomial distribution with mean as the allele frequency gave an accuracy of only 51.1%.
The phenotypes used were deregressed Australian breeding values (ABV) for protein yield, protein percentage, Australian Profit Ranking (APR), and Australian Selection Index (ASI) extracted from the Australian DHI Scheme (ADHIS database). The ASI is given by (3.8 x ABV_protein) + (0.9 x ABV_fat) – (0.048 x ABV_milk), whereas APR is given by (3.8 x ABV_protein) + (0.9 x ABV_fat) – (0.048 x ABV_milk) + (1.2 x ABV_milkingSpeed) + (2.0 x ABV_temperament) + 3.9 ABV_survival + (0.34 x ABV_cellCount) – (0.26 x ABV_liveweight) + (3.0 x ABV_daughter fertility). The breeding values were deregressed to remove the contribution from relatives other than daughters.
To reduce the number of SNP to be considered in the prediction equations for computational tractability, we first tested the effect of each SNP in turn on each trait. To do this, we fitted the model
![]()
where y is a vector of deregressed breeding values for each trait in the reference population; 1n is a vector of 1s (e.g., [1 1 1 ]); W is a vector allocating records to the SNP effect, with element Wij = 0, 1, or 2 if the genotype of animal i is 11, 12, or 22, respectively; g is the (fixed) effect of the SNP; e is a vector of random deviates ei
N(0,
e2), where
e2 is the error variance; and ui is the polygenic breeding value of the ith animal, assumed to be normally distributed, ui
N(0, A
a2), where A is the average relationship matrix. The variance components were estimated in ASREML (Gilmour et al., 2002). For each trait, the SNP that were significant at P < 0.05 were taken to the next stage.
On the basis of the above one-SNP-at-a-time model, the m significant SNP were chosen to be fitted simultaneously in the following model:
![]()
where y is a vector of n daughter yield deviations corrected for herd-year-season effects for each trait; 1n is a vector of 1s; X is (n x m) a design matrix allocating records to the marker effects, with element Xij = 0, 1, or 2 if the genotype of animal i at SNP j is 11, 12, or 22, respectively; g is a (m x 1) vector of SNP effects assumed to be normally distributed, gi
N(0,
gi2 ); e is a vector of random deviates, where
e2 is the error variance; and ui is the polygenic breeding value of the ith animal, with variance A
a2, where A is the average relationship matrix. For some traits, all the SNP were fitted in the models for comparison.
Two methods were used to derive the prediction equations. The first method used was a simple BLUP approach, as described by Meuwissen et al. (2001). This method treats all SNP as having an effect that is sampled from the same normal distribution; in other words, the effects of all SNP are assumed to be very small and the
gi2 are the same across all i. In this case, the
gi2 was calculated as
![]()
gi2 can be different across the i). In this case, samples of the
gi2 from their posterior distributions are taken by using Gibbs sampling [see Meuwissen et al. (2001) for details]. This method was similar to BayesA in Meuwissen et al. (2001), but was modified to include the polygenic effect and used only the subsets of SNP significant at P < 0.05. For some traits, all SNP were included in BayesA for comparison.
To assess the accuracy of genomic selection, the effects of the SNP were first estimated in bulls born from 1998 to 2002. Using these predicted SNP effects, we predicted the GEBV of the bulls born in 2003, as GEBV= û + X
. These GEBV were then correlated with the current breeding values of these bulls, which were largely derived from a progeny test. This gave r(GEBV,EBV), whereas we wished to know r(GEBV,TBV), where TBV is the true breeding value. This can be obtained as r(GEBV,EBV)/r(EBV,TBV), for example, the correlation between the GEBV and the current breeding value divided by the square root of the reliability of the current breeding value. The square of this result is the reliability of the breeding value presented in Table 1
, along with the reliability of the sire pathway EBV at the time of birth of the bull calves. The accuracies of GEBV were considerably greater than the sire pathway EBV that were currently used to select bull calves for progeny testing (Table 1
).
|
The Bayesian method gave small increases in reliability for all traits except fertility, in the order of 2 to 7%. Interestingly, fitting all SNP in the Bayesian analysis, rather than preselected subsets, did not result in increased accuracy for the traits for which this was tried, and in some cases led to slightly decreased accuracy.
Results from New Zealand
Harris et al. (2008) reported reliabilities of GEBV in New Zealand dairy cattle from an experiment performed by the Livestock Improvement Corporation. Their reference population consisted of approximately 4,500 bulls progeny tested by the Livestock Improvement Corporation, a much larger reference than the Australian data to date. The bulls were genotyped for the same SNP set as described above. After quality checking, 44,146 SNP were retained for analysis. To derive the prediction equations, a wide range of methods were tried, including BLUP, BayesA, BayesB (where some SNP can have a zero effect; Meuwissen et al., 2001), least angle regression (Efron et al., 2004), and Bayesian regression (Xu, 2003). A polygenic component (additive breeding value) based on pedigree was included in the GEBV.
The reliabilities of GEBV were estimated by direct inversion of a set of mixed model equations, with the average relationship matrix replaced by the genetic relationship matrix based on the SNP data (for details, see Harris et al., 2008). Reliabilities of GEBV for young bulls with no daughter information calculated in this way were in the range of 50 to 67% for milk production traits, live BW, fertility, SCC, and longevity, compared with an average 34% for parental average breeding values. These reliabilities are generally greater than those achieved in the Australian data, which probably reflects the much larger number of bulls in the New Zealand reference population, as well as the fact that the New Zealand reliabilities were predicted rather than realized. Again, the Bayesian methods gave slightly greater (2 to 3%) reliabilities than the BLUP approach, whereas the regression methods performed poorly.
Results from the United States
VanRaden et al. (2009) reported reliabilities of GEBV for US and Canadian young bulls. The reference population from which the prediction equations were derived consisted of 3,576 Holstein bulls genotyped for 38,416 SNP with the Illumina Bovine SNP50TM chip, as for the Australian and New Zealand experiments. Prediction methods included a method similar to BLUP [as described by Meuwissen et al. (2001)], which assumed a normal distribution for the marker effects, and a Bayesian method with a heavier tail before allowing for genes of the major effect (similar to BayesA described above). As in the Australian and New Zealand calculations of GEBV, the parent average or polygenic effect from pedigree was combined with the genomic predictions by selection index to obtain the final GEBV.
Averaged across traits, the GEBV had a reliability of 50%, compared with 27% from the parent average alone. Using BLUP rather than the Bayesian approach gave only a slightly (1%) reduced reliability, as was observed in the Australian and New Zealand results.
Results from the Netherlands
De Roos (CRV, Arnhem, the Netherlands; personal communication) reported results from a genomic selection experiment conducted by CRV, a dairy breeding company based in the Netherlands. Their reference population consisted of 1,583 bulls genotyped with a custom-made SNP chip containing 57,660 SNP, of which 46,529 SNP were used in subsequent analysis. They calculated the accuracy of GEBV by randomly dropping out 5% of the 429 bulls born between 1999 and 2003 from the reference population, calculating GEBV for these bulls, and then correlating them with the actual EBV of the bulls, which included progeny test information. This was repeated 20 times so that each bull was dropped out once and used as a reference bull in the other 19 runs. Their methodology for calculating SNP effects followed the Gibbs sampling scheme proposed by Meuwissen and Goddard (2004), implemented for single SNP rather than haplotypes (Calus et al., 2008). The increase in reliability of GEBV over parent average EBV at the time of birth was 33% (fat percentage), 19% (kilograms of protein), 15% (feet and legs), 13% (udder depth, SCS), and 9% (fertility). They concluded that having a larger number of bulls in their reference population would increase the reliability of GEBV in their selection candidates substantially.
Comparison of Results
In all 4 countries, the reliabilities of GEBV were substantially greater than breeding values from parental averages. In all countries, the dairy cattle breeding companies are likely to take advantage of the GEBV both to improve rates of genetic gain and to reduce the cost of their breeding programs.
The increase in reliability of breeding value as a result of including the genomic information was greater in the data from the United States and New Zealand than in the Australian data, most likely reflecting the large number of bulls those countries used in their reference populations. However, the method of calculating reliability of the GEBV differed between countries, making a direct comparison difficult.
A common finding was that the BLUP method, which assumes a normal distribution of marker effects, performed only slightly worse than the Bayesian methods, which use a prior allowing for genes of moderate to large effect. A conclusion from this common result would be that for most dairy traits, the assumption of the BLUP method, that there are many genes of small effect and few or none of moderate to large effect, may be close to reality. An alternative explanation might be that the SNP track large chromosome segments and that the effect of the chromosome segment is divided over many SNP. There were some individual SNP with large effects, however; for example, there is a polymorphism in the DGAT1 gene that has a large effect on fat percentage (Grisart et al. 2004), and this was detected by the surrounding SNP [Australian data, Van Raden et al. (2008)].
In all countries, the final GEBV was calculated by combining the parental average breeding value from pedigree information with the breeding value from genomic information by using selection index theory. For example, the components could be weighted by their reliability. The advantage of using both sources of information is that any QTL not captured by the SNP effects may be captured by the parental average or polygenic breeding value. This may be particularly important to capture QTL at low frequency in the population, as discussed below.
| INCREASING THE ACCURACY OF GENOMIC SELECTION |
|---|
|
|
|---|
For genomic selection to work, the single markers must be in sufficient LD with the QTL such that the markers will predict the effects of the QTL across the population and across generations. The level of LD between markers and QTL, or between markers, can be quantified with the parameter r2 (Hill, 1981). If we consider the r2 between a marker and an (unobserved) QTL, r2 is the proportion of variation caused by the alleles at a QTL that is explained by the markers. For genomic selection to be as successful as in the simulations of Meuwissen et al. (2001), where accuracies of GEBV of 0.85 were achieved, the level of LD between adjacent markers should be r2
0.2, because this was the level of LD their simulations generated. It must be noted that Meuwissen et al. (2001) used haplotypes rather than single markers in their simulations, so the level of LD they generated may be greater than can be achieved with single markers at the same marker spacing (e.g., Goddard, 1991). Calus et al. (2008) used simulation to assess the effect of the average r2 between adjacent marker pairs on the accuracy of genomic selection (where the accuracy was the correlation of true breeding values and GEBV for a group of unphenotyped animals) by using single SNP rather than haplotypes. They found that the accuracy of GEBV increased dramatically as the average r2 between adjacent markers increased, from 0.68 when the average r2 between adjacent markers was 0.1, to 0.82 when the average r2 between adjacent markers was 0.2. In the Australian Holstein data, with 38,259 SNP across the genome, the average LD between adjacent markers, measured by r2, was 0.271. However, a considerable number of pairs had zero r2 values (Figure 1
).
|
The accuracy of genomic selection will also be determined by the number of phenotypic records that are used to estimate the SNP effects. The more phenotypic records available, the more observations there will be per SNP allele and the greater the accuracy of genomic selection. The heritability of the trait is also crucial here; with greater heritabilities, fewer records are required. The distribution of QTL effects is also important. If there are very many QTL of very small effects contributing to variation in the trait, as the results above suggest, a large number of phenotypic records will be required to estimate these effects accurately. Goddard (2008) presented a deterministic method for calculating the accuracy of GEBV when the prediction equation is estimated in a reference population of given size and for a given level of heritability (Figure 2
). A normal distribution of QTL effects was assumed [for results with nonnormal distributions of QTL effects, see Goddard (2008)].
|
The importance of having a large reference population is also demonstrated by the accuracies of GEBV achieved in different countries. The increase in reliability of breeding value as a result of including the genomic information was greater in data from the United States and New Zealand than in the Australian data, reflecting the larger number of bulls used in the US and New Zealand reference populations.
| OPTIMIZING BREEDING PROGRAM DESIGN WITH GENOMIC SELECTION |
|---|
|
|
|---|
In dairy cattle breeding, progeny testing is currently used to identify bulls of high genetic merit. A good description of the progeny test scheme was given by Schaeffer (2006): "In the progeny test scheme, a number of elite cows are identified each year as the dams of young bulls, and these cows are mated to specific sires. At one year of age, the young bulls are test mated to a large number of cows in the population, in order that they will have about 100 daughters with their first EBV for production and other traits. Approximately 43 months later, the daughters from these matings complete their first lactations and the young bull EBV for production are produced with an accuracy of approximately 75% (reliability of 56%). At this point, the young bull is proven or returned to service." The results from the New Zealand, US, and Australian experiments demonstrate that GEBV with this accuracy can already be calculated for bull calves, at least for some traits. If such accuracies were achieved for the selection indices in each country, bull calves could be be selected at this stage and used as soon as they are reproductively able, rather than after progeny testing. This reduces the generation interval by at least half. Further genetic gains can be made both by genotyping the elite bull dams and selecting a smaller number for mating to specific sires, and by screening very large numbers of bull calves with the markers to increase the selection intensity greatly. Physiologically, bulls are able to breed from 1 yr of age, so the potential exists to reduce the generation interval further still. However, mating 1-yr-old bulls to a small number of cows to check for congenital defects before widespread use when the bulls reach 2 yr of age is one way in which the technology is being implemented in practice (Harris et al., 2008).
The above description considers the selection of young bulls only and ignores the benefits that can be made from implementing genomic selection on the maternal side. Schaeffer (2006) demonstrated that large genetic gains could be made by genotyping potential dams of young bulls and selecting these dams on their GEBV, as a result of the large increase in accuracy of selecting the dams. In fact, Schaeffer (2006) concluded that the genetic gain made from selecting dams of bulls on GEBV could be greater than the gains made by selecting the sires of bulls on GEBV.
Schaeffer (2006) further suggested that the effect of genomic selection may be to shift the structure of the dairy cattle breeding industry to a model similar to that used by the poultry and swine industries, in which companies maintain a nucleus of elite animals "within house." A dispersed nucleus or preferential matings of cows identified as being elite are other alternatives.
Another effect of genomic selection may be a more appropriate balance in the direction of genetic gain. Currently in the dairy industry, large gains are made for production traits, whereas the gains in fertility are relatively small, in part because of the lower accuracy of fertility EBV (and also because production and fertility are unfavorably correlated). Genomic selection could increase the accuracy of fertility EBV if sufficient records were taken in the initial experiment to estimate SNP effects, allowing a greater contribution of this trait to the total breeding objective. However, if small reference populations are used, the accuracy of selection on fertility will remain low.
The impact of genomic selection on inbreeding should be carefully considered. If the generation interval in the breeding program stays the same, genomic selection actually results in a lower rate of inbreeding than non-marker BLUP selection using pedigree and phenotypic information, particularly for traits of low heritability (Daetwyler et al., 2007). Consider the selection of young bull calves to become part of a progeny test team. In the absence of genomic information, and because the young calves do not have any daughters, their breeding value is predicted as the average of the breeding value of their sire and dam. Two full sibs therefore receive the same breeding value, and if this is high enough, they will both be selected to form part of the progeny test team. If genomic information is available, the Mendelian sampling term (the result of the sampling of the sire and dam alleles during gamete formation) is captured and 2 full sibs receive different breeding values, and may not both be selected to form part of the team, which leads to a decrease in the rate of inbreeding.
However, if the generation interval of the breeding program is halved to take advantage of the accurate GEBV available at birth, the resulting increase in inbreeding per year may be greater than the decrease from capturing the Mendelian sampling term. Given the low cost of genotyping, inbreeding could potentially be managed by screening a much larger number of selection candidates for bull teams than has been done in the past. An effort could then be made to restrict the contribution of any one sire family to the selected bulls, such that inbreeding could be maintained at an acceptable level (e.g., Wray and Goddard, 1994). Potential bull dams could also be screened, and their relatedness with the potential sires could be assessed either through pedigree or through genomic relationships from the markers.
Although it is outside the scope of this review, it is interesting to consider the impact of genomic selection in other species. In the pig, sheep (meat and wool), and poultry industries, a major impact of genomic selection is likely to be increased genetic gain for hard-to-select-for traits. This would include traits such as disease resistance in poultry and meat quality in pigs. Genomic selection can also be used to increase the efficiency of development of composite lines, which are often used in the pig and sheep industries (Piyasatian et al., 2006). Crosses between breeds will exhibit much greater levels of LD than within-breed populations. Piyasatian et al. (2006) demonstrated that the genetic merit of composite lines can be improved by using genomic selection to capture chromosome segments, with the largest effects from the contributing breeds, even with a sparse marker map.
| CHALLENGES |
|---|
|
|
|---|
Combining pedigree, phenotype, and genomic information to calculate GEBV on an industry-wide scale is a considerable challenge. One major difficulty is that the number of animals genotyped is likely to be small compared with the total number of animals in the database. As described in Goddard and Hayes (2007), the most practical method for overcoming this problem may be to first calculate traditional EBV from phenotypes and pedigrees and GEBV from markers separately and then use a selection index to combine the 2 EBV on each animal into one final GEBV for use. This method is approximate but can be readily implemented.
A second possibility is to infer all marker genotypes for all animals and use these to calculate GEBV (Goddard and Hayes, 2007). Although this strategy poses some computational challenges, the option is attractive because it circumvents problems arising from having different animals genotyped for different marker panels, or from not having some animals genotyped at all. Provided the number of QTL is large, even if animals had no actual genotypes and all had inferred genotypes, based on pedigree, the inferred genotypes would simply replace the pedigree-derived relationship matrix in the calculation of EBV (Goddard and Hayes, 2007). A method that efficiently infers genotypes on large numbers of animals is required to implement this strategy.
As Harris et al. (2008) pointed out, incorporating genomic information into the international comparisons among proven sires, as currently calculated by Interbull, will be a very challenging task owing to different sets of SNP being used between and within countries, different prediction equations, and the presence of marker x environment interactions (e.g., Lillehammer et al., 2008).
Long-Term Genetic Gain with Genomic Selection
Both Muir (2007) and Goddard (2008) concluded that the long-term gain from genomic selection could be less than from phenotypic selection or based on pedigree and phenotypes because of simulations or deterministic predictions, respectively. Two explanations were given:
Both Muir (2007) and Goddard (2008) proposed solutions to this problem. Muir (2007) suggested that a polygenic component be included in the GEBV to utilize some of the variance at QTL not captured by the SNP. This strategy has already been adopted by the Australian, US, and New Zealand efforts to apply genomic selection.
Goddard (2008) suggested a method to find the optimal index to maximize the long-term selection response, an approach related to that suggested by Gibson (1994) for single QTL and a polygenic component. The resulting index would vary the weight given to a marker according to its frequency so that markers for which the favorable allele has a low frequency receive more weight in the index. Another approach to capturing low-frequency QTL is to use marker haplotypes rather than single markers. Because of both SNP discovery methods and bias in SNP selection for SNP arrays, SNP with low minimum allele frequencies on SNP arrays are uncommon. This creates a mismatch between the distribution of SNP allele frequencies and QTL allele frequencies. The mismatch results in low power to detect rare QTL alleles. The distribution of marker haplotype frequencies is more likely to match that of the QTL, so marker haplotypes may have greater power to detect QTL with low allele frequencies (Goddard, 2008).
Both authors advocated continual reestimation of the prediction equations, implying continual collection of phenotypes and genotypes. This strategy would maximize the long-term response from genomic selection. The strategy has other advantages, such as capturing low-frequency QTL alleles and allowing prediction of pleiotropic effects of SNP between the traits currently recorded and new traits as they are measured.
Disentangling SNP in LD with QTL and SNP Tracking Relationship
Although dense SNP markers are a valuable tool for detecting and accurately positioning QTL, they also do very well in capturing and describing genetic relationships (Habier et al., 2007). These relationships can be at the level of breed, sire, or complex pedigree (Pritchard et al., 2000; Hayes and Goddard, 2008). Unless these relationships are specifically accounted for in the model used to predict SNP effects in the reference population, some of the SNP will be attributed effects not because they are in LD with QTL, but because they explain part of the genetic relationships in the reference population. This is undesirable, because it is only the effect of SNP in LD with QTL that will persist across the population and across generations.
Habier et al. (2007) demonstrated that there are substantial differences in the persistence of accuracy across generations of GEBV according to which methodology is used to derive the prediction equations. For example, they found that the accuracy of GEBV when the SNP effects were predicted by BLUP decayed rapidly across generations if the SNP effects were not periodically reestimated, whereas the accuracy of GEBV when Bayesian methods were used to derive the prediction equations decayed more slowly. The BLUP method turns out to be particularly susceptible to allocating effects to SNP attributable to the relationship rather than because they are in LD with QTL. This is easy to understand when it is realized that genomic selection with BLUP, where a normal distribution of QTL effects is assumed, is exactly equivalent to estimating breeding values with the normal BLUP equations, where the pedigree relationship matrix is replaced with the genomic relationship matrix derived from the SNP data (Goddard, 2008).
The obvious solution to this problem is to remove the effect of relationships by fitting a polygenic effect in the model where the SNP effects are estimated, where the polygenic effect has a variance-covariance structure given by the average relationship matrix. Another possibility is to use multiple breeds in the reference population, because in this case, the SNP must be very close to QTL for the SNP to have an effect across multiple breeds, and so are less likely to be picking up relationships (De Roos et al., 2008; provided the breed effect is fitted in the model). Finally, the accuracy of GEBV persists for more generations if the reference population consists of multiple generations of animals rather than a single generation, as demonstrated by Muir (2007).
Genomic Selection Across Breeds
Harris et al. (2008) reported that SNP estimates calculated from a Holstein-Friesian reference population did not produce accurate GEBV in Jersey bulls, and vice versa. The correlations ranged from –0.1 to 0.3 when the SNP effects from one breed were used to calculate GEBV in another breed. Genomic selection relies on the phase of LD between markers and QTL being the same in the selection candidates as in the reference population. However, as the 2 populations diverge, this is less and less likely to be the case, especially if the distance between markers and QTL is relatively large. An explanation for the across-breed results of Harris et al. (2008) is that the SNP are in LD with QTL within a breed, but the relationship does not hold across a breed. De Roos et al. (2008) analyzed the extent of LD within and between several beef and dairy breeds, and concluded that for breeds as divergent as the Holstein and Jersey, at least 300,000 SNP would be required so that markers could be discovered that would work across breeds. The optimal composition of reference populations when the prediction equations are going to be used across multiple breeds is an area requiring further research, but early indications are that if the reference population includes at least some individuals from all target breeds, the accuracy of GEBV in these breeds is greatly improved (De Roos et al., 2008; Harris et al., 2008).
The above discussion does assume that the effect of QTL alleles is similar in different breeds and populations. For some QTL that have been traced to known mutations, the alleles do act reasonably similarly in different breeds and populations. For example, the A allele of the DGAT1 gene results in increased fat yield and reduced protein yield and milk volume in New Zealand Holstein-Friesians, Jerseys, and Ayshires (Spelman et al., 2002). Although the size of the effects is consistent for protein and milk volume in the Holstein-Friesian and Jersey breeds, the size of the fat response in Holstein-Friesians is nearly double that for Jerseys (Spelman et al., 2002). Another problem is that we have assumed that the same mutations affecting production traits are polymorphic in different breeds. This is true for some well-characterized mutations, such as the K232A mutation in DGAT1, which is polymorphic in Holsteins, Jerseys, and Aryshires (Spelman et al., 2002). Other mutations, such as some of the functional mutations in the myostatin gene, appear to be breed specific (Dunner et al., 2003). One solution would be to use a multibreed reference population, so that all the genetic variants are captured. The genotype x environment interaction may also reduce the accuracy of predicted GEBV when the chromosome segment effects are estimated from animals in another population.
Nonadditive Effects
Although GEBV as selection criteria by definition should include only additive effects (genetic merit that is passed from one generation to the next), in some cases it may be desirable to predict phenotypes, for example, for allocation of cows to different management regimens. In this case, the accuracy of predicting the phenotype could potentially be improved by including dominance and epistatic effects, depending on the proportion of total genetic variance these effects explain. Xu and Jia (2007) extended a single-marker Bayesian approach similar to the one described above to account for dominance and epistatic effects, and demonstrated that these effects could be estimated with reasonable precision in simulated data. Gianola et al. (2006) presented semiparametric procedures for genomic selection, which allowed them to estimate interactions between potentially hundreds of thousands of markers.
| CONCLUSIONS AND IMPLICATIONS |
|---|
|
|
|---|
Considerable challenges and opportunities remain in implementing genomic selection, including adaptation of national genetic evaluations to include genomic information, genomic selection across breeds, managing long-term gain and inbreeding with genomic selection, and computational challenges (e.g., Legarra and Misztal, 2008; Tsuruta and Misztal, 2008). These are exciting topics for further research.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication August 21, 2008. Accepted for publication October 2, 2008.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. D. MacNeil, J. D. Nkrumah, B. W. Woodward, and S. L. Northcutt Genetic evaluation of Angus cattle for carcass marbling using ultrasound and genomic indicators J Anim Sci, February 1, 2010; 88(2): 517 - 522. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Luan, J. A. Woolliams, S. Lien, M. Kent, M. Svendsen, and T. H. E. Meuwissen The Accuracy of Genomic Selection in Norwegian Red Cattle Assessed by Cross-Validation Genetics, November 1, 2009; 183(3): 1119 - 1126. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Konig and H. H. Swalve Application of selection index calculations to determine selection strategies in genomic breeding programs J Dairy Sci, October 1, 2009; 92(10): 5292 - 5303. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola, G. de los Campos, W. G. Hill, E. Manfredi, and R. Fernando Additive Genetic Variability and the Bayesian Alphabet Genetics, September 1, 2009; 183(1): 347 - 363. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. R. Wiggans, T. S. Sonstegard, P. M. VanRaden, L. K. Matukumalli, R. D. Schnabel, J. F. Taylor, F. S. Schenkel, and C. P. Van Tassell Selection of single-nucleotide polymorphisms and quality of genotypes used in genomic evaluation of dairy cattle in the United States and Canada J Dairy Sci, July 1, 2009; 92(7): 3431 - 3436. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. A. Lewin It's a Bull's Market Science, April 24, 2009; 324(5926): 478 - 479. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |