|
|
||||||||
1 VIT, Heideweg 1, 27-283 Verden, Germany
2 Department of Animal Genetics, Wroclaw Agricultural University, 51-631 Wroclaw, Poland
Corresponding author: Joanna Szyda; e-mail: Joanna.Szyda{at}vit.de.
| ABSTRACT |
|---|
|
|
|---|
Key Words: quantitative trait locus variance component milk production trait chromosome 6
Abbreviation key: AIA = average information algorithm, BTA6 = Bos taurus autosome 6, DYD = daughter yield deviations, EDC = effective daughter contribution, IBD = identical by descent, MAS = marker-assisted selection, MCMC = Monte Carlo Markov Chain, RRTDM = random regression test day model, VIT = genetic evaluation center
| INTRODUCTION |
|---|
|
|
|---|
The mixed inheritance models with random QTL effects can be used in preselection of young bulls for progeny testing programs, selection of young animals without or with little progeny information, or selection of heifers as candidates for dams of bulls, following the marker-assisted selection (MAS) scheme. For routine application of MAS in dairy cattle populations, knowledge of parameters of polygenic and QTL components of the genetic variance is the prerequisite. Such models have been applied to data from the US Holstein population by Zhang et al. (1998) and from a selected subset of the German Holstein population by Freyer et al., in univariate (2002) and multivariate (2003) frameworks. Therefore, the main objective of the current study was to estimate genetic parameters of the mixed inheritance model for the German Holstein dairy population using all available marker and phenotypic information. Moreover, technical aspects related the estimation of model parameters for a large data set from routine genotype recording were discussed.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The analyzed material was a subset of bulls from the active population of the German Holstein and reflects the contents of the marker database in February 2003. The data consisted of 4500 genotyped animals born between 1985 and 2000. With nongenotyped parents of the genotyped animals included, the full pedigree contained 7841 animals.
Genotypic information.
Based on the analysis of the genome scan data, a region representing QTL for milk, fat, and protein yields was identified on chromosome 6 (BTA6). The 6 markers mapped to this region covered approximately 70 cM with an average intermarker distance of 14 cM (Thomsen et al., 2000). Table 1
gives distances between the markers and their basic characteristics based on all 10,152 animals.
|
|
![]() | ([1]) |
where y is a vector of DYD for bulls, expressed on a 305-d lactation basis, ß is a vector of fixed effects for year of birth, qs is a vector of fixed QTL effects for all grandsires, a is a vector of random polygenic effects assuming a ~N(0,Ga
a2) with Ga representing polygenic relationships among individuals and
a2 being a component of the total additive genetic variance attributed to polygenes, e is a vector of random errors assuming e ~N(0,D
e2) with
e2 denoting the error variance and matrix D containing a function of EDC (which is specified later in the text) on the diagonal, and X, Zs, and Z are corresponding design matrices. The (co)variance structure corresponding to model [1] is given by
![]() |
A random QTL model (model 2).
The model included a random QTL effect specific to each animal:
![]() | ([2]) |
where q is a vector of random QTL effects; assuming q ~N(0,Gq
q2) with Gq representing relationship among individuals at a QTL position expressed by proportions of alleles being IBD and
q2 is the component of the total additive genetic variance due to the QTL; X, Zq, Z are corresponding design matrices, and the other effects are as specified above. Covariance structure between the random effects of model [2] is defined as:
![]() |
Derivation of DYD.
By definition, DYD of bulls are averages of daughters performance adjusted for all fixed and nongenetic random effects of the daughters and genetic effects of their dams. The DYD are deregressed and are a more independent measure of phenotypic performance of bulls daughters than are EBV (VanRaden and Wiggans, 1991). The DYD were derived following Liu et al. (2004), by absorbing the genetic effects of daughters by using their records adjusted for all other effects and for the EBV of the dams. For the RRTDM, the DYD were modeled with the same mathematical function as additive genetic effects and expressed in regression coefficients. The vector y contained DYD on a 305-d lactation basis, which were computed by summing up individual DYD of all DIM.
Derivation of weights on DYD.
Reliability of a bull due to daughter performance, denoted as Rb2, was obtained from the approximation procedure of Liu et al. (2004). The calculation of Rb2 under an RRTDM accounted for numbers of daughters, numbers of lactations per daughter, numbers of tests per lactation, and the reliability of the EBV of the mates of the bull. The least squares part, denoted
b, of the left-hand side in the mixed model equation system corresponding to the bull was derived as:
![]() | ([3]) |
where na is the weight on DYD of the bull and represents the diagonal element in D for the bull, and k =
e2/
a2 is the ratio of residual to genetic variance.
Estimation of IBD Proportion
Estimation of the IBD matrix of the QTL was based on the reversible jump Monte Carlo Markov Chain (MCMC; Green, 1995) algorithm along the whole marked chromosome region at a step-size of 1 cM. Because no formal monitoring of the MCMC algorithm convergence was performed, a long burn-in phase of 1000 rounds was used to allow sampling from the proper marginal distributions. Also, a long spacing of 100 rounds was chosen between scored IBD matrix realizations to account for the fact that mixing of the parameter values might be poor for a large multigenerational pedigree with many closely linked markers. The mode of 91 realizations was considered as the final estimate of the IBD matrix.
Estimation of Model Parameters
The restricted maximum likelihood (REML; Patterson and Thompson, 1971) approach was applied for estimating parameters of models [1] and [2]. Following Gilmour et al. (1995), average information algorithm (AIA) was used to maximize REML likelihood. The likelihood functions of models [1] and [2] were respectively defined as:
![]() |
and
![]() | ([4]) |
where n is the number of phenotypic records, r is the rank of the design matrix for fixed effects, R = D
e2is residual (co)variance matrix, C is coefficient matrix of the mixed model equations of model [2]. The estimated effects comprised [ßq a], and estimated variances are
a2,
e2 for model [1] and, additionally,
q2for model [2]. For all the traits considered, the most probable position of the QTL was estimated using model [1] based on a likelihood profile constructed every 1 cM along the marked region of BTA6. Parameters of model [2] were estimated for the most probable QTL location estimated by model [1], except for second lactation fat yield, for which the parameters were estimated along the whole marked chromosome region every 1 cM. Confidence intervals for QTL position,
a2 and
q2 were obtained based on a normal approximation of the asymptotic distribution of maximum likelihood estimates:
![]() |
where
is the estimate of QTL position,
a2, or
q2,
is the probability of type I error, z
/2 is the critical value corresponding to
type I error rate based on the standard normal distribution, and 
is the standard deviation of 
approximated by
with
set to 1 cM for QTL position, 0.3 for
a2, and 0.04 for
q2 (Meyer and Hill, 1992).
Hypotheses Testing
The likelihood ratio test statistic was used as a testing criterion:
![]() | ([5]) |
where L(M1) and L(M0) are the maximum values of likelihood functions underlying the unrestricted model given above and a more parsimonious model without QTL effects. Corresponding null and alternative hypotheses were:
H1: qs
0 and H0: qs = 0 for model [1], with the asymptotic null distribution of
being
1df2
H1:
q2 > 0 and H0:
q2= 0 for model [2]. Because
q2 was constrained to positive values, the asymptotic null distribution of
followed a 50:50 mixture of 0 and
1df2 [for theoretical derivation, see Self and Liang (1987); for empirical results based on livestock data structure, see George et al. (2000)].
Implementation
The routine implementation of the estimation procedure involved 5 steps:
| RESULTS |
|---|
|
|
|---|
|
|
|
|
a2 for milk yield in second parity and 0.08
a2 for fat yield in second parity. Despite the relatively low estimates of QTL variance, the QTL term was significant for all parities and all yield traits.
To examine how inaccurate QTL position estimates affected estimates of QTL variance components, parameters of model [2] were estimated for second-parity fat yield along the entire 70 cM chromosome region. Results presented in Figure 4
indicated that with increasing distance from the more probable QTL position, the QTL variance diminished and a significant portion of QTL variance was absorbed into the polygenic component, whereas the residual component remained relatively stable along the chromosome. This effect was summarized by a linear regression of
e2 and
a2 on distance from the most probable QTL position (in cM), which yield regression coefficients of 0.04 ± 0.007 and 0.07 ± 0.007, respectively.
|
|
a2 and
q2 around the
a2 and
q2 values for second-lactation fat yield, while the other parameters of model [2] were estimated using the AIA as implemented in the AS-REML package. As shown in Figure 6
a2 and
q2 values, the likelihood surface remained rather flat, and equal likelihood values were obtained for 34 different combinations of
a2 and
q2 The 95% confidence intervals approximated for the 2 variance estimates were (5.8 and 14.2) for
a2 and (0.27 and 1.37) for
q2
|
| DISCUSSION |
|---|
|
|
|---|
Variance component estimates.
The main goal of our study was to estimate genetic parameters underlying the random QTL model used for MAS, i.e., additive polygenic and QTL variances for the German Holstein population. Quantitative trait loci variance was already estimated for by Zhang et al. (1998) for US Holsteins and for the German Holsteins by Freyer et al. (2002, 2003). However, an important advantage of the current study was the large size of the data sets: (1) more than 2 generations of genotyped individuals were available, (2) the young genotyped bulls represented almost all of the young bulls tested in German Holstein population (although not all of those bulls have DYD available yet). Those features mean that the data well represent the current German Holstein population. The variance components for milk and fat yields estimated by Zhang et al. (1998) based on the sample of 1794 sons of 14 sires remained in close agreement with our variance estimates. In general, QTL variances explained by the markers on BTA6 represented only a small proportion (< 10%) of the additive genetic variance of milk production traits in both populations. In contrast, the study by Freyer et al. (2002, 2003) based on data from 562 sons of 5 sires gave much higher estimates of QTL variance, especially for fat and protein yields, varying between 23 and 50% of the total genetic variance depending on the model and trait studied. We noticed that there was a significant change in QTL variance estimates in the 2 investigations by Freyer et al. using the same data set. The QTL variance estimate for milk yield increased from 8 to 16% of variance of DYD, but significantly decreased from 50 to 20% for fat yield, and from 50 to 28% for protein yield between the 2 studies, even though the same univariate QTL model was used. The differences in QTL variance estimates between our study and both studies of Freyer et al. were due to the following reasons: 1) The trait used by Freyer et al. was a yield trait on a combined lactation basis, whereas we analyzed the yield traits in 3 lactations separately. Because the 3 lactations were highly positively correlated, higher QTL component can be expected for combined lactation than for single lactations; 2) The dependent variable in our study was DYD, whereas the dependent variable in Freyer et al. (2002) was EBV. They noted that DYD had higher proportion of nonQTL variance and lower QTL component than did EBV. In our opinion, DYD is an unregressed measure of daughter performance and should be preferred to EBV, despite the fact that the use of DYD led to a lower QTL variance component; 3) The DYD and EBV used by Freyer et al. (2003) corresponded to a different genetic evaluation model (Reents et al., 1995) and a different DYD calculation method; 4) The numbers of genotyped animals and animals in pedigree were much higher in our study than in the studies by Freyer et al. (2003) and the data in their studies might not have been representative of the overall population. Sampling variances of QTL parameter estimates were significantly smaller in our studies than in Freyer et al. (2003); and 5) We had multiple generations of genotyped animals, in comparison with only 2 generations in both studies of Freyer et al., (ancestors of the 5 grandsires were ignored in their studies). However, adding more genotyped ancestors can significantly improve the tracing of origin of marker alleles and IBD calculation for QTL.
Considering the estimates of variance components along the marked region of BTA6 in Figure 4
, we concluded that model [2] was well able to differentiate between the QTL and polygenic variances, because with increasing distance from the most probable QTL position, the proportion of QTL component in the total genetic variance diminished. In contrast, the residual component was not confounded with genetic components, as its estimates remained stable along the chromosome.
Compared with yield deviations of cow, DYD of bulls had a much smaller proportion of residual variance, due to higher reliabilities associated with the DYD. Error variance of the original polygenic RRTDM could not be accurately estimated based on the DYD in the parameter estimation for QTL variance. The difference between the estimated polygenic variance and original RRTDM genetic variance amounted to 5 to 10%, which can be explained by different samples of selected animals in both parameter estimations. The RRTMD was applied to cows with test-day records (Liu et al., 2001) vs. genotyped bulls with DYD for this study. In addition different pedigree structures were considered: sire and dam of animals for RRTDM (Liu et al., 2001) vs. male ancestors of bulls only in this study. Another aspect of using DYD as trait representation was weighting of residual variances in the parameter estimation. In our case, initial attempts to use weights based on EDC resulted in convergence problems by the AIA. Because DYD of all bulls had comparable and high reliabilities in current study, the fact that DYD were assigned to equal weights should not affect much the ratios of QTL to polygenic variances.
Likelihood surface.
Comparison of likelihood profile shapes based on model [1] and model [2] (Figure 5
) showed good agreement, because both models indicated the same 2 intervals as the most probable QTL location. In terms of QTL position, Zhang et al. (1998) obtained similar estimates using both least squares and variance component models as well. In addition, test statistics profiles for the 2 methods shown by Freyer et al. (2002) remained in good agreement. Figure 5
revealed an unfavorable feature of the likelihood profile resulting from model [2], as the likelihood values were not smooth along the analyzed region, indicating problems in convergence by the AIA in the neighborhood of maximum of the likelihood function. As shown in Figure 6
, for the axes defined by variance components, the likelihood surface was rather flat, not only at the very maximum but also in its proximity, which most likely caused the observed optimization problems.
Estimation of IBD proportions.
As already pointed out by Grignola et al. (1996), the estimation of IBD relationships for large, multigenerational pedigrees was computationally very demanding. The first method for calculating IBD proportions for such pedigrees was proposed by Fernando and Grossman (1989) for a single marker scenario. However, using one marker at a time is not well suited for analyzing data from actual dairy cattle populations with a complex pedigree structure, because there are often uninformative or missing marker genotypes. Since the above-mentioned seminal study, a number of methods for calculating IBD coefficients have been developed, which can be mainly classified into deterministic and MCMC-based approaches, based on the estimation method, or into marker interval and multiple-marker based approaches, according to the use of marker information. The main advantage of the deterministic approach lies in its speed of computation; however, no method exists at present that is able to use all available marker haplotype and pedigree information. Pong-Wong et al. (2001) and Liu et al. (2002) proposed methods that can account for a marker bracket. The multiple marker approach was developed by Almasy and Blangero (1998), but it was only applicable to special types of relationships between individuals. Meuwissen and Goddard (2001) presented a multimarker approach capable of tracing historical relationships not contained in the recorded part of a pedigree, but available relationship information was not explicitly used in the method. Recently, Lund et al. (2003) combined the information on historical and observed relationships following Meuwissen and Goddard (2001) and Wang et al. (1995), respectively for the 2 components. An overview of deterministic IBD estimation methods was given by George et al. (2000). The MCMC-based approaches were more flexible in terms of using marker and complex pedigree information, but they were time consuming, which limited their application to the analysis of large data sets. Additionally, issues related to monitoring convergence and irreducibility of the algorithms has not been well defined yet. The approach of Heath (1997) was among the first applications of MCMC to the analysis of large complex pedigrees, followed for example, by Xu and Gessler (1998) and Perez-Enciso et al. (2000). Grignola et al. (1996) proposed an intermediate approach, in which MCMC was used for the estimation of IBD among the genotyped part of the parental pedigree and then a deterministic approach to obtain IBD coefficient was used for nonparents.
The approach of Heath (1997) as implemented in the LOKI package, was used for the estimation of IBD proportions in the current study, which were then used in the parameter estimation under model [2] via the ASREML package. The simulation results by George et al. (2000) demonstrated that by using the aforementioned implementation, both variance components and QTL positions can be accurately estimated even in case of missing marker data. An application of this approach to livestock data was presented by de Koning et al. (2003). The results of simulation studies carried by Sørensen et al. (2002) showed that the algorithm provides accurate estimates of IBD proportions.
Using genomic information.
Although the original plan in Germany was to acquire not only male but also female genotypes in the development of routine genotyping and MAS, data available for the current analysis consisted of genotypes of bulls only. Because of the large number of genotyped bulls, genotypes of some dams and maternal grandsires could have been reconstructed. Recently, Bolard and Boichard (2002) showed how the information on maternal grand-sire genotypes and, consequently, on QTL transmissions can be incorporated in the QTL mapping, albeit in the fixed QTL framework. The marker information on female animals should be, whenever possible, incorporated in both parameter estimation and routine genetic evaluation.
| CONCLUSIONS |
|---|
|
|
|---|
Received for publication April 6, 2004. Accepted for publication September 15, 2004.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Z. Liu, J. Jaitner, F. Reinhardt, E. Pasman, S. Rensing, and R. Reents Genetic Evaluation of Fertility Traits of Dairy Cattle Using a Multiple-Trait Animal Model J Dairy Sci, November 1, 2008; 91(11): 4333 - 4343. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Neuner, R. Emmerling, G. Thaller, and K.-U. Gotz Strategies for Estimating Genetic Parameters in Marker-Assisted Best Linear Unbiased Predictor Models in Dairy Cattle J Dairy Sci, November 1, 2008; 91(11): 4344 - 4354. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Gengler, S. Abras, C. Verkenne, S. Vanderick, M. Szydlowski, and R. Renaville Accuracy of Prediction of Gene Content in Large Animal Populations and its Use for Candidate Gene Detection and Genetic Evaluation J Dairy Sci, April 1, 2008; 91(4): 1652 - 1659. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lillehammer, M. Arnyasi, S. Lien, H. G. Olsen, E. Sehested, J. Odegard, and T. H. E. Meuwissen A Genome Scan for Quantitative Trait Locus by Environment Interactions for Production Traits J Dairy Sci, July 1, 2007; 90(7): 3482 - 3489. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Druet, S. Fritz, D. Boichard, and J. J. Colleau Estimation of genetic parameters for quantitative trait Loci for dairy traits in the French holstein population. J Dairy Sci, October 1, 2006; 89(10): 4070 - 4076. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kucerova, M. S. Lund, P. Sorensen, G. Sahana, B. Guldbrandtsen, V. H. Nielsen, B. Thomsen, and C. Bendixen Multitrait quantitative trait Loci mapping for milk production traits in danish Holstein cattle. J Dairy Sci, June 1, 2006; 89(6): 2245 - 2256. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |