|
|
||||||||
1 Agricultural University of Poznañ, Department of Genetics and Animal Breeding, Wolyñska 33, 61-627 Poznañ, Poland
2 Agricultural University of Wroclaw, Department of Genetics and Animal Breeding, ul. Ko
uchowska 7, 51-631 Wroclaw, Poland
3 Agricultural University of Kraków, Department of Genetics and Animal Breeding, Al. Mickiewicza 24/28, 30-059 Kraków, Poland
4 Centre for Genetic Improvement of Livestock, Department of Animal and Poultry Science, University of Guelph, ON, N1G 2W1, Canada
Corresponding author: Tomasz Strabel; e-mail: strabel{at}man.poznan.pl.
| ABSTRACT |
|---|
|
|
|---|
Key Words: dairy cattle test-day yield random regression model
Abbreviation key: ACC = accuracy of breeding value, AG = additive genetic, AS = age and season of calving, BIC = Bayesian information criterion, HTD = herd-test-date, HY = herd-year, PE = permanent environmental, PSB = percentage of squared bias, RRM = random regression model, RV = residual variance, TDM = test-day model.
| INTRODUCTION |
|---|
|
|
|---|
The main nongenetic effects in TDM are the herd-test-date (HTD) effect and fixed regressions to account for the average shape of lactation curves. It has been shown that using HTD effect increases the accuracy of evaluation (Beard, 1983; Strabel and Szwaczkowski, 1995). Average lactation curves may be modeled in many alternative ways; for example, by lactation curve functions (Guo and Swalve, 1997; Jamrozik et al., 1997), fixed classes (Stanton et al., 1992; Strabel, 2004), or with the use of splines (Druet et al., 2003). Due to the differences between populations, various functions are preferred for genetic evaluation models in different countries (Liu et al., 2001; Reinhardt et al., 2002; Mrode et al., 2003).
The variance of daily production also changes during the course of lactation, and the correlation between daily yields decreases with increasing time between tests. For these reasons, random regression models (RRM) were proposed (Schaeffer and Dekkers, 1994). They assume that additive genetic (AG) and permanent environmental (PE) effects change the average shape of the lactation curve. Although various functions have been used to model random curves (Jamrozik and Schaeffer, 1997; Ptak and
arnecki, 1998; López-Romero and Carabaño, 2003), Legendre orthogonal polynomials are becoming the function of choice for this part of the model, as they sufficiently describe variation and help to avoid overestimation of genetic variances and heritabilities at the extremes of lactation (Rekaya et al., 1999; Brotherstone et al., 2000; López-Romero and Carabaño, 2003).
It has been found in several studies that residual variance (RV) is not constant throughout lactation (Jamrozik et al., 1997; Olori et al., 1999a,b; Brotherstone et al., 2000; Rekaya et al., 2000). The heteroskedastic model can be postulated by defining arbitrary classes for DIM, within which RV is assumed constant (López-Romero et al., 2003), or by using the change point technique (López-Romero et al., 2004) or the function of time to model RV (Jaffrezic et al., 2000).
The number of possible combinations of functions and effects included in the RRM makes comparison of all of them infeasible. Thus, the model selection process is usually restricted to analysis of preselected candidates. The choice of a function to describe fixed lactation curves is often made before the full model is applied to the data (Druet et al., 2003). Models can be compared based on several measures, including the size of the residual variance and average accuracy (ACC) of prediction for certain groups of animals. The likelihood ratio may indicate better models; however, this method tends to favor more complex models (Ducrocq, 2000; López-Romero and Carabaño, 2003). To avoid this propensity, a correction for the number of parameters used in the model can be applied, such as the Bayesian information criterion (BIC) (Schwarz, 1978). Additional methods, such as analysis of residuals, are often used, as the results of comparison of models based on a formal criterion are not always consistent for a given set of models.
Although the number of cows under milk recording has increased in recent years, the average herd size in Poland is still small. In 2002, there were over 464,000 cows in almost 20,000 herds, which gave on average 23.2 cows per herd (KCHZ, 2003). This resulted in a large proportion of cows without or with only a few contemporaries. Observations from such classes have limited value for genetic evaluation, and they reduce the reliability of estimates. Small herds usually lead to small HTD classes, and the problem of small contemporary groups remains. In such a case, increasing the size of the contemporary group or treating the herd-time effect as random are among the methods that can improve the quality of genetic evaluation by reducing the variances of residuals and prediction error (Ugarte et al., 1992; Visscher and Goddard, 1993; Oikawa and Sato, 1997). Strabel and Szwaczkowski (1999) compared models with HTD formed by different strategies of clustering with a model containing fixed herd and random HTD effects and found the latter to be superior. The model with a random HTD gave the smallest ratio of error to total variance, and was the best in predicting future records. Similarly, Lidauer et al. (2003) modeled the herd effect for the Finnish dairy population by fixed herd-year and random HTD components. Mayers et al. (2002) found that the combination of random HTD and fixed herd-years gave protection against the negative consequences of HTD groups that were too small.
The overall lactation curve in Poland is characterized by a relatively early peak (Strabel, 2004). Rapid changes in the beginning of lactation may require more complex functions or higher-order polynomials to describe variation in this period. A similar problem was noted for dairy breeds in Switzerland (Kistemaker and Schnyder, 2004). The low peak yield of Polish Black and White cows is associated with relatively low production level when compared with other populations (Strabel and Misztal, 1999).
The objective of this study was to develop the random regression test-day model that would be applicable for routine genetic evaluation of Polish Black and White cattle.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
Data set A consisted of test-day milk yields of 6319 cows in 55 randomly selected herds that were required to have a minimum of 30 cows. The detailed structure of data set A is presented in Table 2
. A slightly stricter criterion on herd size (minimum of 50 cows) was applied for selection of herds for the second part of the analysis (data sets B and C). Two distinct data sets were created and used to reduce the strength of the association between the data and the model. The records used in data set A could have been selected again in data set B or C. To avoid HY levels with fewer than 5 records, some classes of this effect were clustered: records from a single HY were moved to HY from neighboring years. The structure of data sets B and C is presented in Table 2
. Both data sets were almost equal in size, but in data set B there was a smaller number of herds, higher average production, bigger HTD classes, and more cows per sire.
|
![]() | ([1]) |
where Yijklm is the yield of milk on test-day l of cow m within herd-test day effect i, belonging to herd-year class k and the jth class of AS, htdi is the herd-test-day effect, bjn are fixed regression coefficients specific to age-season subclass j, ckn are fixed regression coefficients specific to herd-year k, amn are random regression coefficients specific to the AG effect of cow m, pmn is the random PE effect, and eijklm is the residual effect for each observation. The zmnl are Legendre polynomials on DIM (Kirkpatrick et al., 1990), and nb, nc, na, and np represent the order of fit for AS, HY, AG, and PE curves, respectively.
The covariance structure for models with random HTD effect was defined as:
![]() | ([2]) |
where I is the identity matrix,
is the variance of the random HTD effect, A is the matrix of additive genetic relationships among animals,
is the Kronecker product, G and P are covariance matrices of the random regression coefficients for AG and PE effects, R is the diagonal matrix of the form I
, and
2e is residual variance. Models with fixed HTD effects all had the same (except for the HTD component) covariance structure.
All specific models were fitted by the REML method using the BLUPF90 software package (Misztal et al., 2002).
Preselection of Models
Several alternative submodels were used to describe the shape of the lactation curves. They were created by selecting second- (L2), third- (L3), and fourth- (L4) order Legendre polynomials for different parts of the model. The HTD effect was alternatively used as a fixed or random effect. The models are presented in Table 3
. Model comparison criteria were the percentage of squared bias (PSB; Ali and Schaeffer, 1987), and the analysis of ACC of evaluations. The PSB was computed as
|
![]() | ([3]) |
where n is the number of records, yijklm is the observed record, and
ijklm is the record predicted by the particular model in question, considering all the estimates of effects associated with this observation.
The accuracy of evaluation for the total yield in lactation was calculated as the average correlation between the true and predicted breeding values for all animals. The inverse elements of the mixed model equations for the diagonal block corresponding to random regression coefficients for animal genetic effects were used to make this calculation. They were combined to obtain prediction error variance for the total yield per lactation (for details see Jamrozik et al., 2000).
Further Model Comparison
Additional comparisons of the selected models were made using the same general TDM [1] as in the preselection step. Only models with random HTD were considered. Third-order Legendre polynomials (L3) were used to describe the HY, AG, and PE curves. For the AS effect, fourth- (L4) and fifth- (L5) order Legendre polynomials were also used, and the models with these effects were denoted AS4-PE3 and AS5-E3, respectively. Additionally, a model with L5 for the PE effect and L5 for the AS effect was tested. This model was denoted AS5-PE5. All models were analyzed alternatively with homogeneous or heterogeneous RV. All remaining parts for all compared models were the same as in the preselection step.
Matrix R for the models with homogeneous RV was a diagonal matrix containing the constant variance of residuals. For the models with heterogeneous RV, R was the diagonal matrix of the residual variances that depended on DIM. A second-order Legendre polynomial, following Druet et al. (2003), was applied:
![]() | ([4]) |
where
to ensure positive values of variances; rn were parameters that described the residual variance over DIM, and zn were the Legendre polynomial covariates. No other sources of heterogeneity of variance were assumed.
The models were compared by BIC values and a detailed analysis of residuals was performed. The BIC was calculated as:
![]() | ([5]) |
where lnL is the maximum of the restricted log likelihood function, k is the number of variance components estimated, and
is the difference between the number of observations and the rank of the fixed effects incidence matrix. The BIC criteria could only be used to compare models with equal fixed parts, as L was the restricted likelihood (REML method).
Residuals were calculated for each DIM between 5 and 305 as a difference between yijklm and
ijklm. Mean values and variances of these residuals (over all TD records) were estimated and plotted for selected DIM. Additionally, averages of mean daily residuals and their variances (over 305 DIM) were estimated for each model and they were later applied as a comparison criterion. Average residuals can be used to determine the accuracy of the model, and variances of residuals can evaluate precision of the compared models (Jamrozik et al., 1997).
| RESULTS |
|---|
|
|
|---|
The evaluation of models based on the correlation between true and predicted breeding values favored models with unequal degrees of polynomials for the AG and PE effects, especially the models with L3 for the AG and L2 for the PE effect. Less accurate prediction of genetic merit was obtained from models with L4 used for the AG and PE effects. The differences between models with various submodels for fixed curves and the same models for random curves were small. The differences between models with different ways of defining the HTD effect were minor, and the order of fit for fixed submodels was less important than the other differences between models.
Lactation curves for a selected AS class with different orders of Legendre polynomials are presented in Figure 1
. The typical shape of the milk yield lactation curve was achieved only with the L4 polynomial. The other submodels were not able to describe peak yield accurately.
|
Further Model Comparison
Table 4
presents BIC and residuals statistics for the 3 compared models with homogeneous and heterogeneous RV. All values were similar for both data sets (B and C). In general, BIC slightly favored more complex models with a higher number of parameters for PE and heterogeneous RV over simpler models with homogeneous RV. However, model AS5-PE5 with homogeneous RV for data set B was indicated as the best. It should be emphasized that the differences in BIC values were small.
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
A comparison based only on the PSB and ACC criteria performed in the preselection step did not clearly indicate the best model. The selection of a model for further analysis was done in a stepwise manner. For the AG effect, Legendre polynomials of order 3 were chosen because such models were superior based on the PSB, and because models with other orders for this effect and equal remaining parts of the model ranked the worst. Presumably, the PE effect should be modeled with the same function as AG, because it is difficult to find a biological reason that the functions should differ. Such a model may be easier to implement. Finally, the PSB and ACC criteria did not unanimously indicate which submodel, AG or PE, should be described by the simpler function. Models with L2 for these effects could hardly be indicated as an alternative. Both PSB and ACC evaluated those models differently. Low complexity for the PE effect curve might not be enough to account for systematic changes at the periphery of the trajectories. Although models with fixed and random HTD effects ranked similarly with respect to the accuracy of evaluation and PSB, fixed HTD gave extreme solutions for some levels of this effect. Hence, random HTD was chosen for further analysis. Considering the order of fit for the fixed part of the model, it should be stressed that the low orders of Legendre polynomials were not satisfactory in describing the overall AS lactation curve. Hence, L4 was selected for this effect for further consideration. A similar submodel could be used for HY curves, but due to the problem of small herd size, functions with a lower number of parameters (requiring fewer observations to be estimated) were preferred. Consequently, a third-order Legendre polynomial was chosen.
Further in the comparison, fixed AS curve analysis showed that fifth-order Legendre polynomials were the most suitable for the Polish Black and White population, as they clearly improved the distribution of residuals compared with fourth-order polynomials. The high order of polynomials required to describe lactations of Polish Black and White cows may be the result of their relatively low production. The average milk daily yield of all the data sets used in this study (16.4 kg) was equal to the average yield at the end of lactation in a study based on Canadian cows (Jamrozik and Schaeffer, 1997), and much lower than the average daily yield (22.9 kg) for the cow population in the Netherlands (de Roos et al., 2004) and the 20.3 kg for Finland (Kettunen et al., 2000). The shape of the lactation curve, its lower peak in particular, is also related to the level of production. Strabel (2004) found that the peak appears relatively early, in the second part of the first month of lactation. This part of the lactation is the most complex, and thus particularly sensitive to differences in modeling. In addition, modeling is difficult due to the limited number of records available for this stage of lactation. An overestimated yield at the very beginning of lactation and an underestimated yield at the peak were also found by Jamrozik et al. (1997), who used the Wilmink and the Ali and Schaeffer (1987) functions for the fixed part of the model. The same stage of lactation caused similar problems for the German population (Liu et al., 2001).
Random regression models for the national genetic evaluation also differ in the functions used to describe random curves. Although Legendre polynomials have become a standard for this part of the model, there are differences in their order between different implementations: the third order is used in Germany (Liu et al., 2001), the fourth in Canada (Kistemaker, 2003), and the fifth in the United Kingdom (Mrode et al., 2003). Although higher-order polynomials usually improve model plausibility, there are several problems associated with them. The AG variance follows more oscillatory patterns, which leads to extreme values at the peripheries of lactation and a negative correlation for the extremes of lactation (Pool et al., 2000; López-Romero and Carabaño, 2003; Strabel et al., 2003). Moreover, analysis of the eigenvalues of AG covariance matrices shows the diminishing importance of adding further parameters. Finally, the more parameters are used, the less accurately they are estimated, because fewer records are available for each estimate. Unreasonably high estimates of genetic variance for some parts of the lactation may lead to overestimation of the average genetic variance across the whole lactation. Consequently, the accuracy of genetic evaluation may be overestimated. Thus, the rankings of random regression models based on this criterion should be studied with care.
Szyda and Liu (1999) found heterogeneous variance of yield traits in test-day records of Polish Black and White cows throughout lactation. The results of this study confirmed the heterogeneous residual variance in test-day milk yields. However, the need to incorporate heterogeneous error variance into the model was not apparent, based on the statistical tests performed. In general, using heterogeneous RV increased the variability of residuals for almost all of the models compared. Although as the error variance increased at the peripheries of the lactation, reduction of PE variance in the same regions was noted. A similar effect was reported by López-Romero et al. (2004). A small effect of fitting heterogeneous error variance on the estimation of AG and PE variance was found by Olori et al. (1999b). Similarly, López-Romero et al. (2003) found no differences in AG variance with different numbers of classes for residual variance. López-Romero et al. (2003) observed a slight impact of ignoring heterogeneous RV when the order of fit for PE curves was higher than 3. Similarly, Ødegård et al. (2003) found that the PE effect absorbed most of the heterogeneity of residual variance, particularly for more complex models. A higher order of polynomials for PE (fourth) than for AG (third) was postulated by Pool et al. (2000). López-Romero and Carabaño (2003) also concluded that lower orders of polynomials for AG than for PE might be sufficient. Although the range of values of PSB obtained in this study was within the range of results presented by López-Romero and Carabaño (2003), no association between PSB and the order of fit for the PE effect in the preselection step was observed. The positive effect of increasing the number of parameters for the PE effect was confirmed only in terms of a reduction in the variance of residuals. Setting fifth-order polynomials for PE instead of third-order had no other positive effect on the compared models. On the other hand, the model with fifth-order Legendre polynomials for PE and third-order for AG could not very accurately describe PE and AG variance at the end of lactation, which resulted in unreasonably high estimates of heritabilities. Hence, it seems appropriate to model milk test-day yields with homogeneous residual variance, because such a model is more parsimonious.
Random regression models lead to various results for the level and pattern of daily milk yield heritability; this was discussed by Misztal et al. (2000), and confirmed in more recent studies. The shape of the heritability curve with lower values at the beginning and end of lactation, as obtained by Druet et al. (2003) for example, seems most desirable, as these periods are strongly influenced by nongenetic effects cumulated before calving and associated with the farmers decision about drying off. Similar shapes of heritability curves were obtained by several authors using multitrait analysis (Liu et al., 2000; Druet et al., 2003) and random regression models (Pool et al., 2000; Jakobsen et al., 2002; Auvray and Gengler, 2002). Opposite shapes, with large values estimated at the peripheries and low values in the middle part of lactation were also found (Jamrozik and Schaeffer, 1997; Strabel and Misztal, 1999; Kettunen et al., 2000; Samoré et al., 2002). Using the covariance function approach and Legendre polynomials of order 3, Szyda (2001) also obtained higher heritabilities in the middle part of lactation than at the peripheries for yield traits and the first lactation. The range for heritabilities obtained in that study was similar to that obtained by Strabel and Misztal (1999) for the same population, and comparable with the results obtained in this study.
The different values of the genetic parameters obtained for the 2 data sets (B and C) imply that variance component estimation employing RRM is particularly sensitive to data selection. This problem was also mentioned by Druet et al. (2003), who found large variation among genetic parameters obtained from 10 distinct data samples. These authors suggested pooling to obtain more reliable estimates with lower standard deviations. López-Romero and Carabaño (2003) found that the values of daily variances and the associated heritabilities were not the same across data sets from different regions of Spain. The importance of the data used for analysis was also raised by Pool and Meuwissen (2000). They found that the goodness of fit is improved when only records of complete lactations were used for variance component estimation. In this study, lactations with test-day records covering at least the first 150 DIM were used. The distributions of records for both analyzed data sets showed that the frequency of the records in the third trimester of lactation decreased, reducing the number of records for the last days of the 305-d period to 50% of records in the first 2 trimesters. The differences in heritability estimates for data sets B and C, although small, were not easy to investigate. A possible explanation for the higher estimates for data set C might be differences in the structure of the data sets.
A random HTD effect required the inclusion of other herd-specific fixed effects. The herd-year effect was added to the model in this study. This made it possible to analyze the changes in environmental conditions that influenced the shape of the lactation curves. A solution for this effect may help in making management decisions, indicating in which part of the lactation an increase in production can be achieved. Although this additional benefit appeared because of adjusting the model to a population with small herd size, this effect needs to be analyzed with caution. Seasonal calvings, if they exist in a herd, may interfere with changing environmental conditions. In small herds, due to the small number of records, herd (instead of herd-year) lactation curves might have to be estimated.
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication September 17, 2004. Accepted for publication June 15, 2005.
| REFERENCES |
|---|
|
|
|---|
arnecki. 1998. Estimation of breeding values of Polish Black and White cattle using test day yields. Proc. 6th World Congr. Genet. Appl. Livest. Prod., Armidale, Australia XXIII:335338.
This article has been cited by other articles:
![]() |
H. Hammami, B. Rekik, H. Soyeurt, A. Ben Gara, and N. Gengler Genetic Parameters for Tunisian Holsteins Using a Test-Day Random Regression Model J Dairy Sci, May 1, 2008; 91(5): 2118 - 2126. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Vasconcelos, F. Santos, A. Bagnato, and J. Carvalheira Effects of Clustering Herds with Small-Sized Contemporary Groups in Dairy Cattle Genetic Evaluations J Dairy Sci, January 1, 2008; 91(1): 377 - 384. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Strabel and J. Jamrozik Genetic analysis of milk production traits of polish black and white cattle using large-scale random regression test-day models. J Dairy Sci, August 1, 2006; 89(8): 3152 - 3163. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |