|
|
||||||||
Department of Dairy Science, University of Wisconsin, Madison 53706
Corresponding author: K. A. Weigel; e-mail: weigel{at}calshp.cals.wisc.edu.
| ABSTRACT |
|---|
|
|
|---|
Key Words: longevity survival analysis proportional hazards dairy cattle
Abbreviation key: KL = Kullback-Leibler, PL = productive life, RR = relative risk
| INTRODUCTION |
|---|
|
|
|---|
In the US, a linear model is used for genetic evaluation of PL, as described by VanRaden and Klaaskate (1993). For completed records, PL is calculated as the total number of months in milk between first calving and 84 mo of age, with a maximum of 10 mo per lactation. For censored records from cows that were sold for dairy purposes or cows that are still alive at the time of analysis, linear regression is used to "project" total months in milk from data regarding current months in milk, current months dry, age at first calving, and lactation status (milking or dry). However, the correlation between completed PL records and their early projections is low (VanRaden and Klaaskate, 1993), so the reliability of information for sires that have the majority of their daughters in first and second lactation is limited.
Survival analysis is based on the concept of the hazard rate, which is the instantaneous probability of culling for a particular animal at a given point in time (Smith and Quaas, 1984; Ducrocq, 1987). The application of survival analysis methodology does not require extra data collection; it is simply an improvement in the statistical treatment of culling data.
Censored observations can be accommodated properly in survival models because one can differentiate between a cow that died at exactly time t (i.e., a completed record) and a cow that was last seen alive at time t but may have survived several additional months or years (i.e., a censored record). Survival models can also handle time-dependent covariates, and this feature is particularly important when factors such as herd size, facilities, management practices, feed quality, or productivity of individual cows or their herdmates change over time. For example, modeling herd-year-season contemporary groups in a time-dependent manner can account for the precise management conditions that will influence the risk of culling for a particular cow during each month of her life, and it is not necessary to assume that all cows calving in a given herd-year-season period will be subject to the same conditions throughout their entire lives.
Longevity data usually have a skewed distribution, and, in the absence of a suitable transformation, analysis of such data using methods that assume normality can lead to awkward results (Egger-Danner, 1993). Furthermore, the relationship between longevity and management or genetic factors is probably multiplicative rather than additive (Vukasinovic, 1999). Curiously, comparisons between survival analysis and linear model methodology based on actual data are lacking in the animal breeding literature.
The objectives of this study were to apply survival analysis methodology to the prediction of longevity PTA for US Jersey sires and to compare the accuracy of these predictions with those from the linear model approach that is currently used for routine national genetic evaluation of PL.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Longevity was defined as the number of days from first calving until culling or censoring. Records from cows that were sold for dairy purposes, cows residing in herds that discontinued milk recording, cows that were still alive after 5 completed lactations, and cows that were still alive at the time of analysis were considered censored. Cows that did not calve again within 6 mo after completing a "normal" lactation were considered dead and were treated as uncensored.
Survival Analysis
The following Weibull proportional hazards (sire) model was used for analysis of length of PL:
![]() | ([1]) |
where
| hijkl(t) | = | hazard function (instantaneous probability of culling) for a given cow at time t;
| h0(t) | = | Weibull baseline hazard function with scale parameter and shape parameter ;
| Pi(t) | = | time-dependent fixed parity-stage of lactation effect, assumed to be piecewise constant with change points at 0, 45, and 270 d of lactation 1, 2, 3, 4, or 5;
| Aj | = | time-independent fixed effect of age at first calving, treated as a continuous covariate with regression coefficient ß;
| hysk(t) | = | time-dependent random effect of herd-year-season, assumed to be independently distributed, following a log-gamma distribution with parameter , and assumed to be piecewise constant with change points at January 1, May 1, and September 1 of each year;
| sl | = | time-independent random effect of sire of cow assumed to be distributed as multivariate normal with mean vector and covariance matrix , where is the additive relationship matrix between sires.
|
The Survival Kit version 3.12, a set of FORTRAN programs written by Ducrocq and Solkner (1998b), was used for the Weibull analysis. Details are given in Ducrocq (1994), and theoretical aspects are presented by Ducrocq and Casella (1996). An empirical Bayes approach was used to estimate the fixed parameters and to predict the random sire effects (
).
The sire variance(
) was estimated as the mode of its marginal posterior density, which was approximated by Laplacian integration. A sire model was chosen for computational reasons, although the Survival Kit version 3.12 can accommodate an animal model as well. To avoid problems caused by herd-year-season classes or sire progeny groups with few animals (and no uncensored failures), the parameters
,
and
were estimated from a subset of data from 11 large herds and with a minimum of 20 uncensored daughters per sire. It is unknown whether such a restriction induces a sampling bias. To the extent that censoring is random over sires (an assumption of the procedure), we conjecture that such bias is negligible.
Linear Model Analysis
Because one of our main goals was to compare survival analysis methodology with the linear model approach currently used for routine national genetic evaluation of PL in the US, we also estimated sire PTA for longevity using the following linear (animal) model:
![]() | ([2]) |
where
| yijkl | = | completed or projected months in milk at 84 mo of age for a given cow;
| HYSi | = | fixed effect of herd-year-season of first calving;
| ACj | = | fixed effect of age at first calving; and
| ak | = | random additive genetic effect of animal k assumed to be distributed as multivariate normal with mean vector and covariance matrix , where is the relationship matrix between animals and is the additive genetic variance.
|
Model [2
] is similar to the linear model used by the USDA Animal Improvement Programs Laboratory, except that projected observations for censored animals were based only on age at first calving and current months in milk. (Information about current months dry or current milking status was lacking.) Genetic and residual variance components were estimated from the data via REML, and sire PTA were predicted by BLUP, conditionally, on the estimated variance components. Pearson correlations coefficients were obtained between PTA obtained in this study and PTA from the November 2003 USDA sire summary. For sires with at least 30 uncensored daughters, the correlation was 0.63, and for sires with at least 100 uncensored daughters and born before 1995, the correlation was 0.75.
Model Comparisons
The data were split into 2 samples via random selection of herds; Subsets A and B contained records from 122,388 and 128,450 animals, respectively. Longevity PTA for the sires were subsequently predicted, conditionally, on estimates of the Weibull and dispersion parameters obtained from all data, by applying Model [1] to each subset. The longevity PTA of each Jersey sire was expressed as the risk of culling of his daughters relative to the risk of culling for daughters of an average sire. Sire PTA from Model [1] for Subsets A and B will be denoted as RRA and RRB, respectively. Similarly, sire PTA were estimated in Subsets A and B using linear Model [2], and the resulting predictions will subsequently be referred to as PLA and PLB, respectively. Correlations between estimated sire PTA from each method (survival analysis or linear model) and each subset of the data (A or B) were calculated to assess the agreement (or lack thereof) between genetic predictions from each type of methodology.
Three different approaches were used for model validation. First, stayability observations (i.e., binary responses) indicating survival to second, third, fourth, or fifth lactation for cows in Subset A were regressed (separately) on RRA or PLA of their sires using logistic regression, such that each sires PTA (from each model) could be converted into the probability that his daughters would survive to second, third, fourth, or fifth lactation. Next, the expected number of daughters of each sire that would survive to second, third, fourth, or fifth lactation in Subset B was calculated by multiplying the probability of survival to each lactation (from Subset A) by the total number of daughters in Subset B. The actual number of daughters in Subset B that survived to second, third, fourth, or fifth lactation in Subset B was subsequently determined for each sire, and the observed and expected number of "survivors" and "failures" were compared using the following
2 statistic:
![]() |
These
2 statistics were summed across sires, and the model that produced the smallest sum was considered as the most accurate predictor of stayability. The reciprocal analysis was subsequently carried out by using survival probabilities computed from Subset B to predict daughters actual survival rates in Subset A. Because some daughters of bulls born in recent years may have not yet had an opportunity to survive to third, fourth, or fifth lactation, we repeated the analysis using only those sires that were born in 1992 or earlier. Finally, a weighted analysis was conducted, in which each part of the
2 statistics for individual sires was weighted by the number of daughters of that sire.
In the second approach to cross-validation, the Kullback-Leibler (KL) criterion (Kullback, 1968) for measuring the distance between 2 distributions (one of them assumed to be the true distribution) was applied to the stayability data described previously. A rough interpretation of the KL distance (Sorensen and Gianola, 2002) is the expected value (under the true distribution) of the log-odds ratio of the posterior probabilities of each of 2 models that are, a priori, equally likely.
Let f(y|
T) be the density of the "true" distribution, where y is the data vector (the binary stayability variables in our case) and
T is the parameter of the distribution. Let p(y|
A) be the density of an alternative distribution. The KL discrepancy between the 2 distributions is defined as
![]() |
If the data are discrete, f(y|
T) and p(y|
A) are probability distributions, and the KL discrepancy is
![]() |
In our study, we regressed (logistically) binary stayability observations of daughters in each subset on their sires relative risk (RR) ratios or PL PTA (from the survival and linear models, respectively). From these regressions, predicted probabilities of survival to second, third, fourth, and fifth lactations were obtained. Denote the predicted probability of survival (to the beginning of each lactation) for daughters of sire i as
and
for the survival and linear models, respectively. Next, consider the empirical distribution of the number of daughters that survived to a given lactation in the other subset of data as the "true" distribution. (This empirical distribution approaches the true distribution as the number of daughters per sire goes to infinity.) In other words, these empirical probabilities are calculated as
where ni and ni,survive are the total number of daughters of sire i and the number of daughters of sire i that survive to a particular lactation, respectively. We can regard the ni,survive from different sires as independent random variables that follow a binomial
distribution. In practice, however, the empirical probabilities
can be null if there are no surviving daughters. In this situation, the KL distance is not defined because the logarithm of 0 is not defined. Although it would be possible to exclude such cases for computational purposes, it is more intuitively appealing to "shrink" estimates based on limited information toward the population average because survival probabilities of null or unity are not biologically plausible. If we assume that the binomial probabilities
are distributed a priori across sires as Beta (a, b), then the posterior distribution of
is a Beta(ni,survive + a, ni,fail + b) distribution, where ni,survive and ni,fail represent the number of survivors and failures for sire i, respectively, and where the parameters a and b are estimated from the variation in relative frequencies between sires using a method of moments fit (Gelman et al., 2003); we refer to this as an empirical Bayes approach. We calculated the overall survival rate as
![]() |
where M is the number of sires, and the between-sire variance in survival probabilities as
![]() |
Following Gelman et al. (2003), we calculated
![]() |
![]() |
and
![]() |
The posterior mean of the probability of survival of daughters of sire i is
![]() |
This was used as the true probability in the empirical distribution of the data, and the KL criteria was calculated as
![]() |
where
is set equal to
or
when calculating the KL distance (alternative distribution) for the survival analysis or linear model, respectively.
The third approach was quite similar to the logistic regression and
2 statistic described previously. However, instead of measuring stayability to the beginning of second, third, fourth, or fifth lactation for daughters of a particular sire, we considered stayability to 36, 48, 60, 72, and 84 mo of life. Only those daughters that had an opportunity period of 36, 48, 60, 72, or 84 mo, respectively, were included in calculation of the
2 statistic.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
), 0.94 (
), and 0.021
. Heritability on the logarithmic scale was approximated using the equation of Ducrocq and Casella (1996):
![]() |
where
(1)(
) = tri-gamma function evaluated at
, and
is the variance of an extreme value distribution. This yielded an estimated heritability on the log scale of 4.7%. Heritability on the original scale was approximated using a Taylor series expansion of log h2 around its mean, as in Ducrocq and Casella (1996):
![]() |
where v =
(
) log(
) Eulers constant, and
(
) = di-gamma function evaluated at
. This yielded a heritability estimate on the original scale of 18.0%. By comparison, the estimated heritability of PL using the linear Model [2] was 7.0%. The approximation to heritability on the original scale typically gives estimates that are larger than those obtained from linear models. This suggests that either environmental influences are modeled more precisely using this survival analysis methodology or, perhaps, that the approximation given previously is inadequate (Korsgaard et al., 2002).
Survival analysis methodology allows expressing a sires genetic merit in several ways, including the RR of culling, the expected percentage of daughters still alive after a given number of lactations, or the expected length of PL. In the latter case, estimates correspond to points on a given sires survival curve, S(t). For example, the median survival time occurs when S(t) = 0.50. In this study, we used RR of culling to express genetic merit for survival. For example, daughters of a sire with a RR of culling of 1.2 are expected, on average, to have a 20% greater chance of being culled at any given time than daughters of an average sire (with RR of culling = 1.0). For the Jersey sires considered herein, risk ratios for individual sires ranged from approximately 0.7 (low risk of culling) to 1.3 (high risk of culling).
The number of cows in each data set, by birth year of their sires, is shown in Table 1
, and correlations in predicted genetic merit between methods (survival or linear model) and data sets (A or B) are shown in Table 2
. These correlations were 0.73 for RRA and RRB, 0.76 for PLA and PLB, 0.55 for PLA and RRA, 0.60 for PLB and RRB, 0.46 for PLA and RRB, and 0.35 for PLB and RRA. These results indicate a stronger correlation within methodologies than within data sets (with different methodologies).
|
|
|
2 statistics from the observed vs. expected number of survivors to second, third, fourth, and fifth lactation. When using estimated survival probabilities from Subset A to predict the survival rate of daughters to the beginning of each lactation in Subset B, the RR ratios from the survival analysis (sire model) were superior to the productive life PTA from the linear (animal) model analysis in all cases. When using estimated probabilities from Subset B to predict daughters survival in Subset A, the linear model gave better predictions of survival to second and third lactation, but the survival analysis results were superior predictors of survival to fourth and fifth lactation. However, some daughters of sires born in recent years may have not yet had an opportunity to survive to third, fourth, or fifth lactation. Therefore, the analysis was repeated using only sires born prior to 1992. When this restriction was applied, survival analysis was a more accurate predictor of stayability to each lactation in both Subsets A and B.
|
2 statistic was weighted by the number of progeny for each sire. Results (Table 5
2 from a few sires (with many daughters). However, this does not compromise the usefulness of the results because these sires are important to the Jersey breed.
|
|
2 statistics were used to compare models predicting stayability of daughters to 36, 48, 60, 72, and 84 mo of life among daughters that had opportunity to survive that long. Survival analysis was a better predictor of stayability than linear model anaylsis in all cases in Subset A. In Subset B, survival analysis was a superior predictor of stayability to 48, 60, and 72 mo of life, but the linear model was a better predictor of stayability to 36 or 84 mo of life.
|
| CONCLUSIONS |
|---|
|
|
|---|
Results from the present study suggest that survival analysis can yield higher heritability estimates than linear models, which may lead to more rapid genetic progress, provided that such heritability approximations are meaningful (Korsgaard et al., 2002). Further, differences occurred in sire rankings between the 2 methods considered in this study, so a change in methodology would alter the usage level of certain sires in commercial dairy herds.
None of the 3 methods (
2 analysis of predicted stayability to a certain lactation, KL distance comparison, and
2 analysis of predicted stayability to a certain age) provided a clear-cut answer regarding the superiority or inferiority of survival analysis methodology. However, predictions derived from estimates of sires genetic merit for longevity from the survival analysis tended to be more closely related to actual stayability observations and culling times in independent samples of the data. It should be noted that these comparisons favored the linear model, a priori, because more pedigree information was used in the linear (animal) model than in the survival (sire) model. When combined with the aforementioned theoretical advantages, these results tend to support the adoption of survival analysis methodology for routine genetic evaluation of longevity in dairy cattle. Although survival analysis methodology is computationally demanding, particularly when using an animal model, advances in computing speed and power should allow its implementation on a national scale. Furthermore, any advantages in reliability or stability of longevity evaluations for dairy sires attributable to implementation of this methodology would be obtained with no additional data collection costs.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication October 9, 2003. Accepted for publication January 6, 2004.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
T. Serenius and K. J. Stalder Selection for sow longevity J Anim Sci, April 1, 2006; 84(13_suppl): E166 - E. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Z. Caraviello, K. A. Weigel, and D. Gianola Prediction of Longevity Breeding Values for US Holstein Sires Using Survival Analysis Methodology J Dairy Sci, October 1, 2004; 87(10): 3518 - 3525. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |