|
|
||||||||
1 Centre for Reproductive Biology in Uppsala and Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, PO Box 7023, SE-75007 Uppsala, Sweden
2 Station de Génétique Quantitative et Appliquée, Institut National de la Recherche Agronomique, 78352 Jouyen-Josas, France
3 Swedish Dairy Association, Box 1146, SE-63180 Eskilstuna, Sweden
Corresponding author: Erling Strandberg; e-mail: Erling.Strandberg{at}hgen.slu.se.
| ABSTRACT |
|---|
|
|
|---|
Key Words: female fertility genetic evaluation survival analysis
Abbreviation key: CLI = interval between calving and last insemination, CR = conception rate, PBV = predicted breeding value, TBVCR = true breeding value for conception rate, VWP = voluntary waiting period.
| INTRODUCTION |
|---|
|
|
|---|
With field data, the pregnancy status of cows is not always available and thus one cannot be sure if cows have conceived (Weller and Ron, 1992; Roxström, 2001). However, even if pregnancy information is available, linear model methodology, the method most frequently used for the genetic evaluation of fertility, has the disadvantage that it cannot properly distinguish between pregnant and nonpregnant cows. Hence, records of pregnant and nonpregnant cows have to be treated alike (as is commonly done for interval from calving to last insemination), or the records of nonpregnant cows have to be excluded (as is commonly done for calving interval) or extended by projection. Culling for reproduction creates another problem. The worse a bulls daughter fertility is, the larger the proportion of daughters culled for reproductive failure. Thus, sires are evaluated without correct information on their daughters with poor fertility (these daughters either have missing information or observed intervals that are shorter than true intervals). Therefore, such bulls appear to be better than they really are and this is expected to lead to less efficient selection.
Survival analysis is an alternative method for analyzing reproductive traits recorded as time intervals (Lee et al., 1989; Eicker et al., 1996; Harman et al., 1996; Allore et al., 2001). Survival analysis is a statistical method for studying the occurrence and timing of events, where the outcome variable corresponds to a measure of time elapsed from a starting point until the occurrence of a certain event (Lee, 1992). The length of this interval is not always known, because competing events may occur before the occurrence of the event under study. For example, in our case, cows may have been culled, sold, or the study may have stopped before the cows conceived. One of the main advantages of survival analysis is that it can retain the information from cows that are culled before conception or not pregnant by the time the data recording was completed. Thus, records from pregnant (uncensored) and nonpregnant (censored) cows can be treated jointly and included in the analysis, making proper use of all the available information. Within the field of fertility in dairy cattle, survival analysis has been applied to study: 1) the effects of diseases on days to conception (Lee et al., 1989; Harman et al., 1996b), 2) the relationship between BCS and postpartum reproductive efficiency (Suriyasathaporn et al., 1998), and 3) the effect of early lactation milk yield on days open (Harman et al., 1996a). So far, no research using genetic models with survival analysis has been published for fertility traits.
The objective of this study was to investigate by simulation whether the analysis of CLI using survival analysis results in a more accurate genetic evaluation for conception rate than do the commonly used approaches based on linear models.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Simulated Data
Each replicate of the simulated data consisted of 60,000 first-parity cows, daughters of 400 unrelated sires distributed over 1200 herds. The herd size was fixed to 50 cows. The average number of daughters per sire was 150 (SD = 12.3), ranging from 104 to 201 daughters. Fifty replicates were done. Three traits were simulated: 305-d milk production (kg), interval between calving and first ovulation (d), and conception rate (CR, %). The mean phenotypic values were 8000 kg (SD 1000) and 28 d (SD 15) for milk production and interval between calving and first ovulation, respectively.
Conception rate was simulated as a binary trait with an underlying normally distributed liability for conception with mean zero and standard deviation of unity [~N(0,1)]. Zero was chosen as the threshold; hence, all phenotypic values above 0 corresponded to pregnant cows (50% CR). Heritabilities and genetic and environmental correlations among the traits are shown in Table 1
. The heritability of the interval between calving and first ovulation was chosen according to the estimates for the interval from calving to commencement of luteal activity reported by Darwash et al. (1997), Veerkamp et al. (1997), and Royal et al. (2002). For CR, we assumed a heritability value somewhat higher than the values found in the literature, which were estimated with linear model methods, because we simulated CR on the underlying scale. Herd variances as proportion of the phenotypic variance were 9% for the 3 traits. The phenotypic value for each trait was created as: phenotypic value = mean + herd effect + breeding value (
sire breeding value +
dam breeding value + Mendelian sampling) + environmental value.
|
|
For a nonpregnant cow, the observation was considered censored. Cows were culled for reproductive reasons only. To be culled, the cow either had had its maximum number of inseminations without becoming pregnant or the MAXWAIT was exhausted without it becoming pregnant. The CLI was calculated from the last known insemination. For those cows that were never detected in heat, and thus never inseminated, CLI was calculated from the maximum waiting period (MAXWAIT) (approximately 0.8% of the records).
For the linear model analysis, CLI was defined in the same way, except that we could not distinguish between pregnant or nonpregnant cows.
Description of Simulated Data Set
About 15.3 ± 0.06% of the records were censored, i.e., nonpregnant cows. The average for CLI was 104.2 d (SD 39.0). The average number of inseminations was 1.81 (SD 1.10). The mean failure time was 98 d after calving, and the mean censoring time was 138 d after calving. We corroborated the values of the variables created in the simulation with field data. The average number of days for CLI and the number of inseminations was similar to values reported by Lee et al. (1989) and Roxström (2001).
Figure 2
shows the distribution of CLI. As shown, the distribution was skewed and nonnormal. The time scale started at d 56 because we set a VWP of 8 wk for all cows. About 50% of the observations had an interval less than 95 d.
|
![]() | ([1]) |
where
(t) = hazard of a cow of getting pregnant;
o(t) = baseline hazard function; herdi = random effect of herd i, assumed to follow a log-gamma (
) distribution; sirej = random effect of sire j. Sire effects were assumed to follow a normal distribution with variance
2s. No relationship among sires was assumed.
Three models were analyzed using expression [1]: one Cox and 2 Weibull models. The Cox model (model S1) does not assume a distributional form of the baseline hazard function in expression [1] and defines a semi-parametric regression model.
For the Weibull models (model S2 and S3), a parametric form is assumed for the baseline hazard function in expression [1]; where
o(t) = 
(
t)
1, with positive scale parameter
and positive shape parameter
. In model S3, the origin of the time interval analyzed was shifted to avoid a long early period without events (to account for the VWP). The new time scale (t*) was defined as t*= t 55. This value of 55 d was chosen because the VWP was 56 d. When we tested different values (e.g., 56, 55, ..., 50 d) by regressing the log of the Kaplan-Meier estimates against the log of time, d 55 had the best fit. To evaluate the ability of survival models to account for censoring, an extra analysis was done, similar to model S3 with the only difference that all the records were considered as uncensored (model S4).
To validate that the Weibull distribution properly fitted the data, Kaplan-Meier curves were created. The suitability of the Weibull model was assessed visually from the plot of the log of the Kaplan-Meier estimates [nonparametric estimate of the survivor curve S(t)] against the logs of time (Figure 3
). If the assumption holds, a straight line should be obtained. Two plots are shown: a) the time period analyzed starts at day of calving, and b) the time period analyzed starts at d 55 after calving. In a), the relationship results in approximately a straight line, except at the beginning of the period analyzed where no cows get pregnant; therefore it can be assumed that a Weibull distribution reasonably fits the data, at least after the VWP. In b), when the time scale was shifted such that the origin reflects when failures actually start, a straight line was obtained over the whole time scale.
|
![]() | ([2]) |
where yijk = CLI or Log CLI of cow k, in the herd i, daughter of sire j; herdi = random effect of herd i, assumed to follow a normal distribution with variance
2herd; sirej = random effect of sire j; and eijk = error term. Sire effects were assumed to follow a normal distribution with variance
2s. No relationship among sires was assumed.
Three models were analyzed using expression [2]: model L1 included all the records (pregnant and non-pregnant cows) where records are treated alike, model L2 included all the records as in model L1, but we used the log transformation of CLI (Log CLI), and model L3 used CLI and excluded the nonpregnant cows from the analysis (about 15% of the records).
The Survival Kit V3.12 (Ducrocq and Sölkner, 1998) and DMU (Jensen and Madsen, 1994) were used for the survival analysis and linear model analysis, respectively. Variance components and heritabilities were estimated in both analyses.
The heritability for the linear model was calculated as:
. For the survival analysis, equivalent heritability was defined as (Yazdi et al., 2002):
where c is the proportion of censored records (c = 0.153),
2herd is the herd variance calculated as
2herd = trigamma (
). The use of the equivalent heritability allows a direct comparison of the heritabilities for the 2 approaches.
Model Comparison
Three approaches were used to compare the methods: 1) Pearson correlations (SAS Institute, 1999) between predicted breeding values (PBV) based on CLI and true breeding values for conception rate (TBVCR), 2) comparison of average true genetic merit of bulls selected based on their PBV (best 10% and worst 10%), and 3) the proportions of the truly best or worst 10% of bulls that were identified to be in the best or worst 10% based on PBV from each method.
| RESULTS |
|---|
|
|
|---|
was 2.70 for model S2 and 1.13 for model S3. The heritabilities, sire, herd, and residual variances estimated with survival analysis and linear models are presented in Table 2
|
|
Another way to compare the results of the 2 methods is to calculate how large was the proportion of those bulls that are ranked among the top 10% (or bottom 10%) TBVCR, that are among the top 10% (or bottom 10%) when ranked on the PBV from the 2 methods (Table 4
). Survival analysis (model S3) ranked 51% of the bulls correctly in the top 10%, whereas the linear model (model L1) ranked only 42% correctly. The corresponding differences for the worst bulls were similar.
|
| DISCUSSION |
|---|
|
|
|---|
was 2.7 and 1.13 for model S2 and S3, respectively. The difference is a reflection of the early long period without risk of pregnancy in model S2. Even when the origin was shifted (model S3),
was still greater than 1, which indicates that the hazard of conception increased slightly with time (even though we simulated a constant CR). In our simulation, we had variation in the time from calving to first ovulation. Some cows would therefore have their first ovulation after d 56. This would lower the risk of getting pregnant very early, thus increasing the relative risk with time. The Cox model (model S1) had a higher correlation with TBVCR than Weibull model S2. A feature of the Cox model is that no assumption is made about the form of the baseline hazard function. This higher correlation could be explained by the fact that Cox fits the data better at the beginning of the period analyzed. But when the origin was shifted (model S3), the correlation with TBVCR was as high as for the Cox model.
In the simulation, we assumed a VWP of 56 d for all cows and herds. In field data, the VWP would be more flexible and vary between herds and possibly cows, and the appropriate scale shift should be tested (as in Figure 3
).
If we compare the Weibull models S2 and S3, we can see from the lower correlation between TBVCR and PBV that model S2 is not the better model. Mainly, this result occurred because it does not make sense to assume a nonzero hazard before d 56. This inconsistency also has an effect on the estimates of
and h2, which became inflated in model S2. Nevertheless, model S2 was still superior to the linear models for predicting breeding values for CR. In practice, therefore, a Weibull model with any reasonable scale shift can be expected to perform almost as well as the more computationally demanding Cox model.
In the linear model analysis, when part of the information was excluded (model L3), the correlation of PBV with TBVCR was the lowest. Therefore, if the linear model is used, it is better not to discard the nonpregnant cows. When the data was log transformed (model L2), the correlation with TBVCR was only slightly higher compared with model L1. The log transformation of the data did not improve the results much, because the distribution of the data started so abruptly after the VWP (Figure 2
).
The results show that survival analysis is a better method than the linear model for prediction of the genetic merit of bulls for CR when using observations on CLI. Correlations between TBV and PBV were higher for survival analysis than for LM. If selection were carried out on these PBV, this advantage of the survival analysis would also translate into greater genetic progress. This statement was also confirmed by the higher proportion of bulls that rank among the top bulls (or bottom) for CR. The main advantage of survival analysis is the ability to account for censoring. When all the records were treated as uncensored (model S4), we obtained almost the same correlation as was obtained with the linear model. Survival analysis makes proper use of information that would be otherwise discarded or treated as uncensored.
More simplified scenarios regarding variation within and between herds and without the effect of milk production on the maximum number of insemination were studied (results not shown), and survival analysis was always better than the linear model. In fact, the more complex the simulation, the greater was the advantage of survival analysis.
One potential drawback with using survival analysis is that it is currently not possible to use it together with other traits (e.g., production or other fertility traits) in a multiple-trait analysis. Such an analysis would use all the available information simultaneously, while at the same time accounting for potential bias due to culling or selection over time. However, recent work has shown that it is indeed possible to analyze a survival trait together with a normally distributed continuous trait or a threshold trait using a Bayesian approach and applying Gibbs sampling (Damgaard, 2005).
To take full advantage of survival analysis, it is necessary to have information on whether the cow is pregnant or not (e.g., confirmed by a veterinarian). Information from slaughterhouses on culled cows (pregnant or not at slaughter) could be useful as well. It is hoped that information on fertility (actual VWP, service period, pregnancy status) will be more accurately recorded in the future. Having this information and using survival analysis could be expected to give more than 10% greater genetic response than using the linear models typically used today.
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication June 8, 2004. Accepted for publication March 20, 2005.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Sun, P. Madsen, U. S. Nielsen, Y. Zhang, M. S. Lund, and G. Su Comparison between a sire model and an animal model for genetic evaluation of fertility traits in Danish Holstein population J Dairy Sci, August 1, 2009; 92(8): 4063 - 4071. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hou, P. Madsen, R. Labouriau, Y. Zhang, M. S. Lund, and G. Su Genetic analysis of days from calving to first insemination and days open in Danish Holsteins using different models and censoring scenarios J Dairy Sci, March 1, 2009; 92(3): 1229 - 1239. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sewalem, F. Miglior, G. J. Kistemaker, P. Sullivan, and B. J. Van Doormaal Relationship Between Reproduction Traits and Functional Longevity in Canadian Dairy Cattle J Dairy Sci, April 1, 2008; 91(4): 1660 - 1668. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. de Maturana, A. Legarra, L. Varona, and E. Ugarte Analysis of Fertility and Dystocia in Holsteins Using Recursive Models to Handle Censored and Categorical Data J Dairy Sci, April 1, 2007; 90(4): 2012 - 2024. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. d. P. Schneider, E. Strandberg, V. Ducrocq, and A. Roth Short Communication: Genetic Evaluation of the Interval from First to Last Insemination with Survival Analysis and Linear Models J Dairy Sci, December 1, 2006; 89(12): 4903 - 4906. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Carlen, U. Emanuelson, and E. Strandberg Genetic evaluation of mastitis in dairy cattle using linear models, threshold models, and survival analysis: a simulation study. J Dairy Sci, October 1, 2006; 89(10): 4049 - 4057. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gonzalez-Recio, Y. M. Chang, D. Gianola, and K. A. Weigel Number of Inseminations to Conception in Holstein Cows Using Censored Records and Time-Dependent Covariates J Dairy Sci, October 1, 2005; 88(10): 3655 - 3662. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |