|
|
||||||||
1 Departamento de Producción Animal E.T.S.I. Agrónomos Universidad Politécnica Ciudad Universitaria s/n 28040 Madrid, Spain
2 Department of Dairy Science, University of Wisconsin, Madison 53706
Corresponding author: Oscar González-Recio; e-mail: ogrecio{at}pan.etsia.upm.es.
| ABSTRACT |
|---|
|
|
|---|
The STM had greater predictive ability of daughter fertility at first insemination than the other methodologies. However, the CTM predicted daughter fertility more accurately in subsequent inseminations. The DPH and STM had a similar predictive ability of daughter fertility in second and subsequent inseminations.
Key Words: fertility ordinal-censored threshold model sequential threshold model survival analysis
Abbreviation key: CTM = ordinal-censored threshold model, DPH = discrete proportional hazard model, INS = inseminations per conception, STM = sequential threshold model.
| INTRODUCTION |
|---|
|
|
|---|
Three methods were developed and applied to field data on INS in Holstein cattle. First, an ordinal threshold model (Gianola, 1982; Gianola and Foulley, 1983) that accommodates censored records was implemented (CTM). Second, a sequential threshold model (STM), as described by Albert and Chib (2001), which can analyze categorical traits that occur in a sequential order, was applied. Third, a grouped survival model for discrete proportional hazard analysis (DPH), as developed by Prentice and Gloeckler (1978), was fitted. The STM and DPH models allow for time-dependent covariates, whereas CTM does not.
The objective of this research was to infer parameters of INS data with the aforementioned CTM, STM, and DPH models, and to assess their relative predictive abilities.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Ordinal-Censored Threshold Model
The ordinal threshold model (Gianola, 1982; Gianola and Foulley, 1983) was extended to accommodate censored records. The ordinal-censored threshold model (CTM) postulates an underlying latent liability (
) for number of inseminations to conception.
The statistical model for liability was:
![]() |
The systematic effects (x'ß) in the model were as follows: days to first service treated as a covariate; effect of number of lactation (j = 1 to 4 levels); effect of calendar month of calving (k = 1 to 12 levels), and effect of year-season of calving (l = 1 to 30 levels). The random effects were: hm = herd (m = 1 to 767 levels) distributed independently as N(0,I
2h), where
2h is the variance among herds; ssn = service sire for first insemination (n = 1 to 577 levels) distributed as N(0,I
2ss), where
2ss is the variances among service sires; uo = additive genetic effect of sire of cow (o = 1 to 3267 levels) distributed as N(0,A
2u) where A is the additive relationship matrix between sires and
2u is the variances among sires of cows, and ejklmn = random residual assumed independently distributed as N(0,I
2e), where
2e is the residual variance, which was set equal to one. Service sires and sires of cows were assumed to be independently distributed.
When an observed response yi falls into one of the possible known categories of INS, for example j, the liability of that observation is sampled from a truncated normal distribution between 2 given thresholds (Tj1 and Tj). Then, the conditional probability of the event can be written as:
![]() |
where j = 1, 2, 3, 4 indexes the category to which the uncensored observation yi belongs;
(·) is the standard normal distribution function; xi, zh,i, zss,i, zu,i are the respective incidence vectors of systematic (ß), herd (h), first service sire (ss), and sire of cow (u) effects, and T = [T1, T2, T3, T4]' is the vector of unknown threshold parameters. The thresholds must satisfy the restrictions T1
T2
T3
T4, such that the cumulative distribution function is strictly nondecreasing. Further, the first threshold T1 is set to zero, because this parameter cannot be identified in a probit analysis; hence, only T2, T3, T4 are unknown.
If an observation is censored at the jth category, (e.g., a cow was not inseminated again beyond 2 services), then the liability is sampled from a left-truncated distribution. The truncation point is the threshold Tj corresponding to the last known insemination, and the probability can be written as:
![]() |
Then, assuming conditional independence, the joint distribution of the noncensored and censored observations can be written as
![]() |
where
i =1 if an observation is censored observation, and 0 otherwise. Here, N is the total number of observations. Independent uniform priors were used for all elements of ß, and the residual variance was set to one, as stated earlier. Normal priors were assumed for the herd, service sire, and sire of cow effects, and their distributions were as mentioned before. Independent scaled inverse
2 prior distributions with degrees of freedom equal to 3 and scale parameters equal to 0.1 were assigned to each of
2h,
2ss, and
2u
The CTM was implemented via Markov chain Monte Carlo, as in Sorensen et al. (1995). Because thresholds have been shown to mix slowly in some cases (e.g., Kizilkaya et al., 2003), an efficient method proposed by Albert and Chib (1997) was adopted for sampling thresholds. With the first threshold set equal to zero, a logarithmic transformation was applied to the other 3 thresholds as:
![]() |
Subsequently, a Metropolis algorithm was used to sample the thresholds with a normal proposal, instead of the multivariate-t process suggested by Albert and Chib (1997). The value from the previous iteration was used as the mean of the normal proposal distribution, and its standard deviation was set to 0.012, to attain a reasonable acceptance rate. In short, the proposed distribution had the form
![]() |
where ã* is the value of the logarithmic transformation of the thresholds from the previous iteration. The
1 value was set to zero.
Sequential Threshold Model
An STM described by Albert and Chib (2001) can be used to analyze an ordinal categorical trait that occurs in a sequential order, such as INS. This means that for an observation to be present at a given stage of the sequence, it must have passed through all previous stages. For instance, a cow that is inseminated for a third time must have been inseminated and failed at both first and second AI. Therefore, a single latent variable can be used to represent the cows propensity to pass from one category to the next. The response yi can take the value j only after levels 1, . . ., j1 are reached, and then either a "success" (conception) or "failure" (no conception) in level j is observed. Hence, the probability of pregnancy at insemination j, conditionally on the event that the (j1)th insemination has been reached, is given by
![]() |
where ß now represents the systematic effects of DFS as covariate, number of lactation (4 levels), calendar month of calving (12 levels), and year-season of calving (30 levels), and xi is the corresponding incidence vector. As before, h, ss, and u represent random effects of herds (767 levels), service sire (641 levels), and sire of cow (3267 levels), respectively, with their corresponding incidence vectors (zh,i, zss,i,j, zu,i). Further, the vector
= (
1,
2,
3,
4) represents unordered cutpoints; these cutpoints do not need to be ordered as in the case of an ordinal threshold model (Albert and Chib, 2001).
This model can also be formulated in terms of latent variables expressing the propensity of a cow to receive an additional insemination. Corresponding to the jth insemination, define latent variables {wij}, where wij = xi'ß + zh,i'h + zss,i,j'ss + zs,i'u + eij, where eij~NIID (0,1). We observe yi = 1 if wi1
1, and we observe yi = 2 if the first latent variable wi1 >
1 and the second latent variable wi2
2. In general:
![]() |
These latent variables can be incorporated into a Markov chain Monte Carlo sampling scheme. The latent variable representation can be simplified by incorporating the cutpoints {
i} into the mean function and fixing one of the cutpoints, usually
1 = 0. In general, each latent variable can have different explanatory variables. Thus, a binary-threshold model can be fitted for each insemination event, including the vector of cutpoints in the model, and with the response being: "fail" (pass to next insemination) = 1, or "pregnancy" = 0.
The likelihood function takes the form
![]() |
where
i = 1 if the observation is censored, and 0 otherwise.
As before, uniform priors were used for location parameters other than h, ss, and u. The residual variance was fixed to one and, for simplicity, it was assumed constant at each step of the sequence. Priors for the random effects and for the variance components were as in the CTM.
Posterior distributions of the parameters were estimated using a Gibbs sampling algorithm for STM, and a Gibbs/Metropolis combination for CTM (Gelfand and Smith, 1990; Sorensen et al., 1995; Sorensen and Gianola, 2002). The analyses were based on a single chain of 200,000 iterations, with the first 50,000 samples discarded. Monte Carlo standard errors (Geyer, 1992) were calculated for the posterior means of herd, service sire, and sire variances. Heritability in the liability scale was calculated for CTM and STM as:
![]() |
and samples from its posterior distribution were obtained from the draws for the variance components.
Grouped Survival Model
Survival analysis is widely used for analysis of time-to-event traits, and this methodology can deal with censored records more flexibly than most other approaches. Here, the "grouped" survival model developed by Prentice and Gloeckler (1978) was applied to INS, using a DPH model. Services that occur during lactation can be treated as grouped intervals of time to conception; in the absence of known conception, the event is treated as a censored record in the last known insemination. This model was extended by Ducrocq (1999) to accommodate random effects. In the DPH model, the hazard function at insemination qi is defined as the conditional probability of pregnancy at qi given that the cow was inseminated at such time. The hazard function can be expressed as:
![]() |
where
| hjklmnopq(t) | = | hazard function (instantaneous probability of conceiving) for a given cow at insemination t;
| ![]() | = | regression coefficient;
| DFSklmnopq | = | continuous covariate for days to first service;
| Lk | = | time-independent fixed effect of lactation number (k = 1 to 4);
| Ml | = | time-independent fixed effect of calendar month of calving (l = 1 to 12);
| YSm(t) | = | time-dependent fixed effect of year-season of insemination (m = 1 to 30);
| hn | = | time-independent random effect of herd (n = 1 to 767), assumed to be independently distributed as log-gamma with shape parameter h.
| sso(t) | = | time-dependent random effect of service sire (o = 1 to 641), assumed to be independently distributed as log-gamma with shape parameter ss;
| up | = | time-independent random effect of sire of cow (p = 1 to 3267) assumed to be distributed as multivariate normal with mean 0 and covariance matrix A 2u, where A is the additive relationship matrix between sires, and 2u is the sire of cow variance;
| q | = | log(log q);
| q | = | , where h0 is the base-line hazard function and q is insemination q.
|
The
parameter of the Weibull distribution is fixed to one in the grouped survival model.
The analysis was implemented using the Survival Kit, version 3.12 (Ducrocq and Solkner, 1998). The herd and service sire variances were evaluated by the tri-gamma function at the corresponding estimated gamma shape parameter.
Comparison Among Models
Spearman rank correlation coefficients among the methods for either sire of cow and service sire solutions (posterior means used in CTM and STM were calculated for each model). In addition, a cross-validation analysis was carried out as follows: predictability of the 3 methodologies was compared by taking 2 random samples of herds from the data set. Sample A contained 396 herds and 38,385 records, whereas sample B contained 371 herds and 41,672 records. The PTA for sires in both samples were calculated with the DPH (PTADPH-A and PTADPH-B, respectively) using parameters and variances estimated from the entire data set. Also, PTA for INS were obtained in both samples with the CTM (PTACTM-A and PTACTM-B, respectively) and the STM (PTASTM-A and PTASTM-B, respectively) using the corresponding model and variance components estimated from the whole data. Using logistic regression, sire PTA were converted into daughters probability of conception at each insemination by regressing binary indicators of pregnancy at each of first, second, third, and fourth inseminations for cows in data subset A (among daughters that had an opportunity to be inseminated at each time) on PTADPH-A, PTACTM-A, and PTASTM-A. Next, the expected number of daughters conceiving at each insemination in subset B was calculated as the probability of conception from data subset A, times the total number of daughters in subset B. The observed number of daughters that conceived or failed at each insemination in subset B was calculated for each sire (among daughters that had the opportunity to be inseminated at each time). The observed and expected number of conceptions and failures were compared using the sum across sires of
2 statistics as follows:
![]() |
where n is the number of sires with daughters in each insemination and subset. The model producing the smallest sum of
2 was viewed as the most accurate predictor of female fertility. Reciprocal analyses were computed using conception probabilities calculated from subset B to predict daughters fertility in subset A.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
Sequential Threshold Model
The estimated service sire variance was as in the CTM (0.021), but estimates of herd and sire variances (0.084 and 0.010, respectively) were lower. Although estimates of the service sire were about the same in both models, the STM accounts for a different service sire in each insemination event, which CTM does not do. The STM model gave a lower heritability estimate (0.038) than the 0.05 estimated with the CTM. Smaller posterior standard deviations were obtained for all genetic parameters (Table 2
), although the posterior coefficients of variation were about the same; e.g., 13.5 and 14% in CMT and STM, respectively, for
2u. The heritability estimate (0.038) from STM was similar to that of Veerkamp et al. (2001), studying number of services per lactation using a sire model, but higher than that of Kadarmideen et al. (2003) who found a heritability of 0.016 using a linear animal model for number of services per conception.
The ability to accommodate time-dependent explanatory variables is an advantage of the STM over the CTM when evaluating male and female fertility simultaneously. Also, heterogeneous sire and service sire variances at different inseminations can be fitted with this model, but this option was not pursued, for simplicity.
Grouped Survival Model
The estimated baseline survivor function is shown in Table 1
. At first insemination it was 0.5, which is the fraction of cows that had an opportunity to be inseminated at least twice. As expected, this fraction decreased as INS increased and, after 4 inseminations, only 6% of cows were not censored or had not yet achieved pregnancy.
Table 2
(bottom) shows estimated parameters of the log-gamma distributions of herd and service sire as well as their approximate variances obtained with the DPH. Gamma parameters for herd and service sire were 8.14 and 49.46, respectively, and their approximated variance functions were 0.13 and 0.02, respectively. The estimated sire variance was 0.011 on the log scale. Schneider et al. (2005) found sire variance estimates ranging from 0.012 to 0.019, using different baseline hazard functions, for interval between calving and last insemination. There are no similar estimates reported elsewhere using discrete proportional hazard models.
Comparison of Models
Strong Spearman correlations (0.97 to 0.98 in absolute value), among sire PTA with the 3 models were obtained (upper diagonal, Table 3
). The correlations of DPH with the other 2 models were negative because of different orientation of scale. A high PTA estimate in DPH indicates a higher probability of conception, whereas larger PTA values in CTM and STM mean larger number of inseminations to conception. The 3 methodologies would result in very similar sire rankings in a routine genetic evaluation.
|
The STM seems to be a useful approach for dealing with time-dependent effects on discrete responses, and it should be studied more intensively. The performance of STM may be improved by allowing the random effect variances to change from one service to the next. Also, the STM can consider "clustering" of insemination events between and within lactations for individual cows by including a residual covariance between liabilities of different inseminations. Implementation of a genetic effect for service sire could be studied as well, as this would be of interest to AI companies selecting for male fertility. A conceptual appeal of the CTM and STM is that parameters are easier to interpret; for example, heritability pertains to a linear, Gaussian scale of liability. This means that all the machinery of the infinitesimal model holds on the liability scale. This is not the case with the DPH model.
The sums of
2 statistics for each method and data subset are shown in Table 4
. The DPH and STM provided more accurate predictions for success in first insemination than did the CTM. However, CTM predicted probability of conception more accurately in subsequent inseminations. These results are surprising, because both STM and DPH were expected to deal more properly with time-to-event traits, time-dependent covariates, and with censoring. However, due to the categorical nature of INS, the DPH might not be a proper specification, as it assumes continuously recorded values that are grouped into categories. Further, heterogeneity of variance (due to hormonal treatments, voluntary waiting period between inseminations, and other management decisions) could decrease the accuracy of DPH, which originally postulates fixed time intervals. Time to third and fourth insemination can differ widely between cows. Also, assuming a constant baseline hazard in pregnancy for all inseminations (
= 1) may not be suitable for INS: intervals (from INS n to n+1) usually differ in length and cows with silent heats or early embryonic mortality are usually less fertile at the next insemination. The assumptions made on CTM are less strong in this sense. The STM had a similar but somewhat lower predictive ability than DPH in the data subset B, but STM was slightly better in subset A. In summary, the CTM tended to produce better predictions of conception to insemination than either DPH or STM, perhaps due to higher robustness.
|
| CONCLUSIONS |
|---|
|
|
|---|
2 statistic used are unknown, there are no tests available to assess their significance. Bootstrapping is computationally unfeasible for this data set. The results in this study suggest that STM may be better, globally, than DPH, and its interpretation is easier. Further, STM can be extended to allow for different variance components by insemination. The CTM provided satisfactory results, and is straightforward to program; however, it cannot deal with time-dependent covariates. Future studies could fit a genetic effect for service sires (via the relationship matrix) in DPH and STM; this would lead to genetic evaluations of male and female fertility, simultaneously. Bivariate analyses of INS and other fertility traits could also be of interest to estimate genetic correlations more accurately in the presence of censoring. This could slow down the rate of genetic degradation for fertility in field populations.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication May 16, 2005. Accepted for publication June 23, 2005.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. T. Kuhn and J. L. Hutchison Prediction of Dairy Bull Fertility from Field Data: Use of Multiple Services and Identification and Utilization of Factors Affecting Bull Fertility J Dairy Sci, June 1, 2008; 91(6): 2481 - 2492. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gonzalez-Recio, E. Lopez de Maturana, and J. P. Gutierrez Inbreeding Depression on Female Fertility and Calving Ease in Spanish Dairy Cattle J Dairy Sci, December 1, 2007; 90(12): 5744 - 5752. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gonzalez-Recio, R. Alenda, Y. M. Chang, K. A. Weigel, and D. Gianola Selection for female fertility using censored fertility traits and investigation of the relationship with milk production. J Dairy Sci, November 1, 2006; 89(11): 4438 - 4444. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Heringstad, Y. M. Chang, I. M. Andersen-Ranberg, and D. Gianola Genetic analysis of number of mastitis cases and number of services to conception using a censored threshold model. J Dairy Sci, October 1, 2006; 89(10): 4042 - 4048. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. M. Chang, I. M. Andersen-Ranberg, B. Heringstad, D. Gianola, and G. Klemetsdal Bivariate Analysis of Number of Services to Conception and Days Open in Norwegian Red Using a Censored Threshold-Linear Model J Dairy Sci, February 1, 2006; 89(2): 772 - 778. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |