|
|
||||||||
,
,
,¶
* Department of Animal Sciences, University of WisconsinMadison, Madison 53706
Department of Dairy Science, University of WisconsinMadison, Madison 53706
Department of Biostatistics and Medical Informatics, University of WisconsinMadison, Madison 53706
Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway
¶ Geno Breeding and AI Association, N-1432 Ås, Norway
1 Corresponding author: gdeloscampos{at}wisc.edu
| ABSTRACT |
|---|
|
|
|---|
Key Words: mastitis structural equation model recursive effect variance component
| INTRODUCTION |
|---|
|
|
|---|
In standard multivariate statistical treatment, a response variable cannot also appear as a predictor in a submodel for other responses. This imposes a severe restriction on the class of models that can be used. Specifically, simple recursive effects (i.e., an effect of one response on another), complex loops, or reciprocal effects (simultaneity of effects) cannot be inferred appropriately. Adequate modeling of relationships between traits may require specification of these types of effects. Structural equation models (SEqM; also known as SEM), however, allow such features to be incorporated into the statistical structure (Gianola and Sorensen, 2004). The term SEqM refers to stochastic models in which each equation represents a causal link, rather than a mere empirical association (Goldberger, 1972). The linear SEqM embeds the classical regression model, as well as models in which structural parameters do not correspond to regression coefficients between observable variables in the least-squares sense (Goldberger, 1972). A review of linear SEqM, including many types of recursive and nonrecursive models, can be found in Bielby and Hauser (1977).
The main objective of this study was to incorporate recursive and simultaneous effects into a bivariate model for MY and SCS. Models with different specifications for the recursive and simultaneous effects were compared using data from Norwegian Red (NRF) cows.
| MATERIALS AND METHODS |
|---|
|
|
|---|
![]() | [1] |
![]() | [2] |
![]() | [3] |
where
i is a random vector (mx1) of latent dependent (endogenous) variables;
is a vector (mx1) of intercepts for the endogenous latent variables; B (mxm) is a matrix defining recursive or simultaneous relationships between endogenous latent variables (zeros in the diagonal and, potentially, effects from response to response in the off diagonals);
(mxo) relates latent responses to latent predictors;
i is a random vector (ox1) of latent predictor (exogenous) variables;
i is a vector (mx1) of disturbances of the structural model; yi is a random vector (px1) of observable responses;
y is a vector (px1) of intercepts for yi;
y (pxm) is a matrix of coefficients relating observable endogenous variables (y) to their latent counterparts (
);
i is a px1 vector of random residuals; xi is a random vector (qx1) of observable variables, indicators of the latent predictors (
i);
x is a vector (qx1) of intercepts for xi;
x (qxo) relates observable indicators (x) to latent predictors, and
i (qx1) is a random vector of disturbances. If there is no measurement error on predictors, o = q,
x = I, and
= 0, so that xi =
x +
i. Similarly, if there is no measurement error on the responses, m = p,
y = I, and
= 0, so that yi =
y +
i.
The assumption made in LISREL is
![]() | [4] |
where NIID stands for normal, independent, and identically distributed vectors. With assumptions 1 through 4, the marginal likelihood is multivariate normal with a structured variancecovariance matrix and mean vector. The set of parameters entering in the structured (co)variance matrix and in the mean vector are the distinct elements of the set {
:
,
y,
x, B,
,
y,
x,
,
, 
, 
}. The LISREL software produces maximum likelihood estimates of these parameters by maximizing the marginal likelihood with respect to the elements of
. Missing values are handled via the expectation maximization algorithm (Jöreskog and Sörbom, 2005).
Data
The data consisted of records on 33,453 first-lactation daughters of 245 NRF sires having their first progeny test in 1991 and 1992. Only records of cows with a first calving in 1990 through 1992 and from herds with at least 5 daughters of any of these bulls were included. A total of 4,961 herds were represented in the data. Test-day records for MY and SCS, and information about reported veterinary treatments of clinical mastitis (CM) were available. Only cows with at least 3 test days were included in the analysis.
First lactation was divided into five 60-d periods starting at calving. For each cow, a test-day record (the one closest to the midpoint of the period) was assigned to each period. For each test day, a dummy variable indicating the presence or absence of an event of CM in the 15-d period prior to the test day was created. With this definition of CM, it was assumed that CM events occurring prior to or subsequent to the 15-d period did not affect the SCC or MY.
Models
The LISREL software does not allow fitting cross-classified random effects such as herd-year and sire. To circumvent this problem, records were precorrected with predictions of herd effects obtained from univariate mixed linear models where sire, cow, and herd were treated as random. Details about the models used to obtain these predictions are presented in the Appendix.
Structural equation models were fitted to precorrected data. The sire effects on SCS and MY were fitted using the multiple-groups approach suggested by Muthén (1989), and the cow effects on each trait were fitted using a "common factor" specification. Figure 1
gives a graphical representation of a within-subjects model with simultaneity of effects between SCS and MY. For clarity of the figure, subject subscripts, residuals, and intercepts were omitted. Milk yield and SCS are denoted as MYi and SCSi, respectively, where i indexes lactation period (i = 1, 2, . . ., 5). An intercept was fitted for each lactation period for each trait; the intercept is interpretable as an effect specific to period i. Milk yield was assumed to be affected by effects of age at first calving (AGE in months), by sire and cow (S1 and C1, respectively), and by recursive effects of SCS. On the other hand, SCSi was assumed to be affected by CMi, by sire and cow effects (S2 and C2, respectively), and by a recursive influence of MYi (i = 1, 2, . . ., 5). All but test-day effects were restricted to be the same across test days. A covariance between random effects of sire on MY and on SCS was included, as well as a covariance between random effects of cows on each of the traits (double-headed arrows in Figure 1
).
|
![]() | [5] |
where
![]() |
is the vector of records on MY and SCS responses in the 5 periods for the jth daughter of sire i;
![]() |
is the vector of intercepts specific to each trait by test-period combination; and
![]() |
is a matrix of recursive effects, with
1 being the recursive effect of MY on SCS, and
2 being the opposite effect. Further,
![]() |
is the vector of records on exogenous variables (age at calving, CM status) for the jth daughter of sire i, treated as an observable random variable in LISREL.
![]() |
is a matrix of direct effects of predictors on responses, where
1 is the effect of age at calving on MY and
2 is the effect of an event of CM on SCS on the same test day; si = (s1i s2i)' and cij = (c1ij c2ij)' are vectors of the random effects of the ith sire, containing the effects of sire on MY (s1i) and on SCS (s2i), and of the ijth cow on MY (c1ij) and on SCS (c2ij), respectively, and
![]() |
Finally,
ij in specification [5] is a vector of model residuals for records of the jth daughter of sire i.
The following multivariate normal distribution was assumed for the vector of all exogenous effects and residuals:
![]() | [6] |
where µx is a vector of means of the x variables;
x, 6 x 6, is the (co)variance matrix of xij;
![]() |
![]() |
are the between-sire and between-cow (co)variance matrices, respectively; and
![]() |
is the 10 x 10 covariance matrix of model residuals (i = 1, 2, . . ., 5). This specification assumes that residuals for MY and SCS records made on the same test day were correlated with covariance
MY,SCS, and that residuals of records on different test days, either of MY or SCS, were uncorrelated and heteroskedastic. Note that in specification [6], data are modeled as independent clusters of half-sib families; this was because LISREL does not allow inclusion of pedigree information.
The specification of recursive effects in the matrix,
, was modified so as to generate 4 alternative models. Model 1a (M1a), did not contain recursive effects (
1 =
2 = 0). Model 2 (M2) included an effect from SCS on MY, with
1 = 0; model 3 (M3) included an effect from MY on SCS but not the opposite, with
2 = 0. Finally, M4 was a model with simultaneity, as in Figure 1
.
The LISREL software provides estimates of the (co)-variance matrices of the sire, cow, and residual effects. In an SEqM, genetic and environmental components of variances and covariances are functions of the sire, cow, and residual covariance matrices and of
as well. A detailed discussion on how recursive and simultaneous effects affect variance components can be found in Gianola and Sorensen (2004). The formulas required for computing these estimates from the simultaneous effects in specification [5] are derived in the Appendix. Based on these formulas and on LISREL estimates of dispersion parameters, estimates of the genetic and environmental components of variances and covariances were computed.
D in specification [5] is a matrix of direct effects of predictors on responses. Indirect (i.e., effects mediated by paths connecting response variables) and total effects of predictors on responses can be calculated using the reduced form of Equation [5] (see Equation [7] in the Appendix) and from estimates of
D and of
. In this case, the matrix of total effects is
T =
1
D, where
= I
. Further, the matrix of indirect effects is
I =
T
D = (
1 I)
D.
As discussed above, pedigree information could not be incorporated in the SEqM fitted with LISREL. To evaluate the potential impact of this restriction, a standard bivariate mixed model was fitted to the same data (M1b, no recursive effects, pedigree information using relationships due to sires and maternal grandsires). The specification of this model is described in the Appendix. Its structure is similar to that of M1a. Estimates of variance components from M1b are presented later as a reference.
Models fitted with LISREL were compared using the Bayesian information criterion (BIC; Schwartz, 1978). The BIC was computed using the saturated model as the alternative hypothesis. In the saturated model, each of the elements of the (co)variance matrix and mean vector of observable variables is treated as a parameter; that is, the number of parameters is p + ¹/3p(p + 1), where p is the number of observables, responses, and predictors (Raftery, 1995). The restricted model is the model being fitted, with the corresponding structured mean vector and (co)variance matrix. According to Raftery (1995) and for a sample size of 10,000, a difference of 6 or more points in BIC provides strong evidence in favor of the model with the smaller BIC value. If 2 models are compared in a Bayesian context and equal prior probability is given to each of the models, then a BIC difference of 6 to 10 points for a sample size of 10,000 can be associated with a posterior probability for the model with smaller BIC of 0.95 to 0.99.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
Table 6
shows estimates of the phenotypic and additive genetic (co)variance components, as well as estimates of heritability, genetic correlations, and phenotypic correlations from the 5 models. Heritability of MY ranged from 0.13 to 0.15, and heritability of SCS ranged from 0.09 to 0.11. The genetic correlation between SCS and MY was positive, ranging from 0.34 (M2 and M4) to 0.45 (M1b). In general, small changes in estimates of variance components were observed when recursive effects were included. The most striking change was in the phenotypic correlations, which were 0.08 and 0.07 in models without recursive effects (M1a and M1b), but became more strongly negative (0.23) when an effect from SCS on MY was included (M2 and M4).
|
| DISCUSSION |
|---|
|
|
|---|
The estimated effect of age at calving on MY of approximately 0.2 kg/d per additional month of age (Table 3
) agrees with Carlén et al. (2004). Using classes of age at first calving, they estimated an increase of 62 kg (305-d lactation) per additional month of age at calving; roughly, this gives an average effect of around 0.2 kg/ d of lactation per month of age.
Dilution Effect, Disease Effect, or Simultaneity?
The comparison criterion used in this study provided evidence in favor of models having a recursive effect of SCS on MY (disease effect), but without a reciprocal effect of MY on SCS (dilution). The model with simultaneous effects (M4) had a larger BIC than M2. When included, the effect of MY on SCS was small and lacked statistical significance. Further, the genetic correlation was positive. Under the assumptions of the model, this indicates that the negative phenotypic correlation between traits found within SEqM M2 and M4 was partly due to a negative effect of disease on MY, and that a dilution effect would not be an important cause of this phenotypic correlation.
Genetic Parameters
No major changes in heritabilities, in additive variances of MY and of SCS, or in the genetic correlation were observed when recursive effects were included. Trait definition, statistical models, and assumptions vary among studies, which makes comparison of estimates difficult. Here, both traits were daily MY and SCS. Heritability estimates from this study were expected to be lower than those obtained when responses are defined as total (or average) lactation MY or SCS. Further, heritability estimates are within herd, because records were precorrected for herd effects.
The estimated heritability of SCS of models without recursive effects (0.11 and 0.09 in M1a and M1b respectively; Table 6
) was smaller than the average heritability of 0.15 reported in a review by Mrode and Swanson (1996). Rupp and Boichard (1999) and Carlén et al. (2004) reported higher estimates, 0.17 and 0.14, respectively, both for first-lactation cows. However, Ødegård et al. (2004), analyzing 1.4 million records of first-lactation NRF cows, reported an estimate of 0.11, which is in agreement with the estimate from this study, and is not surprising because both estimates were obtained from samples of the NRF population.
Estimates of heritability of MY (0.13 to 0.15) were lower than most literature values; for example, Carlén et al. (2004) and Rupp and Boichard (1999) reported heritability estimates of 0.34 and 0.20, respectively, for 305-d MY in first-lactation cows. van Tassell et al. (1999) estimated heritability of 305-d MY in first-lactation cows by breed and group within breed, with groups defined according to within-herd standard deviations. Their estimates ranged from 0.18 to 0.51 depending on the breed and group within breed, and illustrate how strongly variance components of MY vary with the population.
Most studies have reported a positive genetic correlation between SCS and MY. Using a lactation average of SCS and 305-d adjusted MY in first-lactation cows, Carlén et al. (2004) and Rupp and Boichard (1999) reported genetic correlations of 0.22 and 0.15, respectively. Schutz (1994) suggested that a possible cause of this positive genetic correlation is that highly productive animals are more liable to mastitis. Apart from measurement error (e.g., false negative cases of CM), this should not be a cause of genetic correlation in this study, because CM was included as an explanatory variable. Also, in this study, both SCS and MY were recorded on the same day, whereas SCS is usually an average of lactation scores and MY is total milk production. Nevertheless, estimates of genetic correlation (0.34 to 0.45) were larger than those of Carlén et al. (2004) and Rupp and Boichard (1999). Finally, within SEqM, the lower estimate of genetic correlation found when recursive or simultaneous effects were fitted (decreasing from 0.45 in M1a to 0.34 in M2 and M4) was mainly due to the negative effect of SCS on MY.
Future Improvements
Several assumptions needed to be made because of restrictions imposed by LISREL, the information available (data and predictor variables), and parameter identification issues. A critical restriction was that pedigree information was not incorporated in the SEqM fitted with LISREL. Fitting SEqM to quantitative genetic data without that restriction will require software development. As shown by Gianola and Sorensen (2004), a Bayesian implementation can be based on a modified version of the standard Gibbs sampler.
Identification in models with simultaneity of effects between 2 traits requires assuming one exclusion restriction for each of the responses involved in the simultaneous system (here, the direct effect of age was excluded from the model of SCS and the direct effect of CM was excluded from the model for MY), and results critically depend on these assumptions. Future research considering other instruments will help in achieving a better understanding of the relationships between these 2 traits.
The data set from this study included records on first-lactation cows only. Extension to multiple lactations is straightforward. A hypothesis of interest is whether recursive or simultaneous effects vary across lactations. In this context one might also postulate lagged recursive effects between records of different lactations (e.g., exploring the carryover effects of SCS or MY between lactations). Given the longitudinal nature of the data, a random regression approach, as opposed to the repeatability model used here, is appealing. However, it is important to note that simultaneous effects between traits make sense only for records of the same test day, because simultaneity of effects implies the notion of instantaneous feedback (e.g., Wright, 1960).
Finally, the standard quantitative genetic model can be considered as a special SEqM without recursive or simultaneous effects. More flexible models can be obtained by expanding the standard model to allow fitting of such effects (Gianola and Sorensen, 2004). In particular, SEqM is appealing for joint modeling of production and disease (as done here), production and fertility, or diseasedisease relationships.
| CONCLUSIONS |
|---|
|
|
|---|
Clinical mastitis was associated with an increase in SCS. If an event of mastitis was observed in any of the 15 d prior to test day, the SCS was expected to increase by 0.8 and MY was expected to decrease by about 1 kg/ d on the following test day.
Estimates of cow, sire, and residual variances and covariances changed substantially when recursive or simultaneous effects were fitted. However, additive and environmental variances and covariances did not change much when those effects were included, the exception being the phenotypic correlation between SCS and MY. This is not surprising, because inclusion of recursive or simultaneous effects is a form of partially modeling causes of (co)variability. This suggests that causal components of variances and covariances are mainly data (and not model) driven. However, even though models with or without recursive effects may provide similar estimates of variance components, the interpretation and use of the model for selection purposes would differ if recursive effects exist.
| APPENDIX |
|---|
|
|
|---|
2s), where A is the additive relationship matrix due to sires and maternal grandsires, and
2s is the sire variance. Herd, cow, and residual effects were assumed to be normally and independently distributed, with null means and variances
2h,
2c and
2e, respectively. The sire pedigree file had 437 males, including the 245 sires with daughters, in the data set. These univariate models were fitted using the DMU software (Madsen and Jensen, 2000).
Genetic and Environmental (Co)variances in SEqM
The genetic, permanent environmental and residual covariance matrices associated with Model 5 are obtained from the reduced form of the model. Consider the hth (h = 1, 2, . . . ., 5) test-day record of the jth daughter of sire i,
![]() |
and solve the equation for the response variables, to obtain
![]() | [7] |
Then the phenotypic (co)variance matrix of the hth test-day record is
![]() | [8] |
where
![]() |
Likewise, the additive genetic variancecovariance matrix is
![]() | [9] |
Replacing parameters in the right hand sides of Equations [8] and [9] by their estimates leads to estimates of the phenotypic and genetic (co)variance matrices, respectively.
Bivariate Mixed Linear Model with Standard Assumptions
A bivariate mixed linear model (M1b), under standard assumptions, was fitted to MY and SCS (both net of herd effects) using the DMU software (Madsen and Jensen, 2000):
![]() | [10] |
Here, my and scs are vectors of precorrected records (with predictions of herd effects, as described in this Appendix) of MY and SCS, respectively; ß1 contains effects of lactation period (five 60-d periods) and age at calving on MY, and X1 is the corresponding incidence matrix. Likewise, ß2 contains effects of lactation period on SCS and of the 5 dummy variable indicators of CM events on each lactation period; X2 is the appropriate incidence matrix. Further, u1 (u2) and v1 (v2) are vectors of sire and cow effects on trait 1 (2), and the Z are incidence matrices. Finally,
1 and
2 are the vectors of model residuals. The following multivariate normal distribution was assumed for the vector of random effects:
![]() |
where A is the additive relationship matrix due to sires and maternal grandsires, and
![]() |
are the sire, cow, and residual variance covariance matrices. These matrices are the mixed-model counterparts of the 3 matrices that intervene in Equation [8]. For instance,
2u1 is the counterpart of
2S1 and
2
1
2 is the counterpart of
2
1,2.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication November 28, 2005. Accepted for publication May 30, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. B. Samore, A. F. Groen, P. J. Boettcher, J. Jamrozik, F. Canavesi, and A. Bagnato Genetic Correlation Patterns Between Somatic Cell Score and Protein Yield in the Italian Holstein-Friesian Population J Dairy Sci, October 1, 2008; 91(10): 4013 - 4021. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Konig, X. L. Wu, D. Gianola, B. Heringstad, and H. Simianer Exploration of Relationships Between Claw Disorders and Milk Yield in Holstein Cows via Recursive Linear and Threshold Models J Dairy Sci, January 1, 2008; 91(1): 395 - 406. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-L. Wu, B. Heringstad, Y.-M. Chang, G. de los Campos, and D. Gianola Inferring Relationships Between Somatic Cell Score and Milk Yield Using Simultaneous and Recursive Models J Dairy Sci, July 1, 2007; 90(7): 3508 - 3521. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |