|
|
||||||||


* Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602
Department of Dairy Science, University of Wisconsin, Madison, WI 53706
| ABSTRACT |
|---|
|
|
|---|
Key Words: structural model covariance component genotype x environment interaction Bayesian method
Abbreviation key: DIC = deviance information criterion, GS = genetic similarity, MS = management similarity, MCMC = Markov chain Monte Carlo
| INTRODUCTION |
|---|
|
|
|---|
In international genetic evaluation, for example, "traits" are defined according to country borders, e.g., milk yield in Italy and in the Netherlands is treated as a different trait. However, similarity in production systems between countries depends not only on geographical proximity but, importantly, on genetic ties and likeness of climatic conditions and management practices including quantity and quality of nutrient resources. Similarity between herds in different regions or countries can be used to describe genetic covariances. This would allow making better use of the available information, provided that the resulting model has fewer parameters and is not inferior in goodness of fit.
Information about climate, herd management practices, and genetic ties between regions (countries) can be exploited to explain patterns in a genetic covariance matrix using a structural model. If such model results in a lower degree of parameterization, it may lead to more precise inferences, because the same amount of information is used to estimate fewer parameters than in a standard multiple-trait analysis. In some animal breeding applications, patterns in covariance structure are sometimes easy to identify (e.g., correlations between milk yields during lactation tend to be larger for adjacent test days).
The objective of this study was to present methodological developments for a structural model for genetic covariances, and to illustrate it via an application to first-lactation records of Holstein cows from five geographical regions of the United States.
| MATERIAL AND METHODS |
|---|
|
|
|---|
![]() | ([1]) |
where: y = y1', y2', ...,yt')' is the vector of records for the t traits; ß and u are vectors of location effects and of additive genetic values, respectively; X and Z are the corresponding incidence matrices, and e is a residual vector. The data are assumed to be generated according to the stochastic process:
![]() | ([2]) |
where R = I
R0, and R0 = {rij} is a matrix of residual covariances between the t traits being analyzed.
In a Bayesian setting, prior information about unknown parameters or features of the model must be specified. For the location effects, the following vague normal prior will be used:
![]() | ([3]) |
Here,
is a positive, scalar hyper-parameter that is large enough so as to convey weak prior precision. For additive genetic values, the usual multiple-trait normal distribution will be adopted as prior:
![]() | ([4]) |
where A is a matrix of additive relationships among animals, and
g = {gij} is a t x t positive-definite genetic covariance matrix, with typical element gij (i,j=1,2,...,t). The priors for the elements of covariance matrices R0 and
g were taken to be uniform (on t x t hyper-cubes), but bounded between minimum and maximum values, as follows:
![]() | ([5]) |
![]() | ([6]) |
where k is a constant.
Prior independence between ß, u, R0, and
g was assumed, but allowing for the dependence stated in [4]
. Thus, the joint prior density has the form:
![]() | ([7]) |
with: -
(ß,u)
; rij,min
rij
rij,max and gij,min
gij
gij,max.
In a standard multiple-trait analysis, all full conditional distributions needed for Bayesian implementation of a Markov Chain Monte Carlo (MCMC) algorithm, such as the Gibbs sampler, can be derived without difficulty (Varona et al., 1994; Wakefield et al., 1994; Wang et al., 1994).
Structural Model
The structural model for the genetic covariances involves an alternative parameterization of the multiple-trait model, aimed at explaining all possible t(t-1)/2 genetic covariances in terms of a fewer number of parameters. Here, the genetic covariance between each pair of traits is written as a linear function of a set of explanatory variables:
![]() | ([8]) |
where b is a vector of effects explaining the genetic covariance between traits i and j, and k'ij is a corresponding row incidence vector of explanatory variables. In matrix notation, g = Kb, where g is a t(t-1)/2 x 1 vector of genetic covariances, and K is a t(t-1)/2 x r incidence matrix, where r is the order of vector b.
Under the new parameterization, the parameters of the structural model are: 1) ß and u, as defined before; 2) the diagonal elements of the genetic covariance matrix, gii(i = 1,2,...t); 3) the r parameters (b) of the structural model for the covariances, and 4) the residual variance-covariance matrix. The sampling model remains as defined previously in [2]
. The joint posterior density, after assigning a prior distribution to each of the new parameters of the model, is:
![]() | ([9]) |
![]() |
Priors for ß and u are as in [3]
and [4]
, respectively. Specific assumptions about the new parameters in [8]
are:
![]() | ([10]) |
where
is a diagonal matrix of order r, with sufficiently large elements so as to make the prior distribution of b diffuse (not very informative). Further,
![]() | ([11]) |
and, p(R0)
k as in [5]
. In [11]
,
i and
are hyper-parameters of a scaled inverted chi-square distribution.
The conditional posterior distributions of ß, u, and R0 are as in the standard multiple-trait analysis, that is, normal for location parameters and breeding values, and scaled inverted Wishart for the residual variance covariance matrix (Wang et al., 1994, Wakefield et al., 1994; Rekaya, 1997). After augmenting the posterior distribution with "missing" records, whenever this is appropriate, the conditional posterior distributions of parameters of the structural model (b) and of the diagonal elements of the genetic covariance (gii) follow from [9]
by treating all other parameters as constant. This leads to:
![]() | ([12]) |
and:
![]() | ([13]) |
for i=1,2,...,t
These conditional distributions do not involve the sampling model [2]
. All information about parameters of the structural model comes through the additive genetic values, and from the prior distribution of the parameters. Unfortunately, at least with this parameterization, conditional distributions [12]
and [13]
are not in closed form. Therefore, draws from these distributions (e.g., with Gibbs sampling) cannot be done directly. However, a rejection scheme (Gilks, 1992) or a Metropolis algorithm (Metropolis et al., 1953) can be used instead.
Model Comparison
The standard multiple-trait model and the structural model can be compared using a deviance information criterion (DIC), defined by Spiegelhalter et al. (1998) as:
![]() | ([14]) |
![]() | ([15]) |
is the posterior expectation of the Bayesian deviance
![]() | ([16]) |
and pD, the "effective number of parameters" (see Appendix), is:
![]() | ([17]) |
where
is the posterior mean of parameters intervening in the sampling model [2]
.
A smaller value of DIC indicates a better fit of the model.
Model comparison is often based only on a measure of fit, or "deviance" between observed and predicted data. However, increasing model complexity, through adding more parameters, typically leads to a better fit. A fair comparison between models should consider both measures of fit and model complexity. From [14]
, it is seen that DIC engulfs these two criteria, as
and PD are measures of fit and complexity, respectively. In our context, the Bayesian deviance in [16]
is:
![]() |
where N is the order of yi (i=1,2,...,t) assuming that all traits are recorded in all individuals. Note that the Bayesian deviance decreases as the sum of squares of errors decreases, the latter constituting the usual measure of fit.
Asymptotically, the effective number of parameters is approximately equal to the difference between the expected deviance and the deviance evaluated at the mean value of the parameters, as indicated in [17]
. In the Appendix, a straightforward derivation of DIC is given. Using [17]
, equation [14]
can be rewritten as:
![]() |
illustrating that DIC consists of a measure of goodness of fit D(
), with a penalty (2pD) for increasing model complexity. Hence, if two models have the same goodness of fit, the DIC criterion would favor the one with fewer parameters. The DIC is easy to estimate in a MCMC implementation. At each iteration of MCMC, the Bayesian deviance, D(
), is computed. At the end of iteration, the mean of the Bayesian deviance,
, as well as the mean value of the model parameters,
, is calculated, leading directly to [14].
Application to Holstein Data from Five Regions of the United States
Data.
The objective was to illustrate the structural model using Holstein production records. Observations were first-lactation milk yields of daughters of AI sires in five regions of the United States (Midwest = 1, Northeast = 2, Southeast = 3, Southwest = 4, and Northwest = 5). After edits (
10 records per sire and DIM
275) the data consisted of 3,465,334 lactations from 43,755 sires in 336,538 herd-year-seasons. Years represented were 1985 to 1997, and seasons were formed using 3-mo classes, starting by December-February, etc. Number of herd-year-season classes, sires and records, by region, are in Table 1
. Most cows were from regions 1, 2, and 3. The pedigree file included 46,095 bulls.
|
. In both analyses, the following sire model was used:
![]() |
where: yijkl(r) is the milk yield of daughter l of sire k in region r (r = 1,2,...,5); HYSi(i = 1,2,...,336,538) is the effect of herd-year-season i; ACj (j=1,2,...,4) is the effect of age at calving j;
r is a region-specific regression coefficient on days in milk (DIM); sk(r) is the transmitting ability of sire k in region r and eijkl(r) is the residual term. Residual covariances between regions are zero, this leading to a simpler expression for D(
).
Structural model for the genetic covariance.
A three-parameter model was used to describe 10 genetic correlations (5 traits). Following equation [8]
, the additive genetic covariance (4 times the "sire" covariance) between regions i and j was described as:
![]() | ([18]) |
where µ is an intercept, common to all off diagonal elements of the covariance matrix; GS(i,j) and MS(i,j) are measures of genetic and management similarity between regions i and j, respectively, and b1 and b2 are regression coefficients. Following the notation in [8]
, b = (µ,b1,b2)'.
Definition of Genetic and Management Similarity
Genetic similarity between regions i and j was defined arbitrarily as the ratio between the number of daughters of bulls used in common in the two regions and the total number of daughters of all bulls used in the pair of regions:
![]() |
where C(i, j) is the number of bulls in common used in regions i and j, T(i, j) is the total number of bulls used in the two regions, and NDkr is the number of daughters of bull k in region r (r = 1,2). As noted, the definition of genetic similarity is arbitrary; an alternative could have been C(i, j) / T(i, j), without making reference to differential usage of sires.
Management similarity was arbitrarily defined as the ratio between the quantity of concentrate used to produce 1000 kg of milk (F1000)in regions i and j.
![]() |
To ensure symmetry of the genetic variance-covariance matrix, the restriction MS(j,i) = MS(i,j) was imposed as noted. An alternative measure of similarity can be
. Values of GS(i,j) and MS(i,j) for each pair of regions are in Table 2
. The values ranged between 0.35 and 0.46 for genetic similarity, and 0.85 to 1.34 for management similarity.
|
= 108, so the prior distribution of ß in [3]
![]() |
In this application, the residual variance-covariance matrix (R0) was diagonal and we used the following proper, independent, priors for each of its diagonal elements:
![]() |
In the standard multiple-trait analysis, the prior for each of the elements of the genetic variance covariance matrix was:
![]() |
where gij,max = 106 for all i and j, and
![]() |
In the structural model, the priors for gii (i=1,2,...,5), the diagonal elements of the genetic covariance matrix, were as for the standard multiple-trait approach. Each of the elements of b was assigned a normal prior with mean 0 and variance 108. This takes an a priori stance of "no effect" of either genetic or management similarity on gij, but with a sufficiently large uncertainty, such that their effects range with high probability between -5x104 and 5x104 in the prior distribution.
Proper priors were used throughout, thus ensuring propriety of the posterior distribution. It must be noted that the structural model confers more flexibility in the assignment of priors to components of the genetic covariance matrix than the standard multiple-trait approach.
Implementation
For the standard multiple-trait model, all conditional posterior distributions needed for Gibbs sampling were in closed form, as pointed out earlier. In the structural model, the conditional posterior distributions of b = (µ,b1,b2)' and those of the diagonal elements of the genetic (co)variance matrix, gii, (i=1,2,...,5), were not in a recognizable form. The b parameters were updated via the Metropolis algorithm (Metropolis et al., 1953), while the Metropolis-Hastings algorithm (Hastings, 1970) was used for updating the diagonal elements, gii. For the parameters in b, a normal proposal density was used, and the updating was as follows. A random sample was drawn from the proposal distribution, centered at the current value of the Markov chain, and with an arbitrary variance such that an acceptance rate of 30 to 50% could be obtained, as recommended by Carlin and Louis (1996). The variances of the proposal distributions for µ,b1,b2 were 900, 200, and 70, respectively. We calculated the ratio between the posterior density evaluated at the candidate value and at the current value. If the ratio was
1, we accepted the candidate value. If the ratio was <1, the chain moved to the next step with probability equal to this ratio. For the diagonal elements of the genetic covariance matrix, a scaled inverted chi-square distribution was used as proposal. Its mean was equal to the current value of the Markov chain, and its variance was equal to the square root of the current value multiplied by 10. When draws for the genetic variance-covariance matrix were obtained (for both models), it was ensured that the matrices were positive-definite and consistent with the boundaries introduced by the prior distributions.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
|
| CONCLUSIONS |
|---|
|
|
|---|
The structural model is an extension to the multiple-trait domain of models for heterogeneous variances described by Foulley et al. (1990, 1992). An appealing further extension would consist of imposing a structure both on the genetic covariances and on the residual dispersion components. The latter are known to be heterogeneous in dairy cattle breeding and such heterogeneity has been related to explanatory factors (Weigel et al., 1993).
| APPENDIX |
|---|
|
|
|---|
with p being the number of elements it possesses. From [17]
![]() |
Expand now D(
) up to the second order around
:
![]() | ([19]) |
Recall that D(
) = -2 log p(y|
) = -2L. Then:
![]() | ([20]) |
where L is the log-likelihood, and L' and L'' are the first and second derivatives of the log-likelihood with respect to
, respectively. Using [20]
in [19]
:
![]() | ([21]) |
Asymptotically, it is well known that:
![]() | ([22]) |
where
is the maximum likelihood estimator. From [21]
, and replacing
by
,
![]() | ([23]) |
Under [22]
it can be shown that
has an asymptotic chi-square distribution with p degrees of freedom (Searle, 1971). From [22]
one can write:
![]() | ([24]) |
Then,
is chi-square on p degrees of freedom, if and only if,
is idempotent with rank p. This is the case, as
is the identity matrix.
Finally, taking expectations in [23]
with respect to the posterior distribution of
leads to the asymptotic results:
![]() |
and
![]() |
as in [17]
. Recall that asymptotically,
.
Corresponding author: R. Rekaya; e-mail:
rekaya{at}uga.edu.
Received for publication May 21, 2001. Accepted for publication July 12, 2002.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Waldmann, J. Hallander, F. Hoti, and M. J. Sillanpaa Efficient Markov Chain Monte Carlo Implementation of Bayesian Analysis of Additive and Dominance Genetic Variances in Noninbred Pedigrees Genetics, June 1, 2008; 179(2): 1101 - 1112. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bohmanova, I. Misztal, S. Tsuruta, H. D. Norman, and T. J. Lawlor Short Communication: Genotype by Environment Interaction Due to Heat Stress J Dairy Sci, February 1, 2008; 91(2): 840 - 846. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Hammami, C. Croquet, J. Stoll, B. Rekik, and N. Gengler Genetic Diversity and Joint-Pedigree Analysis of Two Importing Holstein Populations J Dairy Sci, July 1, 2007; 90(7): 3530 - 3541. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Bryant, N. Lopez-Villalobos, J. E. Pryce, C. W. Holmes, D. L. Johnson, and D. J. Garrick Environmental Sensitivity in New Zealand Dairy Cattle J Dairy Sci, March 1, 2007; 90(3): 1538 - 1547. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Boettcher, D. Caraviello, and D. Gianola Genetic Analysis of Somatic Cell Scores in US Holsteins with a Bayesian Mixture Model J Dairy Sci, January 1, 2007; 90(1): 435 - 434. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Konig, G. Dietl, I. Raeder, and H. H. Swalve Genetic Relationships for Dairy Performance Between Large-Scale and Small-Scale Farm Conditions J Dairy Sci, November 1, 2005; 88(11): 4087 - 4096. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. D. Norman, P. M. VanRaden, R. L. Powell, J. R. Wright, and W. R. VerBoort Effectiveness of National and Regional Sire Evaluations in Predicting Future-Daughter Milk Yield J Dairy Sci, February 1, 2005; 88(2): 812 - 826. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |