|
|
||||||||


,*
* Department of Animal Science, Agricultural University of Norway,N-1432 Ås, Norway
Department of Animal Breeding and Genetics,Danish Institute of Agricultural Sciences,Research Centre Foulum, DK-8830 Tjele, Denmark
Department of Animal Sciences, University of Wisconsin-Madison, Madison 53706
Corresponding author: J. Ødegård; e-mail: jorgen.odegard{at}ihf.nlh.no.
| ABSTRACT |
|---|
|
|
|---|
Key Words: Bayesian method mixture model somatic cell score udder health
Abbreviation key: CM = clinical mastitis, IMI- = intramammary infection absent, IMI+ = intramammary infection present, PE = permanent environment, PPM = posterior probability of putative mastitis, PMC = probability of misclassification
| INTRODUCTION |
|---|
|
|
|---|
Some studies have found that a low initial SCC is associated with increased susceptibility and severity of subsequent mastitis (Schukken et al., 1994; Shuster et al., 1996; Schukken et al., 1999; Suriyasathaporn et al., 2000). Hence, selection for a very low SCC might impair the cows ability to respond to infection. Therefore, statistical models for indirect genetic evaluation of susceptibility to mastitis should allow for heterogeneity in the distribution of SCS between healthy and diseased cows.
Finite mixture models are potentially useful when there is some underlying group structure in the data (McLachlan and Peel, 2000), and are typically used when the group membership of the different observations is unknown. For example, different types of mixture models have been used for identification of risks in disease mapping (Militino et al., 2001), for QTL detection (Jansen, 1992; Kao and Zeng, 1997), and for genetic analyses aiming to account for preferential treatment (Kuhn et al., 1999). Using mixture models, cows can be assigned to different subpopulations (e.g., healthy or diseased) via posterior probabilities estimated from the SCS data (Detilleux and Leroy, 2000). For SCS, a mixture model in its simplest form may be used to assign observations to two components, e.g., SCS1 and SCS0, hopefully representing SCS from cows with (IMI+) and without (IMI-) IMI (mastitis), respectively. Then, identification and culling of animals at risk could be based on the posterior probability of putative mastitis, given SCS, rather than on crude SCS. Mixture models might make better use of the information contained in SCS data, so there is a need to develop such models in the context of selection for reduced incidence of mastitis.
Detilleux and Leroy (2000) described a homoscedastic two-component mixture model with genetic effects and used the EM-algorithm for inference by maximum likelihood. The objective of our study was to develop a heteroscedastic two-component mixture model including both additive genetic and permanent environmental (PE) effects, using a Bayesian approach via the Gibbs sampler for inference. More detailed descriptions of the theory of the Gibbs sampler are given by Casella and George (1992) and Sorensen and Gianola (2002). A Gibbs sampler for a mixture model without random effects is described in McLachlan and Peel (2000).
| MATERIALS AND METHODS |
|---|
|
|
|---|
Conditionally on z, it was assumed that the observations could be described by the linear model:
![]() | ([1]) |
where:
| y | = | (n x 1) data vector;
| ß0 | = | (f0 x 1) vector of fixed effects common to all cows;
| ß1 | = | (f1 x 1) vector of fixed effects peculiar to cows with mastitis;
| a | = | (qa x 1) vector of random additive genetic effects;
| p | = | (qp x 1) vector of random permanent environmental effects;
| Mz | = | (n x n) diagonal matrix with typical element zi;
| e | = | (n x 1) vector of random residuals; and
|
X0, X1, Za, and Zp are incidence matrices of appropriate order.
The conditional distribution of y, given the status vector z, the location parameters, and some scale parameters, was assumed to be:
![]() | ([2]) |
where
![]() |
The parameters
and
are the residual variances within IMI- and IMI+ animals, respectively.
For the additive and PE effects it was assumed that:
![]() | ([3]) |
where
![]() |
Above,
is the additive genetic variance,
is the variance of the PE effects, A is the additive relationship matrix, and I is an identity matrix with dimension qp.
Bayesian Structure of the Model
Sampling distribution of the observations given group status.
By making use of the n-dimensional multivariate normal distribution in [2], it can be shown that the density of the observation vector y given the status vector z, the location parameters, and the residual variances can be written as:
![]() | ([4]) |
where {zi = 0} denotes the subset of the n observations with zi = 0 and {zi = 1} denotes the subset of the n observations with zi = 1. Further:
![]() |
Prior distributions of location parameters and of the unknown status vector.
A proper uniform prior distribution was assumed for the vector of fixed effects ß = [ ß'0ß'1] ', with density:
![]() |
To achieve a reasonably vague prior, the absolute values of the boundsd1 and d2 must be sufficiently large. Further, to avoid label-switching problems (McLachlan and Peel, 2000), constraints must be imposed on the parameters of the distributions of IMI- and IMI+ animals; for example, the mean SCS in the IMI- group being lower than that of the IMI+ group.
From [3
] the prior densities of additive and PE effects, conditionally on the genetic and PE variance components, are:
![]() | ([5]) |
![]() | ([6]) |
Individual elements (zi) of the classification vector z were assumed to be independent a priori, and following the same Bernoulli distribution:
![]() |
where Pm is the mixing proportion (proportion of IMI+). Hence, the corresponding multivariate distribution is:
![]() | ([7]) |
Note that if the expected value of [1] is taken with respect to the prior distributions of z, a, and p (but conditionally on ß0 and ß1) one obtains:
![]() |
This is the usual representation for the mean vector of a two-component mixture from a distribution with mean vector X0ß0 (and proportion 1-Pm) and another distribution with mean vector X0ß0 + X1ß1(and proportion Pm).
Priors for variance components and mixing proportion.
Independent conjugate priors were used for the four variance components and for the mixing proportion. The dispersion parameters were assigned scale inverted
-square prior distributions, with densities:
![]() |
![]() |
![]() |
![]() |
where the
ands2are known hyper-parameters. In this study, all
were set equal to 2, while thes2equaled the true variance component values used in the simulations.
The mixing proportion (Pm), taking values between zero and one, was assigned a beta prior distribution with hyper-parameters
1, and
2. In this study, both parameters were set to 2. The density is:
![]() |
Joint posterior distribution.
The joint density of y, ß, p, a, and z, conditional on the dispersion parameters and mixing proportion, is:
![]() | ([8]) |
The joint posterior density of all unknown parameters is given by:
![]() |
![]() |
Explicitly:
![]() | ([9]) |
Fully conditional distributions.
The conditional posterior distributions of each parameter (or block of parameters) are required for implementing a Gibbs sampler. Let:
![]() |
and let the location parameters be represented by:
![]() |
![]() |
and let
![]() |
with
defined as in [3]. The fully conditional posterior distribution of the parameter vector
(e.g., Sorensen and Gianola, 2002) is:
![]() |
where
and
as defined earlier.
Alternatively, one can select elements of interest in
, denote them as
1, and then reorder
as:
![]() |
Letr and C be partitioned accordingly. Define
![]() |
The fully conditional posterior distribution of any partition of the parameter vector
, given all other parameters in the vector (Sorensen and Gianola, 2002), is:
![]() | ([10]) |
the fully conditional posterior density of the variance components can be deduced from [9
]. the fully conditional posterior density of the genetic variance is:
![]() | ([11]) |
which is in the form of a scaled inverted
-square density, with [qa+
a] degrees of freedom and scale parameter [a'A-1a +
as2a]. Likewise, the fully conditional posterior density of the PE variance is:
![]() | ([12]) |
which is in the form of a scaled inverted
-square density, with [qp +
p] degrees of freedom and scale parameter [p'p +
![]() |
From [9
] the fully conditional density of the residual variance for IMI- observations is:
![]() | ([13]) |
which is in the form of a scaled inverted
-square density, with [n -
zi +
e0] degrees of freedom, and with scale parameter:
![]() |
The corresponding fully conditional posterior density of the residual variance for IMI+ observations is:
![]() | ([14]) |
which is in the form of a scaled inverted
-square density, with [
zi +
e1] degrees of freedom, and with scale parameter:
![]() |
The fully conditional posterior distribution of an individual zi is Bernoulli, and from [9] it can be shown that the zi are mutually independent, given all other parameters. The parameter (
()) of the fully conditional Bernoulli distribution of zi (McLachlan and Peel, 2000) is:
![]() | ([15]) |
where it can be shown from [9
] that:
![]() |
withx'oi,x'1i,z'ai, and z'pi being the ith rows of Xo, X1, Za, and Zp, respectively.
Finally, the fully conditional posterior density of Pm is deduced from [9
] to be:
![]() | ([16]) |
which is in the form of a beta-distribution with parameters (
zi +
1 and (n -
zi +
2).
Implementation of a Gibbs Sampler
The following steps describe how Gibbs sampling can be implemented for our two-component mixture model:
1) of the vector
, and compute;
![]() |
2 includes all elements in
, except those contained in
1. Sample a new
1 from
). Continue sampling new blocks from
until all elements are updated.
with (
, where
is a random draw from a central
-square distribution with [
a + qa] degrees of freedom.
by (
, where
is a random draw from a central
-square distribution with [
p + qp] degrees of freedom.
)' (In - Mz)(y - W
) and replace
with
![]() |
is a random draw from a central
-square distribution with [ n -
zi +
e0] degrees of freedom.
)' Mz(y - W
) and replace
by
![]() |
is a random draw from a central
-square distribution with [
zi +
e1] degrees of freedom.
() as in [15] and sample zi from a Bernoulli distribution with parameter
() for i = 1, 2, ..., n.
zi +
1) and (n -
zi +
2).
,
,
,
,
, and Pm m times. The total number of iterations is thus k + m. The values of k and m can be chosen based on visual inspection of trace plots and on convergence diagnostics (Raftery and Lewis, 1992
Simulation Studies
The mixture model was evaluated using simulated SCS drawn from a two-component normal mixture distribution. When included, genetic and PE effects were considered as having the same distributions across mixture components.
Simulation I.
A total of 8000 SCS observations was simulated for animals with IMI- (SCS0) and IMI+ (SCS1) statuses, as follows:
![]() |
where µ0 and µ1 are the means of the two distributions, and
![]() |
Cows were assigned to the two groups at random using appropriate membership probabilities (Pm). If observation i was assigned to the IMI+ group, SCSi equaled SCS1i; otherwise SCSi was equal to SCS0i. The trait SCS was, therefore, a mixture of observations drawn from SCS0 and SCS1, with a distance
= (µ1 - µ0) between the means of the two distributions. Parameters for nine different simulated settings are presented in Table 1
. Each setting was replicated five times, with this number dictated by computing considerations. Settings 1 to 4 allowed for different degrees of heterogeneous residual variance but had the same µ0 (4.0), µ1 (7.0), and Pm (0.25) values. Settings 5 to 9 were homoscedastic
, but had varying levels of µ0, µ1, and Pm.
|
![]() |
where
| SCS | = | (n x 1) vector of SCS observations;
| µ0 | = | mean of IMI- animals;
| z | = | (n x 1) vector of indicator (group membership) variables;
| ![]() | = | µ1 - µ0; and
| e | = | (n x 1) vector of random residuals.
|
In the Gibbs sampler, values of k = 1000 and m = 30,000 were used. To avoid label-switching problems, only samples where
> 0 were accepted.
Simulation II.
A total of 32,000 SCS observations were simulated, for four discrete generations, with 800 cows and 10 bulls per generation, and each cow having 10 observations (e.g., test-day records). No selection was applied, sires were recruited from 10 different bull dams, and all sires in each generation had the same probability of siring offspring of both sexes. Each cow was replaced by a daughter, and mating was at random. Breeding values for base animals were sampled from a normal distribution with mean 0, and variance
, while breeding values for nonbase animals were sampled from a normal distribution with the midparent value as mean, and variance
. Inbreeding was ignored.
Somatic cell scores from healthy (SCS0) and infected (SCS1) animals were simulated as follows:
![]() |
where µ0 and µ1 are the means of the two distributions,
![]() |
As before, cows were assigned to the two status groups at random, using the appropriate Pm values. Parameters for the two settings are given in Table 2
. Both settings had five replicates.
|
![]() |
where model 1 is equal to the model defined in simulation I, and a, p, Za, and Zp are as defined in [1
]. The factor c is a (qp x 1) vector of random cow effects, and was assumed to have the same distribution as the PE effects, but it included both additive genetic and PE components, ignoring relationship information. For model 1, values of k and m were similar to those chosen in simulation I, while for model 2, values of k and m were chosen to 10,000, and 50,000, respectively. For model 3, k and m were 10,000 and 60,000, respectively. Only samples where
> 0 were accepted.
Model Evaluation
For all models, each element of z (zi) was drawn from a Bernoulli distribution with probability
(), as indicated in [15]. After burn-in, the zi (considered to be draws from their marginal distributions) were averaged over Gibbs samples to obtain an estimate of the posterior probability of putative mastitis (PPM) for each observation.
Possible criteria for evaluating the mixture models were: 1) probability of misclassification (PMC), which is the probability of incorrect classification of health status averaged over rounds of the Gibbs sampler and 2) sensitivity and specificity of models. Sensitivity and specificity were defined as the probabilities of correct classification of IMI+ and IMI- cows, respectively. Sensitivity and specificity for cow i is Pr(zi = 1|ti = 1) and Pr(zi = 0|ti = 0) respectively, where
. These parameters were estimated by averaging over rounds of the Gibbs sampler, and then computing:
![]() | ([17]) |
![]() | ([18]) |
![]() | ([19]) |
PPMi is the posterior probability of putative mastitis for observation i, and n is the total number of observations.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Sensitivity, specificity, and PMC for the different settings of simulation I are presented in Table 2
. Sensitivity ranged between 0.53 (setting 3) and 0.94 (setting 9), whereas specificity varied between 0.87 (setting 4) and 0.98 (setting 1). Except for setting 7, sensitivity was lower than specificity, which could be explained by a Pm lower than 0.5 for those settings, indicating a lower probability of IMI+ than IMI-. The results suggest that for simulations with Pm < 0.5, the model identifies healthy animals with little error (high specificity), but that up to almost 50% of the diseased animals can be misclassified (setting 3). Overall, PMC ranged between 0.03 (settings 1 and 9) and 0.18 (setting 8). There was no clear relationship between the quality of the parameter estimates and the sensitivity, specificity, or PMC. As expected, the settings with the most overlapping mixture components (3 and 8) had the largest PMC.
Simulation II
Recall that in model 1 genetic and PE effects were ignored; model 2 fitted genetic and PE deviations as a cow effect (ignoring relationships), and model 3 accounted for both genetic (including relationships) and PE effects. Hence, in model 1 the residual variance is expected to be equal to the sum of all variance components, while in model 2 the variance of cow effects is expected to correspond to the sum of additive genetic and PE variance components. Posterior means of the parameters in the three models, averaged over five replicates, are presented in Table 3
. Location and dispersion parameters and Pm were returned with a high degree of accuracy, irrespective of whether or not the random effects were combined or modeled properly.
|
|
General Discussion
In our simulations, random effects were assigned identical distributions across mixture components, while in real data the covariance structure may be more complex. To account for this, the mixture model could be extended to allow for a covariance between random effects in the different mixture components. Using such models, breeding values for SCS response to udder infection could be predicted. Additionally, multivariate information could be incorporated in the analysis. Traits that could be potentially regarded as mixtures, depending on the cows udder health status, are electrical conductivity in milk and even test-day milk production. Adding traits to the analysis may yield additional information about the cows udder health status, thus increasing the probability of correct classification. Real SCS data may represent more than two groups (e.g., healthy, clinical mastitis, and subclinical mastitis) so it would be desirable to develop mixture models with more than two components.
Individual PPM from a mixture model might be a competing alternative to EBV for crude SCS in selection against susceptibility to mastitis. In this study, individual elements of the classification vector (z) were assumed independent a priori, and following the same Bernoulli distribution, implying identical prior probability of mastitis for all cows. However, genetic and PE effects on liability to mastitis have been reported (e.g., Heringstad et al., 2003), causing differing probabilities of mastitis between and within families. Hence, genetic selection based on PPM from a mixture model assuming a priori identical probabilities of mastitis for all observations might lead to distorted inferences. A challenge is to develop hierarchical mixture models that make use of family-specific and cow-specific putative liability to mastitis in estimation of the posterior probability of mastitis for each observation, instead of a single Pm, as in equation [15
].
| CONCLUSIONS |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication May 14, 2003. Accepted for publication June 30, 2003.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. ten Napel, Y. de Haas, G. de Jong, T. J. G. M. Lam, W. Ouweltjes, and J. J. Windig Characterization of distributions of somatic cell counts J Dairy Sci, March 1, 2009; 92(3): 1253 - 1264. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Madsen, M. M. Shariati, and J. Odegard Genetic Analysis of Somatic Cell Score in Danish Holsteins Using a Liability-Normal Mixture Model J Dairy Sci, November 1, 2008; 91(11): 4355 - 4364. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. B. Samore, A. F. Groen, P. J. Boettcher, J. Jamrozik, F. Canavesi, and A. Bagnato Genetic Correlation Patterns Between Somatic Cell Score and Protein Yield in the Italian Holstein-Friesian Population J Dairy Sci, October 1, 2008; 91(10): 4013 - 4021. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Boettcher, D. Caraviello, and D. Gianola Genetic Analysis of Somatic Cell Scores in US Holsteins with a Bayesian Mixture Model J Dairy Sci, January 1, 2007; 90(1): 435 - 434. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola, B. Heringstad, and J. Odegaard On the Quantitative Genetics of Mixture Characters Genetics, August 1, 2006; 173(4): 2247 - 2255. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Heringstad, D. Gianola, Y. M. Chang, J. Odegard, and G. Klemetsdal Genetic associations between clinical mastitis and somatic cell score in early first-lactation cows. J Dairy Sci, June 1, 2006; 89(6): 2236 - 2244. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Odegard, P. Madsen, D. Gianola, G. Klemetsdal, J. Jensen, B. Heringstad, and I. R. Korsgaard A Bayesian Threshold-Normal Mixture Model for Analysis of a Continuous Mastitis-Related Trait J Dairy Sci, July 1, 2005; 88(7): 2652 - 2659. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Boettcher, P. Moroni, G. Pisoni, and D. Gianola Application of a Finite Mixture Model to Somatic Cell Scores of Italian Goats J Dairy Sci, June 1, 2005; 88(6): 2209 - 2216. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Odegard, B. Heringstad, and G. Klemetsdal Short Communication: Bivariate Genetic Analysis of Clinical Mastitis and Somatic Cell Count in Norwegian Dairy Cattle J Dairy Sci, October 1, 2004; 87(10): 3515 - 3517. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |