|
|
||||||||
1 Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, N-1432 Ås, Norway
2 Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences, Research Centre Foulum, DK-8830 Tjele, Denmark
3 Department of Animal Sciences, University of Wisconsin-Madison, Madison 53706
Corresponding author: J. Ødegård; e-mail: jorgen.odegard{at}umb.no.
| ABSTRACT |
|---|
|
|
|---|
Key Words: Bayesian methods mastitis mixture model somatic cell score
Abbreviation key: IM = standard univariate model for SCS ignoring the mixture, LNM = liability normal mixture, NM = mixture model ignoring the structure of the underlying liability to mastitis.
| INTRODUCTION |
|---|
|
|
|---|
Conceptually, one may think of a "baseline" SCS level that is affected by systematic (e.g., herd-year-season, age at calving, stage of lactation) and random (e.g., genetic, permanent environment) sources of variation. Mastitis would produce a shift away from the "baseline" level peculiar to a given combination of explanatory factors. Thus, an observed test-day SCS can be regarded as resulting from effects of several possibly confounded factors, such as "baseline" SCS (a continuous trait) and of shifts caused by a binary process (IMI, IMI+). Ødegård et al. (2003) and Gianola et al. (2004) described a mixture model for SCS containing fixed and random effects affecting "baseline" SCS, with mastitis producing a change (typically upward) in the mean of the distribution. Based on simulated data, Ødegård et al. (2003) found that when residual variances were homogeneous, the model yielded seemingly unbiased estimates of most parameters; when these variances were heterogeneous between healthy and diseased animals, some bias was detected. This bias was no longer detectable after a programming error was found and was corrected subsequent to publication. In Ødegård et al. (2003), it was assumed that the random variable assigning observations to putative IMI and IMI+ classes was independently distributed as Bernoulli, having the same a priori probability for all animals and all observations (i.e., test-day scores) within animal. Subsequent analyses of clinical mastitis field data have indicated that systematic (e.g., herd-year-season, age at calving, stage of lactation) and random (e.g., genetic, permanent environment) effects affect the probability of mastitis infection (Chang et al., 2004). Hence, assigning the same Bernoulli distribution, i.e., the same prior probability of group membership, to all animals at all test-days is not realistic. Thus, factors affecting probability of group membership should be included in the model to arrive at an improved statistical description.
Our objective was to extend the model of Ødegård et al. (2003), where SCS is treated as a 2-component mixture, by introducing a hierarchical structure for the group membership variables. In particular, the probability of membership in the IMI+ class, e.g., is allowed to vary with "fixed" and random effects that are distinct from those affecting SCS. The behavior of the model and of the procedures used for inference was examined with simulated data. The proposed model can be used for prediction of breeding values for liability to mastitis derived from observed SCS.
| MATERIALS AND METHODS |
|---|
|
|
|---|
), and probabilities (P) has the mixture density:
![]() | ([1]) |
where P = [P1, P2,..., Pn]', Pi is the a priori probability that SCSi is drawn from distribution N() (diseased), and (1 Pi) is the a priori probability that SCSi is drawn from N*() (healthy). Ødegård et al. (2003) assumed Pi = P, for all i; in this study, Pi may differ between observations. Further, fi(
), gi(
), fi*(
), and g i * (
) are functions of the parameter vector
. Typically, fi(
) and f i * (
) are linear combinations of fixed and random effects; gi(
) =
2e1SCS and g i * (
) =
2e0SCS for all i = 1, 2, ..., n, where
2e0SCS and
2e1SCS are variance parameters. Given
and P, observations were assumed to be conditionally independent, so that the joint density of the data vector SCS is
![]() | ([2]) |
Estimation, either by maximum likelihood (Gianola et al., 2004) or by Bayesian (Ødegård et al., 2003) approaches, is facilitated by augmenting the density described above with auxiliary binary indicator (IMI
0, IMI+
1) variables Zi (i = 1, 2, ..., n). Assuming that the indicator variables are product Bernoulli and conditionally independent a priori, one can write
![]() | ([3]) |
where Pr(zi = 1|Pi) = Pi is the prior probability that the health status of observation i is IMI+, allowing for individual prior probabilities of mastitis, which is an extension of Gianola et al. (2004) and Ødegård et al. (2003).
In our "extended" mixture model, we postulate an underlying continuous random variable, called liability (
), which determines the mastitis status associated with each observation (Zi) depending on the value of liability relative to a fixed threshold. This is called a threshold-liability model (Wright, 1934; Dempster and Lerner, 1950; Falconer, 1965; Gianola, 1982; Gianola and Foulley, 1983), which has been used for genetic analysis of clinical mastitis as a binary response (e.g., Heringstad et al., 2003). Here, liability is incorporated into what we term as a liability-normal mixture (LNM) model. Both in LNM and in standard threshold models, it is assumed that mastitis status switches from 0 to 1 if liability exceeds a given threshold T. In the standard threshold model, data consist of observed binary responses (e.g., presence or absence of clinical mastitis), whereas, in the LNM model, data consist of observed SCS, which is a continuous valued variable. However, the true distribution of SCS switches from N*() to N() according to mastitis status, and putative mastitis status may be inferred from the observed SCS. Thus, in the LNM model, the a priori probability of IMI+ for a specific SCS observation i can be written as
![]() | ([4]) |
so that
![]() |
where
i is the liability variate and T is the threshold (we take T = 0). Thus, in this model, Pi depends on some of the elements in
.
In the following, we argue that for this model, it is reasonable to assume that
![]() | ([5]) |
That is, given Z = z, liability is conditionally independent of SCS. In other words, given Z = z, which in turn is inferred from SCS, and
, SCS does not convey any additional information about liability. This implies that the residual correlation between SCS and liability should be assumed to be zero.
Modeling SCS and Liabilities
Let
![]() |
where
, a = [a'SCS a'
]', and p = [p'SCS p'
]' are vectors of fixed (ß), random additive genetic (a), and random permanent environmental (p) effects on SCS and liability to mastitis. Further, G0 is the 2 x 2 additive genetic (co)variance matrix between SCS and
, and P0 is the 2 x 2 (co)variance matrix of permanent environmental effects;
2e0SCS and
2e1SCSare the residual variances of SCS in IMI and IMI+ classes, respectively. Given disease status (Z) and
, SCS can be modeled as a standard Gaussian trait, allowing for heterogeneous residual variance for the different disease categories. The density of the conditional distribution of all SCS, given Z = z and
is
![]() | ([6]) |
where the indicator I(
i > T) = 1 if
i > T, and 0 elsewhere, and I(
i
T) = 1 if
i
T and 0 elsewhere. It is assumed that
![]() |
where x' and w' with appropriate subscripts are incidence row vectors. Further, it is assumed that the
i, given
, are mutually independent so that the density of the
vector, given
, is
![]() | ([7]) |
where the distribution of
i, given
, is
![]() |
For reasons of identifiability, it will be assumed that
2e
= 1. With this structure, [4] is equivalent to
![]() | ([8]) |
where
() is the standard normal cumulative distribution function and
i is the expectation of the liability of observation i, conditionally on ß
, 
, and p
. Thus, in our extended model, Pi is a function of location parameters affecting liability (included in
), which may differ between observations.
Bayesian Structure
Conditional density of SCS.
The conditional distribution of the data given the parameters (P,
) is p(SCS |P,
) as in [2]. Further, the conditional distribution of SCS, given
and Z, is given in [6]. Also, from [5], p(SCS |
,
, Z = z) = p(SCS |
, Z = z).
Prior density of all unknown parameters.
The joint prior density of all unknown parameters, including the liabilities (
) as unknowns, is
![]() | ([9]) |
Note that
and
. Hence, given
, Z is completely specified, and Pr(Z = z |
) is, therefore, a degenerate distribution. The density p(
|
) is defined in [7], and the density p(
) is
![]() | ([10]) |
where
were assigned bounded uniform priors. To achieve reasonably vague priors, the absolute values of the bounds were large. Further, to avoid "label-switching" problems (McLachlan and Peel, 2000), constraints must be imposed on parameters of the SCS distributions of putative IMI and IMI+ animals; for example, the mean SCS in the IMI group was set to be lower than that in the IMI+ group.
Additive breeding values (a) and permanent environmental effects (p) were assumed to be normally distributed, with the densities
![]() | ([11]) |
and
![]() | ([12]) |
where A is the additive relationship matrix with dimension qa, I is an identity matrix with dimension qp (number of individuals with at least one SCS record), and G0 and P0 are as previously defined.
Joint posterior density.
The augmented joint posterior density of all unknowns is
![]() | (13) |
Fully conditional posterior distributions.
Knowledge of the fully conditional posterior distribution of each parameter (or block of parameters) is required for implementing a Gibbs sampler. Given Z and
, all fully conditional posterior distributions have a standard form, and, therefore, are easy to sample from (Sorensen and Gianola, 2002). Specifically, given Z, the required posterior distributions are as in a linear Gaussian model: 1) the conditional posterior distribution of each element of ß is normal, truncated in some interval [d1, d2]; 2) the conditional distributions of both a and p are multivariate normal; 3) the densities of G0 and P0 are inverse Wishart, and 4) the conditional distributions of
2e0SCS and
2e1SCS are inverse gamma.
Further one has
![]() | ([14]) |
with p(
|
, Z = z, SCS) = p(
|
, Z = z), because the liabilities are independent of SCS, given
and Z = z. Now, the probability
![]() |
is a product of n independent Bernoulli distributions with success probability
i. As Zi = 1 is equivalent to
i > 0:
![]() | ([15]) |
likewise
![]() | ([16]) |
The parameter (
i) of the fully conditional posterior distribution of Zi (Bernoulli) is therefore
![]() | ([17]) |
Hence, Zi can be sampled from a Bernoulli distribution with probability [17]. Subsequently, given Zi,
i can be sampled from a distribution with density
![]() |
Given Zi, the term p(SCSi |
, Zi = zi) does not involve
i and is thus a constant. Hence,
![]() | ([18]) |
In other words, the fully conditional distribution of
i, given Zi, is a truncated standard normal distribution TN(
i|
), with right truncation for Zi = 0 and left truncation for Zi = 1.
Implementation of a Gibbs Sampler
The following steps describe how Gibbs sampling can be conducted for the LNM model.
i as described in [19], for i = 1, 2, ..., n.
as for a standard Gaussian bivariate model (for SCS and
), assuming a zero residual covariance between the traits.
i, as described in [17], and sample Zi from a Bernoulli distribution with parameter
i for i = 1, 2, ..., n.
Simulation Study
Four different scenarios were simulated. Input parameters are given in Table 1
. In all settings, SCS was assumed to follow a 2-component mixture, which depended on a putative mastitis status, determined fully by an underlying liability. Both SCS and liability to mastitis were simulated with additive genetic and permanent environmental effects in addition to a random residual following standard procedures. The difference between means of the 2 normal distributions in the SCS mixture equaled 2 units for all scenarios, and residual variance for SCS was assumed homogeneous. Four generations, each consisting of 800 cows, progeny of 10 sires, and 10 records per cow, were simulated. No selection was applied, sires were recruited from 10 different bull dams, and all sires in each generation had the same probability of siring offspring of both sexes. Each cow was replaced by a daughter, and mating was at random. Inbreeding was ignored. Intended mastitis frequency was set to 25%.
|
The following LNM model was used in the genetic analyses, given the vector of indicator variables z:
![]() |
where µSCS and µ
are mean values of SCS and liability, respectively, and
is the difference between means of the 2 components in the mixture (effect of mastitis on SCS). For comparison purposes, a mixture model ignoring the structure of the underlying liability to mastitis (NM) (Ødegård et al., 2003) was also fitted:
![]() |
The following standard univariate model for SCS, ignoring the mixture (IM) was fitted as well
![]() |
In the LNM and NM models, features of the posterior distributions were estimated with Gibbs sampling. A chain consisting of 10,000 burn-in rounds and 100,000 sampling rounds was run. In the IM model, parameters were estimated with the AI-REML algorithm. All computations were carried out using a modified version of the DMU package (Madsen and Jensen, 2005).
The 2 mixture models (LNM vs. NM) were compared on the basis of their ability to classify SCS observation into correct disease categories, defined as sensitivity (average posterior probability of IMI+ for truly diseased animals) and specificity (average posterior probability of IMI for truly healthy animals), ability to return simulated parameter values (LNM vs. NM), and accuracy of selection (LNM vs. IM). Accuracy of selection was defined as the correlation between the selection criterion and the true breeding value for liability to mastitis.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
When using a LNM model, sensitivity (probability of classifying a diseased animal as diseased) and specificity (probability of classifying a healthy animal as healthy) were slightly increased relative to the NM model (Table 2
). In the LNM model, sensitivity ranged from 0.645 to 0.664, and specificity varied between 0.881 and 0.893. Because we assumed a frequency of mastitis of 0.25, the probability of an error of status classification is given by [0.25 x (1 sensitivity) + 0.75 x (1 specificity)]. The implication for the best-case scenario (the model fitting perfectly, as in the LNM model) is that the expected frequency of errors of classification would be around 0.25 x 0.34 + 0.75 x 0.11 = 0.17. When more complicated explanatory structures (e.g., herd-specific effects, age at calving, and stage of lactation) are needed to describe the data, modeling correctly the underlying liability to mastitis might be more crucial, because such factors may have rather different effects with respect to baseline SCS and liability to mastitis.
|
|
The proposed LNM model could be extended to multivariate mixtures, i.e., several mixture traits depending on the same status variable (e.g., SCS and electrical conductivity in milk) or on different indicator variables (e.g., SCS and a trait affected by a QTL). Similarly, it is of interest to develop models for joint analysis of mixture and non-mixture traits (e.g., SCS and longevity). More advanced mixtures consisting of > 2 components (e.g., mastitis caused by different pathogens) may be developed as well.
In addition to selecting cows that are able to avoid infection, it is also desirable to improve, genetically, the cows ability to recover from infection. This may be achieved by developing a mixture model where the magnitude of the SCS response to infection has a genetic component, which, in turn, may be related to the probability of recovery from disease.
| CONCLUSION |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication November 23, 2004. Accepted for publication March 18, 2005.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Madsen, M. M. Shariati, and J. Odegard Genetic Analysis of Somatic Cell Score in Danish Holsteins Using a Liability-Normal Mixture Model J Dairy Sci, November 1, 2008; 91(11): 4355 - 4364. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Boettcher, D. Caraviello, and D. Gianola Genetic Analysis of Somatic Cell Scores in US Holsteins with a Bayesian Mixture Model J Dairy Sci, January 1, 2007; 90(1): 435 - 434. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola, B. Heringstad, and J. Odegaard On the Quantitative Genetics of Mixture Characters Genetics, August 1, 2006; 173(4): 2247 - 2255. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Ng, G. J. McLachlan, K. Wang, L. Ben-Tovim Jones, and S.-W. Ng A Mixture model with random-effects components for clustering correlated gene-expression profiles Bioinformatics, July 15, 2006; 22(14): 1745 - 1752. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Heringstad, D. Gianola, Y. M. Chang, J. Odegard, and G. Klemetsdal Genetic associations between clinical mastitis and somatic cell score in early first-lactation cows. J Dairy Sci, June 1, 2006; 89(6): 2236 - 2244. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |