|
|
||||||||
1 Institute of Biology and Biotechnology of Agriculture, National Research Council, 20090 Segrate, Italy
2 Department of Animal Pathology, Hygiene and Veterinary Public Health, University of Milano, 20133 Milano, Italy
3 Department of Animal Sciences, University of Wisconsin, Madison 53706
Corresponding author: Paul Boettcher; e-mail: boettch{at}ibba.cnr.it.
| ABSTRACT |
|---|
|
|
|---|
Key Words: goat mastitis somatic cell count finite mixture model
Abbreviation key: DIC = deviance information criterion, FMM = finite mixture model, PMC = proportion of misclassification, SLM = standard linear mixed effects model.
| INTRODUCTION |
|---|
|
|
|---|
Recent work has been done to extend the ideas of Detilleux and Leroy (2000) for development of an FMM for SCS. Gianola et al. (2004) refined the models of Detilleux and Leroy (2000), and proposed algorithms for parameter estimation using maximum likelihood. Ødegard et al. (2003) outlined a Bayesian approach for solving FMM, and applied it to simulated data for SCS. In their simulations, Ødegard et al. (2003) evaluated the ability of their procedure to estimate various parameters of the model (such as variances of different effects) and the quality of classification of individual records as belonging to one of the 2 components of the FMM. The objectives of our study were to apply an approach similar to that of Ødegard et al. (2003) to real data for SCS, and to compare the fit of the FMM to the data with that of a single-component, mixed effects, standard linear model (SLM). Data from goats were used in this study, which presented particular challenges to the FMM, inasmuch as SCS are considered less reliable indicators of mastitis in cattle than in goats (e.g., Poutrel and Lerondelle, 1983; Droke et al., 1993; Aleandri et al., 1996). As an additional test of the FMM, data for bacterial infection were used to evaluate the ability of the FMM to correctly classify observations as being from healthy or infected udder halves.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The resulting data set included 4518 test-day records, with an average of 7.3 test-days per goat. The SCC were converted to SCS using the standard log 2 transformation of Ali and Shook (1980). Bacterial infections could be assigned to 1 of 5 categories: 1) healthy (no bacteria present), 2) infected with Staphylococcus aureus, 3) infected by a CNS species, 4) infected by Streptococcus species, and 5) infected by another bacterial species. In general, bacteria falling into the final category were environmental pathogens.
Statistical Analyses
Data for SCS were analyzed using 2 models. The first model was an SLM, where all observations were assumed to follow a normal distribution, with appropriate covariates included to represent heterogeneity in the population. The second model was an FMM with 2 components. Here, each observation was assumed to be drawn from 1 of 2 distinct distributions differing according to their mean, but without knowing which of the 2 processes generated the draw. The distribution of observations from the first component was restricted to have a lower mean, and was interpreted to be data representing uninfected udder halves. The distribution of observations in the second component of the FMM had a higher mean, and was assumed to represent records from infected udder halves. Using notation similar to that of Ødegard et al. (2003), both models can be described in matrix form by the following equation:
![]() | ([1]) |
where y is a vector of observations of test-day SCS; ß0 is a vector of fixed effects common to all udder halves, regardless of infection status; ß1 is a vector of fixed effects common to infected udder halves; Mz is a diagonal matrix of indicator variables with elements M(i,i) = 0 for records corresponding to the first component (healthy, with prior probability 1 Pm, where Pm is the mixing proportion or unknown proportion of putatively infected records) or M(i,i) = 1 for records from the second component (infected), with prior probability Pm ; g is a vector of random goat effects for all observations; h is a vector of random udder-half effects; X0, X1, Zg, and Zh are incidence matrices of appropriate order, and e is a vector of random residuals. Goat effects were assumed to be uncorrelated, as genetic relationships were not considered in either model because pedigree information was not available for approximately 50% of the animals in the study, and only a small fraction of recorded sires were used in more than one herd.
Elements in the ß0 vector included 3 regression coefficients for the effect of DIM on SCS, and class effects of herd-test-day (50 levels), parity group, and udder side (left or right). The regression coefficients for DIM were those of Wilminks (1987) curve, i.e., an intercept and regressions on DIM, and on exp(0.05 DIM). Parity groups consisted of separate classes for first through fifth lactations and a pooled class for all animals with 6 or more lactations. The ß1 vector included a single element, which could be interpreted as the difference in mean values of the 2 components (i.e., between infected and healthy udder halves). The SLM is obtained by taking Mz in [1] to be a null matrix.
All variances were assumed to be homogeneous for both the SLM and the FMM. The conditional distribution of y, given the fixed and random effects, was assumed to be:
![]() | ([2]) |
where z is a vector containing the diagonal elements of Mz indicating group membership and
,
,
are variances for goat, udder-half, and residual, respectively. The g and h vectors had conditional (upon
and
) normal distributions with null expectations and variance-covariance matrices I
and I
, respectively.
Posterior distributions of the various parameters were obtained via Gibbs sampling using minor modifications of the Bayesian approach described by Ødegard et al. (2003). Briefly, prior distributions were Bernoulli for all elements of the classification vector z, uniform for fixed effects (bounded by ± 100,000), normal distributions for random effects, scale-inverted
2 for variances, and a Beta distribution for the mixing proportion Pm. All random effects had prior means = 0. Prior variances were obtained from Maximum Likelihood estimates using the MIXED procedure of SAS (SAS Institute, Inc., Cary, NC) (our unpublished results) and all
2 distributions had 5 degrees of freedom. The Beta distribution is defined by two parameters,
1 and
2, and these were assigned a value of 2 each, following the approach of Ødegard et al. (2003).
The Gibbs sampler was then implemented as follows: 1) Fixed effects were sampled piecewise from univariate normal distributions; 2) random effects were sampled from univariate normal distributions; 3) the between-goat variance was sampled from a scale-inverted
2 distribution; 4) the variance between udder halves was sampled from a scale inverted
2 distribution; 5) the residual variance was sampled from a scale-inverted
2 distribution; 6) group membership variables (i.e., elements of z) were sampled from Bernoulli distributions; and 7) the mixing proportion Pm was sampled from a Beta distribution. In step 1, ß1 was forced to be > 0, to ensure identifiability and absence of "label switching". In other words, the mean for observations in the second (infected) component was forced to be greater than that for the first component. For group membership (step 6), the Bernoulli distribution differed for each observation and was defined by a parameter pi, the conditional probability that the record was assigned to the infected group, given yi and the values for the other parameters in the model. Specifically,
![]() | ([3]) |
where
' = [ß'0 ß'1 g'h'],
![]() | ([4]) |
and
![]() | ([5]) |
where the incidence vectors are rows of the appropriate incidence matrices.
Steps 1 to 7 were repeated many times for each chain of the Gibbs sampler. (Only steps 1 to 5 were needed for the single component model.) For this study, 2 chains of 205,000 cycles were generated for each model. The first 5000 rounds from each chain were discarded as a burn-in, to ensure sampling from the desired marginal distributions. Posterior means of the various parameters were calculated based on retaining each 20th of the remaining cycles, i.e., 10,000 draws were used to estimate features of posterior distributions of interest.
Model Comparison
The 2 models were compared using the deviance information criterion (DIC) presented by Spiegelhalter et al. (2002). The DIC is defined as
![]() | ([6]) |
where D is the posterior expectation of the Bayesian deviance and pD is a measure of the effective number of parameters in the model. Thus, the DIC considers both the fit of the model (as measured by D) and the number of penalties pD (proportional to number of parameters), which is necessary because models with an increased number of penalties have smaller deviance. Smaller values of DIC indicate a better fit of the model to the data.
Specifically,
![]() | ([7]) |
where
![]() | ([8]) |
We obtained D by calculating D(
) every 2000 cycles (after the first 5000 cycles) and then averaging across all cycles. In addition,
![]() | ([9]) |
where D(
) is the Bayesian deviance (equation [8]) evaluated at the posterior mean of the parameters in
. The posterior mean of
was calculated by averaging realizations of elements in
obtained every 500 cycles.
Classification Ability
As previously mentioned, bacterial culture data were available on a test-day basis for all udder-halves contributing data to the study. These records allowed for assessment of the ability of the FMM to correctly classify udder halves as healthy or infected on the date on which SCC was measured. Following Ødegard et al. (2003), 3 measures of classification ability were defined: 1) proportion of misclassification (PMC), 2) sensitivity, and 3) specificity.
The PMC was calculated for all observations. The vector z of classification variables was recalculated in each cycle of the sampler. To obtain the PMC, values of zi for each observation were summed across all cycles and divided by the number of cycles, yielding m, a vector of the proportion of times each observation was assigned to the second component of the FMM. For each observation i from uninfected udder halves, PMCi = mi. For infected udder halves i, PMCi = (1 mi). The mean PMC was obtained by averaging the PMCi. Sensitivity was the probability of correct classification of infected udder halves. Therefore, sensitivity was calculated by averaging mi for observations from the infected udder halves. Specificity was the probability of correct classification of healthy udder halves, and was 1 minus the average of mi for these observations. These classification statistics are available only for the FMM, so they were compared with values that would have been obtained by random classification, based on observed rates of infection. Finally, sensitivity of the FMM was calculated according to the type of bacteria causing the infection, inasmuch as different species of bacteria may induce different responses in terms of SCS.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
2g/(
+
+
)] was 0.31 for the SLM, and a similar calculation would yield 0.55 for the FMM. However, the correct expression for repeatability in our 2-component mixture model is
|
![]() |
because the variance is mean-proportion dependent in a mixture model; µ0 and µ1 are the true means of SCS under the 2 distributions. On this basis, repeatability for the FMM was roughly 0.34, slightly larger than for the linear mixed effects model.
In addition to having a smaller residual variance and a slightly larger repeatability, the FMM had a lower DIC than did the SLM. The DIC for the FMM was 21,679 vs. 55,834 for the SLM. As explained previously, a lower DIC is preferred in model comparison. The lower DIC of the FMM indicates that the improvement in fit to the data afforded by the FMM overwhelms the additional complexity introduced by the extra parameters.
Classification Ability
As mentioned previously, no classification is possible with the SLM, so the classification ability of the FMM was compared with random classification of observations based on observed rates of infection. Approximately 45% of observations were misclassified with the FMM. This compares to about 48% PMC if observations had been assigned to components randomly, based on the observed proportion of healthy and infected udder halves. The proportion of 48% was obtained by assuming that 40% of the truly healthy udder halves would have been randomly assigned to the infected group and 60% of the observations from infected udder halves would have been randomly assigned to the healthy group.
The advantage of the FMM was in the classification of infected udder halves (sensitivity), which was 48% for the FMM, relative to 40% with random classification. In contrast, specificity of the FMM was about 60%, equal to that for random classification of healthy udder halves. The rates of sensitivity and specificity observed in this study were much lower than the values obtained by Ødegard et al. (2003), and our level of PMC was much greater. However, a direct comparison between the 2 studies is not feasible. Ødegard et al. (2003) used simulated data, patterned after mastitis in dairy cattle where the proportion of healthy animals is around 70 to 85% and in which a greater proportion of mastitis infections show clinical symptoms.
The rates of sensitivity and specificity observed in this study are not high enough to justify the use of an FMM to diagnose individual cases of mastitis and identify animals that are candidates for treatment, at least in our goat population. Although sensitivity was greater than what could be achieved by random assignment, it was still below 50%. In addition, the FMM analysis (as described here) can only be applied accurately after the fact, based on a large body of data, and thus too late to be of practical value for identification of infected animals. However, diagnosis of infections is not the sole purpose of such an analysis. The FMM represents a statistical improvement over the SLM, and thus may contribute to increased precision of genetic evaluations.
Rates of sensitivity varied according to the type of bacteria (Table 3
). Sensitivity was greatest for udder halves infected with Staph. aureus (59.1%). Previous researchers have reported strong associations between infections by Staph. aureus and elevated SCS, both at the cow (Schepers et al., 1997) and herd (Barkema et al., 1998) levels; Staph. aureus was also associated with the highest increase in SCS among the pathogens present in these herds (Moroni et al., (accepted)). Sensitivity for CNS infections was near the mean, 48%, for all infections, as these bacteria are generally associated with a smaller increase in SCS than that produced by Staph. aureus. Sensitivity for environmental bacteria was nearly equal to that expected with random assignment to groups. Environmental bacterial infections are characterized by a spike in SCS followed by a relatively quick return to normal levels (de Haas et al., 2002).
|
|
This empirical study leaves questions that must be answered before any practical application. The next logical step would be to increase the complexity of the FMM used in this analysis and to apply it to other data. A first extension would be to include genetic effects. Ødegard et al. (2003) applied such a model to simulated data, but no applications to real data have been reported yet. Another useful generalization would be to allow for heterogeneous variances. Ødegard et al. (2003) also applied this extension, but to simulated data only. Experimentation with heterogeneous variances for genetic effects and an imperfect genetic correlation between the alternative states could be particularly valuable for practical applications (Gianola et al., 2004). Somatic cell scores may be different traits genetically for healthy and infected animals (Detilleux and Leroy, 2000), and an FMM with heterogeneous variances may be the only way to account for these different traits if mastitis incidence is not recorded.
Following development of a more complex FMM for SCS, the next critical step would be evaluation of the biological meaning of results. Our study indicates that the FMM is superior to the SLM from a statistical point of view, and that it may result in increased heritability and more precise genetic evaluations. From a practical standpoint, the advantages of applying the FMM may be greater for dairy cattle than for dairy goats. Although SCS tends to be elevated among goats with udder infections (e.g., Leitner et al., 2004; Moroni et al., (accepted)), the effect of infection on SCS tends to be greater in dairy cattle. For a number of reasons, including higher natural levels of SCC in goats milk than in cows milk (Droke et al., 1993), poor specificity of cell counting machinery (Poutrel and Lerondelle, 1983), and strong effects of other factors such as estrus (Aleandri et al., 1996) on SCC in goats, SCC is considered a less reliable indicator of mastitis in goats than in cattle. In fact, routine genetic evaluation of goats for SCC is essentially nonexistent.
Regardless of the species to which an FMM analysis of SCS is applied, certain practical issues must be considered. The EBV for SCS derived from an FMM may not have the same genetic meaning as EBV from the SLM. Males with high (unfavorable) EBV for SCS generally have high EBV not simply because their daughters have relatively greater SCS than herdmates in both the infected and uninfected states, but also because they tend to have more daughters in the infected state. As suggested by Ødegard et al. (2003), the FMM could be extended to allow for genetic influences on the mixing proportion, Pm, which could thus capture differences in sires for the probability that their daughters records will be assigned to the healthy or infected group. Such an extension was recently presented by Ødegard et al. (accepted). Once an appropriate FMM for SCS is arrived at, indices would need to be developed, so that genetic response for mastitis resistance can be optimized.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication November 9, 2004. Accepted for publication December 19, 2004.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. J. Boettcher, D. Caraviello, and D. Gianola Genetic Analysis of Somatic Cell Scores in US Holsteins with a Bayesian Mixture Model J Dairy Sci, January 1, 2007; 90(1): 435 - 434. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Gianola, B. Heringstad, and J. Odegaard On the Quantitative Genetics of Mixture Characters Genetics, August 1, 2006; 173(4): 2247 - 2255. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Moroni, C. S. Rossi, G. Pisoni, V. Bronzo, B. Castiglioni, and P. J. Boettcher Relationships between somatic cell count and intramammary infection in buffaloes. J Dairy Sci, March 1, 2006; 89(3): 998 - 1003. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |