|
|
||||||||
,1
,


* Interbull Centre, Department of Animal Breeding & Genetics, SLU, Box 7023, Uppsala 75007, Sweden
Union Nationale des Coopératives agricoles dElevage et dInsémination Animale, Paris 75595, France
Institut de lÉlevage, Département de Génétique, INRA-SGQA, Jouy-en-Josas 78352, France
Station de Génétique Quantitative et Appliquée, INRA, Jouy-en-Josas 78352, France
1 Corresponding author: helene.leclerc{at}jouy.inra.fr
| ABSTRACT |
|---|
|
|
|---|
Key Words: genetic correlation matrix structural model international evaluation
| INTRODUCTION |
|---|
|
|
|---|
In international dairy sire evaluation, traits are currently defined according to country borders, even though the underlying trait (e.g., milk yield) is quite similar for all countries. Thus, the expression of this trait in different countries tends to be highly correlated. To avoid computational difficulties, structural models are often proposed as an alternative to the classical approach (Rekaya et al., 2001; Delaunay et al., 2002). Here, the basic idea behind structural models is to describe the full genetic covariance matrix as a function of fewer parameters. Rekaya et al. (2001) suggested the use of external information to characterize production systems in different regions or countries to describe genetic covariances. However, the use of external information on climate conditions, management practices, and genetic composition of the cow population to measure similarities across regions or countries is ambiguous due to lack of uniformity in recording such information across countries. In the structural model proposed by Delaunay et al. (2002) as a part of the Production Traits European Joint Evaluation project (Canavesi et al., 2002), genetic correlations between countries are described as a simple function of unspecified country characteristics that can be mapped in a space of limited dimensions. The link function used by Delaunay et al. (2002) to define the correlation between two countries was the exponential of minus the Euclidian distance between the coordinates of two countries. However, there was some concern about the fact that the use of distances imposed important constraints, because not all correlation matrices can be described with such an approach (Delaunay et al., 2002; Minéry et al., 2003; M. E. Goddard, Univ. Melbourne, Australia, personal communication).
The objective of this study was to present and test the structural model of Delaunay et al. (2002) and a variant of it in the context of international dairy sire evaluations on simulated data and field data for different levels of correlations between countries.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
![]() | [1] |
where yijk is the simulated performance of cow i in herd k of country j; bk is the herd effect (25 herds per country and per generation) with bk ~ N(0,1); aij is the true additive genetic value of cow i in country j with
being the cows Mendelian sampling, a ~N(0,G
A) where G is the across-country genetic (co)variance matrix and A is the additive genetic relationship matrix between animals; and eijk is the residual with e ~ (0,I
ej2) where
ej2 is the residual variance for country j. For all countries, genetic and residual variance were fixed such as
a2 = 0.25 and
e2 = 0.75, that is, a resulting heritability value of 0.25.
Field Data.
Field data were used to assess the generality of the results obtained on simulated data for which the populations structure was not realistic. Data available were the deregressed national breeding values of the bulls used for the Holstein international genetic evaluation. Milk yield analyses were based on data of August 2003, and type trait analyses (foot angle) were based on data of November 2003. The deregression procedure was done by Interbull from the national breeding values sent by the participating countries. It removed the double counting of effects that subsequently were included in the prediction of international breeding values (Jairath et al., 1998). Each deregressed breeding value was weighed in the analyses by their effective daughter contribution (EDC), which considers contemporary group size, correlations between repeated records, and the reliability of the daughters dam evaluation (Fikse and Banos, 2001). These EDC were sent by each country to Interbull.
Data were edited to include only national evaluations for bulls born after 1984. All the observations were included in the estimation of genetic correlations, in contrast with the current Interbull practice based on subsets of well-connected bulls.
Data from 22 Interbull member countries (Australia, Belgium, Canada, Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Ireland, Israel, Italy, New Zealand, Poland, Spain, South Africa, Switzerland, Switzerland Red, The Netherlands, United Kingdom, and United States) were used for the milk production study. These data sets are characterized in Table 2
. The selected countries represented a wide range of production systems and environments. Links between countries were variable, with a number of bulls with daughters in 2 countries (hereafter referred to as common bulls) ranging from 0 (e.g., EstoniaFinland) to 772 (CanadaUnited States). For foot angle, data from 8 countries were selected (Australia, Canada, France, Germany, Italy, The Netherlands, United Kingdom, and United States). These data sets are characterized in Table 3
.
|
|
Model for Field Observations.
For the variance components estimation, the sire model used in the international genetic evaluations was applied (Schaeffer, 1994). In this MACE, traits in different countries are considered different traits that are genetically correlated. The linear model used was
![]() | [2] |
where yi is the vector of deregressed breeding values of bulls for country i, µi is the mean for country i, gi is the vector of genetic groups effects treated as fixed for the estimation of genetic correlations between countries, si is the vector of random sire transmitting abilities for country i, ei is the vector of random residuals, Zi is the sire incidence matrix, and Q is the matrix assigning sires to genetic groups.
For t countries, the variance-covariance matrices of the random effects are
![]() |
![]() |
where Di is a diagonal matrix with EDC weighting factors (Fikse and Banos, 2001) as elements,
ei2 is the residual variance for country i, A is the additive relationship matrix based on sire, maternal grand sire, and maternal grand-dam of the bull, with maternal grand-dam treated as missing and assigned to phantom parent groups,
si2 is the sire variance for country i, and
sij is the sire covariance between countries i and j.
Genetic groups for missing ancestors were formed, based on selection path, year of birth, and origin (defined following country borders). Small groups were merged together first by year of birth and then by origin to achieve a minimum group size of 500 bulls with unknown parents for the estimation of genetic correlations across countries.
Models for the Genetic Covariances.
Three different models were used for the genetic variance-covariance matrix: a classical model (CM), the structural model of Delaunay et al. (2002) referred to as SM(dXY), and a structural model derived from that of Delaunay et al., referred to as SM(dXY2).
In the classical model, the variance-covariance matrix was assumed unstructured. In the structural model of Delaunay et al. (2002), the covariances between countries were defined as a function of a set of unobserved variables (characteristics) for each country that condition the genetic correlations between countries. The country characteristics are represented in a space of k dimensions (k < number of countries), in which the coordinates of the countries are the unobserved characteristics. In this space, the genetic correlation between 2 countries, X and Y, (rGXY) was defined as
![]() | [3] |
where dXY is the Euclidian distance between countries X and Y, computed as
with PXi and PYi being the coordinates of countries X and Y, respectively, for axis i. According to this definition, the covariance between 2 countries X and Y is
XY =
X ·
Y·exp(dXY) with
X and
Y being the genetic standard deviations in countries X and Y, respectively.
To illustrate, consider 4 countries (A, B, C, and D). Their characteristics can be used to conceptually define the axes of a 3-dimensional space (Figure 1
). They are referred to as axis countries. Country A defines the center of the space. Adding country B determines the first axis. The inclusion of countries C and D position the second and third axes, respectively. For example, the correlation between countries B and C (rGBC) computed from their coordinates is
In this space, a fifth country, E, could be added without contributing to the definition of the space. Country E is referred to as an added country. For country E, only 3 coordinates need to be estimated to determine the 4 genetic correlations with the axis countries. Similarly, 3 additional coordinates need to be estimated for an added country F to determine the 5 additional genetic correlations (4 with the axis countries and one with country E). With this structural model, even if countries E and F do not have any direct link between them, the genetic correlation between them can be computed from the Euclidian distance dEF obtained from their coordinates in the 3-dimensional space by
|
![]() | [4] |
The use of the square of the Euclidian distance (dXY2) allowed more flexibility in the model. It solved many of the cases where "triangular inconsistencies" occurred (Delaunay et al., 2002; Minéry et al., 2003; M. E. Goddard, Univ. Melbourne, Australia, personal communication). As an example, take 3 countries, A, B, and C. If rGAB = 0.90, rGAC = 0.70, and rGBC = 0.80, it is not possible to find country coordinates such that rGXY = exp(dXY). Indeed, on the "distance scale", it is impossible to have, at the same time, dAB = 0.105, dAC = 0.357, dBC = 0.223, because dAB + dBC < dAC. With rGXY = exp(dXY2), dAB = 0.325, dAC = 0.597, dBC = 0.472, and the reparameterization is possible (dAB + dBC > dAC). Another obvious restriction is that correlations must be between 0 and 1 in both cases.
The use of these structural models makes it possible to reduce the number of parameters to estimate for the genetic covariance matrix from
to
where m is the number of countries and k the 2 number of axes.
In the rest of this paper, CMm will represent a classical model used to estimate the genetic correlations among m countries. The terms SM(dXY)km and SM(dXY2)km will represent structural models for which the correlations among m countries are estimated based on the country coordinates in a space of dimension k, with the genetic correlations defined as in [3] and [4] respectively.
Algorithm.
An average information-REML (AI-REML) algorithm was used for parameter estimation (Johnson and Thompson, 1995). The main advantages of AI-REML are that it converges faster than the expectation maximization-REML algorithm used by Interbull and it provides asymptotic standard errors of the estimates, obtained as the inverse of the AI matrix. For the first population of simulated data, the ASREML software (Gilmour et al., 2002) was used for the estimation of parameters of both the classical model and the structural model SM(dXY). For the second population of simulated data and field data analyses, the AI-REML algorithm implemented by Druet et al. (2003a, b), which allows the user to define parametric structures for the random effects, was used for the estimation of the parameters. For the structural models, the genetic variance-covariance matrix was a nonlinear function of parameters (i.e., the coordinates), so the AI-REML algorithm used a simplified AI matrix, ignoring nonzero terms of the second derivative of the genetic (co)variance matrix (Gilmour et al., 1995).
In the ASREML software and in the AI-REML software used to analyze the field data, the update of the parameters was based on a line search procedure in which the step size was repeatedly divided by 2 until the likelihood increased (Dennis and Schnabel, 1983). For the classical model with field data, the update of the genetic covariance matrix was a combined AI-EM update if the AI-REML update alone lead to a nonpositive definite genetic covariance matrix (Jensen et al., 1996).
Some parameters could be forced to remain constant during the iteration process by setting to zero the first derivatives of the likelihood with respect to these parameters. By fixing coordinates for the axis countries, the coordinates for other countries estimated in different runs were relative to the exact same space.
Model Comparison.
For the simulated data, the structural and the classical models were compared with the true genetic correlations and on the basis of minus twice the logarithm of the likelihood (2logL). For the field data, the structural and the classical models were compared on the basis of the estimated genetic correlations, of minus twice the logarithm of the likelihood (2logL), and of 2 information criteria that take the number of parameters to estimate into account: the Akaikes information criterion (AIC; Akaike, 1974) and the Schwarzs Bayesian information criterion (BIC; Schwarz, 1978):
![]() |
![]() |
where q is the number of parameters, n is the number of observations, and p is the rank of fixed effects matrix computed as p = (number of genetic groups + 1) x number of countries. Although not necessarily the most accurate ones, the results obtained with the classical model were used as reference.
Analyses
Simulated Data.
For the first population, genetic correlations between countries estimated for the structural model SM(dXY) taking between 1 and 3 axes into account and a classical model were compared with the true values of genetic correlations. A first data set was simulated assuming a genetic correlation between all countries of 0.90; whereas in a second set, a genetic correlation between countries of 0.99 was simulated. This extreme correlation value makes it possible to check the ability of the structural models to deal with the estimation of genetic parameters very close to the border of the parameter space. For the second population, the data were simulated in such a way that genetic correlations between countries ranged from 0.66 to 0.97 (Table 4
). Genetic correlations between countries estimated for the structural models SM(dXY) and SM(dXY2) taking between 1 and 7 axes into account and a classical model were compared with the true values of genetic correlations. No replications of either scenario were made.
|
, if O is the center of the space, K, L, and M define the first, second, and third axis, respectively, as suggested by Minéry (2003). The increase of the number of axes was expected to give more flexibility to the model.
Field Data: Complete Data.
For milk yield, 2 specific structural models with a fixed number of axes were selected based on the results from the 9 well-connected countries: one for SM(dXY) and one for SM(dXY2). Genetic correlations estimated with the structural models were compared with estimates obtained for a classical model when all 22 countries described in Table 2
were included. Due to memory constraints, it was not possible to estimate correlations between more than 10 countries per run. Therefore, subsets including between 4 and 10 countries per run were created to estimate all correlations with CM. At least 1 or 2 countries providing many links, such as France, Germany, The Netherlands, and United States, were used in each subset. Only 211 of 231 genetic correlations could be computed with CM from the different subsets. For the 211 estimated country pairs, the number of estimates per country pair ranged from 1 (e.g., Czech RepublicFinland, IsraelSpain) to 28 (New ZealandUnited States) with, on average, 3.4 estimates per country pair. For country pairs with several estimates, we used the average genetic correlation. For missing genetic correlations (20 country pairs out of 231), which were not considered in this analysis, most of them involved countries with weak links; that is, Estonia, Finland, Israel, South Africa, and Switzerland Red.
To estimate the coordinates for the structural models, different subsets of countries were considered in addition to the axis countries. Here, the coordinates for axis countries were fixed. Therefore, the coordinates of other countries were estimated in the exact same space and were not influenced by small variations of coordinates for axis countries that were observed when the space was not fixed (Minéry, 2003). With the coordinates of all countries in the space defined by the axis countries, it was possible to compute the distance between all pairs of countries, and thus their genetic correlations.
| RESULTS |
|---|
|
|
|---|
|
|
For SM(dXY), only the models including at most 4 axes gave consistent results; that is, where the increase of the space dimension led to an increase of the likelihood (Table 7
). Model SM(dXY)29 appeared to be the most interesting compromise between the accuracy of estimated genetic correlations and the reduction of the number of parameters compared with the CM9 results (Table 8
). The lower likelihood observed for SM(dXY)29 was compensated by the reduction of parameters and led to the lowest BIC. Interestingly, the correlations estimated with this model did not deviate substantially more from the CM9 estimates than SM(dXY) with 3 or 4 dimensions (Table 8
). All deviations of correlations larger than 0.030 were for pairs of countries from the southern and northern hemispheres.
|
|
Complete Data.
Genetic correlations among all countries estimated with SM(dXY)7m were on average closer to the CM correlations than those estimated with SM(dXY)2m (Figure 2A
). With SM(dXY2)7m, 76.7% of the correlations deviated by less than 0.030 from the CM estimates, whereas this proportion was only 47.8% for SM(dXY)2m. Both models presented similar extreme deviations: 0.290 for SM(dXY)2m and 0.241 for SM(dXY2)7m, which were for pairs of countries with weak links to each other: PolandSwitzerland (30 common bulls) and HungaryIsrael (17 common bulls), respectively.
|
Foot Angle Field Data.
The 8 well-connected countries selected were Australia, Canada, France, Germany, Italy, The Netherlands, United Kingdom, and United States. As for milk yield results, both structural models were compared using a model including a limited number of axes [3] and a model including a large number of axes (7). Model SM(dXY2)78 had the same number of parameters to estimate as CM8 but it did not give values of 2logL, AIC, and BIC close to those of CM8 (+52.4). It was also noticed that the likelihood of SM(dXY2)78 was even lower than the one of SM(dXY)38, indicating computational inconsistency. For both structural models, it was found that different sets of starting values could lead to very different likelihood values, corresponding to different local maxima. For instance, with 4 sets of starting values, we obtained 4 different values of 2logL at convergence with SM(dXY)45 (ranging from +29.6 to +74.9 relative to 2logL for the classical model CM5).
| DISCUSSION |
|---|
|
|
|---|
The agreement between SM and CM correlation estimates was rather disappointing. The use of SM(dXY2)7m or SM(dXY)2m led to deviations with respect to CM of more than 0.030 for 23 and nearly 50% of the correlations, respectively. A more detailed examination showed that between 70 and 80% of the large deviations of these estimates (larger than 0.030) were for correlations computed among nonaxis countries. The accuracy of the calculated genetic correlations among nonaxis countries were much lower than the ones estimated among axis countries or between axis and nonaxis countries, and were lower than expected based on a previous study (Minéry et al., 2003).
According to the analysis of simulated data, the structural model allows the estimation of genetic correlations that are very close to the border of the parameter space, which was not possible with an unstructured model. The structural model intrinsically involves a restriction on correlations that is not included in the unstructured model. This study only considers the comparison of the results obtained from structural models against methods used by Interbull (AI-REML or expectation maximization-REML algorithm). Very high genetic correlations (e.g., larger than 0.95) are common in international evaluations, especially with traits very similar across countries such as milk, fat, and protein yield. Unfortunately, the structural models used in this study did not perform well for traits moderately correlated such as foot angle, whose definition varies considerably across countries (e.g., in Switzerland, another trait, heel depth, is used as a measure for foot angle). Low correlations lead to country coordinates that are further apart, exacerbating problems of triangular inconsistency.
The accuracy of genetic correlation estimates largely depended on the dimension of the parameter space defined by the axis countries. For milk yield field data, genetic correlations between countries were estimated quite accurately with a structural model SM(dXY) in examples involving a low number of axis countries. However, when a larger space was considered, including more axis countries to add more flexibility to the model, serious convergence problems appeared and correlations estimated with SM(dXY) deviated substantially from correlations estimated with the classical model. At least some of these problems are related to the evidence of local maxima, due to strong geometrical constraints or flat likelihood profiles. Such problems exist with SM(dXY2) even though geometrical constraints are lower. These problems indicate that the triangular inconsistency should not be overlooked. The constraints have consequences on the maximization procedure, with different maxima reached depending on starting values. In contrast, for SM(dXY2), the estimates were mostly accurate only with models involving a large number of axis countries. In this case, the reduction of the number of parameters to estimate is not as large as hoped (Minéry et al., 2003).
The choice of axis countries is another important issue for our structural models. The axis countries should represent most of the production systems that exist in the participating countries. The choice of axis countries was based on maximization of the volume of space defined by the coordinates of the axis countries. However, in view of the disappointing results obtained for some countries (e.g., Czech Republic, Estonia, Poland), it seems that some of the 22 countries were not correctly represented in the space defined by the 3 [for SM(dXY)2m] or 8 [for SM(dXY2)7m] well-connected countries that were chosen as axis countries. The motivation for working with a preliminary subset of 9 well-connected countries was to provide, through these countries, indirect links to the other countries, but it seems that this is not sufficient. Therefore, a compromise should be found between the amount of links between countries and their representativeness.
One of the expected advantages of our structural models was that only the coordinates of the country in the space defined by axis countries were needed to estimate correlations with all the other countries. This is an attractive property when a new country wants to join international evaluation or one of the other countries changes something in its genetic evaluation, because structural models avoid time-consuming estimation. From this perspective, structural models with a large number of axes (e.g., 8 countries), although preferred for their higher accuracy, are not appealing from a practical point of view because the probability that one of these countries modifies its evaluation is high, which means that the considered space is modified and all coordinates need to be reestimated.
The use of Euclidian distance to define our structural models imposes obvious restrictions because only positive correlations can be estimated. Alternative link functions exist to encompass the correlation range from 1 to 1 [e.g.,
or rGXY = 2exp(dXY)
1]
The geometrical restrictions imposed by the use of the Euclidian distance are strong. It can be shown that a similar structural model could be implemented to completely remove these constraints, enlarging the space to complex numbers. Unfortunately, in that case, the genetic correlation matrix is no longer ensured to be positive definite. However, as the number of participating countries to international evaluation is likely to keep growing, it seems obvious that unstructured correlation matrices are far from optimal. There are other structural models, such as models based on principal components or factor analysis, for which promising results have been obtained (Leclerc et al., 2005).
| CONCLUSIONS |
|---|
|
|
|---|
It is concluded that the structural models envisioned here are mainly interesting to deal with cases where correlation estimates are near the border of the parameter space, to get reasonable genetic correlations for countries with limited links with most of the others.
Received for publication June 28, 2005. Accepted for publication November 23, 2005.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |