|
|
||||||||
Department of Animal Sciences The Ohio State University, Columbus 432101
Corresponding author:
N. R. St-Pierre; e-mail:
st-pierre.8{at}osu.edu.
| ABSTRACT |
|---|
|
|
|---|
)? The answer requires knowing the expected relationship under the assumption of an unbiased model. The objectives of this paper are: 1) to derive the expected relationship between residuals, Y, and
; 2) to determine whether Y or
should be used for the assessment of bias; and 3) to reassess the extent of mean and linear bias in the prediction of N flows to the duodenum by the NRC (2001). In the simplest case, we can assume a true model of the form Y = Xß +
. This model is estimated by Y = Xb + e, and
= Xb. The correlation between the residual vector e and the vector of observations Y can easily be derived. The numerator of the correlation coefficient is shown to be equal to e'e, the residual sum of squares. The denominator of this correlation is equal to the square root of e'e multiplied by the total sum of squares. Algebraic simplifications show that the correlation between e and Y is equal to the square root of (1-R2). That is, under the assumption of an unbiased model, the residuals are correlated with the observed values and the slope of e regressed on Y is equal to (1-R2). Thus, a graph of e versus Y will show a positive slope between e and Y unless the model is a perfect predictor (i.e., R2 is equal to 1.0). Significant slopes linking e to Y have been erroneously interpreted as evidence of biased models in the NRC (2001). Conversely, the slope of e regressed on
is expected to be zero under the assumption of an unbiased model. Therefore, residuals should be regressed against
and not Y. When
, as opposed to Y, was used to assess biases in the prediction of flows to the duodenum of microbial N, nonammonia-nonmicrobial N and nonammonia N in NRC (2001), mean biases became nonsignificant and linear biases over the range of predicted values are of the same magnitude or smaller than the standard errors of measurements reported in literature. Thus, although N flow predictions from NRC (2001) may not be precise, they appear to have insignificant and inconsequential biases.
Key Words: prediction bias model evaluation National Research Council accuracy
Abbreviation key: MN = microbial nitrogen, MSS = model sum of squares, NANMN = nonammonia-nonmicrobial nitrogen, RSS = residual sum of squares, TSS = total sum of squares
| INTRODUCTION |
|---|
|
|
|---|
). It has long been known that residuals are correlated to observed values of the dependent variable in linear models. This is clearly stated in numerous statistical textbooks (e.g., Draper and Smith, 1988), although the proof of this relationship is not provided. The NRC (2001) presents important plots of residuals against Y (Figures 5-6, 5-7, and 5-8) for the prediction of flows of three N pools to the duodenum. In most instances, there are clear negative slopes in these graphics, leading many to conclude that the NRC predictions have linear biases. The NRC (2001) stated that "the degree of the negative slope-bias that is evident in the residual plots are of concern" (NRC, 2001 p. 65). The report concluded that "errors in the structure of the model are probably major contributors to the negative slope biases" (NRC, 2001 p. 65). The objectives of this paper are 1) to show that a negative slope is expected when negative residuals are plotted against observed values if the predictions are truly unbiased, and 2) that the NRC predictions of N flows to the duodenum have small biases relative to measurement errors.
| MATERIALS AND METHODS |
|---|
|
|
|---|
i be the ith predicted value. Residuals ei are defined as ei = Yi -
i, that is the difference between the observed and the predicted value. The NRC (2001) and others defined the residuals
=
i - Yi. Thus
and the rest of this derivation could be expressed in terms of the
. The traditional expression of residuals in use in the field of statistics was chosen here. The expected relationship between ei and Yi under the conditions of an unbiased model must be established prior to making judgment that a certain pattern of residuals is symptomatic of biased predictions. That is, we must establish the expectation for the slope of the regression of ei on Yi, and determine whether this expectation is equal to zero under unbiasedness. Likewise, we need to establish the expected value of reY, the simple correlation coefficient between residuals and observed values, when the model is unbiased. As the following mathematical derivation will show, even in the simplest case, that of a simple linear model, the slope of the regression of ei on Yi and the correlation coefficient reY are not equal to zero. Similar conclusions can be reached for more complex model structures although the algebra gets very complex. The correlation between ei and Yi is derived as follows. In the general case, the Pearson product-moment correlation r between two random variables U and W is given by:
![]() | ([1]) |
where
and
are the means of the two random variables. Thus, the correlation between ei and Yi is given by:
![]() | ([2]) |
where
and
are the means of residuals and of observed values respectively. A linear model is expressed as:
![]() | ([3]) |
where Y is a column vector of observed values, X is the matrix of independent variables, b is the vector of parameter estimates, and e is the column vector of residuals. The predicted values,
, are calculated as:
![]() | ([4]) |
The numerator in [2]
can be simplified as follows:
![]() | ([5]) |
because
= 0 under the assumption of an unbiased model. That is, if the model is unbiased, then the mean residual (error) must be equal to zero. As we proceed with the multiplication of the factors in [5]
, the
ei
part reduces to:
![]() | ([6]) |
where n is the number of observations. Because
=
ei/n and
= 0 under the assumption of an unbiased model, then
ei must also be equal to zero. Thus,
![]() | ([7]) |
![]() | ([8]) |
In matrix notation,
![]() | ([9]) |
where e' is the transpose (row vector) of the column vector e defined in [3]
. Rearranging [3]
,
![]() | ([10]) |
The least squares estimates of b are unbiased and are found as:
![]() | ([11]) |
where X' is the transpose of X, the design matrix defined in [3]
and (X'X)-1 is the inverse of the X'X matrix. We define
![]() | ([12]) |
noting that H is symmetric and idempotent (i.e., H'H = H). Using [11]
and [12]
, equation [10]
becomes:
![]() | ([13]) |
![]() | ([14]) |
![]() | ([15]) |
Factoring Y:
![]() | ([16]) |
where I is the identity matrix, a symmetric and idempotent matrix.
![]() | ([17]) |
![]() | ([18]) |
Because both I and H are idempotent, I - H is also idempotent and, by definition of an idempotent matrix,
![]() | ([19]) |
![]() | ([20]) |
![]() | ([21]) |
and
![]() | ([22]) |
due to the commutative property of vector multiplication. Combining [5]
, [8]
, [9]
, and [22]
:
![]() | ([23]) |
![]() | ([24]) |
![]() | ([25]) |
![]() | ([26]) |
ande'e is known as the residual sum of squares (RSS). The denominator in [2]
can be greatly simplified. The first part of the denominator reduces to:
![]() | ([27]) |
because
= 0 under the assumption that the model is unbiased. Using matrix notation.
![]() | ([28]) |
where e'e is again the RSS. The second half of the denominator in [2]
,
(Yi -
)2 is, by definition, the total sum of squares (TSS). Finally, by definition, the model sum of squares (MSS) = TSS - RSS. Using [26]
, [27]
and the definition of MSS, RSS, and TSS, [2]
can be expressed as:
![]() | ([29]) |
![]() | ([30]) |
![]() | ([31]) |
![]() | ([32]) |
![]() | ([33]) |
Because R2 = MSS/TSS, [33]
becomes:
![]() | ([34]) |
![]() | ([35]) |
Thus, under the assumption that the model is unbiased, a correlation is expected between the residuals and the observed values. This correlation is always positive and approaches 1 as R2 approaches 0. Even R2 considered to be high would lead to significant r eY. For example, reY = 0.32 when R2 = 0.9.
The slope (b1) of the regression of ei on Yi is easily found:
![]() | ([36]) |
Using [26]
and the definitions of MSS, RSS and TSS, equation [36]
becomes:
![]() | ([37]) |
![]() | ([38]) |
![]() | ([39]) |
This implies that for R2 < 1, there is a positive slope between ei and Yi; and the lower the R2, the greater the slope.
In [39]
, the slope is for the regression of ei on Yi. Remember that NRC (2001) defined the residuals as -ei. Thus, under the assumption of unbiased prediction, a negative slope would be expected between the NRC residuals and the observed values. The magnitude of this slope would be equal to -(1 - R2) = R2 - 1. Clearly, biases of models predicting flows of nitrogenous compounds to the duodenum were not assessed properly. In nonlinear models with nonoptimized parameter estimates, the regression of ei on Yi can take a multitude of forms depending on the nature of the bias, if bias is present. The point is that the method of plotting (or regressing) residuals on observed values fails to properly identify biases with the simplest, most basic model (simple linear regression).
The Expected Relationship Between Residuals and Predicted Values
What can be said of the correlation of ei with the predicted values
i? The numerator of the correlation rei
is:
![]() | ([40]) |
![]() | ([41]) |
because
= 0 under the assumption of an unbiased model. Factoring,
![]() | ([42]) |
![]() | ([43]) |
because
ei = 0 under the assumption of an unbiased model. This is because of
= 0 and that
=
ei ÷ n. In matrix notation,
![]() | ([44]) |
![]() | ([45]) |
![]() | ([46]) |
Thus:
![]() | ([47]) |
![]() | ([48]) |
![]() | ([49]) |
![]() | ([50]) |
![]() | ([51]) |
![]() | ([52]) |
Factoring,
![]() | ([53]) |
Because H is idempotent andI is the identity matrix,
![]() |
Which results in:
![]() | ([54]) |
Thus, the numerator of rei
is zero and the correlation between e i and
i is zero. That is, the residuals are not correlated with the predictions, and the slope of ei regressed on
i is zero if the model is unbiased. A positive or negative slope of ei on
i is a test of biased prediction. This is why we should plot the residuals versus the predicted values and not the observed values.
Reevaluation of Biases in NRC Predictions of N Flows
Data in figures 5-6, 5-7 and 5-8 of NRC (2001) were digitized using a GTCO CalComp AccuTabII digitizer (Columbia, MD) set at a resolution of 0.05 mm, to recalculate observed, predicted and residual values for microbial nitrogen (MN), nonammonia-nonmicrobial nitrogen (NANMN), and nonammonia nitrogen (NAN) flows to the duodenum. For all three N flow variables, residuals were regressed against their predicted values according to the following model (expressed here for MN):
![]() | ([55]) |
where
![]() |
In [55]
, the regressor is shifted and centralized to its mean value. This method of shifting the independent variable was extensively used by Harvey (1977; 1982) in the software LSML76 to test the interaction of a discrete (class) variable by a continuous variable. By centering the independent variable, the intercept term of the linear model is estimated at the mean value of the independent variable as opposed to a value of zero. The intercept term at the mean value of the regressor measures the overall prediction bias, also known as mean prediction bias. A t-test on the estimate of the intercept determines the statistical significance of this bias. The slope of the regression is an estimate of the linear prediction bias and a t-test is used to assess its significance. In instances where the slope of the regression of residuals on predicted values is significant, the magnitude of the bias within the range of the predicted values must be quantified. This is done by calculating the bias at the minimum and maximum levels of the predicted values. Equation [55]
is used with the minimum and maximum predicted values as inputs. Then, the bias at minimum and maximum predictions is judged relative to the size of the standard error (
êi), or compared to the 95% confidence intervals of measurements reported in the literature.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
|
|
|
|
|
| CONCLUSIONS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Received for publication June 17, 2002. Accepted for publication August 14, 2002.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. P. McMeniman, P. J. Defoor, and M. L. Galyean Evaluation of the National Research Council (1996) dry matter intake prediction equations and relationships between intake and performance by feedlot cattle J Anim Sci, March 1, 2009; 87(3): 1138 - 1146. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Kebreab, K. A. Johnson, S. L. Archibeque, D. Pape, and T. Wirth Model for estimating enteric methane emissions from United States dairy and feedlot cattle J Anim Sci, October 1, 2008; 86(10): 2738 - 2748. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Boston and P. J. Moate A novel minimal model to describe NEFA kinetics following an intravenous glucose challenge Am J Physiol Regulatory Integrative Comp Physiol, April 1, 2008; 294(4): R1140 - R1147. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. J. van Vliet, J. W. Reijs, J. Bloem, J. Dijkstra, and R. G. M. de Goede Effects of Cow Diet on the Microbial Community and Organic Matter and Nitrogen Content of Feces J Dairy Sci, November 1, 2007; 90(11): 5146 - 5158. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. R. Karnati, J. T. Sylvester, S. M. Noftsger, Z. Yu, N. R. St-Pierre, and J. L. Firkins Assessment of Ruminal Bacterial Populations and Protozoal Generation Time in Cows Fed Different Methionine Sources J Dairy Sci, February 1, 2007; 90(2): 798 - 809. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Seo, C. Lanzas, L. O. Tedeschi, and D. G. Fox Development of a Mechanistic Model to Represent the Dynamics of Liquid Flow Out of the Rumen and to Predict the Rate of Passage of Liquid in Dairy Cattle J Dairy Sci, February 1, 2007; 90(2): 840 - 855. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Vlaeminck, V. Fievez, S. Tamminga, R. J. Dewhurst, A. van Vuuren, D. De Brabander, and D. Demeyer Milk odd- and branched-chain Fatty acids in relation to the rumen fermentation pattern. J Dairy Sci, October 1, 2006; 89(10): 3954 - 3964. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Moate, R. C. Boston, I. J. Lean, and W. Chalupa Short communication: further validation of the fat sub-model in the cornell-penn-miner dairy model. J Dairy Sci, March 1, 2006; 89(3): 1052 - 1056. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Firkins, A. N. Hristov, M. B. Hall, G. A. Varga, and N. R. St-Pierre Integration of Ruminal Metabolism in Dairy Cattle J Dairy Sci, March 1, 2006; 89(e_suppl_1): E31 - E51. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Nennich, J. H. Harrison, L. M. VanWieringen, N. R. St-Pierre, R. L. Kincaid, M. A. Wattiaux, D. L. Davidson, and E. Block Prediction and Evaluation of Urine and Urinary Nitrogen and Mineral Excretion from Dairy Cattle J Dairy Sci, January 1, 2006; 89(1): 353 - 364. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Schinckel Critique of "Evaluation of procedures to predict fat-free lean in swine carcasses" J Anim Sci, December 1, 2005; 83(12): 2719 - 2720. [Full Text] [PDF] |
||||
![]() |
T. D. Nennich, J. H. Harrison, L. M. VanWieringen, D. Meyer, A. J. Heinrichs, W. P. Weiss, N. R. St-Pierre, R. L. Kincaid, D. L. Davidson, and E. Block Prediction of Manure and Nutrient Excretion from Dairy Cattle J Dairy Sci, October 1, 2005; 88(10): 3721 - 3733. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. N. Hristov, W. J. Price, and B. Shafii A Meta-Analysis on the Relationship Between Intake of Nutrients and Body Weight with Milk Volume and Milk Protein Yield in Dairy Cows J Dairy Sci, August 1, 2005; 88(8): 2860 - 2869. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Bateman II, J. H. Clark, and M. R. Murphy Development of a System to Predict Feed Protein Flow to the Small Intestine of Cattle J Dairy Sci, January 1, 2005; 88(1): 282 - 295. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. P. Weiss and D. J. Wyatt Macromineral Digestion by Lactating Dairy Cows: Estimating Phosphorus Excretion via Manure J Dairy Sci, July 1, 2004; 87(7): 2158 - 2166. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. N. Hristov, W. J. Price, and B. Shafii A Meta-Analysis Examining the Relationship Among Dietary Factors, Dry Matter Intake, and Milk and Milk Protein Yield in Dairy Cows J Dairy Sci, July 1, 2004; 87(7): 2184 - 2196. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |