|
|
||||||||
Department of Animal Sciences, The Ohio State University, 2029 Fyffe Rd., Columbus 43210
2 Corresponding author: st-pierre.8{at}osu.edu
| ABSTRACT |
|---|
|
|
|---|
Key Words: genetic algorithm optimal sampling schedule forage composition sensitivity analysis
| INTRODUCTION |
|---|
|
|
|---|
Changes in forage composition on a given farm are essentially of 3 types. The first type is primarily associated with changes in measurements due to laboratory and method variations (analytical variance) as well as sampling and subsampling variances (St-Pierre and Weiss, 2006). These variations are of no consequence to animals because they reflect changes in apparent nutritional composition but not true composition. The second type reflects small changes in composition, essentially white noise, caused by a multitude of factors, each having a small effect. This is the variation that one would expect, for example, in the true composition of forage harvested from a single, uniform field, from a single forage variety or hybrid, on the same harvesting day (St-Pierre and Weiss, 2006). Little can be economically done to reduce these sources of variation. The last type of change is characterized by relatively large and sudden threshold changes in composition, such as could be expected when sourcing forages from different fields, varieties, species, harvesting dates, and so on (St-Pierre and Weiss, 2006). These changes can be managed by separation of inventories or by sampling for laboratory determination of chemical composition. In most instances, large storage structures such as those used for the storage of grass, legume, and grain silages do not allow for meaningful inventory separation, leaving periodic sampling and laboratory determinations as the predominant means of controlling the effect of forage variation in the diets.
The determination of an optimal feed sampling schedule in animal production is an unresolved problem and is receiving more attention because of its potential impact on animal performance and farm profitability. Control charts have long been used to monitor commercial production processes to detect special causes that produce changes in the output (Shewhart, 1931). They are slowly gaining acceptance in animal agriculture as ways to incorporate statistical process control to various animal production systems (DeVries et al., 1997; Lukas et al., 2005). Many charts have been proposed, such as the Shewhart (X-bar), exponentially weighed moving average, and cumulative sum charts. The basic X-bar control chart consists of sampling from a process over time and charting mean measurements at each sampling time (Drain, 1997). As long as the mean of measurements at each sampling time falls inside predetermined control limits (L), the process is considered in-control. Shewhart X-bar charts with "3-sigma" control limits [i.e., L = 3 standard deviations (SD) of the mean] are often used (Reynolds and Stoumbos, 2005). In most agricultural systems, it is customary to set the control limits at 2 SD of the mean of a set of measurements taken at the same time, traditionally called a subgroup. This is in contrast to industrial production, in which control limits are generally set at 3 SD (DeVries et al., 1997; Drain, 1997). This practice of using 2 SD, however, does not consider the respective size of type I (i.e., concluding that the process is out-of-control when it is in-control; the false alarms) and type II errors (i.e., concluding that the process is in-control when in fact it has shifted to an out-of-control state), as well as all costs incurred. The objectives of this research were to derive a general methodology to determine optimum sample size (n), sampling interval (h), and location of L for an X-bar chart used to monitor forage or feed composition and to perform a sensitivity analysis to parameter values to identify those parameters requiring precise estimates.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
Time Until Occurrence of an Abrupt Change in Forage Composition.
Given that we assumed a memoryless process subject to abrupt changes, the incontrol period is distributed as a negative exponential random variable with mean 1/
(Ryan, 2000; Yashin, 1993). If production continues during searches for an assignable cause, the average time for occurrence of the assignable cause is simply 1/
. Production always continues during searches in the context of monitoring forage quality for milk production. But production could cease in other future applications of quality costs minimization of a renewal reward process. In such instances, the function becomes more complicated and is derived here for future applications. Let T0 be the expected search time (d) for a false alarm. Then, the expected time spent searching during false alarms is T0 times the expected number of false alarms:
![]() | [1] |
where Ta = time spent searching during false alarms (d), T0 = expected search time for a false alarm (d), s = expected number of samples taken while in-control, and ARL0 = average run length while in-control (d).
A few steps of algebraic derivations based on a renewal reward Poisson process (Parzen, 1999), and hence a negative exponential distribution of ARL0, show that:
![]() | [2] |
where e = the exponential function,
= reciprocal of the mean length of the in-control period (1/d), and h = sampling interval; that is, the time interval between 2 successive sampling occurrences (d).
In [2],
is a characteristic of the harvesting and storage system, independent of any control chart.
Also:
![]() | [3] |
where
= P(out-of-control signal | process is in-control); that is, the probability of a type I error with P symbolizing "the probability of," and | symbolizing "given." Lorenzen and Vance (1986) suggested a way of combining both conditions. Let
1 = 1 if production continues during searches, and
1 = 0 if production ceases during searches. Then,
![]() | [4] |
where Tb = expected time until the change in composition occurs (d), and
1 = dummy binary variable depending on production during searches. Tb is equal to 1/
in the case of forage composition monitoring because production continues even when diets are being reformulated.
Time Until the Next Sample is Taken.
Let
be the expected time (d) of occurrence of the abrupt change in forage composition, given that it occurs between the ith and (i + 1)th samples; that is, the time between the last sample while in-control and the abrupt change in forage composition. By derivation, we get:
![]() | [5] |
with all symbols as previously defined. The expected time (d) between the occurrences of the abrupt change in forage composition and the next sampling time (Tc) equals
![]() | [6] |
Time to Analyze the Sample and Chart the Result.
Let E be the expected time (d) to sample, analyze, and chart the results of one sampling time. Note that E can be considered equal to 0 if the measurements are made on the farm as in the determination of silage DM concentration (the measurements are quite fast, and usually performed on site) but can be significant if samples are sent to an outside laboratory for extensive chemical analyses. For a sample of n items, the time (d) to sample, analyze the sample and chart the result (Td) is given by:
![]() | [7] |
if Td is proportional to the number of samples taken, and
![]() | [8] |
if Td is a constant, independent of the number of samples. In general, [8] is the norm in forage analyses and will be retained for the remainder of this paper, remembering that [7] could be substituted in the derivation when applicable.
Time Until the Chart Gives an Out-of-Control Signal.
The expected time (d) until an out-of-control signal occurs (Te) is given by
![]() | [9] |
where ARL1 = the average run length when the process has shifted to an out-of-control state (d).
If the sample statistics are independent (i.e., samples taken at a given sampling time are true replicates and not subsamples), then
![]() | [10] |
where β = P(in-control signal | process is out-of-control); that is, the probability of a type II error.
Note that ARL1 depends on the underlying distribution, the control limit L, the sample size n, and the extent of the change in composition (
) when there is an abrupt change in forage composition. It is important to understand that when forage quality has changed, it is not necessarily detected (signaled) the next time samples are taken, leading to type II errors.
Time to Find the Cause and Repair the Process.
Let T1 be the expected time (d) to investigate the cause of the change, and T2 be the expected time (d) to reformulate and implement new diets (equivalent of repairing the machinery or process in industrial production). Then,
![]() | [11] |
where Tf = the expected time to investigate the cause of a composition change and reformulate and implement new diets (d).
Total Cycle Length.
The total cycle length TCL is simply calculated as the sum of its elements:
![]() | [12] |
When fully expanded, [12] yields
![]() | [13] |
Costs per Cycle
The costs per cycle are incurred from loss of production while in-control due to white noise in forage composition, as well as loss of production when forage quality declines and increased feed costs when forage quality increases while out-of-control (i.e., when forage composition has changed abruptly), for false alarms, for location and repair of the assignable cause (i.e., identifying remedies, and reformulating and implementing new diets), and for sampling and analyzing forages.
Cost per Cycle Due To Reduced Production.
Let C0 and C1 be the costs per day due to lost production from white noise in forage composition while the process is in-control, and to lost production or increased feed costs while the process is out-of-control, respectively. Then, the expected cost per cycle due to reduced production (Ca) equals
![]() | [14] |
where Ca = cost per cycle due to reduced production ($), C0 = cost per day from reduced production while incontrol ($/d), C1 = cost per day from reduced production while out-of-control ($/d), and
2 = dummy binary variable that depends on state of production during repair (
2 = 1 if production continues during repair, and
2 = 0 if production ceases).
In dairy production,
![]() | [15] |
where
M0 = milk production loss per cow and per day due to white noise in forage composition when the process is in-control (kg/d per cow), Pm = unit price of milk ($/kg), and Nc = number of lactating cows in the herd.
Also,
![]() | [16] |
where
M1 = milk production loss per cow and per day due to an abrupt change in forage composition when the process is out-of-control (kg/d per cow).
Cost per Cycle Due To False Alarms, Identifying the Cause, and Repairing.
Let Y be the cost per false alarm. This includes the cost of searching and testing for the cause of the potential forage change when applicable, plus the cost of downtime if production ceases during the search. Also, let W be the cost to investigate the cause of the change and to reformulate and implement new diets when an abrupt change in forage composition occurs. Then, the expected cost for false alarms and for identifying the cause of the change and repairing the system when out-of-control (new diet formulation and implementation) is:
![]() | [17] |
Cost per Cycle for Sampling and Analyses.
Let a be the fixed cost per sampling time (depending on the conditions, this could be the cost of mailing samples or the labor to set up the sampling and sample preparation) and b the cost per unit sampled (e.g., cost of laboratory analyses). The sampling and analyses costs at each sampling time (Cc) are then
![]() | [18] |
The expected cost for sampling and analyses of samples is given by (a + bn) x (days of production in a cycle) / h. Thus, using [13], the expected cost per cycle for sampling (Cd) equals
![]() | [19] |
Total Quality Cost per Cycle.
Combining [14], [17], and [19], we get the total quality cost per cycle (CCL) as
![]() | [20] |
When expanded, [20] yields:
![]() | [21] |
Total Quality Cost per Day.
Because cycle lengths are variable and are functions of h, n, and L, we must express the cost function per unit of time (days), not per cycle. This is accomplished by dividing equation [21] by equation [13] to yield the total quality cost per day, C:
![]() | [22] |
All we are left with is the derivation of ARL0 and ARL1 in equation [22]. For an X-bar chart, it is assumed that the observations are independent and identically distributed from a normal distribution with mean equal to the center line (CL) of the chart and standard deviation
. The independence assumption requires that the samples taken at a given sampling time are true replicates and not subsamples. Thus, from the mean (X) of a set of measurements taken at the same time we have:
![]() | [23] |
where
= is the cumulative distribution function for the standard normal distribution.
When the process is out-of-control, the observations are independent and identically distributed from a normal distribution with mean CL + 
, so that
![]() | [24] |
which equates to:
![]() | [25] |
Substitution of 1/
for ARL0 and 1/(1 – β) for ARL1 in [22] in combination with [2], [5], [15], and [16] gives the quality cost per day for any process to be monitored by an X-bar chart.
Minimizing the Total Quality Costs per Day Function
The total quality cost per day function is given by [2], [5], [15], [16], [22], [23], and [24] and is a function of 3 quality control variables: the sample size (n), the sampling interval (h), and the control limit (L). Function [22] with 1/
substituting for ARL0 and 1/(1 – β) for ARL1 as per [23] and [24] is highly nonlinear in each of the 3 parameters. Therefore, nonlinear optimization techniques must be used to identify the optimum n, h, and L leading to the minimum cost. In our application the number of samples n must clearly be considered an integer variable. Additionally, the sampling interval h should likely be considered an integer variable in field applications (i.e., samples can only be shipped once a day to an outside laboratory). The resulting mixed-integer nonlinear optimization problem was solved using the genetic algorithm, a nonparametric optimization algorithm (Goldberg, 1989). The genetic algorithm (Haupt and Haupt, 1998) was programmed using the Borland Delphi 2005 development environment with the following structure: initial population size = 4,000, working population size = 1,000, mutation rate = 12%, elite member rate = 0.5%, and number of iterations = 60.
Sensitivity Analysis
Equation [22] contains 16 input variables and 3 optimization variables (n, h, and L). Although we could not find specific recommended sampling designs for forages in the scientific literature, it is common on many US commercial dairy farms to take one sample of each forage once a month and to use control limits of 2 SD or 0 SD (i.e., always adjust the diet based on the most recent forage analysis). Thus, an industry standard was used for comparison with n
= 1, h
= 30 d, and L
= 2 SD (the
symbol signifies that these are the values of n, h, and L for the industry standard). Of the 16 input variables to the total quality cost function in [22] (an equation that is applicable to any renewal reward process), 13 of them could vary in our specific application, as two of the input variables (
1 and
2) have a fixed value of 1, and a third (T1) has a fixed value of 0. A set of default central values for all the resulting 13 input variables were selected as a starting basis for the sensitivity analyses (Table 1
). This set of default values will be referred to as the set of central values. These were essentially based on expert opinion. Sensitivity analyses should indicate which input variables need to be estimated precisely in model application. Three types of sensitivity analyses were performed: a univariate sensitivity analysis, a global linear sensitivity analysis, and a global nonlinear sensitivity analysis (Saltelli et al., 2004). For all 3 types, sensitivity analyses were done for total quality cost per day (C), and for each of the 3 variables forming the sampling design (n, h, and L).
|
Global Linear Sensitivity Analysis.
A global linear sensitivity analysis based on standardized regression coefficients (SRC) was performed according to Saltelli et al. (2004) using SimLab V1.1 (Giglioli and Saltelli, 2006). The SRC were fitted by least squares using the GLM procedure of SAS (SAS Institute, 2004).
If the true model is linear, the SRC coefficients determine the fraction of the total variation caused by each of the parameters. In such an instance, the total variance is completely decomposed through SRC coefficients and the sum of the SRC squared equals 1. If the true model is not linear, the SRC coefficients decompose the variance for the linear effect of each parameter, but ignore the contribution from the nonlinear and interaction effects. The variance decomposition attributed to the nonlinear part of the model remains unknown. The validity of the SRC as a measure of sensitivity is conditional on the degree to which the regression model fits the data; that is, to the coefficient of model determination Ry2 (Saltelli et al., 2004). Large values of Ry2 indicate near linearity of the model and the calculated SRC can be used to assess the contribution of the individual input variables to the uncertainty in model prediction. Conversely, small values of Ry2 indicate strong nonlinear sensitivity behavior, and consequently higher order methods must be used for sensitivity analyses. Whereas the univariate sensitivity analysis assesses sensitivity in only one dimension, SRC offers a measure of sensitivity that is multidimensionally averaged, meaning that the SRC coefficients provide a measure of variance contribution for an input variable that is averaged over all possible values of the remaining input variables.
Global Nonlinear Sensitivity Analysis.
The extended Fourier amplitude sensitivity analysis (e-FAST) is a variance-based technique that estimates the total fractional contribution of each input factor Xi to the total variance of the model output Y (Saltelli et al., 1999; Giglioli and Saltelli, 2006). Furthermore, e-FAST allows the simultaneous evaluation of the first (Si) and total effect (STi) indices. The estimation of the pair (Si, STi) is important to appreciate the difference between the impact of factor Xi alone on Y (that is, Si) and the overall impact of factor Xi through interactions with the others on Y (provided by STi). All e-FAST calculations were made using SimLab V1.1 (Giglioli and Saltelli, 2006).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
= 1, h
= 30, and L
= 2. The cost implication of these differences are discussed in the next section in conjunction with the effect of herd size.
Univariate Sensitivity Analysis
Figure 2
shows the univariate sensitivity of the sampling design parameters and C to herd size (Nc). The optimal number of samples to be taken at each sampling time is equal to 2 except for the smallest herd size investigated (Nc = 50). The optimal sampling interval follows an exponential decline as herd size increases, reaching an asymptote of 4 d when Nc exceeds 670 cows. For all but the smallest herd size, the optimal control limit L exceeds 1 and fluctuates between 1.1 and 1.4. In Figure 2
, total quality cost C is shown both for the optimal sampling design as well as for the industry standard design. Both costs are nearly identical at a herd size of 50 cows, but the optimal design reduces total quality cost by nearly $250/d at a herd size of 1,000 cows. At an average daily production of 30 kg/d, this difference equates to net savings of $0.83/100 kg of milk. This figure corresponds to 19 to 37% of the average long-term net farm income per cow per day in the United States (St-Pierre et al., 2000). It is of interest to note that the industry standard design (n
= 1, h
= 30, and L
= 2) is relatively close to the optimum design for a herd of 50 cows, except for the control limits (n* = 1, h* = 27, and L* = 0.6). Thus, the practice of taking a forage sample once a month probably served the industry well when herd sizes were smaller, but its use negatively impacts large herds which are becoming the norm in many parts of the United States.
|
m1, and 1/
than to all other input variables to the model. For these 3 input variables, a 1% change results in a 0.5 to 1.39% change in P. For all other input variables, C changes by less than 0.33% for each 1% change in the variable. The control limit L is most sensitive to 1/
,
, b,
m1, and Y. In all instances, a 1% change in these input variables results in a greater than 1% change in L. The number of samples per sampling time n is most sensitive to Y, b, 1/
,
, Nc,
m1, and W. A 1% change in these input variables results in a 0.5 to 1.0% change in n. Finally, h is most sensitive to
m1, Nc, b,
, and W, with a 1% change in their input values resulting in a 0.53 to 1.26% change in h. These results can only be accurate at the central values of the input variables. This is a severe limitation to the traditional univariate sensitivity analysis. In addition, they do not account for nonlinear effects as well as interaction effects between input variables. In the presence of such effects, the SL can be grossly inaccurate in assessing sensitivity to inputs (Saltelli et al., 2004).
|
,
, Nc, and
m1. The Ry2 of 0.75 indicates that the contribution of most variables to the total variance is probably reasonably well approximated solely by its linear effect. In contrast, the Ry2 for the SRC decomposition of the sensitivity of L, h, and n to input variables were all relatively small (Ry2 < 0.5, results not shown) indicating the necessity of factoring nonlinear and interaction effects in the analysis.
|
m1, and 1/
(results not shown). Recall that the difference between STi and Si represents the overall impact of input variable i through interactions with other input variables. A useful way of expressing the importance of the interaction effects is to measure the proportion of STi that is accounted by Si (Table 4
m1, 1/
, Pm, and E. These input variables account for more than 98% of the total sensitivity of C to inputs. For L, 57.4% of the total sensitivity is the result of Nc,
m1,
, 1/
, and W. For n, 76.3% of the total sensitivity is the result of b,
, Y, 1/
, and
m1. Finally for h, 66.9% of the total sensitivity is the result of Nc,
m1, 1/
, W, and
m0. These results indicate that the variables with a large influence on total quality costs are not the same as those with a large influence on the parameters of the sampling design. The 3 design parameters and C are all sensitive to 1/
and
m1, and none of them are particularly sensitive to T0, T2, and a. Total quality costs C is very sensitive, whereas L and h are some-what sensitive to Nc. Both L and n are sensitive to
, but not C and h. Finally, E,
m0, PM, Y, W, and b result in important total sensitivity for only 1 of the 4 variables. Therefore, precise and accurate estimates of Nc,
m1, 1/
, Pm, and E are required for accurate estimation of total quality costs, whereas precise and accurate estimates of 1/
,
, Nc,
m1, and b are required for a reasonably accurate identification of the optimum sampling design. Under field conditions, reasonably precise estimates of Nc, Pm, and E should be relatively easy to obtain. Precise estimate of
would require prior monitoring records of forage quality on individual farms. This would prove to be more difficult in many instances. A precise estimate of
m1 is problematic. One could possibly use a nutritional model such as NRC (2001) to estimate expected production changes from shifts in forage composition for each of the nutrients being monitored by their individual control chart. This would naively transform a nutritional requirement system into a nutrient response system, which has been shown to be inaccurate (St-Pierre and Thraen, 1999). In addition, the best strategy to monitor multiple nutrients is unclear. Each nutrient could have its own control chart and its own sampling design but this may require a global sampling schedule, which is not acceptable in practice. Clearly, additional work is needed in this area.
|
|
| CONCLUSIONS |
|---|
|
|
|---|
Sensitivity of total quality costs is not affected much by the interaction of input variables. Total sensitivity of C is primarily for Nc,
m1, 1/
, Pm, and E, whereas total sensitivities of the design parameters n, h, and L are primarily for 1/
,
, Nc,
m1, and b. Sensitivities of design parameters have large interaction and high-order components, indicating that the traditional sensitivity analysis of varying one parameter at a time yields incorrect sensitivity assessments.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Received for publication November 5, 2006. Accepted for publication August 31, 2007.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. O. Tedeschi, P. J. Kononoff, K. Karges, and M. L. Gibson Effects of chemical composition variation on the dynamics of ruminal fermentation and biological value of corn milling (co)products J Dairy Sci, January 1, 2009; 92(1): 401 - 413. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. R. St-Pierre and B. Cobanov Optimal Sampling Schedule of Diet Components: Model Robustness to Departure from Assumptions J Dairy Sci, December 1, 2007; 90(12): 5395 - 5404. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |