|
|
||||||||
Department of Genetics, Alexander Silberman Institute of Life Science, Hebrew University of Jerusalem, Jerusalem 91904, Israel
1 Corresponding author: lipkin{at}vms.huji.ac.il
| ABSTRACT |
|---|
|
|
|---|
Key Words: quantitative trait loci mapping selective DNA proportion of false positive
| INTRODUCTION |
|---|
|
|
|---|
|
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Population and Samples
The study included 10 half-sib daughter families of Israeli Holstein AI bulls, each the sire of 1,800 or more milk-recorded daughters and 134 markers. This would have required a total of almost 2 million genotyping data points if done by individual genotyping. Thus, the study was made feasible only through selective DNA pooling, which enabled it to be carried out with a total of about 25,000 genotyping data points. Herd book data and milk samples of these daughters were collected at different times, generating a series of data sets (DS), which are summarized in Table 3
. In particular, daughters of 4 of the sires, comprising DS1 were previously sampled and mapped for PP in 1996. In 1998, for purposes of the present study, daughters of these 4 sires were independently resampled for PY and MY (DS2), and 6 additional sire families were sampled for all 3 traits (DS3). Based on EBV, lists of the highest and lowest 220 (DS1) or 230 (DS2, DS3) daughters were prepared for each sire-trait-tail combination. Milk sampling was as previously described (Lipkin et al., 1998; Mosig et al., 2001). For each sire-trait-tail combination, 2 subpools (DS1) or a single pool (DS2, DS3) were prepared in completely independent duplicate. Genotyping of individual and pooled milk samples was as described (Lipkin et al., 1998). Densitometric values for the marker and sire-marker tests of DS1 were obtained from the original data of Lipkin et al. (1998) and Mosig et al. (2001) as described in those studies. These were reanalyzed using the empirical standard error obtained in the present study. All other densitometric values were obtained on the basis of data collected in the present study.
|
Microsatellite Markers
A total of 134 dinucleotide microsatellites distributed over 25 bovine autosomes were used to scan chromosomal intervals indicated previously to harbor QTL affecting PP (Lipkin et al., 1998; Mosig et al., 2001). All markers were obtained from the public Web sites (USDA, http://www.marc.usda.gov/genome/genome.html; IBRP, http://www.cgd.csiro.au/cgd1.html).
Marker-QTL Linkage Tests
The test for linkage between a marker and each of the 3 study traits was carried out at 2 levels: the sire-marker-trait level, which tests for association of marker and trait at the level of the individual sire, and the marker-trait level, which tests for association of marker and trait across all sires.
Sire-Marker-Trait Combinations.
For each sire-marker-trait combination for which the sire was heterozygous at the marker, the comparison-wise error rate (CWER) Pijk-value, for the ith sire-jth marker-kth trait combination was obtained as twice the area of the normal curve from Zijk to +
, where
![]() |
Dijk = (DLijk – DSijk)/2 is the difference in sire-allele frequencies between the high and low daughter pools of the ith sire with respect to the jth marker and the kth trait, averaged over the long and short sire alleles (Lipkin et al., 1998; Mosig et al., 2001). DLijk is the difference in allele frequency of the long sire allele between the high and low daughter pools. DSijk is the difference in allele frequency of the short sire allele between the high and low daughter pools. DLijk and DSijk will necessarily have opposite algebraic sign; hence, the minus sign in the expression for Dijk.
Empirical SE(D').
Lipkin et al. (1998) and Mosig et al. (2001) calculated SE(D) on a priori grounds, on the assumption that it was determined by binomial sampling of sire and dam alleles and technical error of densitometric estimation of allele frequency in a pool. For the present study a novel empirical standard error was used, denoted SE(D'), based on a D-value (Dtail), obtained between 2 independent subpools at the same tail. Details on calculation of SE(D') are presented in Bagnato et al. (2008). To obtain an estimate of SE(D') for use in the present study, we reanalyzed 3 datasets (DS4, DS5, DS6; Table 3
) for which 2 subpools were prepared for each tail. Data set 4 consisted of high and low daughters according to EBV for PP, of each of 7 Israel-Holstein sires. For each sire, the daughters comprising each tail were divided into 2 subpools. The 2 subpools in each tail were not completely independent, because the most extreme daughters within tails were assigned to the external subpools and the remainder to internal subpools (Lipkin et al., 1998; Mosig et al., 2001). Nevertheless, average Dtail was not significantly different from zero (data not shown), allowing the use of this DS to estimate SD(Dtail). Data set 5 consisted of high and low daughters according to EBV for PP of a new set of 7 sires. For each sire, the daughters in each tail were divided at random among 2 independent subpools. Data set 6 consisted of high and low daughters according to EBV for female fertility and a second set of high and low daughters according to EBV for milk SCC for 6 sires, 5 of which were included in DS5 and 1 which was collected independently. Here too, for each sire, the daughters in each tail were divided at random among 2 independent subpools.
Marker-Trait Combinations.
The CWER Pjk-value, for the jth marker-kth trait combination, was obtained as the area of the
2 distribution from
jk2 to +
, where
![]() |
The parentheses in the subscript of Zi(jk)2 and Zi(jk)2 indicate that the summation is over sires heterozygous at the jth marker, within the jth marker-kth trait combination, and si(jk) is the number of heterozygous sires for the jkth marker-trait combination (Weller et al., 1990; Lipkin et al., 1998; Mosig et al., 2001).
Proportion of True and Falsified Null Hypotheses Out of All Tests.
The total number of tests (N) represents a mixture comprised of n1 tests for which the null hypothesis is false (i.e., marker-QTL linkage is present) and n2 tests for which the null hypothesis is true (i.e., marker-QTL linkage is not present). Thus, the observed distribution of CWER P-values is a mixture of P-values representing these 2 classes of tests. On this basis, given the observed distribution of CWER P-values for the individual tests, the number of true null hypotheses (denoted n2) out of all N tests can be estimated by a number of procedures (Nettleton et al., 2006). For the present study, the histogram-based procedure first described by Mosig et al. (2001) and simplified by Nettleton et al. (2006) was used with the following algorithm:
Count the P-values in each of the H histogram intervals of width 1/H
Let yi be the count in the ith interval for i = 1,..., H,
Let xj be the mean of the yi from i = j to H,
then,
n2 = Hxj', where j'= the lowest value of j for which yj is lower than xj.
In the present study, H was set to 10 so that bin width was 0.10.
The number of false null hypotheses (denoted n1) is then obtained by subtraction, n1 = N – n2. Both n2 and n1 are useful statistics for a number of purposes, as will be shown in the following sections.
Proportion of False Positives: Setting Significance Levels for Experiment-Wise Linkage Tests.
To control for the multitest situation while retaining power, the proportion of false positive (PFP) criterion (Fernando et al., 2004), also termed aFDR (Mosig et al., 2001), was used to set CWER threshold P-values for declaration of significance at the marker-trait (PFPM) and sire-marker-trait (PFPS) levels. Following Mosig et al. (2001) and Fernando et al. (2004), PFPMj(k) for the jth marker test within the kth trait was calculated as:
![]() |
where PMj(k) = the CWER P-value of the jth marker test within the kth trait, when the marker tests are ranked by their P-values from lowest to highest within a given trait; RMj(k) = the rank number of the jth marker test within the kth trait, and n2Mk = the number of true null hypotheses for marker tests within the kth trait, estimated as shown above.
Values for the ith sire-jth marker combination within the kth trait [PFPSij(k)] were obtained in an exactly analogous manner, except that at the sire-marker level, the number of tests for which null hypothesis is true (n2Sk) represents tests for which either the marker is not in linkage to a QTL or the marker is in linkage to a QTL, but the QTL is homozygous in the sire.
Significance Thresholds.
Significance thresholds were calculated separately by traits at the marker level and at the sire-marker level. At the marker level, marker-trait tests having CWER P-values corresponding to PFPM
0.10 were taken as significant. At the sire-marker level, PFP were calculated across all markers, but only sire-marker-trait tests having CWER P-values corresponding to PFP
0.20 within significant markers were taken as significant.
Estimating Power of the Test
Power of the statistical tests used to determine marker-QTL linkage for the kth trait at the given significance threshold for the marker-trait and sire- marker-trait levels (denoted VMk and VSk, respectively) were estimated as observed number of significant tests at each level (nOMk and nOSk) after correction for proportion of false positives, divided by the estimated number of false null hypotheses in the DS at that level (n1M, n1S) obtained as described above:
![]() |
![]() |
where the factor (1 – PFP) represents the correction for PFP.
Estimating the Proportion of Heterozygosity at the QTL
At the marker-trait level, a false null hypothesis represents a marker in linkage to a QTL. The proportion of false null hypotheses out of all marker-trait tests for a given trait (n1Mk/NMk) thus represents the proportion of markers in linkage to QTL for the kth trait (denoted QMk). At the sire-marker-trait level, the proportion of false sire-marker-trait null hypotheses out of all sire-marker-trait tests for a given trait (n1Sk/NSk, denoted QSk) represents the joint proportion of markers in link-age to QTL (QMk) and of QTL that are heterozygous in the sires (denoted Hk). It follows that QSk = QMkHk, and Hk can readily be estimated as Hk = QSk/QMk. This expression is algebraically equivalent to the corresponding expression given in Mosig et al. (2001). As pointed out by Mosig et al. (2001), QTL that are not represented in the sample by at least 1 sire heterozygous at both the marker and the QTL will not be included among the significant markers, so that QMk is an underestimate. This, in turn, leads to the Hk being overestimates. Mosig et al. (2001) also presented a procedure for correcting for the bias. However, when they applied this procedure to their data, the bias was close to 10%, and should be even lower for the present DS, because it includes a larger number of sires. Given the negligible bias, the correction procedure was not implemented in the present study.
Allele Substitution Effects
Allele substitution effects were calculated as described (Lipkin et al., 1998), based on shadow-corrected allele frequency estimates and the mean EBV of the pools (Table 4
). The effects were calculated for all sire-marker-trait combinations defined as significant on the above PFPM and PFPS criteria.
|
Let OMij be the observed number of markers that affect both traits. Because the observed marker effects include false positives, the number of observed marker effects on the 2 traits that represent true effects on both traits is equal to OMij(1 – PFPM)2. On the other hand, because of incomplete power of the experiment, the actual number of markers affecting the 2 traits will be n1Mij = OMij(1 – PFPM)2/ViVj, where Vi and Vj = the power for detecting marker effects on traits i and j, respectively. The true proportion of markers affecting trait i that also affect trait j (QMij) will then be given by the proportion of n1Mij among the total number of estimated false null hypotheses involving i, (n1Mi): QMij = n1Mij/n1Mi. Similarly, the true proportion of markers affecting j that also affect i (QMji) will be given by QMji = n1Mji/n1Mj, where n1Mji and n1Mj are obtained as described above. Because the estimated numbers of false null hypotheses n1Mi and n1Mj can be different for different traits, it follows that QMij and QMji need not be equal. That is, the proportion of QTL affecting trait i that also affect trait j need not be the same as the proportion of QTL affecting j that also affect i. For example, it might be that a small fraction of QTL affecting MY also affect PY, but almost all QTL affecting PY also affect MY. On the same argument, the true proportion of markers affecting 3 traits (i, j, k) will be n1Mijk = OMijk(1 – PFPM)3/ViVjVk.
As calculated above, n1Mi represents the total number of markers that affect trait i, including those that also affect j and k, and n1Mij represents the total number of markers that affect traits i and j, including those that affect i, j, and k. From this, it follows that the number of markers
n1Mi,
n1Mij, and
n1Mik that exclusively affect traits i, j, or k will equal
![]() |
and similarly for the other traits.
The proportion of markers affecting trait i that exclusively affect trait i, and trait combinations ij, ik, and ijk, are obtained by dividing
n1Mi,
n1Mij,
n1Mik, and n1Mijk, respectively, by n1Mi.
In principle, it would have been preferable to perform the mapping for each trait in an independent set of sires or daughters. This was achieved in part by the separation of daughters for the same sires in DS1 and DS2. But the logistical and analytical problems posed by extending such an approach to the entire DS seem insuperable. Had such an experiment been performed, we can expect that the proportion of QTL showing pleio-tropic effects would have been somewhat less than found in the present study.
| RESULTS |
|---|
|
|
|---|
|
Empirical Standard Error
For DS4, SE(Dtail) was calculated across 1,375 sire-marker-trait-tail combinations and was found equal to 0.072 so that SE(Dnull) = 0.072/
2 = 0.051. For DS5 and DS6, SE(Dtail) was calculated across 1,338 sire-marker-trait-tail combinations and was found equal to 0.073 so that again SE(Dnull) = 0.051. Independent subpools were not prepared for the samples of DS2 and DS3. Given that the same methodology was used, we took the average SE(Dnull) = 0.051 obtained in DS4, DS5, and DS6 as the SE(D) to analyze the DS2 and DS3 data as well. This value is somewhat less than the value of 0.056 reported by Bagnato et al. (2008).
The Proportion of False Null Hypotheses Among All Null Hypotheses
A total of 2,563 sire-marker-trait tests and 402 marker-trait tests were implemented in the present study. Table 6
shows the frequency distribution in bins of width 0.10 of CWER P-values by traits, for sire marker, and for marker tests. At both levels, there was an excess of low (significant) P-values, compared with the null hypothesis expectation of 10% of total tests in each bin. This indicates a high proportion of false null hypotheses in the DS, which are generating the excess low P-values. The excess of low P-values was about twice as great at the marker level than at the sire-marker level. This is as expected, because even when a sire is tested at a marker in linkage to a QTL, a significant sire-marker effect can be obtained only if the QTL is present in a heterozygous state as well. At both marker and sire-marker levels, the excess of low P-values was similar for PP and MY but considerably less for PY. This is due to the generally smaller D-values found for PY. The mean absolute magnitude of D-values was 0.056, 0.048, and 0.057 for MY, PY, and PP, respectively.
|
Critical CWER P-Values and Power According to PFP
Table 6
also shows critical CWER P-values for declaration of linkage, according to PFP criteria for significance set at a level of 0.10 for marker-trait tests and 0.20 for sire-marker-trait tests within significant markers. Because PFP values are in direct proportion to n2, PP and MY had similar critical P-values, which were distinctly higher than the corresponding values for PY. Critical P-values for PP were comparable to the values reported for this trait by Mosig et al. (2001).
The Number of Significant Marker-Trait Tests
A total of 134 markers covering 25 chromosomes were tested against each of the 3 traits. Thus, there were 402 tests in all, of which 165 (41.0%) were significant at PFPM
0.1 (appendix Table I
). Of the markers tested, 30 were not significant for any of the traits. Of the 104 significant markers, 20 were exclusively significant for PP, 2 for PY, and 27 for MY; 6 were exclusively significant for PP and PY, 36 for PP and MY, 7 for PY and MY, and 6 markers were exclusively significant for all 3 traits. Considering the traits individually, 68 markers (50.7%) on 23 chromosomes had a significant effect on PP (i.e., this includes markers significant exclusively for PP and those significant for PP and PY, PP and MY, and PP, PY, and MY), 21 markers (15.7%) on 15 chromosomes had a significant effect on PY, and 76 markers (56.7%) on 23 chromosomes had a significant effect on MY.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
The Number of True and False Null Hypotheses Among All Null Hypotheses
The present study made extensive use of estimates of the number of true null hypotheses (n2) among all null hypotheses. These were obtained by a histogram-based approach first proposed by Mosig et al. (2001) using an iterative procedure and later simplified to a short algorithm by Nettleton et al. (2006). Evaluation of this procedure in simulations involving QTL mapping (Fernando et al., 2004) and DNA expression analysis (Nettleton et al., 2006) showed that the procedure was effective in estimating n2 and much simpler to implement relative to a variety of alternative procedures (Nettleton et al., 2006). The estimate of n2 is primarily of interest for application of the PFP criterion for setting statistical significance levels (Mosig et al., 2001; Fernando et al., 2004). However, once an estimate of n2 is available, the corresponding n1, an estimate of the number of false null-hypotheses is obtained by simple subtraction and can be used to estimate power of the experiment with respect to uncovering marker-QTL linkage (Mosig et al., 2001). As shown in Mosig et al. (2001), and somewhat differently in the present study, estimates of statistical power can be used within the framework of a multiple-sire daughter design to estimate proportion of heterozygosity at the QTL. In the present study, power estimates were further utilized to estimate the proportion of QTL affecting 2 or more traits.
QTL Effects on PP, MY, and PY
As far as PP is concerned, the high estimated proportion of false null hypotheses at the marker level (61.4%, Table 6
) derives from the fact that the markers tested were for the most part chosen from QTLR previously shown to affect PP. The estimated 38.6% of true null hypotheses can be attributed to false positive results in the previous studies, chance sampling such that the new sires were homozygous at some of the QTLR identified in previous studies, and new markers added to explore the flanking regions of the previously identified QTLR, or QTLR reported from other populations. Given that the scan was concentrated on QTLR affecting PP, and that the calculations in this study indicate that three-fourths of markers affecting PP also affect MY, the proportion of false null hypotheses for MY would have been expected to be somewhat less than what was found for PP. Nevertheless, MY showed an even higher proportion of false null hypotheses (70.1%) than PP (Table 6
). A simple explanation for this observation was not found. The proportion of false null hypotheses for PY (34.5%) was considerably less than for PP or MY. This is as expected if most of the markers affecting PP and MY are more or less balanced in their effects on the 2 traits. In this case, the net effect on PY is reduced by the negative correlated effects of MY and PP so that detectable signal strength of PY extends to fewer markers than the signal for MY or PP. The generally lower D-values observed for PY compared with MY and PP noted in the results section are attributed to this factor.
The estimated numbers of true marker-QTL linkages (n1M, Table 6
) were 82, 46, and 94 for PP, PY, and MY, respectively, with corresponding power after correction for 10% of false positives of 0.75, 0.41, and 0.73. The power for PP is somewhat higher than the value 0.71 (uncorrected for false positives) reported by Mosig et al. (2001). The increase in power in the present study may reflect the use of 10 sires, whereas only 7 sires were surveyed in Mosig et al. (2001), and the fact that the present study was focused on QTLR found in that study.
Averaging over PP taken as trait i and MY taken as trait j, over two-thirds (0.71) of QTL affecting PP also affect MY and vice versa. The proportion of markers affecting PP that also affect PY was 0.39; the corresponding value for MY and PY was 0.37. In the reverse direction, the proportion of markers affecting PY that also affect PP was 0.69; the corresponding value for PY and MY was 0.77. Thus, about one-third of markers that affect PP or MY also affect PY, whereas about two-thirds of markers that affect PY also affect PP or MY. This is plausible. Many QTL affecting PP or MY will not affect PY because of the inverse relationship between MY and PP. In the converse direction, however, we would expect a QTL affecting PY to have effects on MY or PP or both. Thus, the results of the present study are broadly consistent with Viitala et al. (2003) but also support the presence of an appreciable body of QTL that affect MY or PP without major effects on the alternative trait (see below). These may exert their effects through other mechanisms.
Although most markers, and by implication QTL, were estimated to have effects on both MY and PP, it was also estimated that an appreciable proportion (about one-third) of markers affecting MY did not have major effects on PP and vice versa. Thus, these results imply that effects of individual QTL on MY, PP, and PY can differ. This is supported by studies of the effects of alleles at individual known genes on the 3 traits, MY, PY, and PP.
| CONCLUSIONS |
|---|
|
|
|---|
The estimate of 30% of QTL that affect either MY or PP without effects on the alternative trait is close to the estimate of about one-third (0.32) of QTL affecting PP or MY that also affect PY. Thus, those QTL that affect primarily PP or MY with only minor effect on the other main trait may be the QTL that have significant effects on PY. As expected from the arithmetical relationships among PP, MY, and PY, QTL affecting PY alone were absent from Viitala et al. (2003) and were estimated as absent in the corrected analysis of the present study. The high proportion of QTL estimated as affecting MY and PP with neutral effects on PY should enable PP to be maintained even when selection for PY results in an increase in MY. Assuming a diallelic QTL, average QTL heterozygosity equal to 0.40 as found in the present study implies allele frequencies in the range 0.7 or more for the more frequent QTL allele. This would be consistent with the longstanding selection for MY that has been underway in the Holstein breed for many generations and suggests that, for the most part, alleles with positive effects on MY (and associated negative effects on PP) may already be at moderately high frequency in the breed.
| Appendix 1 |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication August 31, 2007. Accepted for publication December 18, 2007.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H. Khatib, W. Huang, X. Wang, A. H. Tran, A. B. Bindrim, V. Schutzkus, R. L. Monson, and B. S. Yandell Single gene and gene interaction effects on fertilization and embryonic survival rates in cattle J Dairy Sci, May 1, 2009; 92(5): 2238 - 2247. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Lipkin, A. Bagnato, and M. Soller Expected Effects on Protein Yield of Marker-Assisted Selection at Quantitative Trait Loci Affecting Milk Yield and Milk Protein Percentage J Dairy Sci, July 1, 2008; 91(7): 2857 - 2863. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |