JDS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Interpretive Summary
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Boettcher, P. J.
Right arrow Articles by Stella, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Boettcher, P. J.
Right arrow Articles by Stella, A.
J. Dairy Sci. 87:4303-4310
© American Dairy Science Association, 2004.

A Monte Carlo Approach for Estimation of Haplotype Probabilities in Half-Sib Families

P. J. Boettcher1, G. Pagnacco2 and A. Stella3

1 Istituto di Biologia e Biotecnologia Agraria, Consiglio Nazionale delle Ricerche, Segrate 20090, Italy
2 Dipartimento di Scienze e Tecnologie Veterinarie per la Sicurezza Alimentare, Universita ‘ degli Studi, Milano 20133, Italy
3 CERSA–Fondazione Parco Tecnologico Padano, Lodi 26900, Italy

Corresponding author: Paul Boettcher; e-mail: boettch{at}ibba.cnr.it.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The objective of this work was to propose an algorithm (HAPROB) to estimate haplotype probabilities for genotyped members of half-sib families for which parents lacked genotypic information. The algorithm had 2 basic steps. First, a Monte Carlo-based approach was used to estimate haplotype probabilities for sires conditional upon offspring genotypes and population allelic frequencies, and then offspring-haplotype probabilities were estimated conditional upon sire probabilities and population frequencies. The 2 steps were alternated iteratively until estimates of population frequencies were essentially unchanged. Simulation was used to evaluate effects of the number of Monte Carlo cycles on the accuracy of the reconstructed haplotypes. Fifty thousand cycles was found to be sufficient for the haplotype configurations considered. Accuracy of the algorithm was compared with that obtained by the public domain SIMWALK2 software. Predictions of the most likely haplotype configurations are produced by SIM-WALK2, but no estimates of probability are given. The accuracy of the current approach was comparable to that obtained from SIMWALK2. The proportions of times that haplotypes were reconstructed correctly were 87.0 and 92.4% (sires and offspring) for HAPROB vs. 87.5 and 91.5% for SIMWALK2. Effects of family size on accuracy of reconstruction were examined. Accuracy of reconstruction was only about 4% for sires with 2 offspring, but accuracy among the offspring themselves was 65%. Accuracy increased quickly as family size increased and reached 100% for sires with 30 offspring. Maximum accuracy for offspring was about 96%. Estimates of haplotype probabilities produced can be used in regression analyses to estimate effects of haplotypes on quantitative phenotypes.

Key Words: haplotype reconstruction • probabilities • half-sib families

Abbreviation key: HAPROB = algorithm for estimating haplotype probabilities, HC = haplotype combination, RP = relative probability


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Early studies of genetic association between DNA polymorphism and dairy traits were typically based on effects of polymorphism at a single locus (e.g., Cowan et al., 1990). However, as new genotyping technologies have been developed and refined and the quantity of genomic information has increased, simultaneous analysis of multiple loci or sites of polymorphism has become more common. One approach for this type of analysis is multiple regression on all loci and their interactions. An alternative strategy for genes on the same chromosome is an analysis of haplotype effects, which has several advantages over considering the various loci as separate entities (Morris and Kaplan, 2002). Advantages of this strategy are that 1) the number of haplotypes in a population is often lower than the number of allelic combinations among loci, thus resulting in more precision and statistical power; and 2) haplotype effects can account for effects of unknown loci flanked by and in linkage disequilibrium with the genotyped loci. Analyses of haplotype effects have been performed in humans (e.g., Martin et al., 2000) and in farm animals (e.g., Sironen et al., 2002). Braunschweig et al. (2000) and Ikonen et al. (2001) have recently applied such an approach to dairy cattle.

One obstacle to haplotype analysis is that genotyping procedures generally do not provide information about the phase between loci, which is necessary for obtaining haplotypes. For example, suppose 2 hypothetical loci A and B are linked together on the same chromosome. Most genotyping approaches provide only information regarding the alleles carried by an individual at each locus. For individuals that are heterozygous at both loci (e.g., A1A2B1B2), the haplotypes are not known because the information available does not indicate if the chromosome with A1 also has the B1 allele (haplotype = A1B1) or the B2 allele (haplotype = A1B2). In theory, the phase could be determined by sequencing all the DNA between the loci of interest, but this method would be costly. Therefore, in silico methods have been developed for reconstruction of haplotypes (e.g., Clark, 1990; Excoffier and Slatkin, 1995; Stephens et al., 2001). These methods typically use frequencies of haplotypes in the population under study or genotypes of relatives to determine the most likely haplotype combination (HC) for each individual. In some cases, software that applies these methods is available in the public domain. For example, the SIMWALK2 software (Weeks et al., 1995; Sobel and Lange, 1996), a set of computer programs that perform various genetic analyses, including haplotype reconstruction, can be freely downloaded from the Internet. One limitation to most of these methods is that they cannot resolve unambiguously the haplotypes for all individuals. Many approaches determine only the most probable HC without providing an estimate of certainty or probabilities of other HC. Ikonen et al. (2001) used one such approach to reconstruct haplotypes in cattle and then estimated haplotype effects by regressing the phenotype of each animal on the number of copies (0, 1, or 2) of a given haplotype that each individual carried. Their analysis assumed that assigned haplotypes were reconstructed without error. An alternative strategy would be to determine the probability of each HC for each animal and then regress phenotypes on these probabilities. Such an analysis requires an extension of developed methods for haplotype reconstruction based on determining the most likely HC.

Some haplotype reconstruction methods that use pedigree information may require that both parents and offspring are genotyped, even if the offspring are the only individuals with phenotypes to be analyzed. However, availability of parental genotypes may be limited in some cases. For example, for study of a trait like longevity in livestock or late-onset diseases in humans, parents may no longer be alive when a phenotype is recorded in the offspring. In other cases, when experimental designs are limited by fiscal restrictions, genotyping of both offspring and parents may be cost prohibitive. The genotyping method used may also preclude acquisition of information from parents. For example, the genotypes of dairy cattle can be obtained by analyzing milk, by using the DNA from somatic cells (Lipkin et al., 1993) or the milk proteins, for the special case of genotyping milk protein genes (Erhardt et al., 1998). These approaches can obviously not be applied to obtain genotypes of male parents. In these cases, genotypes of sibs have to be used to deduce the likely haplotypes of parents before reconstructing haplotypes of the sibs themselves. The widespread use of AI in dairy cattle has created large half-sib families. For many families, enough offspring may be available for determination of the exact haplotypes of the common sire, or to narrow down the possibilities to a small number of HC.

The objective of this study was to develop an algorithm (HAPROB) to estimate probabilities of HC for members of half-sib families, given that genotypes were known for all siblings, but unknown for all parents. A software application was written and applied to simulated data sets to compare performance of the algorithm with a popular publicly available computer program for haplotype reconstruction.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Methods
A Monte Carlo-based algorithm was proposed for estimation of haplotype probabilities within half-sib (paternal) families, based on multilocus genotypes of the half-sibs. The algorithm was developed for the situation in which genotypes of parents were unknown, although simple extensions could be made to consider this information, if available. The approach assumed that genotyping was complete for all half-sibs genotyped and that each member of a given family had a different mother. Recombination was allowed between loci and rates of recombination were assumed to be known.

The general algorithm had 2 primary operations:

  1. Estimation of conditional haplotype probabilities for sires, given genotypes of offspring and allelic frequencies in the general population.
  2. Estimation of conditional haplotype probabilities for offspring, given haplotype probabilities of sires and allelic frequencies in the general population.

Each of these primary operations involved a number of different steps. A Monte Carlo-based approach was used in the first stage, whereas the second operation was deterministic.

A simple flow-chart that briefly outlines the entire procedure is shown in Figure 1Go.



View larger version (25K):
[in this window]
[in a new window]
 
Figure 1. A flow-chart defining the algorithm used to reconstruct haplotypes and estimate haplotype probabilities.

 
The first step of the algorithm consisted of the determination of all possible haplotypes, the corresponding HC, and the haplotype frequencies given the frequencies of each allele in the population of interest (prior information). The number of possible haplotypes (Nh) is defined by the equation


([1])

where NAi is the number of alleles at the ith locus and i = 1,...,Nl. Dairy cattle (and all mammals) have 2 haplotypes, so the number of unique HC (Nc) is determined by the equation


([2])

The algorithm processed one family at a time and one HC at a time within a family.

Sire haplotype probabilities.
Within each half-sib family, a Monte Carlo approach was used to obtain a relative probability (RP) of each HC for the sire, based on offspring genotypes and population haplotype frequencies. This procedure was repeated for each HC and when all HC were completed, the nonzero RP were standardized to sum to 1.00 to obtain final haplotype probabilities for each sire.

The calculation of the RP of a given HC involved a number of steps. First, offspring genotypes were surveyed to determine if the HC was possible. A given HC was considered possible only if all of the offspring genotypes included at least 1 of the 2 alleles of the respective HC at each locus of the haplotype. If a given HC was possible, all of the potential resulting gametes were determined and their respective probabilities according to a priori rates of recombination. Number of potential gametes was equal to 2Nl', where Nl' is the number of heterozygous loci in the HC. Following the determination of possible gametes and their respective probabilities, the following process was undertaken for each offspring in the family. 1) A random gamete from the HC was assigned, with all gametes having the same probability. For example, consider a haplotype defined by loci A and B. For a sire with the HC A1B1 and A2B2, 4 gametes were possible. An offspring with the genotype A1A2B1B2 could have received any of the 4 gametes. However, an offspring with the genotype A1A1B1B2 could have received only gametes A1B1 or A1B2. 2) Given the selected sire gamete, the haplotype received from the mother was deduced based on complementarity to the sire gamete.

After steps 1 and 2 had been applied to all offspring within the half-sib family, the numbers of different dam haplotypes received by the group of offspring were used to calculate the probability (pd) of the offspring receiving this particular combination of dam haplotypes, according to the standard multinomial probability function.


([3])

where O is the number of offspring in the family, Oi is the number of offspring that received haplotype i from their respective mothers, and pi is the frequency of haplotype i in the general population. The same type of calculation was then made for the combination of sire gametes received (ps), substituting in equation [3] the dam gametic types, numbers, and probabilities with corresponding values for sires.

This process was repeated many (n) times (i.e., the Monte Carlo process) for each possible HC. The RP for the given HC (RPHC) was then obtained according to the following equation:


([4])

where p(HC) is the probability of this given HC, which is equal to the product of the population frequency of the 2 haplotypes that comprised the HC, and Pdj and Psj are the probabilities of the dam and sire gamete distributions from cycle j of the Monte Carlo process.

As mentioned previously, the final HC probabilities for each sire were then calculated by standardizing the various RP to 1.0:


([5])

where SPi is the final probability that the sire had the ith HC and RPi is the relative probability (from equation [4]) that the sire had the ith HC and NC is the number of different HC. Only HC with SPi > 0.00001 were considered plausible and thus included in the calculation.

Offspring haplotype probabilities.
Offspring haplotype probabilities were calculated directly from the final sire probabilities, in a series of steps. First, for each offspring, the particular HC (HCO) possible were identified, given the respective genotype. The RP of each HCO were then calculated for each sire HC (HCS). Suppose that the ith HCO is defined by 2 haplotypes (gametes) h1 and h2 that had been transmitted either from the sire (hs) or from the mother (hm). Then, the RP for the ith HCO for the jth HCS was calculated as


([6])

where p(h1 = hs) is the probability that the first haplotype in the HCo had been obtained from the sire and p(h2 = hm) is the probability that the second haplotype in the HCo had been obtained from the mother (and so forth). The p(hk = hs) (i.e., k = 1 or 2) were based on the gametic probabilities for each HCS, which depended on recombination rates, as described previously. The p(hk = hm) depended upon population frequencies. After RPij had been calculated for all HCO for a given HCS, standardized probabilities (Pij) were obtained by dividing by the sum of RPij across all possible HCO:


([7])

where M is the number of different HCO possible. To obtain probabilities of a given HCO (OP) across all HCS, the Pij for each HCS were multiplied by the probability of that corresponding HCS and summed over all HCS:


([8])

Finally, probabilities of specific haplotypes for each offspring were obtained with the following formula:


([9])

where HPij is the probability that the ith offspring carried the jth haplotype, OPij is the probability that the ith offspring had the kth HC, and Ijk is the number of copies of the jth haplotype that were included in the kth HC (i.e., Ijk = 0, 1, or 2).

After haplotype probabilities were calculated for all offspring in all families, the newly constructed offspring haplotypes were used to reestimate frequencies in the population. To calculate these frequencies, only the haplotypes transmitted from mothers were used. Sire haplotype information was not used because sires are typically expected to be a small, selected group that may not be very representative of the general population.

Calculation of the new population frequencies was the final step in the algorithm and the entire process was repeated until frequencies converged to stable values. To determine if the values had stabilized, the following convergence criterion (CC) was calculated:


([10])

where is the estimated frequency of allele i in iteration r and r–1i is the corresponding estimate from the previous iteration, and NA is the number of alleles (across all loci). The iteration process was stopped when CC < 10–7.

Verification and Testing
Effect of number of replicates.
Simulation was used to evaluate the accuracy of the algorithm as a function of n (number of cycles in the Monte Carlo process). For this test, the haplotype to be reconstructed consisted of 3 completely linked loci (i.e., no recombination), each with 3 alleles with equal expected frequencies of P = 1/3. In the simulation to test for effects of n, the population consisted of 20 sire families, each with 10 half-sibs. Four values of n were compared: 10,000; 50,000; 100,000; and 500,000. The simulation was replicated 25 times. Accuracy of haplotype reconstruction was evaluated by determining the proportion of times that the HC with the greatest probability HC was in fact the true HC and was calculated separately for sires and half-sibs.

Verification with other software.
The accuracy of the proposed algorithm was also compared with the accuracy obtained when using a popular publicly available software application for haplotype reconstruction. This application was the haplotype reconstruction feature of the SIMWALK2 software (Weeks et al., 1995; Sobel and Lange, 1996). Like HAPROB, SIMWALK2 considers pedigree structure when reconstructing haplotypes. Certain other publicly available software considers only population frequencies when determining haplotypes (e.g., Niu et al., 2002). Unlike HAPROB, SIMWALK2 reports only the most probable HC, rather than providing probabilities for all plausible HC.

For this comparison, haplotypes of 3 loci were simulated. Each locus had 3 alleles, all with expected frequencies of 0.33; no recombination was simulated. Each replicate involved 20 families of 10 half-sibs and 25 replicates were generated. As mentioned previously, the SIMWALK2 software returns only the most likely haplotype configuration. The proportion of times that this haplotype configuration was correct was calculated for both sires and offspring and compared with the corresponding accuracy yielded by HAPROB.

Effect of family size on accuracy.
Following verification of the software, a simple experiment was performed to examine the effect of family size on accuracy of haplotype reconstruction for sires and offspring and to compare the relative value of genotypic information for an animal itself to that of a group of its offspring. In this experiment, the simulated haplotype again had 3 loci, each with 3 equally frequent alleles. Fourteen family sizes were compared: all values from 1 to 10 half-sibs, and then 12, 15, 20, and 30. The total population size (number of total offspring) was set to be as close to 300 as possible. Therefore, the number of sires ranged from 300 (1 offspring) to 10 (30 offspring). In cases for which 300 was not exactly divisible by the number of offspring per family (8 for example), the number of sires was based on the next multiple of offspring number yielding a total population of at least 300 (e.g., 38 sires with 8 offspring per sire). A constant value of n = 100,000 was used and 25 replicates were generated.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Testing
No advantage in accuracy of haplotype construction was obtained by increasing the number of replicates of the Monte Carlo process above 50,000, at least for the 3-locus, 3-allele haplotype simulated and 10 offspring per family. With n = 10,000, the correct haplotype combination was obtained for 86.4% of sires and 92.1% of the offspring of these sires. These respective percentages were increased to 87.2 and 92.2% when n = 50,000. However, no additional increase in accuracy was observed by increasing n to 100,000 or 500,000. This result could possibly change with increased numbers of loci, and thus additional combinations of sire gametes.

Verification with Other Software
Table 1Go has the comparison of reconstruction accuracy for HAPROB and SIMWALK2. The HAPROB algorithm yielded levels of accuracy that were essentially equal to that of SIMWALK2, suggesting that the algorithm was theoretically sound. The SIMWALK2 algorithm yielded a slightly greater rate of accuracy for sires, at 87.5%, vs. 87.0% for the HAPROB algorithm. The opposite was the case for the genotyped half-sibs, as HAPROB yielded an accuracy of 92.4 vs. 91.5% for SIMWALK2. The result that one method yielded higher accuracy for sires and the other for offspring likely suggests that the levels of accuracy from the 2 approaches were not significantly different, although no formal test of significance was applied.


View this table:
[in this window]
[in a new window]
 
Table 1. Comparison of accuracy of haplotype reconstruction using the current algorithm (HAPROB) and SIMWALK2, a publicly available software application.1
 
Effect of Family Size on Accuracy
Figure 2Go shows the effects of increasing number of offspring per sire on the accuracy of haplotype reconstruction. Not surprisingly, accuracy of reconstruction increased as number of offspring per half-sib family increased. One or 2 siblings provide little information about the particular haplotypes carried by the father. With 2 siblings, half of the time, the sibs will each receive a different haplotype from the sire. In this case, offspring genotype information is of little value, as the algorithm relies on the presence of multiple copies of a given haplotype (more than would be expected by chance) in siblings to suggest that the sire carried that haplotype. The other half of the time, both sibs receive the same gamete from the father and thus provide no information about the haplotype associated with the nontransmitted gamete. Because of this lack of information, haplotypes were reconstructed correctly for only 4.2% of sires when only 2 offspring were present. Despite the low accuracy of reconstruction for sires, 64.8% of offspring had correctly defined haplotypes. In contrast to sires, genotypes of these individuals were available, narrowing the number of possible haplotypes to consider. In some cases, namely heterozygosity at ≤1 locus, haplotypes for offspring could be immediately deduced.



View larger version (13K):
[in this window]
[in a new window]
 
Figure 2. Accuracy of haplotype reconstruction as a function of number of offspring per half-sib family.

 
Increasing family size quickly increased accuracy of reconstruction for sires. The accuracy increased nearly 10-fold to 38.9% when family size was increased from 2 to 5 offspring, and then more than doubled to above 80% with families of 10 half-sibs. Increased accuracy of sire haplotype reconstruction led to increases for offspring as well, and more than 90% of offspring haplotypes were defined correctly for families of 10 half-sibs. Increasing family size beyond 10 half-sibs was more beneficial for accuracy of sires than of offspring. Accuracy of 100% was achieved for sires with 30 offspring, whereas accuracy for offspring was approximately 96% at this level. The accuracy of reconstruction for offspring seemed to asymptote at a level less than 100%. This upper limit in accuracy is because some offspring had genotypes for which either of the sire haplotypes was compatible. In this case, the most likely haplotype combination depended upon the relative frequencies of the complementary (i.e., received from mother) haplotypes in the population.

It should be noted that the least favorable situation with regard to allelic frequencies (all loci with equal frequencies) was simulated and accuracies for both sires and offspring would be greater with higher levels of homozygosity. For this haplotype definition, the predictive value of the genotype of an individual with no half-sibs (about 50%) was equal to that of the genotypes of approximately 5 to 6 half-sib offspring for a sire with none of his own genotypic information. For families with about 13 half-sibs, the haplotypes of genotyped offspring and the nongenotyped sire were predicted with similar accuracy (Figure 2Go).

The mean probability of the most likely haplotype combination also increased as family size increased (data not shown). This trend indicated increasing family size not only increased the accuracy of reconstruction, but also its precision. With few offspring, many combinations were plausible for each sire and each had a relatively low probability. As number of offspring increased, the incorrect haplotype combinations became less and less plausible and the probability that a single particular HC was correct increased.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Extensions to the Algorithm
Parental genotypes.
The HAPROB algorithm was designed for estimation of haplotype probabilities for a specific and rather difficult situation. In particular, offspring haplotypes were predicted without the benefit of parental genotypic information. However, in many real-life situations, parental genotypes may be available, particularly for the sires. When sire information is available, the algorithm becomes simpler. When no sire information is present, then all Nh possible haplotypes must be initially considered. According to equation [1], Nh is the product of the number of alleles at each locus. When a sire is genotyped, this number is decreased to the product of the number of different alleles carried by the sire at each locus. A sire can have a maximum of 2 alleles at each locus, so this value will have a maximum of 2Nl, where Nl is again the number of loci, and could even be as low as one if the sire is homozygous at all loci. Having access to sire genotypes will increase accuracy of recombination for offspring, but, as seen in Figure 2Go, after a certain number of offspring are genotyped, the haplotypes of the sire are all but known with certainty and some ambiguity about offspring haplotypes can remain.

When dam information is present, the situation becomes slightly more complicated. The probability of maternal gametes would be then be based on the dam genotype rather than on the population frequencies. The right side of equation [3] would then have 2 parts, the part shown for offspring with dams with unknown genotypes and a new addition that could be written as


([11])

where Ndk is the number of offspring with dams with known genotypes and p(Gi) is the probability that offspring i received its particular dam gamete given its dam’s genotype. The availability of dam information would be expected to increase accuracy of haplotype reconstruction on average, but would not be necessary for some animals, that is, those for whom haplotypes can be deduced with certainty from their own genotypes and that of their sire. Thus, for maximum economic efficiency, a useful 3-step strategy might be to 1) genotype offspring first and reconstruct haplotypes based on this information, 2) genotype only the sires of families with offspring whose haplotypes cannot be resolved with certainty based on their genotypes alone and reconstruct haplotypes again based on this information, and then 3) genotype only those dams of offspring for whom some uncertainty remains after genotyping offspring and sires.

Missing genotypes.
The algorithm as presented (and the software written) was based on the assumption that all individuals were genotyped for all loci. However, this level of genotyping is not a necessary condition of the algorithm. Animals with less than complete genotypes will simply contribute less information about the haplotypes of their sire (and have less precisely reconstructed haplotypes of their own). For these offspring, more parental haplotypes would be plausible than for animals with complete genotypes. In fact, haplotype probabilities could be estimated for animals with no genotype information. The most probable haplotype for such animals would consist of a random gamete from the most probable HC of the sire, plus the most common haplotype in the population from the dam.

No recombination.
The described algorithm was to be general enough to allow for recombination between loci at predefined rates. However, studies of haplotype effects often deal with sets of loci that are positioned closely enough together in the genome such that little, if any, recombination is expected. In such a case, the population frequencies used to obtain the dam gametic probabilities (pi) in equation [3] and P(HC) for sires in equation [4] should be based on probabilities of specific haplotypes, rather than the product of the frequencies of the alleles defining a given haplotype. This recommendation is because tightly linked loci will likely show linkage disequilibrium. Thus, transmission of certain pairs of alleles will not be independent and probabilities of certain allelic combinations will not be equal to the product of individual allelic frequencies. This consideration may be especially important for a species such as dairy cattle for which selection has led to high rates of linkage disequilibrium (Farnir et al., 2000).

Use of the Algorithm
Haplotype reconstruction is generally done for 1 of 2 purposes: to estimate haplotype frequencies within a population (e.g., Excoffier and Slatkin, 1995), or to estimate haplotype effects on a given phenotype. Estimation of haplotype probabilities for each individual allows one to use the probabilities as covariables upon which quantitative phenotypes can be regressed to estimate haplotype effects. This process should result in increased precision for estimating haplotype effects, relative to simply assuming that the most probable HC is correct.

In theory, opportunities exist to extend the process an additional step in future research. If a statistical association exists between a haplotype and a given phenotype, the phenotype would provide some information about the haplotype probabilities. The ideal algorithm would simultaneously consider genotypes, pedigree, and phenotypes in an approach to both reconstruct haplotypes and estimate their effects. A Bayesian analysis with Gibbs Sampling would be one possible way to approach such a problem.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Accuracies of haplotype reconstruction obtained by the proposed algorithm were very similar to those of SIMWALK2, popular software for haplotype reconstruction available in the public domain, which probably indicates that the logic upon which the algorithm is based was correct. However, HAPROB holds an advantage in the comparison in that it estimates and reports probabilities for each plausible HC, rather than simply an indication of the single most likely combination. Phenotypes can be regressed on such probabilities for estimation of haplotype effects, which may lead to more precision than is obtained by assuming that the most probable haplotype is correct. Sire haplotypes can be reconstructed with near certainty when enough offspring genotypes are available, but some uncertainty about offspring haplotypes may remain even when the individual is genotyped and the sire haplotypes are known with certainty. The software developed based on this approach for estimation of haplotype probabilities can be obtained directly from the author of this article via electronic mail.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
This research supported in part by a MURST contract (No. MM07247883_005). The authors would like to thank the anonymous reviewers for their detailed critical assessment of the manuscript.

Received for publication June 4, 2004. Accepted for publication September 2, 2004.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 


Braunschweig, M., C. Hagger, G. Stranzinger, and Z. Puhan. 2000. Associations between casein haplotypes and milk production traits of Swiss Brown cattle. J. Dairy Sci. 83:1387–1395.[Abstract]

Clark, A. 1990. Inference of haplotypes from PCR-amplified samples of diploid populations. J. Mol. Biol. Evol. 7:111–122.[Abstract]

Cowan, C. N., M. R. Dentine, R. L. Ax, and L. A. Schuler. 1990. Structural variation around prolactin gene linked to quantitative traits in an elite Holstein sire family. Theor. Appl. Genet. 79:577–582.

Erhardt, G., J. Juszczak, L. Panicke, and H. Krick-Saleck. 1998. Genetic polymorphism of milk proteins in Polish Red Cattle: A new genetic variant of ß-lactoglobulin. J. Anim. Breed. Genet. 115:63–71.

Excoffier, L., and M. Slatkin. 1995. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12:921–927.[Abstract]

Farnir, F., W. Coppieters, J. J. Arranz, P. Berzi, N. Cambisano, B. Grisart, L. Karim, F. Marcq, L. Moreau, M. Mni, C. Nezer, P. Simon, P. Vanmanshoven, D. Wagenaar, and M. Georges. 2000. Extensive genome-wide linkage disequilibrium in cattle. Genome Res. 10:220–227.[Abstract/Free Full Text]

Ikonen, T., H. Bovenhuis, M. Ojala, O. Routtinen, and M. Georges. 2001. Associations between casein haplotypes and first lactation milk production traits in Finnish Ayrshire cows. J. Dairy Sci. 84:507–514.[Abstract]

Lipkin, E., A. Shalom, H. Khatib, M. Soller, and A. Friedmann. 1993. Milk as a source of deoxyribonucleic acid and as a substrate for the polymerase chain reaction. J. Dairy Sci. 76:2025–2032.[Abstract]

Martin, E. R., E. H. Lai, J. R. Gilbert, A. R. Rogala, A. J. Afshari, J. Riley, K. L. Finch, J. F. Stevens, K. J. Livak, B. D. Slotterbeck, S. H. Slifer, L. L. Warren, P. M. Conneally, D. E. Schmechel, I. Purvis, M. A. Pericak-Vance, A. D. Roses, and J. M. Vance. 2000. SNPing away at complex diseases: Analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67:382–394.

Morris, R. W., and N. L. Kaplan. 2002. On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles. Genet. Epidemiol. 23:221–233.[Medline]

Niu, T., Z. S. Qin, X. Xu, and J. Liu. 2002. Bayesian haplotype inference for multiple linked single nucleotide polymorphisms. Am. J. Hum. Genet. 70:157–169.[Medline]

Sironen, A. I., M. Andersson, P. Uimari, and J. Vilkki. 2002. Mapping of an immotile short tail sperm defect in the Finnish Yorkshire on porcine chromosome 16. Mamm. Genome 13:45–49.[Medline]

Sobel, E., and K. Lange. 1996. Descent graphs in pedigree analysis: Applications to haplotyping, location scores, and marker sharing statistics. Am. J. Hum. Genet. 58:1323–1337.[Medline]

Stephens, M., N. J. Smith, and P. Donnelly. 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68:978–989.[Medline]

Weeks, D. E., E. Sobel, J. R. O’Connell, and K. Lange. 1995. Computer programs for multilocus haplotyping of general pedigrees. Am. J. Hum. Genet. 56:1506–1507.[Medline]


This article has been cited by other articles:


Home page
J DAIRY SCIHome page
I. Leyva-Baca, F. Schenkel, J. Martin, and N. A. Karrow
Polymorphisms in the 5' Upstream Region of the CXCR1 Chemokine Receptor Gene, and Their Association with Somatic Cell Score in Holstein Cattle in Canada
J Dairy Sci, January 1, 2008; 91(1): 407 - 417.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
B. S. Sharma, I. Leyva, F. Schenkel, and N. A. Karrow
Association of toll-like receptor 4 polymorphisms with somatic cell score and lactation persistency in holstein bulls.
J Dairy Sci, September 1, 2006; 89(9): 3626 - 3635.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
F. S. Schenkel, S. P. Miller, X. Ye, S. S. Moore, J. D. Nkrumah, C. Li, J. Yu, I. B. Mandell, J. W. Wilton, and J. L. Williams
Association of single nucleotide polymorphisms in the leptin gene with carcass and meat quality traits of beef cattle
J Anim Sci, September 1, 2005; 83(9): 2009 - 2020.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
P. J. Boettcher, A. Caroli, A. Stella, S. Chessa, E. Budelli, F. Canavesi, S. Ghiroldi, and G. Pagnacco
Effects of Casein Haplotypes on Milk Production Traits in Italian Holstein and Brown Swiss Cattle
J Dairy Sci, December 1, 2004; 87(12): 4311 - 4317.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Interpretive Summary
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Boettcher, P. J.
Right arrow Articles by Stella, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Boettcher, P. J.
Right arrow Articles by Stella, A.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS