|
|
||||||||
1 US Department of Agriculture, Eastern Regional Research Center, Wyndmoor, PA 19038
2 Department of Food Science, California Polytechnic State University, San Luis Obispo 93407
3 Gala Design, Middleton, WI 53562
4 Department of Microbiology, School of Medicine, University of Iowa, Iowa City 52240
5 Fonterra Research Centre, Palmerston North, New Zealand
6 Department of Animal Science, University of Kentucky, Lexington 40546
7 Masterfoods USA, Burr Ridge, IL 60527
8 Department of Animal Science, McGill University, Sainte Anne de Bellevue, PQ, Canada H9X 3V9
9 Department of Food Science, North Carolina State University, Raleigh 27695
Corresponding author: H. M. Farrell, Jr.; e-mail: hfarrell{at}errc.ars.usda.gov.
| ABSTRACT |
|---|
|
|
|---|
Key Words: milk protein structure nomenclature review
Abbreviation key: C = constant, EIMS = electrospray ionization MS, H = heavy chain, HSA = human SA, L = light chain, LF = lactoferrin, MS = mass spectroscopy, NMR = nuclear magnetic resonance, SA = serum albumin, V = variable
| INTRODUCTION |
|---|
|
|
|---|
|
-CN genes CSN3, that will code for the same protein (
-CN). Note that the designation at the CSN3 locus relates to the DNA sequence or genotype, and CN denotes the phenotypic expression of the gene, the protein molecule. For example, Prinzenberg et al. (1999) demonstrated the presence of a silent mutation in the CSN3*A locus; here, the change of CCA to CCG still codes for proline at residue 152. This locus is termed CSN3*A1, but the protein remains
-CN A. As an interesting aside, for single codon changes in the 3 base code, there could be up to 575 silent changes in the DNA, yet the expressed protein would remain
-CN A. Because this Committee has focused on protein methodologies and sequences, it is the intent to assign letter symbols only to those proteins that are chemically different from the reference protein or a known member of the family. In this respect, no matter how many codon changes there are in the DNA, if the protein is chemically identical to a member of the family, then it would still be, e.g.,
-CN A.
National Center of Biotechnology Information: www.ncbi.nih.gov/database
The Swiss Protein and European Molecular Biology Site (ExPASy): www.expasy.org
Georgetown University Protein Information Resource: http://pir.georgetown.edu
Protein Science Site: http://www.proteinsociety.org/
A new database for protein researchers is being constructed under an NIH contract. It will combine the current Swiss-Prot, Trembl, and PIR resources into a single searchable database. The new resources will be called United Protein Data Base or UniProt, and, when completed (~2005), its web site will be www.uniprot.org. The current sites will continue to function as conduits for the new database.
The European Bioinformatics Institute: www.ebi.ac.uk
Comparative Ig Web site: www.medicine.uiowa.edu/CIgW
IMGTIg web site: http://imgt.cines.fr/ | THE CASEINS |
|---|
|
|
|---|
s1-,
s2-, ß-, and
-CN. This recommendation is affirmed, and researchers are requested to refrain from assigning specific genetic variant letters to new variants until their sequence homology can be established. Individual members of these families still can be identified by gel electrophoretic techniques, some of the more effective of which are suggested in the monograph by this Committee (Swaisgood et al., 1975).
S1-CN
The
Sl-CN family, which constitutes up to 40% of the CN fraction in bovine milk, consists of one major and one minor component. Both proteins are single-chain polypeptides with the same amino acid sequence established by Mercier et al. (1971) and Grosclaude et al. (1973) and differ only in their degree of phosphorylation. The minor component contains one additional phosphorylated serine residue at position 41 (Eigel et al., 1984). The reference protein for this family is
S1-CN B-8P, a single-chain protein with no cysteinyl residues. It consists of 199 amino acid residues: Asp7, Asn8, Thr5, Ser8, Ser P8, Glu25, Glnl4, Prol7, Gly9, Ala9, Val11, Met5, Ile11, Leul7, Tyrl0, Phe8, Lysl4, His5, Trp2, and Arg6 with a calculated molecular weight of 23,615 (Mercier et al., 1971). Its primary sequence is given in Figure 1
; its ExPASy entry name and file number are CAS1_Bovin and P02662, respectively. Since the last nomenclature report (Eigel et al., 1984), 3 new genetic variants of
S1-CN have been identified. They are
s1-CN F (Erhardt, 1993), which was found in German Black and White cattle;
S1-CN G (Mariani et al., 1995) discovered in Italian Brown cows; and the H variant (Mahé et al. 1999). Hence, this family of proteins is currently known to consist of variant A found in Holstein Friesians, Red Holsteins, and German Red cattle (Ng-Kwai-Hang et al., 1984; Grosclaude, 1988; Erhardt, 1993); variant B, which is the predominant variant in Bos taurus (Eigel et al., 1984); variant C in Bos indicus and Bos grunniens (Eigel et al., 1984); variant D in various breeds in France (Grosclaude, 1988) and Italy (Mariani and Russo, 1975) as well as in Jerseys in The Netherlands (Corradini, 1969); and variant E in Bos grunniens (Grosclaude et al., 1976) in addition to the new variants F, G, and H.
|
S1-CN B is given in Figure 1
S1-CN B was determined by amino acid sequencing (Mercier et al., 1971; Grosclaude et al., 1973) and confirmed by cDNA sequencing (Nagao et al., 1984; Stewart et al., 1984) and by sequencing of the genomic DNA (Koczan et al., 1991). The
S1-CN signal peptide is composed of 15 amino acid residues, making the pre-form of
S1-CN B 214 amino acids in length. Variant A uniquely arises as a result of exon skipping caused by a single-base mutation that affects splicing of the pre-mRNA (Mohr et al., 1994). Deletion of residues 14 through 26 was first identified by amino acid sequencing (Grosclaude et al., 1970) and later confirmed by cDNA sequencing (McKnight et al., 1989). Neither the primary structures nor nucleic acid sequences of variants F and G have been completely reported.
|
S1-CN has been examined by various methods including CD spectroscopy, Raman spectroscopy, and predictive algorithms using sequence information, and the results have been reviewed previously (Swaisgood, 1992). However, its 3-D structure cannot be determined because the protein does not form crystals. Nuclear magnetic resonance (NMR) studies have also proven to be problematic because of the intrinsic aggregation of the protein. Nevertheless, its tertiary structure has been predicted using a combination of predicted secondary structures, adjusted to conform to the amount of global secondary structures determined experimentally, with molecular-modeling computations based on energy minimization (Kumonsinski et al., 1994). The latter structure should be viewed as a working model, which is consistent with bulk properties of the protein; it represents one possible interpretation of its structure.
Since the discovery of genetic variants, attempts have been made to correlate milk characteristics or milk production with the genotype. However, the correlations obtained have not been straightforward, in part because of differences in the parameters used. For example, the
S1-CN BB phenotype has been correlated with higher milk yields and, thus, higher protein yield over the lactation, (Ng-Kwai-Hang et al., 1984; Aleandri et al., 1990; Sang et al., 1994), but the same phenotype has also been correlated with lower protein concentration in milk (Ng-Kwai-Hang et al., 1986, 1992). It appears that cows carrying the G allele produce less
S1-CN and more of the other caseins (Mariani et al., 1995). For example, homozygous (GG) cows produce 55% less
S1-CN.
Because of the 13-amino acid residue deletion, the A variants characteristics are most different from the other variants (Farrell et al., 1988). Thus, most of the hydrophobic residues in the N-terminal region are eliminated, including the Phe-Phe-Val sequence that is cleaved by chymosin during cheese ripening (Mulvihill and Fox, 1979). Hence,
S1-CN A is similar to the peptide
S1-I corresponding to
S1-CN (f25-199), which has a reduced hydrophobicity (Creamer et al., 1982) and does not aggregate as extensively in the presence of calcium (Kaminogawa et al., 1980). The changes in curd rheology that occur with this proteolysis of the B variant are consistent with the observation that soft curds are formed with milks containing
S1-CN A (Sadler et al., 1968).
Comparison of the properties of variants B and C has indicated that
S1-CN C self-associates more strongly (Schmidt, 1970; Swaisgood, 1973), and cheeses made from milks containing the latter form a tougher curd (Sadler et al., 1968).
The distinct regions of anionic clusters and hydrophobicity evident in the primary structure are suggestive of the formation of hydrophobic and polar domains (Swaisgood, 1982, 1992) and are consistent with observed physical-chemical properties, such as the strong dependence of association on concentration, pH, ionic strength, and ion binding. The characteristics and significance of calcium ion binding to the anionic clusters are well known, but it has also been found that Zn2+ (Singh et al., 1989) and Fe (III) (Reddy and Mahoney, 1991) bind at these sites. The effect of these interactions on micelle structure and stability is not known.
S2-CN
The
S2-CN family, which constitutes up to 10% of the CN fraction in bovine milk, consists of 2 major and several minor components exhibiting varying levels of post-translational phosphorylation (Swaisgood, 1992) and minor degrees of intermolecular disulfide bonding (Rasmussen et al., 1992). The predominant forms in bovine milk contain an intramolecular disulfide bond and differ only in their degree of phosphorylation. The reference protein for this family is
S2-CN A-11P, a single-chain polypeptide with an internal disulfide bond. It consists of 207 amino acid residues: Asp4, Asnl4, Thrl5, Ser6, Ser P11, Glu24, Gln16, Pro10, Gly2, Ala8, Cys2, Val14, Met4, Ile11, Leu13, Tyr12, Phe6, Lys24, His3, Trp2, and Arg6 with a calculated formula molecular weight of 25,226. The primary structure of this protein is given in Figure 2
; its ExPASy entry name and file number are CAS2_Bovin and P02663, respectively. The secondary structure of
S2-CN has recently been studied by CD and FTIR spectroscopies (Hoagland et al., 2001).
|
S2-CN A, B, C, and D. Upon alkaline urea-gel electrophoresis, these proteins migrate between the
S1- and ß-CN, and the most prevalent species,
S2-CN A-11P, has served as the reference band for all proteins in the casein pattern (Whitney et al., 1976). The A variant is most frequently observed in Western breeds, with
S2-CN D observed with frequencies of 0.01 to 0.09 in Vosgienne and Montbeliarde breeds (Grosclaude et al., 1978) and in 3 Spanish breeds (Osta et al., 1995b). The B variant was observed with low frequencies in zebu cattle in South Africa and, variant C was observed in yaks in the Nepalese valley and the Republic of Mongolia (Grosclaude et al., 1976, 1982).
The primary structure of
S2-CN A-11P (Figure 2
), reported by Brignon et al. (1977), has been changed to Gln at position 87 rather than Glu, as indicated by cDNA sequencing (Stewart et al., 1987) and genomic DNA sequencing (Groenen et al., 1993). The
s2-CN signal peptide is composed of 15 amino acid residues, making the pre-form 222 amino acid residues in length. The D variant differs from
S2-CN A by the deletion of 9 amino acid residues from positions 51 to 59. However, the genomic DNA sequence does not reveal a deletion, but rather a substitution, suggesting that the amino acid sequence deletion is caused by the skipping of exon VIII, a 27-nucleotide sequence that encodes amino acid residues 51 to 59 (Bouniol et al., 1993). As shown in Table 2
, the C variant differs from the A variant at positions 33, 47, and 130 (Mahé and Grosclaude, 1982). As the specific sites of mutation resulting in
S2 -CN B have not been identified, as shown in Table 2
. Because of the progress made on this protein, following the elucidation of its sequence,
S2-CN will be reviewed in more detail in this report. Post-translational phosphorylation, primarily at seryl residues, results in the incorporation of 10 to 13 phosphate moeities. According to the specificity of CN kinase, phosphorylation occurs at Ser/Thr residues in the sequence Ser/Thr-X-Glu/SerP/Asp; however, the sequence SerX-Glu/SerP is heavily favored (Mercier, 1981). Only seryl residues are phosphorylated in
S2-CN A-11P, but Thr-66 was partially phosphorylated in
S2-CN C (Mahe and Grosclaude, 1982). Those residues known to be phosphorylated in
S2-CN A-11P are indicated by boldface italics in the figure. The underlined residues indicate potential sites of phosphorylation suggested by the enzyme specificity. It should be noted that Thr-47 in
S2-CN C is a potential phosphorylation site.
Another post-translational change that occurs with this protein is the formation of disulfide bonds. The 2 cysteinyl residues of this protein participate in both intramolecular and intermolecular disulfide bonds (Rasmussen et al., 1992, 1994). The protein exists predominantly as a monomer (>85%) with a disulfide bond between Cys residues 36 and 40 (Rasmussen et al., 1994) or as a dimer with both parallel and antiparallel disulfide bonds (Rasmussen et al., 1992). Therefore, 2 types of dimers are found: one fraction with residues 36 and 40 in one chain linked to residues 36 and 40, respectively, in the other chain. But, in another fraction, residues 36 and 40 are linked to residues 40 and 36, respectively, in the other chain. These results suggest that the formation of these bonds is not important to any structure required by this protein for its interaction with other CN.
S2-Casein is the most hydrophilic of all caseins as a result of the 3 clusters of anionic groups composed of phosphoseryl and glutamyl residues. Although relatively hydrophobic, the C-terminal 47 residues carry a net positive charge (about +9.5) at the pH of milk (Swaisgood, 1992). On the other hand, the more hydrophilic N-terminal 68 residues contain 2 anionic clusters and exhibit a net charge of about 21 at the prevalent pH of milk. Hence, the primary structure of
S2-CN can be represented by 4 domains: an N-terminal hydrophilic domain with anionic clusters, a central hydrophobic domain, followed by another hydrophilic domain with anionic clusters, and finally a C-terminal positively charged hydrophobic domain (Swaisgood, 1992). This structure is consistent with an association behavior that is very dependent on ionic strength (Snoeren et al., 1980). The association appears to be strongest around an ionic strength of 0.2 M, with dissociation occurring in lower salt because of electrostatic repulsion and also in higher salt because of suppression of electrostatic attraction, thus, reflecting the contributions of both hydrophobic interactions and electrostatic attraction.
The number of anionic clusters and the hydrophilic nature is also reflected in calcium-binding properties of
S2-CN. For example, the latter protein is more sensitive to Ca2+ than
S1-CN (Toma and Nakai, 1973), with almost complete precipitation occurring in 2 mM Ca2+ for
S2-CN at pH 7; whereas, precipitation of
S1-CN requires 6 mM Ca2+ (Aoki et al., 1985). These properties also led to a method for fractionation of
S2-CN from other caseins by precipitation from propan-l-ol solutions (Vreeman and van Riel, 1990). Solubility in this solvent is governed by electrostatic interactions that are most prevalent in
S2-CN.
S2-Casein appears to be readily susceptible to proteolysis as assessed by the activities of chymosin and plasmin toward the protein. Chymosin activity was observed in the regions of residues 88 to 98 and 164 to 180, but its primary cleavage occurred at Phe 88-Tyr 89 (McSweeney et al., 1994). These 2 regions are, respectively, at the edge of the central hydrophobic domain or in the first part of the cationic hydrophobic C-terminal domain. Plasmin activity released a number of peptides, including the N-terminal 21 to 24 residues of the initial hydrophilic domain containing one of the anionic clusters (Le Bars and Grippon, 1989; Visser et al., 1989). In agreement with plasmin specificity, mostly Lys-X bonds were cleaved at varying rates (Lys residues 21, 24, 149, 150, 181, 188, and 197). In addition to the shorter N-terminal peptides, a major peptide released was
S2-CN (fl51-207) (Le Bars and Grippon, 1989). In this regard, it is interesting to note that recently
S2-CN (fl65-203) was isolated from milk and shown to have antibacterial activity (Zucht et al., 1995).
ß-CN
The ß-CN family, which constitutes up to 45% of the casein of bovine milk is quite complex because of the action of the native milk protease plasmin (Eigel et al., 1984). Plasmin cleavage leads to formation of
1-,
2-. and
3-CN, which are actually fragments of ß-CN consisting of residues 29-209, 106-209, and 108-209. In addition, polypeptides previously called proteose peptone components 5, 8-fast, and 8-slow are fragments of ß-CN, which represent residues 1-105 or 1-107, 1-28, and 29-105, respectively. The reference protein for this family, ß-CN A2-5P is a single-polypeptide chain with no Cys residues containing 209 residues. It consists of Asp4, Asn5, Thr9, Ser11, Ser P5, Glul9, Gln20, Pro35, Gly5, Ala5, Val19, Met6, Ilel0, Leu22, Tyr4, Phe9, Lys11, His5, Trp1, and Arg4 with a calculated molecular weight of 23,983. The most common variant used as reference is variant A2; its ExPASy entry name and file number are CASB_Bovin and P02666, respectively. The A2 variant has been chemically sequenced (Ribadeau-Dumas et al., 1972) and sequenced from its cDNA (Jimenez-Flores et al., 1987; Stewart et al., 1987) and its gene (Bonsing et al., 1988). The ß-CN signal peptide is composed of 15 amino acid residues, making the pre-form 224 amino acids in length.
The sequence shown for ß-CN A2 in Figure 3
is that as corrected by 2 groups (Yan and Wold, 1984; Carles et al., 1988). It differs from the original sequence (Eigel et al., 1984) in 4 places: Gln117Glu, Pro 137 and Leu 138 are inverted, Glu175Gln, and Gln195Glu. The changes at residues 117 and 175 are confirmed by both groups and by gene sequencing. The inversion of residues 137 and 138 are not in agreement with cDNA sequencing data (Jimenez-Flores et al., 1987), which is in accordance with the original data. However the Leu-Pro substitution is a one base change, and mutations could occur and not be observed by HPLC-mass spectroscopy (MS) of peptides or by electrophoresis of the proteins. The weight here is, however, given to the 2 independent protein-sequencing reports. In a similar fashion, the change at 195 is not in agreement with the cDNA results, but, in this case, 3 lines of evidence support the occurrence of only Glu at residue 195. They include the following:
|
The previous report of Eigel et al. (1984) described 7 genetic variants. Since that revision, 3 new variants have been identified by sequence: ß-CN F, previously called ß-CN X (Visser et al., 1995); ß-CN G (Dong and Ng-Kwai-Hang, 1998); and ß-CN H (Han et al., 2000). The amino acid substitutions giving rise to all variants of ß-CN are given in Table 2
. In addition, Chung et al. (1995) identified variant A4 in native Korean cattle using electrophoresis only; its substitutions in the A2 reference protein are unknown.
Visser et al. (1995) identified ß-CN F, which contains the A1 substitution and Leu for Pro at residue 152. ß-casein F was separated by preparative reverse-phase HPLC. The main differential peaks representing the 114 to 169 fragments of ß-A1 and ß-X, respectively, were both purified following trypsin digestion, cyanogen bromide cleavage, and separation of the corresponding peptides representing the 145156 sequence. The presence of Leu residue at position 152 instead of the Pro-152 in ß-CN A1 was established by fast-atom bombardment MS-MS. In accordance with internationally accepted guidelines for the nomenclature of milk proteins, the new genetic variant has been named ß-CN F-5P. In a similar fashion, Dong and Ng-Kwai-Hang (1998) identified ß-CN G-5P, which is similar to ß-CN A1 and F, but contains a Leu in place of Pro at either position 137 or 138, depending on the sequence assigned, as the Pro-Leu inversion is controversial. Han et al. (2000) identified ß-CN H, which represents 2 substitutions relative to the corrected reference ß-CN A2. These are Arg25 to Cys and Leu88 to Ile. A genetic variant, discovered by Senocq et al. (2002) was also named H; so, it is proposed that the Han variant be termed H1, and the Senocq variant be termed H2. The H2 variant differs from the A2 variant at 2 known positions (Met93Leu and Gln72Glu) and a substitution of Gln to Glu between residues 114169. Finally, the I variant was described by Jann et al. (2002); it contains only the Met 93Leu substitution of the H2 variant.
ß-Casein is the most hydrophobic of the CN. The N-terminal sequence codes for charged amino acids as well as a phosphoserine cluster. This initial sequence is different from the second half of the molecule, where neutral and hydrophobic amino acid residues abound. Calculation of the net charge at pH 6.6 indicates that the first 21 amino acids would have a net charge of about 11.5, and the C-terminal 21 amino acids (190209) have no net charge. This molecule presents a high contrast in its sequence, one-tenth of the amino acids at the N-terminus of the protein contain one-third of the total charge, while 75% of the residues at the C-terminal one-tenth consist of hydrophobic amino acids. It is this unusual distribution of amino acids that leads to the release of ß-CN from CN micelles in the cold (Aoki et al., 1990). It is important to mention that no 3-D structure from X-ray crystallography has been reported; however, a computer-generated working 3-D model has been presented (Kumosinski et al., 1993b). Perhaps the difficulty on inducing suitable crystals from this protein is due to its dependence on the environment surrounding it and its propensity for self-association.
Addition of a glycosylation signal in the gene of ß-CN has been reported (Choi and Jimenez-Flores, 1996). This modification of a bovine milk protein is an important aspect of this review because the original gene was generated from ß-CN A1 with the glycosylation signal, changing from Pro 67 to Ser 67. However, this new variant represents a man-made intervention and does not occur naturally. We suggest that nomenclature of genetic variants induced through molecular biology techniques follow the same mechanisms as those established by this Committee for naturally occurring variants, e.g., a point mutation of ß-CN A1, Pro67Ser 67.
-CN
The
-CN family consists of a major carbohydrate-free component and a minimum of 6 minor components. The 6 minor components, as detected by PAGE in urea with 2-mercaptoethanol (Mackinlay and Wake, 1965; Pujolle et al., 1966; Woychik et al., 1966; Vreeman et al., 1977; Doi et al., 1979), represent varying degrees of phosphorylation and glycosylation.
-casein, as isolated from milk, also occurs in the form of a mixture of disulfide-bonded polymers ranging from dimer to octamers and above (Groves et al., 1992). Beeby (1964) reported the presence of free thiol groups after calcium removal by treatment with ethylenediaminetetraacetate, but other chemical analyses did not confirm this result (Swaisgood et al., 1964). Sodium dodecyl sulfate-gel electrophoresis and physical measurements suggest that the native form of
-CN is highly associated both chemically and physically (Swaisgood and Brunner, 1963; Groves et al., 1992; Farrell et al., 1996) and that heat treatment of native
-CN results in aggregation caused by free sulfhydryl-disulfide interchange (Groves et al., 1998). Reduction and S-carboxymethylation of
-CN followed by heating can result in amyloid (fibrillar) structures (Farrell et al., 2002).
The primary structure of the reference protein of the
-CN family is the major carbohydrate-free component of
-CN A-1P (Figure 4
); its ExPASy entry name and file number are CASK_Bovin and P02668, respectively. It consists of 169 amino acid residues as follows: Asp4, Asn8, Thr15, Ser12, Ser P1, Pyroglu1, Glu12, Gln14, Pro20, Gly2, Ala14, Cys2, Val11, Met2, Ile12, Leu8, Tyr9, Phe4, Lys9, His3, Trp1, and Arg5, with a formula molecular weight of 19,037. There is still some question about the presence of the N-terminal pyroglutamyl residue in the native protein, as cyclization may occur during isolation (Swaisgood, 1975). In addition to protein-chemical sequencing, the cDNA of
-CN has been sequenced (Stewart et al., 1984), and the
-CN gene sequence is complete (Alexander et al., 1988). The
-CN signal peptide is composed of 21 amino acid residues, making the pre-form 190 amino acids in length.
|
-casein B-1P differs from the A variant (Figure 4
-CN C (Erhardt, 1989). The incorrect identification of
-CN D indicates the need for sequence analysis (PCR, protein, etc.) to confirm new genetic variants for all CN.
-Casein F was discovered by PCR analysis of both Zebu and Black and White hybrid cattle (Sulimova et al., 1992). This analysis revealed 2 nucleotide changes between
-CN A and
-CN F: a G for T in the second position coding for Thr 145 (which yields no change in the protein) and a T for G in the second position of Asp 148 (which yields Val 148 in the F variant). This latter protein should be termed F1, as Prinzenberg et al. (1996) by PCR analysis described a second F variant that contains the A-B changes (residues 136, 148) as well as an A for G substitution, which yields a change from Arg 10 to His 10. This variant (Arg10His) could be considered to be F2, as the same researchers (Prinzenberg et al., 1996) named their next discovered variant G. This new variant (
-CN G) was reported in Alpine breeds solely on the basis of isoelectric focusing gels (Erhardt, 1996) but was confirmed as a point mutation by PCR; this G variant causes Arg 97 of
-CN B to be changed to Cys 97. Again, this variant could be termed G1, as Sulminova et al. (1996) found another variant of
-CN in yak (Bos grunniens) that they also termed
-CN G; it differs from the
-CN A by an Asp148Ala mutation, and the codons for residues 167 and 168 are different but yield no changed protein phenotype. This latter variant (Asp148Ala) could be termed G2. The reasoning for the superscript nomenclature is that Prinzenberg et al. (1999) identified yet another variant in Prinzegauer cattle and termed it
-CN H. The H protein deviates from the A variant by a Thr135Ile mutation. Coincidently, it is identical with
-CN A-Zebu found by Grosclaude et al. (1974). In another study, Prinzenberg et al. (1999) described
-CN I, which differs from the A variant by a Ser104Ala change. Finally,
-CN J was discovered in Bos taurus cattle on the Ivory Coast by Mahé et al. (1999); this variant appears to have arisen from the
-CN B variant by a Ser155Arg mutation.
The bond sensitive to chymosin (EC 3.4.23.4) (rennin) hydrolysis occurs between Phe 105 and Met 106 (Figure 4
) (Delfour et al., 1965; Jollés et al., 1968). The hydrolytic products are para-
-CN (residues 1105) and the macropeptide (residues 106169). Doi et al. (1979) and Vreeman et al. (1977, 1986) have observed para-
-CN in purified preparations of
-CN. This is undoubtedly due to a chymosin-like proteolysis subsequent to translation, but more work must be done before concluding that para-
-CN is a natural constituent of milk or a product of storage or of the preparatory processes. It is interesting to note that of the 11 known variants, 8 occur in the distal portion of the macropeptide, relatively far removed from the point of chymosin attack. These mutations range from positions 136 to 155 and occur in the extended portion of the molecule, which serves as a physical deterrent to coagulation prior to the action of chymosin. Small, relatively neutral changes in this portion of the molecule may not adversely affect cheese making. Perhaps more interesting are the changes in the C, F2, and G1 variants, which occur in the para-
-CN portion of the molecule. The F2 variant is functionally identical in that the net charge remains constant (Arg10His). The C variant may be of interest as Arg 97 has been implicated in the possible attraction of chymosin to the CN micelle, but the positive charge is conserved by the His 97 substitution. In a similar way, the G1 variant in which Arg 97 is converted to Cys 97, could further influence micelle structure as the new sulfhydryl residue could promote unusual disulfide linkages close to the chymosin cleavage site.
The major component (~50%) of all
-CN variants is generally believed to be the carbohydrate-free component. However, the post-translational modifications of
-CN, which result in the formation of minor components, have been studied in considerable detail, and their degree of complexity is correlated with the degree of sophistication of the instrumentation used to study them. Generally, the minor
-CN components are multiglycosylated and/or multiphosphorylated forms of the major
-CN. Vreeman et al. (1977, 1986) concluded from their investigations that, in order of elution from DEAE-cellulose, the adsorbed
-CN were the major components free of carbohydrate with one phosphate group followed by 6 minor components differing in degrees of glycosylation and phosphorylation. Doi et al. (1979) concluded from their fractionation of
-CN on DEAE-cellulose that there were 4 major and 2 minor components, all containing one phosphate group and various degrees of glycosylation. The major fraction was the carbohydrate-free component.
Several researchers have investigated the structure of carbohydrate moieties (Tran and Baker, 1970; Fiat et al., 1972; Jollés et al., 1972, 1973; Fournet et al., 1975; Jollés et al., 1978). Fournet et al. (1975) isolated 3 oligosaccharides from
-CN and determined the structures for 2. Saito and Itoh (1992) confirmed their structures and added 3 more; all of these structures are given in Figure 5
. A composite summary of the reported glycosyl moeities, their molecular weights, and relative percentage occurrence is given in Table 3
. From these data, it appears as though the complex structures C, D, and E are most prevalent. Mollé and Léonil (1995) used reverse-phase HPLC in conjunction with on-line electrospray ionization MS (EIMS) to characterize the distribution of glycosyl residues further within the
-CN macropeptide for the A variant. They found at least 14 glycosylated forms, a glycosylated and non-phosphorylated form, and multiphosphorylated forms (1P at 78%, 2P at 20%, and 3P at 2%) totaling 18 reported species attributable to post-translational modification. The EIMS data on HPLC followed the absorbance data and confirmed the structures of Saito and Itoh (1992) except for A, the monosaccharide.
|
|
-CN A-1P at Thr 136, the B variant could not contain an oligosaccharide at this position, as Ile replaces Thr (Figure 5
-CN exist (Wheelock et al., 1969, 1973), para-
-CN has generally been reported to be devoid of carbohydrate, as monoglycosylated forms of proteins can be formed artifactually (Pisano et al., 1994; Mollé and Léonil, 1995).
|
-CN is more complex and variable than that of normal milk and was reviewed in the last report of this Committee (Eigel et al., 1984). However, a salient feature is that only Thr residues 131, 133, and 135 have been identified as points of attachment for the complex oligosaccharides bound to colostral
-CN (Doi et al., 1980). The earliest reported sites, Thr 131 and 133 (Pujolle et al., 1966; Mercier et al., 1973), have been confirmed, and others have been added to the list. The most complete study was conducted by Pisano et al. (1994) who demonstrated up to 6 sites for O-glycosylation. A summary of the reported sites is given in Table 4
From all of these data, it would appear that for
-CN glycosylation, structures C, D, and E are statistically most prevalent, and Thr residues 131 and 133 are the sites most populated by these glycosyl moieties. Two reasons postulated for the prevalence of glycosyl residues at the 131 and 133 sites are that the proline turns, which bound this region, maintain its surface orientation (Kumosinski et al., 1993a) and that glycosylation at other sites is restricted by the neighboring Ile (or Val) residues (Pisano et al., 1994), which are highly prevalent in the
-CN macropeptide. In studies with whole CN, it is important to remember that the major
-CN bands for the A and B variants (40 to 60%) are not glycosylated; therefore, differences in the minor bands could be related to a variety of factors specific to the milk sample, not the least of which is genetic variation (Pisano et al., 1994; Mollé and Léonil, 1995).
Because of the high degree of heterogeneity and the limited amounts of the minor components of
-CN, we feel that the precise nomenclature of these components still cannot be achieved at this time. We suggest that they be identified according to the genetic variant of the major nonglycosylated component and that isolated fractions, which contain post-translational modifications, be numbered consecutively according to either their increasing relative electrophoretic mobility in alkaline urea gels or their elution from anion exchange media in the presence of mercaptoethanol. For example, starting with
-CN A, the nonglycosylated 1P form would be designated
-1, and then subsequent bands would be termed
-2,
-3,
-4, etc. This is in accord with most current working definitions of isolated fractions.
| THE WHEY PROTEINS |
|---|
|
|
|---|
-LA, serum albumin (SA), Ig, and proteose-peptone fractions have been considered the major characterized components of this fraction. Because the current and most frequently used method of assessing the integrity of this fraction is SDS-PAGE, LF, a major component that is readily visualized by this technique, should be added to this list (Figure 6
-LA, SA, and LF, should be classified according to homology with the primary sequence of their amino acid chains. Polyacrylamide or starch gel electrophoresis still can be used to characterize and identify individual members of each family. Immunoglobulins, proteins not unique to milk, are the products of B-lymphocytes and are the result of somatic gene segment rearrangement and somatic mutation. With 1 million variants, Ig lend themselves poorly to traditional biochemical characterization. Immunochemical criteria continue to be used for laboratory diagnosis and quantitation of Ig, but molecular genetics is heavily used for structural analyses.
|
The reference protein for this family, ß-LG B, consists of 162 amino acids and has the following composition: Asp10, Asn5, Thr8, Ser7, Glu16, Gln9, Pro8, Gly4, Ala15, Cys5, Val9, Met4, Ile10, Leu22, Tyr4, Phe4, Lys15, His2, Trp2, and Arg3. The calculated formula molecular weight is 18,277, and the measured molecular weight is 18,278.35 ± 2.2 Da (Léonil et al., 1995) or 18,277.0 ± 0.9 (Burr et al., 1997); its ExPASy entry name and file number are LACB_Bovin and P02754, respectively. The primary sequence shown in Figure 7
is unchanged since the 1984 review (Eigel et al., 1984). However, the disulfide bonds in the native protein are now unambiguously determined as Cys 66 to Cys 160 and Cys 106 to Cys 119, with Cys 121 as the source of the free thiol (Papiz et al., 1986; Bewley et al., 1997; Brittan et al., 1997; Brownlow et al., 1997; Qin et al., 1998a, b; 1999). The calculated formula weight of 18,277 takes these disulfide linkages into account.
|
The publication of the first high-resolution X-ray crystal structures of the triclinic form (lattice X) of ß-LG A/B (Brownlow et al., 1997), the orthorhombic form (lattice Y) of ß-LG A, B, and C (Bewley et al., 1997), and the trigonal form (lattice Z) of ß-LG A and B (Qin et al., 1998a) has verified the earlier medium resolution structure (Papiz et al., 1986) in general terms and has corrected some earlier errors. The amino acid sequences corresponding to the
-helix, the ß-sheet strands (A to I), and several 310 turns for the lattice X form are indicated in Figure 7
. There are slight differences among the crystal lattice forms as to the details of the secondary structural elements. In all cases, the ß-I strand is an important feature of the interface between the 2 monomers that constitute the dimer in all of the high resolution crystal structures. The NMR structures (Kuwata et al., 1999; Uhrínová et al. 2000) at about pH 2.5 contain the same strands and helix, confirming the polypeptide fold.
Another feature of bovine ß-LG is the ability to bind hydrophobic and amphiphilic molecules ranging from hexane to palmitic acid to vitamin D (Hambling et al., 1992; Pérez and Calvo, 1995; Narayan and Berliner, 1997; Sawyer, 2003). Considerable attention has been paid to the binding of retinol (vitamin A), which is essential for mammalian growth and well being, to ß-LG, and ß-LG is considered a member of the lipocalin family of proteins (Sawyer, 2003). Although some retinoids and fatty acids can bind in the deep hydrophobic pocket of ß-LG (Cho et al., 1994; Qin et al., 1998b; Wang et al., 1999; Kontopidis et al., 2002), there is some doubt about the biological role of this protein. The original biological role could have been related to maternal physiology, but this may have shifted to a more nutritional role for some species (Kontopidis et al., 2002).
The observation that ß-LG may be glycosylated (Léonil et al., 1997) prior to milking, but probably external to the mammary epithelium, is interesting and suggests a chemical rather than a biochemical reaction. More extensive modification (lactosylation) occurs in heated milk or whey (Burr et al, 1996; Léonil et al., 1997; Morgan et al., 1998), where Lys47 and Lys91 are the most reactive, and Lys8 and Lys141 are the least reactive (Morgan et al., 1998).
The practice of naming a new variant "X" until such time as the sequence has been demonstrated should save embarrassment and/or duplication of protein names. However, there is the possibility that such a ß-LG X could be confused with the X, Y, and Z lattice forms, a nomenclature used by the crystallographers. The various engineered proteins often have slight differences at the N-terminus of the protein because of the method of synthesis. (For example, Kim et al. [1997] expressed a ß-LG in which the N-terminal sequence was Glu-Ala-Glu-Ala-Tyr-Val-Thr-, whereas it is Leu-Ile-Val-Thr- in the natural proteins [Figure 7
]). It is recommended that such proteins be clearly labelled so that they are not confused with the naturally synthesized proteins, because the difference in structure would give the proteins different properties, such as electrophoretic mobility.
-LA
The whey protein
-LA has a specific and defined physiological function in the mammary gland. Within the Golgi apparatus of the mammary epithelial cell,
-LA interacts with the ubiquitously expressed enzyme ß-1,4-galactosyltransferase to form the lactose synthase complex.
-lactalbumin modifies the substrate specificity of ß-1,4-galactosyltransferase, allowing the formation of lactose from glucose and UDP-galactose. The constitutive function of ß-1,4-galactosyltransferas, to glycosylate glycoproteins and glycolipids is reversibly altered by combining with
-LA in a 1:1 molar ratio. The production of lactose, its function as the major osmolyte of milk, and the function of the lactose synthase complex, have been the topics of a number of reviews (Brew and Hill, 1975; Hill and Brew, 1975; Jones, 1977; Kuhn et al., 1980).
Bovine milk contains
-LA at a concentration of approximately 1.2 to 1.5 g/L (Jenness, 1974). The protein has been sequenced, and the nucleotide sequence has been confirmed (Vilotte et al., 1987; Brew et al., 1970; Bleck and Bremel, 1993a). The mature
-LA (Figure 8
) is a 123-amino acid globular protein (Brew et al., 1970). The
-LA signal peptide is composed of 19 amino acids, making the pre-form of
-LA 142 amino acids in length (Gaye et al., 1987; Hurley and Schuler, 1987). The mature
-LA protein has 2 predominant genetic variants (A and B) that have been confirmed by sequence analysis (Table 2
) (Bhattacharya et al., 1963). The B variant is present in the milk of most Bos taurus cattle, and both the A and B variants are found in Bos indicus cattle (Jenness, 1974).
-lactalbumin A variant is present at a low frequency in some Italian and Eastern European Bos taurus breeds (Mariani and Russo, 1977). The A variant contains a Glu at position 10 of the mature protein, and the B variant has an Arg substitution at that position (Gordon, 1971). A third genetic variant,
-LA C, has also been reported but not confirmed by DNA or protein sequencing (Bell et al., 1981). This variant was identified in Bali cattle (Bos javanicus). The C variant was reported to differ from the B variant by having either an Asn for Asp or a Gln for Glu substitution. The B variant is the reference protein for the family and is composed of the following amino acid residues: Ala3, Arg1, Asn8, Asp13, Cys8, Gln6, Glu7, Gly6, His3, Ile8, Leu13, Lys12, Met1, Phe4, Pro2, Ser7, Thr7, Trp4, Tyr4, Val6 (Brew et al., 1970). Both the A and B variant contain 4 disulfide bonds. The B variant has a formula molecular weight of 14,178; its sequence is given in Figure 8
and has been corrected since the last report to take into account the nucleic acid sequences noted previously. The ExPASy entry name for
-LA is LACA_ Bovin, and its file number is P00741.
-lactalbumin has a very high content of the essential amino acids (Trp, Phe, Tyr, Leu, Ile, Thr, Met, Cys, Lys, and Val). Essential amino acids account for 63.2% of the total amino acid content compared with just 51.4% for total CN (Heine et al., 1991). The amino acid composition of bovine
-LA and its 72% sequence identity to human
-LA makes it an ideal protein for the nutrition of human infants (Heine et al., 1991).
|
-LA found in the milk of cattle is glycosylated on an Asn residue (Barman, 1970). The N-linked glycosylation signal occurs at amino acids 45 to 47 (Asn-Gln-Ser) of mature
-LA (Figure 8
-LA is not phosphorylated in its native form (Bingham et al., 1988). However,
-LA becomes a good substrate for CN kinase in vitro after it has been reduced and carboxymethylated (Bingham et al., 1988).
In dairy cattle, the concentration of
-LA in milk decreases near the end of a lactation (Caffin et al., 1985; Regester and Smithers, 1991). This is opposite of what occurs for the other major bovine milk proteins; their concentrations tend to increase as a lactation progresses (Davies and Law, 1980). The decline in
-LA concentration is correlated with the decline observed in the concentration of milk lactose at the end of a lactation. Lower concentrations of
-LA have also been observed in cows that have mammary infections (Caffin et al., 1985).
The protein structure, amino acid sequence, and DNA sequence of
-LA are very similar to that of the c-type lysozymes (McKenzie and White, 1991), and t