|
|
||||||||
Laboratory of Dairy Science, Institute of Food Science, Swiss Federal Institute of Technology, CH-8092 Zurich, Switzerland
Corresponding author:
S. R. Kappeler; e-mail:
stefan.kappeler{at}alumni.ethz.ch.
| ABSTRACT |
|---|
|
|
|---|
-casein is considerably higher in camel milk. ß-Lactoglobulin is absent, but whey acidic protein and peptidoglycan recognition protein have been detected. Genomic sequences upstream to milk-protein genes, which are known to regulate the expression of milk proteins to a great extent, were determined for 10 camel milk-protein genes and compared to respective sequences in other mammals. Multiple sequence alignment showed closest relationships to homologous sequences from other mammals. Comparison of milk protein regulative regions revealed two distantly related groups with pronouncedly different transcription factor site probabilities. The GC-content in sequences of the first group was considerably higher than in sequences of the second group and combined occurrence of CAAT and TATAA boxes was rare, suggesting that the first group represented mostly the housekeeping gene type, probably regulated by cellular signal transduction pathways, whereas the second group helped to regulate genes specifically expressed in terminally differentiated cells of the lactating alveolar epithelium. A core region of the composite response element, which primarily controls milk protein gene activity, was found by a search for elements conserved within all 5'-flanking sequences analyzed, and it is assumed, that the presence of this element determines gene expression in the lactating mammary gland, and binding sites for general activator and repressor factors, surrounding the milk protein gene specific element, are important for regulation of gene activity.
Key Words: camel gene expression milk protein transcription factor binding site
Abbreviation key: 5'-flanking region = gene sequence 5'-flanking to the transcriptional start site of the milk-protein genes examined, TF = transcription factor. Abbreviations for transcription factors follow the TRANSFAC entries ( Wingender et al. 2000)
| INTRODUCTION |
|---|
|
|
|---|
Our interest in this study was to understand the factors that regulate the expression of the genes that correspond to milk proteins and lead to the observed variation in the distribution of these proteins between species. In particular, we intended to find out whether the mechanisms described to regulate the milk protein gene expression in other species also apply to the homologous camel genes.
During the past decade, a new level in understanding of the processes that regulate the tissue-specific expression of milk-protein genes has arisen, with molecular and cellular investigations in the regulation of the mammary gland during pregnancy, lactation, and involution. Expression of milk-specific genes in the lobuloalveolar epithelium of the lactating mammary gland was reported to be regulated by hormonal and environmental stimuli, which are transmitted through a set of transcription factors that bind to enhancer elements located on the proximal or distal 5'-flanking region to the transcriptional start site (Rosen et al., 1998).
The minimal requirements to elicit a sufficient lactogenic response include prolactin, glucocorticoids, and insulin (Nagaiah et al., 1981), which may be substituted by the insulin-like growth factor. Progesterone, which is involved in mammogenesis, was shown to repress casein gene expression during pregnancy through a DNA-binding factor of 65 kDa (Lee and Oka, 1992). Epidermal growth factor (Teng, 1999), which is by itself secreted into milk and stimulates the gastrointestinal development of the newborn (Brown et al., 1989), also contributes to mammogenesis (Gallego et al., 2000), counteracting transforming growth factor ß.
On the morphological level, it was found that signals from the basement membrane cooperate with hormonal factors mentioned to maintain the lobuloalveolar structure and the basal-apical polarization of the epithelial cells, which is required for expression and secretion of milk proteins (Aggeler et al., 1991; Close et al., 1997). In particular, the basement membrane protein laminin was shown to be required for activation of the prolactin-receptor by prolactin, acting via membrane anchored ß1-integrins (Edwards et al., 1998), as well as factors from the extracellular membrane suppressing histone deacetylases (Rosen et al., 1999).
In studies, where stage- or tissue-specific levels of milk proteins and corresponding mRNA were compared, good correlations were found, indicating that quantitative control of expression occurs on the transcriptional level (McClenaghan et al., 1995).
Our finding that the protein composition of camel milk is markedly different from the composition in bovine milk over the course of the lactational cycle, prompted us to investigate whether this differential expression pattern can be followed back to differences between the 5'-flanking regions of camel and bovine milk-protein genes. The data gained allowed us to start a comparative statistical analysis of putative transacting factor binding sites in 5'-flanking regions of homologous milk genes from different species, and to examine, which of the proposed regulative elements is likely involved in the regulation of the examined camel milk-protein genes.
| MATERIALS AND METHODS |
|---|
|
|
|---|
DNA Sequence Analysis
Coding sequences for camel milk proteins were determined by using a cDNA library constructed from a Somali breed camel, as described in Kappeler et al (1998).
Genomic DNA was extracted from white blood cells of an Arabian camel by conventional phenol-ethanol purification. An average length of about 40 kDa was found after 0.5%-agarose gel separation. Five Genome Walker libraries were created as recommended by the manufacturer (K1807-1; Clontech, Palo Alto, CA). Sequence specific primers were designed, based on information from exon I and intron I regions sequenced previously. PCR-amplification products were sequenced on an ABI Prism 310 Genetic Analyzer using BigDye chemistry. The 5' flanking sequences were entered in the EMBL/GenBank database under the accession numbers AJ409277 (
S1-CN), AJ409278 (
S2-CN), AJ409279 (ß-CN), AJ409280 (
-CN), AJ409281 (
-LA), AJ409282 (lactoferrin), AJ409283 (lactophorin), AJ409284 (lactoperoxidase), AJ409286 (peptidoglycan recognition protein), and AJ409285 (whey acidic protein).
Computational Sequence Analysis
The GCG version 10 software package (Genetics Computer Group, Madison, WI) was used to create a tree illustrating the relatedness of the 5'-upstream sequences examined. First, 50 proximal, 5'-flanking regions (
1120 bp) of milk-protein genes were aligned with the pileup function, using gap weight 1 and gap length weight 0. The penalties for gap creation and extension were set to a low value, since sequences 5'-upstream to vertebrate genes usually exhibit a high variability in regard to insertion and deletion of DNA fragments. A Tamura distance-corrected matrix of the aligned sequences was created with the distances function. Third, a phylogenetic tree was created out of the matrix according to the unweighted pair group method using arithmetic averages. Additional sequences used for the analyses (see Figure 1
) included DNA regions from the following EMBL/GenBank accession numbers. The information in brackets indicates the corresponding protein and the number of base pairs in front of the transcriptional start site included into analysis. AC005962 (human lactoperoxidase, 2000), AC007785 (human peptidoglycan recognition protein, 2000), AC063956 (human alpha s2-casein, 2000), AF027807 (human beta-casein, 1194), AF044256 (porcine lactoferrin, 1366), AF072711 (human carboxyester lipase, 2000), AF107201 (equine beta-lactoglobulin, 2000), AL121936 (human butyrophilin 1A, 2000), D16108 (murine lactophorin (GlyCAM-1, PP3, 2000), D85424 (human alpha s1-casein, 1180), E12614 (porcine beta-casein, 2000), L11749 (murine mammary tumor virus, LTR, 1361), M10936 (rat beta-casein, 783), M55158 (bovine beta-casein, 1722), M61170 (human mucin 1, gene product of the human polymorphic epithelial mucin gene (PEM), 2000), M74778 (murine lactoferrin, 2000), M75887 (bovine kappa-casein, 2000), M90645 (bovine alpha-lactalbumin, 1951), M94327 (bovine alpha s2 casein type A (CASAS2), 1509), U02884 (murine fatty acid binding protein, 1515), U16175 (murine mucin 1, 2000), U25810 (bovine lysozyme C, 2000), U28757 (porcine lysozyme C, 2000), U38816 (murine whey acidic protein, 2000), U57623 (human fatty acid binding protein, 1247), U67065 (murine butyrophilin 1A, 2000), U69534 (murine epidermal growth factor, 2000), X01153 (rat whey acidic protein, 1190), X03584 (rat alpha s1-casein, 682), X03589 (rat gamma-casein, 679), X12817 (ovine beta-lactoglobulin, 801), X13484 (murine beta-casein, 2000), X15735 (rabbit beta-casein, 2000), X59856 (bovine alpha s1-casein, 2000), X83391 (bovine lactophorin (GlyCAM-1, PP3), 1014), X98558 (porcine fatty acid binding protein, 1607), Y00726 (guinea pig alpha-lactalbumin, 1195), Y12088 (murine peptidoglycan recognition protein, 1532), Z33882 (caprine kappa-casein, 2000), Z48305 (bovine beta-lactoglobulin, 2000).
|
Potential binding sites for transcription factors, so-called TF-sites, were detected in the 50 5'-flanking regions examined, according to the method of (Schug and Overton, 1997). The TRANSFAC (transcription factor) database (Wingender et al., 2000) version 3.3 was used and the following parameters applied: no allowable mismatch, a minimum element length of six bases and a minimum lg-likelihood of 6. Where possible, the 2000-bp upstream to the transcriptional start site were examined, to be able also to detect more distantly located binding sites. The reliability of TF-sites found was 31%, calculated as follows: (
nP -
nR)/
nP with
nP as the number of sites found per base pair of 5'-flanking sequence (the total number of base pairs analyzed was 78,598), and
nR as the number of sites found per base pair in 200 kbp of randomly generated DNA sequence. Because factors with a large number of TRANSFAC-compiled binding sites were overrepresented by these string-based searches, redundancies were filtered out. The frequencies of occurrence within 1000 bp of the 5'-flanking regions were calculated for all TF sites. The same was done with randomly generated DNA sequences. The values obtained from the randomly generated sequences, considered as a background resulting from weakly defined binding sites, were subtracted from the former. The resulting binding site profiles contained information about the grinding potential of known transcription factors to the analyzed promoter regions, under relinquishment, of positional information. A correlation factor for every pair of promoter regions was calculated as the sum of products of the number of background-subtracted, nonredundant binding sites of known transcription factors, according to the formula
(nxA-nxR)(nxB-nxR) for x = 1 to z, where A represents a 5'-sequence, B another 5'-sequence, R a random sequence, n the nonredundant number of binding sites for a certain transcription factor and z the total number of transcription factors compiled in the TRANSFAC library (Wingender et al., 2000).
A search for conserved motifs within the 5'-upstream sequences was done using MEME (multiple expectation maximization for motif elicitation) Version 3.0 with analysis of both strands and a limiting range for width variability between 6 and 50 bp, expecting at least one occurrence of every motif in each sequence (Bailey and Gribskov, 1998). A set of the following 26 5'-upstream sequences was chosen, of which the gene products are abundant in milk: Bos taurus:
S1-CN,
S2-CN, ß-CN,
-CN,
-LA, ß-LG, lactoferrin, PP3 component (lactophorin). Camelus dromedaries:
S1-CN,
S2-CN, ß-CN,
-CN,
-LA, lactoferrin, PP3 component (lactophorin), peptidoglycan recognition protein, whey acidic protein. Capra hircus:
-CN. Equs caballus: ß-LG. Homo sapiens:
S1-CN,
S2-CN, ß-CN. Ovies aries ß-LG. Sus scrofa: ß-CN, fatty acid binding protein, lactoferrin. Where possible, the 2000 bp upstream to the transcriptional start site were examined, to be able also to detect more distantly located binding sites. Rodent sequences were not included, due to the more distant evolutionary relationship, and presumably modified TF binding site preferences. In the following, each motif resulting from MEME analysis was searched for TF binding sites, mainly by TESS analysis (Schug and Overton, 1997).
RESULTS
The protein composition of camel and cow milk differed in regard to both casein and whey proteins. Camel milk contained significantly higher concentrations of ß-CN and lower amounts of
-CN (Table 1
). ß-LG, lysozyme C and lactoperoxidase were not detected in mid- to late-lactational camel milk, and the corresponding cDNA of the former two proteins was not found by screening of a lactating mammary gland cDNA library. On the other hand, whey acidic protein, generally described as a major constituent of rodent milk, and peptidoglycan recognition protein, an intracellular protein binding to gram-positive bacteria, and presently not known to be a milk constituent, were detected in major amounts in camel whey, both on the cDNA and protein level. Lactophorin, the proteose peptone 3 component of whey, was detected in higher concentrations than in bovine milk (Kappeler et al., 1999b).
|
Multiple alignment between the 5'-flanking regions of milk-protein genes from several species revealed two groups of distantly related sequences (Figure 1
). The nucleic acid composition of the two groups, thereafter designated as group I and group II, differed markedly, with group I having an average GC content of 54%, and group II having an average GC content of 38%. Interspersed elements and long terminal repeats covered about 16.5% of group I sequences, and 20.5% of group II sequences. Correlation analysis of transcription-factor binding site profiles of the different 5'-upstream sequences produced a relational matrix with similar relationships between binding site preferences as the relationships found with multiple sequence alignment (Figure 2
).
|
were predominantly found in group II promoter sequences, whereas binding sites for the ubiquitous activators Sp1, AP-2 and PU.1, as well as MAF and PPAR (peroxisome proliferator-activated receptor) were mostly found in the GC-rich promoter sequences of group I.
|
|
|
-CN in camel milk from different breeds that reported for cow milk (Table 1
In the course of this study, we sequenced the proximal (
2 kbp) regions 5'-upstream to the transcriptional start of 10 camel milk protein genes, to find a rationale to the different expression pattern of milk protein genes in the lactating mammary gland of camels and cows. We supposed these regions to be sufficient to direct the stage- and tissue-specific expression of the respective genes, as it was shown for corresponding sequences in other mammals (Faerman et al., 1995; Lee et al., 1998).
Alignment of the 50 5'-upstream sequences analyzed (Figure 1
) showed in a first step, that all 5'-flanking regions of camel milk-protein genes were most closely related to their homologous counterparts from other species, independently from the gene expression level in the mammary gland of the other species (Figure 1
). Interestingly, the camel 5'-upstream sequences also showed the observed relatedness in those cases, where the respective genes were reported to be expressed to a very different level in the lactating mammary glands of the different species, e.g., in the case of camel and bovine
-CN (Kappeler et al., 1998). This sequence relatedness gave indication that the regulation of gene expression has to be restricted to small conserved areas within the 5'-upstream sequences, and that mutations in these areas will initiate, abolish, or modulate the tissue- and stage-specific expression of milk protein genes.
Additionally, two very distantly related groups of 5'-flanking regions were discerned, designated as group I and group II (Figure 1
). Group I sequences enclosed some of the whey and the milk-fat globule membrane protein gene sequences, group II sequences encompassed casein and some whey and butyrophilin gene sequences. The higher GC-content of group I sequences, and the scarcity in combined CAAT and TATAA boxes towards the transcriptional start sites let us assume that group I 5'-flanking regions predominantly were regulating housekeeping genes, which became highly expressed in the lactating mammary gland of some species, whereas group II genes were specifically expressed in the end-differentiated cells of the lobuloalveolar epithelium.
Our interest at this point was to see if binding site probabilities for the different transcription factors were similar for the 50 sequences examined. A position independent pattern search strategy was developed, as described in Materials and Methods, because similar regulative elements, especially with regard to hormone responsive elements, have been localized on different positions relative to the transcriptional start site (Rosen et al., 1998). A correlation matrix was generated out of the background corrected binding site probabilities. A simplified view of this matrix is shown in Figure 2
combining the results of homologous sequences. The resulting matrix showed the same division into two weakly related groups as the multiple sequence alignment before, with the exception of binding site probabilities for wap 5'-flanking sequences, which also exhibited good TF-site correlation results with some group II sequences.
Figure 2
shows, nonetheless, that the majority of the different 5'-flanking regions weakly correlate with each other, indicating that most sequences contain some common regulative elements. Some TF-sites were detected on nearly all sequences with a background-corrected frequency of more than one per 1000 bp. Most abundantly, binding sites for the glucocorticoid receptor and for transcription factors of the Ets family were found on sequences of both groups (Table 2
). Ets factors, especially MAF and PEA3, were localized on regulative sequences of milk-protein genes and probably mediate signals of the epidermal growth factor and of insulin to the prolactin gene (Jacob et al., 1999). Glucocorticoid receptor sites in the first group were often mere half sites, as described to exist particularly on milk-protein gene promoter regions (Lechner et al., 1997), whereas sites in the second group preferably consisted of near-palindromic sequences. Binding sites for the peroxisome proliferator activated receptor (PPAR) were detected on most group I sequences. This fatty-acid activated nuclear receptor controls genes involved in the lipid metabolism and is enhanced by prolactin during adipogenic conversion of cells (Nanbu et al., 2000). The
2-isoform was reported to be downregulated (Gimble et al., 1998) or upregulated (Jain et al., 1998) during pregnancy and lactation, and a role in response to physiologic and pathologic stimuli, which alter lipid metabolism, was suggested. This indicates that the factor may help to regulate group I gene expression in differentiating and involuting mammary tissue. The ubiquitous transcription factors of the octamer binding family were reported to bind to elements in the proximal 5' sequence of casein genes (Groenen et al., 1992) and in the long terminal repeat of the mouse mammary tumor virus (Brueggemeier et al., 1991), enhancing the activity of the mediators of hormonal and local signals. Octamer binding sites were abundantly detected in all group II sequences, but only weakly in group I sequences.
There are a number of observations on the histological level that genes belonging to either of the two groups show strikingly different stage- and cell-specific expression patterns. A mutually exclusive histological localization of lactoferrin mRNA on one hand,
-LA and
S1-CN mRNA on the other hand was reported in the alveolar epithelium (Molenaar et al., 1992), depending on the lactational status of the cells. Whereas
-LA and
S1-CN mRNA have been detected in terminally differentiated, lactating alveoli, lactoferrin mRNA was almost exclusively found in emerging and regressing alveoli. Lactoferrin expression also did not require basement membrane signals and its expression was even repressed in epithelial cells with basal-apical orientation, but stimulated in nonpolarized cells (Close et al., 1997). This finding was in contrast to the expression of genes that belong to typical milk proteins, such as
-LA, caseins and also the whey acidic protein, although we found the 5'-flanking sequences of wap genes to belong to group I (Figure 1
). Fatty acid binding protein, also known as mammary-derived growth inhibitor, was localized in the vacuolar, nonlactating cell type (Erdmann and Breter, 1993), whereas butyrophilin mRNA was detected in lactating cells (Molenaar et al., 1995). Most genes of the second group are solely expressed in alveolar epithelial cells of the late-pregnant and lactating mammary gland, whereas the first group, with the exception of the whey acidic protein, was made up of genes, which are known to be expressed in a broader range of tissues. Fatty acid binding protein, for example, participates in the intracellular transport of fatty acids in different tissues, and peptidoglycan recognition protein is a protein of the innate immune system of vertebrates and invertebrates. Additionally, many proteins of the first group are likely involved in feedback regulation of protein and fat secretion into the alveoli and thus need to be expressed in a manner different to caseins and related proteins, to execute these functions. Altogether, a majority of the proteins, which help to preserve the structural properties of milk, seem to be expressed in terminally differentiated cells under control of a TATA-like promoter. Proteins, which protect against environmental stress, on the other hand, rather seem to be constitutively expressed and merely upregulated in the lactating mammary gland.
Established models describing the regulation of milk-protein gene transcriptional control are mainly based on studies with genes for rodent whey acidic protein and rodent and bovine ß-CN. Additionally, the long terminal repeat of the murine mammary tumor virus, which directs expression to lactating mammary epithelial cells, is well investigated. Information about the regulation of other genes expressing milk proteins is rare.
DNase I hypersensitivity and footprinting experiments, as well as electrophoresis mobility shift assays and transgenic studies, helped to identify binding sites for transcription factors and hormone responsive elements (Rosen et al., 1999). A crucial role in the regulation of milk genes plays a pathway, by which the prolactin signal is effected via phosphorylation of the membrane-bound prolactin-receptor, inducing tyrosine phosphorylation of STAT5 A and B isoforms by Janus kinase, dimerization, nuclear translocation, and DNA binding (Rosen et al., 1999). Short forms of the prolactin-receptor and of STAT5 have been reported, which were found in nonlactating mammary tissue and act as dominant negative regulators, repressing expression of target genes (Edwards et al., 1998). STAT5 binding sites were shown to reside within a composite response element, which also included sites for NF-1 and GR in the rodent whey acidic 5'-flanking region, and for GR, C/EBP ß and
isoforms, and YY1 in the bovine ß-CN 5'-flanking region (Doppler et al., 1995). The core region of this element was retrieved as the second motif from a MEME analysis of 26 regions 5'-upstream to genes highly expressed in the lactating mammary gland of different species (Figure 3
). The motif was lost when sequences were included in the MEME training set, which are not highly expressed in the lactating mammary gland, supporting the idea of a crucial role for this response element in the control of milk protein gene expression.
We conclude by comparing our results from TF-site analysis and from MEME detection of conserved elements, that genes lacking the composite response element, which combines the STAT5 binding site with TF-sites for other activating and repressing factors, will not be expressed in the lactating mammary gland of a particular species. However we did not succeed to associate the variances in milk protein levels between species to a structural disparity in the corresponding 5'-flanking regions. We presume that the fine-tuning of milk protein gene expression will probably reside in the arrangement of binding sites for ubiquitous factors, such as NF-1 or the Octamer family, nearby the conserved elements. Additionally, superior regulative regions and mRNA stability are likely involved in the control of milk protein gene expression.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Received for publication January 31, 2002. Accepted for publication March 26, 2002.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. R. Kappeler, C. Heuberger, Z. Farah, and Z. Puhan Expression of the Peptidoglycan Recognition Protein, PGRP, in the Lactating Mammary Gland J Dairy Sci, August 1, 2004; 87(8): 2660 - 2668. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |