©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Silk Gland Factor-1 Involved in the Regulation of Bombyx Sericin-1 Gene Contains Fork Head Motif (*)

Václav Mach (§) , Shigeharu Takiya (¶) , Kaoru Ohno , Hiroshi Handa (1), Takeshi Imai (1), Yoshiaki Suzuki (**)

From the (1) National Institute for Basic Biology, 38 Nishigonaka, Myodaiji-cho, Okazaki 444, Japan Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama 227, Japan

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Silk gland factor-1 (SGF-1) regulates transcription of the Bombyx sericin-1 gene via interaction with the SA site. In this study, two related SGF-1 polypeptides of apparent molecular masses of 40 and 41 kDa were purified. Specific interaction of these proteins with the SA site was demonstrated by electrophoretic mobility shift and dimethyl sulfate methylation interference assays. The SGF-1 40-kDa protein was partially sequenced and characterized as a new member of the fork head/HNF-3 family. Several full-length cDNAs encoding the SGF-1 40-kDa and possibly also the 41-kDa proteins were cloned and sequenced. SGF-1 mRNA is expressed consistently with the presumed role of the SGF-1 protein product in regulating the sericin-1 gene. The SGF-1 protein contains putative transactivation domains. We conclude that the 40- and 41-kDa SGF-1 proteins affect transcription of the sericin-1 gene via binding to the SA site.


INTRODUCTION

The silk glands of the silkworm Bombyx mori produce vast amounts of several silk proteins. Genes for the H- and L-chain fibroin are expressed in the posterior part of silk glands, whereas several serine-rich proteins arise from transcripts of two sericin genes expressed in the middle part of silk glands. We are interested in the regulation of this massive tissue-specific gene expression (see Ref. 1 for a review). Several protein factors were shown to interact with DNA elements located upstream of the 5`-ends of the H-chain fibroin and sericin-1 genes (2, 3) . A silk gland-enriched factor SGF-3() is known to bind to the SC site upstream of the sericin-1 gene promoter (3) . Removal of this SC element curbs down in vitro transcription from the sericin-1 gene promoter in crude middle silk gland (MSG) nuclear extract (3, 4) . POU-M1, a homologue of the Drosophila Cf1-a transcription factor, was recently found to be expressed in the middle (and to a lesser extent in the posterior) silk glands and shown to be identical with SGF-3 (5) .

Another silk gland-enriched factor, named SGF-1, interacts with the SA site located in the proximal upstream region of the sericin-1 gene promoter. Removal of the SA element decreases in vitro transcription from the sericin-1 promoter in nuclear extract prepared from MSG (3) . A major tissue and sequence-specific complex was detected by means of electrophoretic mobility shift assay (EMSA) with a crude MSG extract using an oligonucleotide containing the SA element as a probe (4) . It was proposed that the protein component of this retarded complex is identical to SGF-1 (4) . The same protein probably binds also to two regions upstream of the H-chain fibroin gene (2) .

In the present report, we describe purification of the SGF-1 protein and molecular cloning of its corresponding cDNAs. Bombyx SGF-1 is a homologue to the Drosophila fork head protein (6) and therefore a new member of the fork head/HNF-3 family of transcriptional regulators.


EXPERIMENTAL PROCEDURES

Preparation of Nuclear Extract Nuclear extract was obtained from MSG of 2-day-old final instar larvae as described earlier (3) . SGF-1 Binding to DNA-coupled Latex Particles SGF-1 binds to the SA site in the sericin-1 gene promoter (3) . DNA containing multiple repeats of the SA element was prepared by ligation of double-stranded oligonucleotides (SA20 oligonucleotide in Fig. 1). The ligated DNA was coupled with latex particles as described (8) . The DNA-coupled particles were used to enrich SGF-1 binding activity from crude nuclear extract. The published guidance (8) was followed throughout the procedure, and all manipulations were done in a cold room. Typical binding reaction contained 35 mg of nuclear extract and 2 mg of Escherichia coli tRNA in 13 m M Hepes (pH 7.9), 8.3 m M MgCl, 33 m M KCl, 170 m M NaCl, 17 m M EDTA, 0.1% Nonidet P-40, 15% (v/v) glycerol, and 1.3 m M dithiothreitol in total volume of 1.5 ml. This mixture was pre-incubated for 20 min, 20 mg of latex particles coupled to 20 µg of DNA was added, and the incubation was continued for an additional 60 min. Washes and elutions were performed with 20 m M Hepes (pH 7.9), 15% (v/v) glycerol, 10 m M EDTA, and increasing concentration of NaCl. Typically, two 5-min washes (0.3 ml each) with the above solution containing 0.1 M NaCl were followed by four washes containing 0.3 M NaCl and by four 100-µl elutions containing 0.5 M NaCl. The elutes were pooled, and this SGF-1-enriched fraction was referred to as latex fraction. The entire procedure was repeated until approximately 10 ml of the latex fraction was obtained. EMSA Binding reactions (20 µl) included 100 µg/ml poly(dI-dC), 100 µg/ml tRNA, and 1 mg/ml nuclear extract and were carried out in 10 m M Hepes (pH 7.9), 100 m M NaCl, 1 m M dithiothreitol, 10 m M EDTA, and either 8% (v/v) glycerol or 2.5% Ficoll 400. When EMSA was performed with the latex fraction (0.2 µl/reaction) instead of the nuclear extract, the amount of heterologous nucleic acid was reduced to 0.5 µg/ml tRNA, and Nonidet P-40 and bovine serum albumin were added to final concentrations of 0.1% and 1 mg/ml, respectively. Similar conditions were used to examine renatured proteins. A double-stranded oligonucleotide containing the wild type SGF-1 binding site (SA site, see Ref. 3) was used both as a probe (5 fmol/reaction) and a specific competitor (SA oligonucleotide in Fig. 1). The mutated version of this oligonucleotide, which contained three T to A transitions, was found to compete inefficiently with the wild type probe and served as a control competitor (SAM1 oligonucleotide in Fig. 1). UV Cross-linking A pair of SAUV oligonucleotides (Fig. 1) was used to incorporate 5-bromo-2`-deoxyuridine triphosphate and labeled dATP into the SA sequence. This probe was UV cross-linked (see Ref. 9, Suppl. 3) with SGF-1 present in the latex fraction. Reactions contained either excess of the unlabeled SA oligonucleotide, unlabeled SAM1 oligonucleotide, or no competing oligonucleotide. The optimal interval for UV irradiation was empirically determined to be 20-25 min. The reactions were not treated with DNase prior to SDS-PAGE analysis. Dimethyl Sulfate Methylation Interference The protocol described in Ref. 9 (Suppl. 25) was followed except that a passive elution was used to recover DNA from acrylamide gels. Renaturation of Proteins SDS-PAGE was performed as described (see Ref. 9, Suppl. 13). Proteins were stained with zinc (10) , and the protein bands were excised, destained, and renatured as described (8) . HPLC The Pharmacia Smart system micro HPLC was employed.


Figure 1: Sequence of oligonucleotides used in this study. The double-stranded oligonucleotides were derived from the SA site sequence (3). Numbers indicate the positions of nucleotides relative to the nucleotide +1 of sericin-1 gene. The nucleotides mutated from the wild type sequence are indicated in lower case letters. These substitutions were introduced to facilitate labeling by Klenow enzyme ( not underlined) or to interfere with SGF-1 binding ( underlined). The active SA site in the SA20 and SAUV DNAs was generated after self-ligation and a filling-in procedure, respectively.



Proteins

The latex fraction was supplemented with solid guanidium hydrochloride, 1% trifluoroacetic acid, and 50% ACN to achieve final concentrations of 6 M, 0.1 and 4%, respectively. Up to 2 ml of this mixture was loaded on a µRPC C2/C18 PC 3.2/3 column. The gradient was run in 0.1% trifluoroacetic acid, from 4 to 40% ACN in 1 ml, and then from 40 to 80% ACN in 5 ml. The flow rate was 0.1 ml/min. The entire latex fraction was processed by reverse phase chromatography, aliquots containing 40 and 41 kDa proteins were pooled, and this material (total 0.8 ml) was referred to as reverse phase fraction.

Peptides

Trypsin digest of 40-kDa protein was concentrated according to the protocol used in Ref. 11. All material was loaded on a µRPC C2/C18 SC 2.1/10 column and fractionated as recommended by the manufacturer. Trypsin Digestion The reverse phase fraction was mixed with 20 µl of 1.5% deoxycholate, four times concentrated by a Speedvac, and precipitated with trichloroacetic acid using the Sigma P5656 kit. Precipitated proteins were resuspended in 50 µl of SDS-PAGE sample buffer and stored at -70 °C. Following the SDS-PAGE analysis, the gel was zinc stained, and the 40-kDa band was excised and destained with citric acid (10) . From this point, the gel slice was washed in ACN and treated with trypsin as described (11) . Peptide Sequencing Custom sequencing (Takara) as well as the National Institute for Basic Biology facilities were used to obtain peptide sequences. cDNA Library Construction RNA fraction enriched for poly(A)was isolated from MSG of the 2-day-old last instar larvae. The MSG cell layer was separated from the lumen containing silk proteins, and 150 mg of the cell layer was homogenized in 3.5 ml of solution containing 3 M LiCl, 6 M urea, 10 m M sodium acetate (pH 5.2), and 0.4 mg/ml heparin. RNA was precipitated on ice and centrifuged. The precipitate was resuspended in 0.4 ml of the extraction solution supplied with the Quick-prep micro mRNA isolation kit (Pharmacia), and mRNA was isolated as recommended by the manufacturer. cDNA was synthesized using the TimeSaver cDNA synthesis kit (Pharmacia) and cloned into EcoRI/CIAP-treated lambdaZAP II vector (Stratagene). The library was plated in the E. coli SURE cells (Stratagene). Phagemids from positive clones were rescued by the Ex-assist/SOLR system (Stratagene). Northern Blot Poly(A)RNA was obtained as described above; MSG lumen contents were eliminated if applicable. Samples of approximately 2 µg of poly(A)RNA per lane were electrophoresed in an agarose/formaldehyde gel as described (12) . The RNA was electroblotted on Hybond-N membrane (Amersham Corp.) in 25 m M sodium phosphate (pH 7.0). DNA Sequencing The sequence of SGF-1 cDNA 6 was determined from a set of overlapping deletions using the Sequenase version 2.0 system (U. S. Biochemical Corp.). Standard reactions containing dGTP were run in parallel with dITP-containing reactions. Oligonucleotide primers were synthesized to verify this sequence and to resolve sequences of other cDNA clones.


RESULTS

Purification of the SGF-1 Protein

Approximately 5000 pairs of MSGs were dissected from 2-day-old final instar larvae and used in batches of 1000 MSG pairs to prepare a total of 40 ml of nuclear extract containing 1.6 g of protein. This material was enriched for the SGF-1 binding activity using latex particles coupled with DNA containing the SGF-1 binding site SA (see ``Experimental Procedures''). The estimated SGF-1 enrichment was greater than 500-fold, with 20-30% recovery. This fraction is further referred to as latex fraction.

A bromodeoxyuridine-substituted oligonucleotide carrying the SGF-1 cognate sequence SA was prepared by filling in the SAUV oligonucleotide shown in Fig. 1. This probe was UV cross-linked with the latex fraction. Several specific bands were observed of molecular masses ranging from 52 to 60 kDa (data not shown). Some bands were apparently caused by probe or protein degradation during the incubation period. Our attempt to shorten the cross-linked probe by DNase digestion failed, probably because SGF-1 was degraded by proteases during the 37 °C incubation step. Molecular mass of the SGF-1 protein was estimated within the range of 38-46 kDa, subtracting the contribution of the probe from the electrophoretic mobilities of the observed bands.

SDS-PAGE analysis of the latex fraction revealed a prominent band of apparent molecular mass of 40 kDa (Fig. 2 A, lane 1). The relative SGF-1 binding activity estimated by EMSA correlated well with the intensity of the 40-kDa protein band in various washes and successive elutions throughout the DNA affinity enrichment procedure (data not shown). The 40-kDa protein was therefore presumed to be identical with the SGF-1 factor. To confirm this hypothesis, the 40-kDa band was excised from SDS-PAGE, renatured, and analyzed by EMSA. The mobility of the resulting complex was identical with the mobility of the SGF-1DNA complex in both the crude extract and the latex fraction (data not shown).

The latex fraction was chromatographed on a reverse phase column. An additional, less abundant protein of approximately 41 kDa co-eluted with the 40-kDa protein in a single fraction composed of two overlapping peaks (Fig. 2 A, lane 4, and data not shown). This fraction was further referred to as reverse phase fraction.


Figure 2: SDS-PAGE analysis of purified fractions. M, marker lane. A, silver-stained gel. The crude MSG extract was enriched for the SGF-1 binding activity using DNA-coupled latex particles. This latex fraction ( lane 1) was chromatographed on a reverse phase HPLC column. Lanes 2-4 contain fractions 12-14, respectively, obtained by the reverse phase chromatography step. Material in lane 4 was saved, and this fraction was referred to as the reverse phase fraction. The SGF-1 40-kDa protein is present in lanes 1, 3, and 4 (indicated by an arrow); the SGF-1 41-kDa protein migrates just above the 40-kDa band. Relative intensity of the SGF-1 41-kDa polypeptide band was exaggerated by the photography process. B, Coomassie Brilliant Blue-stained precipitated reverse phase fraction. Proteins present in the reverse phase fraction were precipitated by trichloroacetic acid/deoxycholate and dissolved in the SDS-PAGE loading buffer; of this mixture was loaded on a SDS-PAGE gel. Approximately 400 ng of the SGF-1 40-kDa polypeptide was present, as estimated by Coomassie Brilliant Blue staining and comparison with standard proteins (standard proteins are not shown). Position of the SGF-1 40-kDa protein is indicated by an arrow, and the SGF-1 41-kDa protein is visible just above.



An aliquot of the reverse phase fraction was loaded on a SDS-PAGE, and both the 40- and 41-kDa protein bands were excised separately, renatured, and assessed in EMSA. Both proteins displayed the same binding specificity and produced retarded complexes migrating indistinguishable from the SGF-1 complex formed in crude extract (Fig. 3). We used the methylation interference assay to examine the interaction of the renatured proteins with the proximal upstream region of the sericin-1 gene in greater detail. Both proteins yielded clear, apparently identical dimethyl sulfate footprints (compare lanes 8 and 11 in Fig. 4 A). Indistinguishable footprints were obtained when the DNA probe was incubated either with crude extract or with the latex fraction (Fig. 4 A). This confirms that SGF-1 binding activity purified in the 40- and 41-kDa proteins is identical with the DNA binding activity ascribed to the SGF-1 factor in crude extract. The observed footprint is depicted in Fig. 4B and maps within the DNase I protected region defined as the SA site (3) . We will hereafter refer to the 40-kDa protein as SGF-1 (40 kDa) and to the 41-kDa protein as SGF-1 (41 kDa).


Figure 4: Evidence that the purified SGF-1 proteins correspond to the SGF-1 factor in crude extract (methylation interference assay). A, a DNA fragment spanning positions -132 to -50 of the sericin-1 gene promoter was uniquely labeled at the 5`-end of the coding ( lanes 1-12) or the non-coding ( lanes 13-15) strand. Guanine and (to a lesser extent) adenine residues were partially methylated with dimethyl sulfate, and this probe was used in a gel shift assay with either crude extract, latex fraction, or renatured SGF-1 proteins (obtained as described in Fig. 3). Protein complexes and unbound DNAs were localized by autoradiography, eluted, and cleaved at methylated residues with piperidine. The protein-bound and free DNAs were compared on a sequencing gel. DNA extracted from the free probe band is present in the outer lanes 1 and 3 (crude extract), 4 and 6 (latex fraction), 7 and 9 (renatured SGF-1 40-kDa protein), 10 and 12 (renatured SGF-1 41-kDa protein), and 13 and 15 (latex fraction), whereas DNA extracted from the retarded band is present in the middle lanes 2 (crude extract), 5 (latex fraction), 8 (renatured SGF-1 40-kDa protein), 11 (renatured SGF-1 41-kDa protein), and 14 (latex fraction). Methylated G and A residues that interfere with protein binding are diminished in the protein-bound lanes compared with unbound DNA. B, an overview of the observed interferences. Numbers indicate the positions of nucleotides relative to the nucleotide +1 of sericin-1 gene. Filled arrowheads indicate strong interferences, and blank arrowheads indicate weak or ambiguous interferences.



Internal Sequence of SGF-1 (40 kDa) Protein

Proteins present in the reverse phase fraction were precipitated with trichloroacetic acid/deoxycholate, and the sediment was dissolved in SDS-PAGE loading buffer. One-tenth of this mixture was electrophoresed to estimate the total yield of SGF-1 by Coomassie Brilliant Blue staining (Fig. 2 B). About 4 µg of the SGF-1 (40 kDa) protein (approximately 100 pmol) and 1 µg of the SGF-1 (41 kDa) protein (25 pmol) were purified. These two proteins together comprised over 85% of the precipitated reverse phase fraction. The entire precipitated reverse phase fraction was loaded onto a SDS-PAGE gel, and the SGF-1 (40 kDa) protein was excised from the gel and digested by trypsin. The SGF-1 digestion products were separated by HPLC, and their sequences were partially determined. Several SGF-1 fragments (21A, 25A, 44A, 8B, and 25B) revealed useful sequences. These peptides are indicated by underlines in Fig. 5 A, and their sequences are listed in the figure legend.

Cloning of SGF-1 cDNA

The partial SGF-1 peptide sequences were compared with a non-redundant protein sequence data base using the Fasta program (13) . The overlapping peptides 21A and 25A were found to be contained within the Drosophila fork head protein (6) . Peptide 8B was very similar to another part of Drosophila fork head (data not shown). We therefore concluded that SGF-1 belongs to the fork head/HNF-3 family (see Refs. 7 and 14).

Several members of this family have been recently cloned in our laboratory from a Bombyx embryonic cDNA library.() One of these clones appears to encode several SGF-1 peptides.() This embryonic cDNA clone was used to screen a MSG cDNA library. Over 20 putative positive clones were identified in among approximately 3.5 10plaque-forming units of the unamplified library. 12 of them were isolated, mapped by restriction enzyme digestion and Southern hybridization, and partially sequenced.

The entire 3.1-kb insert of the SGF-1 cDNA 6 clone was sequenced (Fig. 5 A). The sequence revealed a 1047-base pair-long ORF (nucleotides 36-1082), encoding a 38.8-kDa protein that contains all the peptides obtained by the SGF-1 (40 kDa) trypsin digest. The starting AUG codon is preceded by sequence AGCC, which corresponds relatively well with the Drosophila translation start consensus (15) .


Figure 5: Nucleotide sequences of the SGF-1 cDNA clones. A, nucleotide sequence and predicted translation of the SGF-1 cDNA clone 6. The completely sequenced full-lengthcDNA 6 is 3078 base pairs long and encodes a 349-amino acid-long polypeptide. Amino acids are shown and numbered in italics. The putative polyadenylation signal and the peptide sequences determined from the trypsin digest of the SGF-1 40-kDa protein are underlined. Not all amino acid residues in the underlined peptides were resolved by peptide sequencing. The actually unambiguously determined amino acid residues are as follows: peptide 44A, EQ XXXSPTSALQ; Peptide 25A, FKDEK; Peptide 21A, XKDEKK; Peptide 41A, LLPGADTK; Peptide 8B, QEPSGYAPAQHPF; Peptide 25B, XYDVNYGYG XXPA XNYY. B, partial sequences of several SGF-1 cDNAs. Left, the extreme 5`-ends of SGF-1 cDNAs 6, 9, 13, and 5. The clones differ by only a few nucleotides and therefore may correspond to (nearly) full-length cDNAs.Right, the sequences of clones 6, 9, 13,and 3 represent the extreme 3`-ends of the respective clones. The sequence of clone 5 maps approximately 0.8 kb from the actual 3`-end of this clone. The underlined putative polyadenylation signal was probably used by clones 6, 3, 9 (and 13) but not by clone 5.



The SGF-1 cDNAs 6, 9, and 13 revealed essentially identical restriction maps when digested with PstI, XhoI, and KpnI enzymes (data not shown). The 5`-ends of these clones were partially sequenced and found to differ by only a few bases in length (Fig. 5 B and data not shown). The cDNA library was not amplified, and therefore these three cDNAs likely correspond to the (nearly) full-length copies.()

The 3`-untranslated end of SGF-1 cDNA 9 (and other cDNAs) differs from that of clone 6; four base pair substitutions were found within the last 100 base pairs (Fig. 5 B and data not shown). Sequences of the remaining portions of the 3`-untranslated region of clone 9 were not determined, but no differences were noticed after restriction mapping. We sequenced the entire ORF and the 5`-untranslated end of the SGF-1 cDNA 9 and found that this sequence was identical with the corresponding region of clone 6 (data not shown).

Restriction mapping revealed that the 3`-end of clone 5 contains an extra 0.8-kb stretch; this sequence was not present in the other clones shown in Fig. 5 B. Partial sequencing at this region indicated that the 3`-end of clone 5 is merely an extended 3`-end found in the other cDNAs (Fig. 5 B and data not shown). The 5`-end and the ORF of cDNA 5 was sequenced except for the conserved fork head domain. No difference was found between the coding regions of clones 5 and 6 (data not shown).

The SGF-1 cDNA clones 6 and 9 differ by base substitutions at their 3`-untranslated regions (see above). This is not due to the existence of two closely related SGF-1 genes, since Southern blot of genomic DNA showed the presence of a single hybridizing band in most restriction enzyme digests (Fig. 6 A). Therefore, we speculate that the observed differences between the two cDNA clones reflects occurrence of two slightly different SGF-1 alleles.

SGF-1 Expression

We were interested to see whether SGF-1 was expressed specifically or predominantly by MSG and whether the time course of its expression was consistent with its presumed role as a regulator of the sericin-1 gene. We have detected two SGF-1 transcripts in MSG poly(A)RNA preparations. The major transcript (approximately 5 kb)was present in MSG of all stages examined, while the less abundant one (approximately 6 kb) was found in MSG of molting larvae in extremely low levels also occurred in MSG of 2-day-old final instar larvae (Fig. 6 B, lanes 2 and 3). Thus, SGF-1 mRNA appearance preceded the massive expression of sericin-1 gene in the last larval instar (16) , and the major 5-kb SGF-1 transcript was maintained throughout the last instar. The 5-kb SGF-1 mRNA could also be detected in the control tissue (Fig. 6 B, lane 5), although its relative amount was very low (see legend to Fig. 6, B and C).()


Figure 6: Evidence that SGF-1 is a single copy gene and that its transcripts are predominately expressed by MSG. A, Southern blot analysis. Bombyx Sho-Wa strain genomic DNA was digested with EcoRI ( lane 1), PstI ( lane 2), PstI and XhoI ( lane 3), or XhoI alone ( lane 4). The restriction enzyme digests were resolved on an agarose gel, blotted, and probed with a fragment of the SGF-1 cDNA 6 (nucleotides 1-325, Fig. 5 A). Presence of a single hybridizing band in most lanes suggests that SGF-1 is a single copy gene. B, Northern blot analysis. Poly(A)RNAs were prepared from MSG of animals from the penultimate larval instar staged 72 h after the third ecdysis ( lane 1), from MSG of the E2 stage in the fourth molt (staged approximately 24 h later than the previous stage, see Ref. 24 for details), from MSG of 2-day-old last instar larvae ( lane 3), from MSG of 5-day-old last instar larvae ( lane 4), and from ovary of 4.5-day-old prepupae ( lane 5). The RNAs were hybridized to the same SGF-1 probe as in A. The major 5-kb transcript is prominent in all MSG samples and is present as an extremely weak band in the ovary ( lane 5). The minor 6-kb transcript is clearly observed in the E2 stage ( lane 2) and is almost diminished in the next stage ( lane 3). C, the same Northern blot as in B rehybridized with a Bombyx cytoplasmic actin oligonucleotide probe. We used a Fuji Bas2000 image analyzer to calculate the actual amount of the major SGF-1 transcript after correction on actin level. The amount of SGF-1 5-kb mRNA present in the last instar is roughly constant; approximately two times higher than in the stage from the fourth instar and 40 times higher than in ovary. We excluded the molting stage from our calculations, since we feel that actin level is not an appropriate marker for this stage.




DISCUSSION

SGF-1 Proteins and cDNA Clones

We have identified two closely related SGF-1 proteins of observed molecular masses of 40 and 41 kDa, respectively. They both yielded retarded EMSA complexes that comigrated with the complex generated by SGF-1 in crude extract (Fig. 3). Furthermore, the interaction of the SGF-1 (40 kDa) and SGF-1 (41 kDa) proteins with the methylated SA site appears identical when compared both with each other and with the SGF-1 protein present in crude extract (Fig. 4). In fact, the results of the methylation interference assay practically preclude that other protein(s), different from the 40- and 41-kDa ones, participate in the formation of the tissue- and sequence-specific SGF-1 complex in crude extract. The SGF-1 40- and 41-kDa proteins co-elute in a single fraction from reverse phase chromatography (Fig. 2 A and data not shown), further indicating their high similarity.


Figure 3: Evidence that the purified SGF-1 proteins correspond to the SGF-1 factor in crude extract (electrophoretic mobility shift assay). EMSA was performed with 5 fmol of labeled SA oligonucleotide (Fig. 1) and either 10 µg of crude MSG extract ( lane 1) or unknown but equal amounts of SGF-1 40-kDa protein ( lanes 2-4) or unknown but equal amounts of SGF-1 41-kDa proteins ( lanes 5-7). To obtain SGF-1 proteins, reverse phase fraction was analyzed by SDS-PAGE (Fig. 2 A, lane 4), and the 40- and 41-kDa SGF-1 polypeptides were excised and renatured as described under ``Experimental Procedures.'' Lanes 3 and 6 contain 500 pmol of unlabeled SA oligonucleotide (efficient competitor containing functional SA site, Fig. 1), and lanes 4 and 7 contain 500 pmol of unlabeled SAM1 oligonucleotide (inefficient competitor bearing mutated SA site, Fig. 1).



We have analyzed over a dozen cDNA clones including several likely full-length cDNAs ( Fig. 5and data not shown).These clones fall into three categories represented by SGF-1 cDNAs 6, 9, and 5. Based on a Southern blot hybridization, all of them originate from a single gene (Fig. 6 A) and possess an identical ORF encoding a 38.8-kDa protein. This protein is a new member of the fork head family and contains all peptide sequences found in the SGF-1 (40 kDa) protein fragments (Fig. 5 A and legend). As discussed above, the SGF-1 (40 kDa) and SGF-1 (41 kDa) proteins are likely to be close relatives. If they arose from different mRNAs, one would expect the presence of cDNA clones encoding different protein products. However, no such different cDNAs were obtained. We therefore assume that both the 40- and 41-kDa (apparent molecular mass) proteins probably correspond to the predicted molecular mass of 38.8 kDa of the product deduced from cDNA sequences. One possibility is that the 41-kDa protein is a post-translationally modified version of the 40-kDa polypeptide. The 1.2-kDa difference between 40 and 38.8 kDa is well within the limits of experimental error of molecular mass determination from SDS-PAGE.

We did not study the differences between cDNAs 6, 9, and 5 in details. Our presumption is that the 3.1-kb-long clones 6 and 9, which differ by base substitutions at the 3`-end, are full-length cDNAs (Fig. 5 B),representing two slightly different alleles of the SGF-1 gene. Both of these clones could arise by the usage of the most proximal polyadenylation signal (Fig. 5 B) and likely correspond to the major transcript observed on Northern blot (Fig. 6 B).The 3.9-kb-long clone 5 contains additional 0.8 kb at its 3`-end, and its restriction map and partial sequence suggest that it is an extension of clone 9 (Fig. 5 B and data not shown). It is likely that a more distant polyadenylation site was used in the case of cDNA 5. This clone may correspond to the 6-kb transcript detected on Northern blot (Fig. 6 B, lanes 2 and 3).

Regulation of Genes Coding for Silk Proteins by SGF-1

SGF-1 is a new member of the fork head/HNF-3 family. This family contains HNF-3 factors , , and , which regulate transcription of tissue-specific genes in rodent liver and several other organs derived from the embryonic gut (14) . The other two members of this family are Xenopus XFKH-1 (17) and Drosophila fork head (6) , which are developmental regulators. Members of this family are characterized by a 110-amino acid-long conserved DNA binding motif (see Ref. 18). This motif is highly conserved also in SGF-1 (Lys-Gluin Fig. 5 A, domain I in Fig. 7), indicating that SGF-1 is a true homologue of Drosophila fork head and mammalian HNF-3 rather than a homologue of some other Drosophila (19) or mammalian (7, 20) fork head protein. In addition, the SGF-1 binding site (Fig. 4 B) closely resembles some sequences recognized by HNF-3 (21, 22) . Two additional domains (Leu-Leuand Asp-Leu, Fig. 5 A; domains II and III, respectively, in Fig. 7 ) are conserved between Drosophila fork head and the three mammalian HNF-3 factors. These two domains are transactivation domains required for transcriptional stimulation by HNF-3 (23) . Both of these domains can be recognized in SGF-1 (Fig. 7). Except for two gaps, domain II is well conserved between HNF-3 and SGF-1. It is likely that this conservation of structure is paralleled by conservation of function; these regions may be required for target gene transactivation. Further biochemical evidence is necessary to address this question.


Figure 7: Comparison of SGF-1 with other members of fork head/HNF-3 family. The three domains conserved between Drosophila fork head protein (FKH) and mammalian HNF-3 factors are present also in SGF-1. The domain boundaries in SGF-1 are Lysand Glu( I), Leuand Leu( II), and Aspand Leu( III). Domain I is the DNA binding domain, and domains II and III participate in transactivation by HNF-3 (23).



Drosophila fork head is a homeotic gene required for differentiation of embryonic termini. Aside from early terminal domains in ectoderm, fork head is expressed in four additional tissues: the developing midgut, salivary glands, nervous system, and the yolk nuclei. With the exception of midgut, expression persists until very late stages of embryonic development (6) . Salivary glands are missing in fork head mutant embryos (cited in Ref. 6). Lepidopteran silk glands are homologous to the salivary glands of Diptera, and SGF-1 is expressed in Bombyx embryos.We speculate that the SGF-1 protein may be initially required for the development of silk glands and subsequently utilized in the control of genes coding for silk proteins. Indeed, besides the sericin-1 gene, SGF-1 probably interacts with an upstream region of the fibroin-H gene (2) , and there are putative SGF-1 binding sites in the regulatory regions of several other genes coding for silk components. To our knowledge, no target genes for the Drosophila fork head protein have been identified to date.

There is conclusive evidence that SGF-1 stimulates sericin-1 gene transcription via interaction with the SA site (3, 4) . The SGF-1 transcription-stimulating activity was ascribed to a DNA binding protein forming the major and the only sequence-specific retarded complex detected by EMSA, with the SA site containing probe in crude extract. The SGF-1 complex is tissue specific, being most profound when extract from middle or posterior silk glands is used (3, 4) . We propose that the SGF-1 protein, which we have cloned in the present study, is involved in the sericin-1 gene regulation because 1) this is the protein that forms the sequence-specific complex with the SA site probe in crude extract (Figs. 3 and 4), 2) high SGF-1 mRNA levels are found in silk glands of animals of the appropriate developmental stages but not in the control tissue (Fig. 6 B),and 3) SGF-1 is a new member of an established family of transcriptional regulators, possesses putative transactivation domains, and its Drosophila homologue is required for the development of salivary glands (see above).

Nevertheless, a direct test, preferably in vitro transcription assay, is needed to study the function of recombinant SGF-1 protein and its domains in stimulation of the sericin-1 promoter activity. Other questions concern putative roles of SGF-1 in the control of other genes coding for silk components and possible SGF-1 interaction with other protein factors, namely SGF-3 (5) .


FOOTNOTES

*
This research was supported in part by grants-in-aid for Research of Priority Areas and for Japanese Society for the Promotion of Science Fellows from the Ministry of Education, Science, and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked `` advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBank/EMBL Data Bank with accession number(s) D38514.

§
Present address: Institute of Entomology, Czech Academy of Sciences, Branisovska 31, Ceske Budejovice, Czech Republic.

Present address: Research Center for Molecular Genetics, Hokkaido University, North 10, West 8, Kita-ku, Sapporo 060, Japan.

**
To whom all correspondence should be addressed. Tel.: 81-564-55-7560; Fax: 81-564-55-7566.

The abbreviations used are: SGF-3, silk gland factor-3; SGF-1, silk gland factor-1; MSG, middle silk glands; EMSA, electrophoretic mobility shift assay; ACN, acetonitrile; ORF, open reading frame; PAGE, polyacrylamide gel electrophoresis; kb, kilobase(s); HPLC, high pressure liquid chromatography.

S. Takiya, unpublished results.

The actual identity of this clone was not clear at that time. Later in the course of the present study, it became evident that this embryonic clone actually codes SGF-1 (S. Takiya, unpublished data).

Almost identical 5`-ends were found in four of the clones listed in Fig. 5 B ( 6, 9, 13, and 5) and in 4 embryonic SGF-1 cDNAs (S. Takiya, unpublished data). Comparison of the 3`-ends of clones 9 and 3 with clone 5 (Fig. 5 B) strongly suggests that the oligo(A) stretches found at the 3`-ends of the 9 and 3 clones were added by polyadenylation. Taken together, these data indicate that clones 6, 9, 13, and 5 are full-length cDNAs. The full-length cDNA clones are 3.1 or 3.9 kb long. Northern blot analysis revealed transcripts of apparent lengths of 5 and 6 kb. The first 1300 bases of SGF-1 cDNAs are rich in GC stretches and possess a potential for forming secondary structures. We presume that the discrepancy observed between lengths of SGF-1 cDNA clones and their transcript sizes is most likely explained by our failure to denature SGF-1 mRNAs completely.

SGF-1 mRNA is also expressed in posterior silk gland in the last and penultimate larval instars in a manner similar to its expression in MSG. The major transcript is also present in embryonic silk gland and a few other embryonic tissues (H. Kokubo and S. Takiya, unpublished data).


ACKNOWLEDGEMENTS

We thank Dr. C.-c. Hui for Bombyx genomic DNA; Dr. H. Kokubo for the actin oligonucleotide probe and useful discussion; H. Kajiura and Y. Makino for peptide sequencing; Dr. M. Yoshikuni for help with operating the Smart system HPLC; C. Inoue, M. Ohkubo, E. Suzuki, and M. Sasaki for assistance; Prof. G. Eguchi, Dr. M. Mochii, and Prof. Y. Nagahama for the access to Smart system HPLC, and Drs. X. Xu, K. Nakai, and P.-X. Xu for useful discussion. We are especially grateful to Dr. M. Jindra for his careful revision of the manuscript (grammar and style).


REFERENCES
  1. Suzuki, Y. (1994) Int. J. Dev. Biol. 38, 231-235 [Medline] [Order article via Infotrieve]
  2. Hui, C., Matsuno, K., and Suzuki, Y. (1990) J. Mol. Biol. 213, 651-670 [Medline] [Order article via Infotrieve]
  3. Matsuno, K., Hui, C., Takiya, S., Suzuki, T., Ueno, K., and Suzuki, Y. (1989) J. Biol. Chem. 264, 18707-18713 [Abstract/Free Full Text]
  4. Matsuno, K., Takiya, S., Hui, C.-c., Suzuki, T., Fukuta, M., Ueno, K., and Suzuki, Y. (1990) Nucleic Acids Res. 18, 1853-1858 [Abstract]
  5. Fukuta, M., Matsuno, K., Hui, C., Nagata, T., Takiya, S., Xu, P.-X., Ueno, K., and Suzuki, Y. (1993) J. Biol. Chem. 268, 19471-19475 [Abstract/Free Full Text]
  6. Weigel, D., Jurgens, G., Kuttner, F., Seifert, E., and Jäckle, H. (1989) Cell 57, 645-658 [Medline] [Order article via Infotrieve]
  7. Clevidence, E. D., Overdier, D. G., Tao, W., Qian, X., Pani, L., Lai, E., and Costa, R. H. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 3948-3952 [Abstract]
  8. Inomata, Y., Kawaguchi, H., Hiramoto, M., Wada, T., and Handa, H. (1992) Anal. Biochem. 206, 109-114 [Medline] [Order article via Infotrieve]
  9. Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K. (1987) Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley-Interscience, New York
  10. Ortiz, M. L., Calero, M., Patron, C. F., Castellanos, L., and Mendez, E. (1992) FEBS Lett. 296, 300-304 [CrossRef][Medline] [Order article via Infotrieve]
  11. Rosenfeld, J., Capdevielle, J., Guillemot, J. C., and Ferrara, P. (1992) Anal. Biochem. 203, 173-179 [Medline] [Order article via Infotrieve]
  12. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  13. Pearson, W. R., and Lipman, D. J. (1988) Proc. Natl. Acad. Sci. U. S. A. 85, 2444-2448 [Abstract]
  14. Lai, E., Prezioso, V. R., Tao, W. F., Chen, W. S., and Darnell, J. E., Jr. (1991) Genes & Dev. 5, 416-427
  15. Cavener, D. R. (1987) Nucleic Acids Res. 15, 1353-1361 [Abstract]
  16. Obara, T., and Suzuki, Y. (1988) Dev. Biol. 127, 384-391 [Medline] [Order article via Infotrieve]
  17. Dirksen, M. L., and Jamrich, M. (1992) Genes & Dev. 6, 599-608
  18. Clark, K. L., Halay, E. D., Lai, E., and Burley, S. K. (1993) Nature 364, 412-420 [CrossRef][Medline] [Order article via Infotrieve]
  19. Hacker, U., Grossniklaus, U., Gehring, W., and Jäckle, H. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 8754-8758 [Abstract]
  20. Kaestner, K. H., Lee, K. H., Schlondorff, J., Hiemisch, H., Monaghan, A. P., and Schutz, G. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 7628-7631 [Abstract/Free Full Text]
  21. Paulweber, B., Sandhofer, F., and Levy-Wilson, B. (1993) Mol. Cell. Biol. 13, 1534-1546 [Abstract]
  22. Lemaigre, F. P., Durviaux, S. M., and Rousseau, G. G. (1993) J. Biol. Chem. 268, 19896-19905 [Abstract/Free Full Text]
  23. Pani, L., Overdier, D. G., Porcella, A., Qian, X., Lai, E., and Costa, R. H. (1992) Mol. Cell. Biol. 12, 3723-3732 [Abstract]
  24. Kiguchi, K., and Agui, N. (1981) J. Insect. Physiol. 27, 805-812

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.