©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
Membrane Topology of the Human Na/Glucose Cotransporter SGLT1 (*)

(Received for publication, August 15, 1995; and in revised form, November 11, 1995)

Eric Turk (§) Cynthia J. Kerner M. Pilar Lostao (¶) Ernest M. Wright

From the Department of Physiology, UCLA School of Medicine, Los Angeles, California 90095-1751

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS AND DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The membrane topology of the human Na/glucose cotransporter SGLT1 has been probed using N-glycosylation scanning mutants and nested truncations. Functional analysis proved essential for establishment of signal-anchor topology. The resultant model diverges significantly from previously held suppositions of structure based primarily on hydropathy analysis. SGLT1 incorporates 14 membrane spans. The N terminus resides extracellularly, and two hydrophobic regions form newly recognized membrane spans 4 and 12; the large charged domain near the C terminus is cytoplasmic. This model was evaluated further using two advanced empirically-based algorithms predictive of transmembrane helices. Helix ends were predicted using thermodynamically-based algorithms known to predict x-ray crystallographically determined transmembrane helix ends. Several considerations suggest the hydrophobic C terminus forms a 14th transmembrane helix, differentiating the eukaryotic members of the SGLT1 family from bacterial homologues. Our data inferentially indicate that these bacterial homologues incorporate 13 spans, with an extracellular N terminus. The model of SGLT1 secondary structure and the predicted helix ends signify information prerequisite for the rational design of further experiments on structure/function relationships.


INTRODUCTION

SGLT1 is one of a family of homologous Na-dependent membrane transport proteins currently comprising six eukaryotic and two bacterial homologues and several bacterial genomic open reading frames(1, 2) . This family has been assigned membership in a larger superfamily of (nonhomologous) families sharing a common general function, the sodium-coupled transport of solutes, and a common predicted structure of 12 membrane spans(2) . The secondary and tertiary structures that underlie the efficient and specific transmembrane shuttling of substrates are wholly unknown.

One step in transporter structural analysis is the determination of membrane topology of hydrophilic and hydrophobic regions. Knowledge of structural domain topology is a prerequisite for the rational design of experiments intended to localize sodium and substrate binding sites. Plots of hydrophobicity (3, 4) provide good indications of transmembrane regions but are subject to interpretation; e.g. analyses of the Escherichia coli lac permease predict between 8 and 14 membrane spans(5) .

SGLT1 was originally proposed to form 11 transmembrane domains(6) . Subsequent models have deleted or added spans(7, 8, 9) . We report the physical determination of SGLT1 membrane topology by N-glycosylation scanning mutagenesis, a method used to map several membrane proteins, including hydroxymethylglutaryl-CoA reductase(10) , cystic fibrosis transmembrane conductance regulator (11) , and the Glut 1 facilitative glucose transporter(12) .


EXPERIMENTAL PROCEDURES

SGLT1 Mutants

The endogenous N-glycosylation sequence of human SGLT1 (13) was eliminated by the mutation N248Q to simplify interpretation and avoid potential interference with the introduced N-glycosylation consensus(14) .

The nested N-glycosylation sequence NNSS was inserted into wild-type SGLT1 at strategic locations, co-introducing an XhoI restriction site (NNSS mutants, see Fig. 1and Fig. 2). Insertion sites were chosen such that flanking amino acids were among those found at N-glycosylated sites(15) . Site-directed mutagenesis used a variation (16) of the Taq DNA polymerase-mediated megaprimer method(17) . Mutant clones were screened for Taq-introduced errors by dideoxy sequencing (Sequenase 2.0 kit, U. S. Biochemical Corp.), and NNSS-bearing restriction fragments were swapped into the N248Q mutant, yielding the NNSS mutant series.


Figure 1: N-Glycosylation mutant scanning of the secondary structure model (9) of human SGLT1. N-glycosylation consensus sequences were inserted at the points numbered 1 to 12 in a mutant (N248Q) of SGLT1 that lacked the native N-glycosylation site. Insertion sites were chosen to reside between two putative membrane spans or between a span and one of two hydrophobic regions labeled H1 and H2.




Figure 2: Insertion sites of two sets of N-glycosylation scanning mutants of SGLT1. A, the residues flanking the 12 NNSS insertions generated by site-directed mutagenesis of the mutant N248Q (human SGLT1, which lacks the native N-glycosylation site) are listed. The 12 NG mutants were made from the NNSS mutants by splicing the 42-amino acid hydrophilic loop of SGLT1 bearing the native N-glycosylation consensus into the introduced XhoI site. B, the DNA sequence of the two nested N-glycosylation consensus sequences of the NNSS mutant series, and the 42-amino acid sequence of the NG mutant series as it appears after splicing into the XhoI site of the NNSS mutants. The introduced XhoI site is underlined in the NNSS insert; two XhoI sites flanking the NG insert are shown. The NG mutants also retained amino acids (underlined) introduced in the parent NNSS mutant. The resultant NG mutants had N-glycosylation consensus sequences in the two regions highlighted in boldface.



Native glycosylation (NG) (^1)mutants bearing a transposed copy of the natively N-glycosylated loop of SGLT1 (see Fig. 2) were constructed by splicing into the XhoI site of each NNSS mutant. This 42 amino acid-encoding loop of human SGLT1 was synthesized by polymerase chain reaction, incorporating flanking XhoI sites. The XhoI fragment was ligated into each NNSS mutant, and transformants were screened for orientation and sequence fidelity.

In Vitro Synthesis of mRNA

mRNA was synthesized from templates of XbaI-linearized mutant plasmid (0.5 µg) in 10-µl reactions using the Ambion T3 Megascript kit (Austin, TX). Truncated mRNAs were transcribed from template DNA cut with the restriction enzymes indicated under ``Results.'' Reactions included 7.5 mM each ATP, CTP, and UTP, 3.75 mM GTP, 7.35 mM cap analog, and 20 units of RNase-Block (Stratagene). RNA quality was assessed by formaldehyde-agarose gel electrophoresis(18) .

In Vitro Translation of mRNA

mRNA was translated (0.4 µg in a 10-µl reaction) with rabbit reticulocyte lysate and dog pancreatic microsomes in the presence of [S]methionine using the kit provided by Promega (Madison, WI). SDS-PAGE on 10% gels was done as described previously (19) . Microsomes bearing translated truncation peptides (see Fig. 4C) were pelleted, after 17-fold dilution with phosphate-buffered saline, pH 7.6, by ultracentrifugation for 15 min at maximum speed in a Beckman Airfuge. Pellets were resuspended in 32 µl of phosphate-buffered saline. To open microsomes and remove nonintegral proteins, 10-µl aliquots were diluted with 100 µl of Na(2)CO(3), 0.2 M(20) . After 5 min at 22 °C, microsomal sheets were pelleted in the Airfuge. Supernatant proteins were precipitated with 200 µg of SDS and 1 volume of 3 M KAc, 2 M HAc. Samples were electrophoresed on 10-20% gradient gels in Tricine/SDS (ISS Inc., Natick, MA) and autoradiographed with Kodak Biomax film.


Figure 4: Experiments elucidating the topology of the N-terminal region. A, immunostained Western blots of detergent extracts, either treated (+) or not treated(-) with the glycopeptidase PNGase F, of Xenopus oocytes expressing native SGLT1, the nonglycosylated mutant N248Q, or the N-glycosylation mutant NNSS13 (NNSS sequence after Asp2). B, schematic representation of anticipated peptides resulting from in vitro translation of mRNA synthesized from native SGLT1 cDNA templates cleaved with one of three indicated restriction enzymes. C, autoradiogram of SDS-PAGE of peptides synthesized in the presence of [S]methionine in vitro from progressively truncated mRNAs of native SGLT1 cDNA. mRNAs were synthesized from native SGLT1 cDNA templates cleaved with BamHI, BsmI, or MscI (lanes 1-3, respectively). Microsomes were pelleted after protein translation and directly applied to the gel (lanes a), or were washed at high pH to remove secreted or adsorbed translation products. Lanes b and c represent pelleted membrane sheets, or soluble protein in the alkaline supernatant, respectively.



Oocyte Expression of mRNA

Xenopus laevis oocytes were injected with mRNA(21) , and after 3-4 days they were homogenized in 100 mM NaCl, 20 mM TrisbulletCl, pH 7.6, 1% Elugent (Calbiochem, San Diego, CA), using 20 µl/oocyte. After 5 min in ice, they were centrifuged for 3 min at 16,000 times g, 22 °C, and 85% of the supernatant was collected, avoiding floating lipids, and mixed with 0.33 volume of 4 times SDS sample buffer(19) . Samples were stored at -80 °C or used immediately. SDS-PAGE and Western blotting were done as described previously (22) .

Peptidyl N-Glycanase F (PNGase F) Digestion

10 µl aliquots of oocyte extract in SDS (above) were mixed with 40 µl of 100 mM NaCl, 10 mM TrisbulletCl, pH 8, 1 mM EDTA and denatured at 90 °C for 1 min. After cooling, they were mixed with 1 volume of 5 mM EDTA, 1.5% Lubrol PX, 40 mM HEPES, 40 mM Tris base, 1 µg/µl aprotinin,50 ng/µl leupeptin, 50 ng/µl pepstatin, 1 mM phenylmethylsulfonyl fluoride, and incubated with 0.4 units of PNGase F (Boehringer-Mannheim) at 37 °C overnight.

Immunostaining of Western Blots

Nitrocellulose Western blots were dried, fixed briefly in 15% 2-propanol, 5% acetic acid, and dried again at 37 °C. They were blocked with PB (0.05% Tween 20, 1% polyvinylpyrrolidone, M(r) 40,000(23) , in phosphate-buffered saline, pH 7.1). Nonspecific staining of oocyte protein bands was reduced by immersing blots for 5 min in 0.005% Alizarin red, 0.005% calmagite in PB, (^2)and then extensively washed in PB. Blots were probed with a 1/3000 dilution of antibody 8821(22) . After five washes in PB minus polyvinylpyrrolidone, blots were incubated with a 1/10,000 dilution of peroxidase-conjugated secondary antibody, and washed. Chemiluminescence was detected with the Renaissance kit (DuPont NEN), and exposure of Hyperfilm-MP (Amersham).

Transport Measurements

Transport function of oocyte-expressed mutants was assessed by radiotracer uptakes from 50 µMD-[alpha-methyl-^14C]glucopyranoside as described(24) .

Computer-assisted Analyses

Transmembrane helices of SGLT1 were predicted using the neural network PredictProtein (25) and the MEMSAT algorithm(26) . Prediction of helix ends by interfacial hydrophobicity/reverse turn propensity (IFH/RT) was done (27) using IFH values of Jacobs and White (28) and RT propensities of Table V of Levitt ((29) , also see (27) ). Scanning-window averages of IFH and RT were calculated using Excel 5.0 (Microsoft) and graphed in Deltagraph Pro3 (DeltaPoint), and the graphs were arranged in Canvas 3.5 (Deneba) on a Macintosh Quadra 950.


RESULTS AND DISCUSSION

Previous experiments demonstrated that only the first (Asn-248) of two N-glycosylation consensus sequences found in native SGLT1 is glycosylated(13) , indicating their extra- and intracellular dispositions. To extend these findings, insertion sites in human SGLT1 for introduction of N-glycosylation consensuses (NX(T/S)) were selected between putative membrane spans (9) and two other hydrophobic domains (Fig. 1, H1 and H2).

N-Glycosylation Mutants, First Series

The sequence Asn-Asn-Ser-Ser was inserted into one of 12 sites in Asn-248 Gln SGLT1 (mutants NNSS1-NNSS12, Fig. 1and Fig. 2). The sequence NNSS and the insertion sites were chosen to maximize recognition by N-oligosaccharyltransferase in the endoplasmic reticulum lumen. NNSS encodes two nested consensuses for increased probability of steric accessibility. Consensuses were inserted between residues commonly flanking naturally N-glycosylated sites(15) .

NNSS mutants were expressed in Xenopus oocytes. Glycosylation status was assessed by mobility comparisons on immunostained Western blots, of peptidyl N-glycanase F-treated and nontreated oocyte detergent extracts. Despite measures to maximize consensus recognition, only NNSS3 and NNSS7 were glycosylated (Fig. 3A; NNSS7 not shown). Glycosylation of NNSS7 verified this domain's putative extracellular location, whereas NNSS3 glycosylation contradicted this consensus' putative intracellular disposition. Glycosylation profiles of NNSS mutants expressed in vitro in reticulocyte lysate with pancreatic microsomes (not shown) paralleled those obtained from oocytes. Presumably, steric factors such as membrane proximity prevented consensus recognition and glycosylation of most NNSS consensuses presented to the endoplasmic reticulum lumen.


Figure 3: Immunostained Western blots of detergent extracts of Xenopus oocytes expressing N-glycosylation scanning mutants. A, oocytes expressing native SGLT1, the nonglycosylated mutant N248Q, or one of four NNSS N-glycosylation scanning mutants. NNSS mutants lacked the wild-type N-glycosylation site. The extracts were either treated (+) or not treated(-) with the glycopeptidase PNGase F. Arrows denote N-glycosylated species. B, oocytes expressing 12 NG N-glycosylation scanning mutants. NG mutants lacked the wild-type N-glycosylation site and were derived from NNSS mutants by ligation of the natively glycosylated 42-amino acid external loop of SGLT1 into the XhoI site at the NNSS insertion, increasing the mutant's M(r) by 5.5 kDa above that of N248Q. N-I indicates noninjected oocytes.



N-Glycosylation Mutants, Second Series

Mutants with an insertion intended to be more recognizable by N-oligosaccharyltransferase were constructed. DNA encoding the natively glycosylated 42-amino acid hydrophilic domain of SGLT1 was synthesized by polymerase chain reaction (see ``Experimental Procedures'') and ligated into the XhoI site of each NNSS mutant ( Fig. 1and Fig. 2). Mutants NG1-NG12 contained the 48-amino acid cumulative insertion NNSSDAFMEKYMKAIPTIVSDGNTTFQEKCYTPRADSFHIFRDPLTSS (N-glycosylation consensuses are in boldface; 42-amino acid loop is italicized).

Immunoblots of oocyte-expressed NG mutants, ± PNGase F treatment, are shown in Fig. 3B. Results were as follows: (i) the domains glycosylated in the NNSS3 and NNSS7 mutants were also glycosylated in their NG mutant counterparts (NG3 and NG7); (ii) NG2 was also glycosylated, however, suggesting no membrane span between NG2 and NG3 (but see below); (iii) NG5 was glycosylated, and NG4 was not, indicating that the intervening hydrophobic region H1 forms a transmembrane span, consistent with concurrent extracellularity of the NG3 and wild-type glycosylation sites; (iv) NG11 was glycosylated and NG10 was not, indicating the intervening hydrophobic region H2 forms a newly identified membrane span, as predicted in one model(8) ; (v) NG12 was not glycosylated, indicating the large ionized C-terminal region is cytoplasmic; and (vi) the remaining three putatively external hydrophilic regions, topologically mandated the sum of all previous evidence to reside extracellularly, were each glycosylated (NG5, NG9, and NG11). The 48-amino acid insertion of the NG mutants was thus, as intended, sterically recognizable for N-glycosylation when presented to the endoplasmic reticulum lumen.

Mutant NG1 was predominantly nonglycosylated, but unexpectedly showed an N-glycosylated component (Fig. 3B), implying that the N terminus resides extracellularly, inconsistent with the glycosylation of NG2. The topogenic orientation imposed on the N terminus and adjacent hydrophobic signal-anchor of integral membrane proteins is evidently determined by the distribution of positive charge near the signal anchor(30, 31) . Insertion of the large NG sequence (three positive, three negative charges) in mutants NG1 and NG2 near the signal-anchor MS1 possibly perturbed the initial translocation/orientation event.

N-Terminal Region Analysis, and Nested mRNA Truncations

Another NNSS mutant was constructed to resolve inconsistencies presented by glycosylations of NG1, NG2, and NNSS3/NG3. Mutant NNSS13 ( Fig. 1and Fig. 2) encoded a consensus at the amino terminus, distant from the signal anchor. Expression of NNSS13 in oocytes resulted in its full glycosylation (Fig. 4A). Most importantly, only NNSS13 was both glycosylated and retained full cotransporter function (see below), determining firmly the extracellular localization of the amino terminus and also suggesting that the glycosylation of NG2 reflected mistranslocation of its signal anchor. An extracellular N terminus and the distribution of charged residues about MS1 are consistent with the empirical rule of N terminus membrane orientation determined by Lodish (30) .

N-glycosylations of NNSS13, NG2, and NG3 were topologically inconsistent with two membrane spans flanking NG2, suggesting that the 48-amino acid insertion in NG2 had perturbed the membrane topology, or that spans MS1 and MS2 were actually extracellular domains. Therefore, three truncated mRNA species were synthesized (Fig. 4B), from SGLT1 templates cut with MscI, BsmI, or BamHI, yielding peptides of 82, 103, and 141-amino acid, respectively, upon translation in vitro (Fig. 4C). Microsomes of translation reactions were pelleted and resuspended, and aliquots were washed with pH 11.5 carbonate to open microsomes and remove secreted and adsorbed translation products(20) . All three truncated peptides remained associated with the pelleted membrane fraction after alkaline washing, although a fraction of the smallest peptide (MscI), bearing two putative membrane spans, appeared in the supernatant. Significantly, the next truncation (BsmI) was fully retained in the membrane sheets, although no additional putative membrane spans were translated (cf. Fig. 4B). We conclude that the domain probed by NG2 resides intracellularly in native SGLT1, and that the 48-amino acid insertion in NG2 had perturbed translocation of the first transmembrane span, inverting local topology.

Fig. 5B schematizes derivation of a corrected secondary structure model from that in Fig. 1.


Figure 5: Summary and alignment of results of N-glycosylation mutant scanning of SGLT1, predictions of transmembrane helices by the neural network PredictProtein (25) and the program MEMSAT(26) , and the results of a hydrophobicity/reverse-turn analysis of the transmembrane helix ends. The N-glycosylation status, after expression in Xenopus oocytes, of the mutants NNSS1-NNSS12 and NG1-NG12 are indicated in the paired boxes, with the NG status appearing above the NNSS status. Mutants that were N-glycosylated are indicated by a +. The two natively-occurring N-glycosylation consensus sequences and their natural N-glycosylation status are indicated by the two triangles. Membrane spans predicted by the PredictProtein neural network are indicated by the hatched bars. The spans indicated by the upper hatched bars were predicted from the sequence alignment S6 of one isoform each of the six eukaryotic SGLT1 homologues, including human SGLT1, pSGLT2 (39) , the rabbit Na/nucleoside cotransporter(40) , the human Na/myoinositol cotransporter(41) , and two sequences of unknown function, RK-D (42) and ST1(43) . The spans indicated by the lower hatched bars were predicted from a second sequence alignment S6B5, which includes the six eukaryotic homologues and five bacterial homologues, the Na/proline cotransporter(38) , the Na/pantothenate cotransporter(44) , and three hypothetical proteins known from genomic sequencing, with SwissProt numbers P32705 and P31448 and GenPept number X86084. MEMSAT predictions are shown for SGLT1 and the Na/myoinositol cotransporter SMIT1 (upper and lower gradient-filled bars, respectively) using the following parameters: minloop, 2 amino acids; minhelix, 19 amino acids; maxhelix, 31 amino acids; and minhelixscore = -100. MEMSAT predictions of helix orientations are indicated by the gray-to-white (intracellular-to-extracellular) shading. Black bars show predictions of SGLT1 transmembrane helix ends (methodologies of White and Jacobs(27) , applied as outlined in Fig. 6).




Figure 6: IFH and RT propensity analysis of human SGLT1. A, the IFH averages of a scanning 19-amino acid window are shown for three assumptions of side chain-side chain satisfaction of hydrogen bonds ranging from none (h = 0) to complete (h = 1). Helix centers of the transmembrane spans (MS1-MS13) correspond to the amino acid residue at the peak in a broad hydrophobic peak and are indicated by filled cross-marks 19 amino acids wide. Intracellular polar interhelix regions are indicated by a star. B, the RT propensity averages of a scanning window 19 amino acids wide. Helix centers of the transmembrane spans correspond to the amino acid of the minimum value in a broad RT minimum, and are indicated by unfilled cross-marks 19 amino acids wide. C, the IFH averages of a scanning trapezoidal window 5 amino acids wide are shown for h = 0, 0.5, and 1. The 5-amino acid window averages were iteratively scan-averaged using a 3-amino acid window, thereby trapezoidally weighting the central amino acid values of the resulting 7-amino acid window to smooth the curve. The window's nominal width at half-height remains 5 amino acids (see (27) ). Helix centers from (A) above are superimposed. Helix end zones are indicated by the gray boxes and were determined from the h = 0.5 curve. They are defined to lie between the edgemost secondary peak on a broad hydrophobic peak and the adjacent secondary peak. End zones that overlap completely are indicated by black boxes. D, RT(5) averages of a scanning trapezoidal window 5 amino acids wide are shown. Superimposed are the helix centers (cross-marks) localized in A and B above, and the helix end zones localized in (C). The RT(5) peak falling within each end zone was used to estimate the helix end, and is marked by a short vertical line.



Mutant Cotransporter Functional Activity

The 12 NNSS mutants were expressed in oocytes and assayed for cotransporter function by measuring radiotracer uptake from 50 µMD-[alpha-methyl-^14C]glucopyranoside. The N248Q mutant, which lacks native N-glycosylation, was fully functional (295 ± 21 S.D. pmol oocyte h) compared with native SGLT1 (305 ± 23 S.D. pmol oocyte h), consistent with a previous report that N-glycosylation was unnecessary for function (13) . Immunoblots of oocyte extracts indicated equivalent quantities of N248Q and wild-type protein synthesized. Preliminary electrophysiological results (not shown) indicated that the apparent affinities for alpha-methyl-D-glucopyranoside transport of the N248Q mutant and native SGLT1 were equal (K(m) = 0.47 ± 0.09 mM for N248Q and 0.49 ± 0.03 mM for wild type at -150 mV).

10 of 13 NNSS mutants were devoid of cotransport activity. Only NNSS13, NNSS1, and NNSS11 retained activity levels of 100, 13, and 7% that of native SGLT1, respectively. It is interesting that a 4-amino acid insertion (2 amino acid in NNSS12) usually eliminated function. In contrast, four of four cystic fibrosis transmembrane conductance regulator scanning mutants (11) bearing small insertions were fully functional in mammalian cells. 10 of 15 glucose carrier Glut 1 mutants (12) bearing a 41-amino acid insertion retained 2.5-30% deoxyglucose transport activity in oocytes. SGLT1 sensitivity to even small insertions likely reflects complex structural constraints imposed by the requirements of transport-coupling three solutes.

Computer Predictions of Membrane Topology

The unnatural topologies of NG1 and NG2 demonstrated that insertional probes of membrane topology can instead alter it, consistent with other work (31) . Newer computer algorithms for topological predictions from the primary sequence can thus be of particular utility for assessment of experimentally determined topology. Two empirically-based computer algorithms predictive of transmembrane alpha-helices were applied to evaluate the topology indicated by experiment. A third algorithm based empirically and thermodynamically on the hydrophobic partitioning, hydrogen-bonding and reverse-turn propensities of amino acid residues was applied to SGLT1 in order to predict the actual transmembrane helix ends.

PredictProtein Algorithm

The PredictProtein neural network (25) accurately predicts transmembrane helices from multiple sequence alignments, ideally of homologues with differing levels of identity. The algorithm is trained with experimentally determined transmembrane helices. 95% of them are subsequently predicted correctly, 5% of globular proteins are incorrectly predicted to contain a membrane helix, and no helices are predicted for the 16 transmembrane beta-strands of porin. Transmembrane predictions were made for two multiple sequence alignments of SGLT1 homologues, each representing a different range of sequence identity. Output per residue was a simple two-state determination (transmembrane helix/not transmembrane). The alignment ``S6'' included SGLT1 and five eukaryotic homologues of 75, 59, 54, 53, and 50% sequence identity to SGLT1 (Fig. 5); the alignment ``S6B5'' also included five bacterial homologues of 27, 26, 23, 21, and 20% identity. 13 membrane spans were predicted using either alignment (Fig. 5A). Span 2 was not predicted, and span 7 was rejected in secondary ``filtered'' predictions because too few contiguous amino acids scored a moderate transmembrane helix probability.

MEMSAT Algorithm

The program MEMSAT (26) utilizes empirical topological data from membrane proteins, assigning residues to one of five states (inner helix end/middle helix/outer helix end, and within cell/out-of-cell). Normalized statistical propensities (log likelihood ratios) of amino acid species occurrence in each state are incorporated in five frequency tables. (Chi-square evaluations show each state to be distinguishable from the other four at a confidence level of geq98%, validating the arbitrary 5-fold division.) Each residue in a query protein is scored according to its state occurring in a series of dynamically generated models, with hypothetical topologies ranging from 1 to n (arbitrary) membrane spans. The highest scoring topology(s) represents the global optimum model(s).

Application of MEMSAT individually to six eukaryotic SGLT1 homologues (Fig. 5A) gave the highest scores to 13-, 14-, and 15-span topologies. The 13- and 15-span models were ruled out because they predicted an intracellular N terminus. Gradient-filled bars of Fig. 5A indicate the 14 transmembrane helices and orientations predicted by MEMSAT for human SGLT1 and, positioned below, those for the eukaryotic homologue of least identity, SMIT1. Predicted and experimental transmembrane orientations accord.

Experimental results and computer predictions agreed, except for MS2, which PredictProtein failed to predict (Fig. 5A). The two computer algorithms provided complementary information. The PredictProtein neural network utilized the evolutionary information content of multiple sequence alignments without assumptions of helix length, hydrophobicity, etc. MEMSAT's five-state scoring basis permitted predictions of transmembrane helical orientations.

IFH/RT Propensity Prediction of Transmembrane Helix Ends

Neither PredictProtein, MEMSAT, nor our experimental results indicate the transmembrane helix ends. The elegant work of White and Jacobs (27) showed that helix end prediction could be achieved by applying a set of algorithms based on (i) thermodynamics of the free energy changes accompanying helix formation, (ii) the hydrophobic propensities of side chains to partition into the membrane interface layers and the hydrocarbon interior, (iii) the satisfaction of H-bonding of polar side chains by residue-residue interactions upon partition into the hydrocarbon or interfacial layers of the membrane, and (iv) on the empirical propensities of reverse-turn-promoting amino acids to flank transmembrane helix ends. Application of these algorithms to the three membrane proteins comprising the bacterial photosynthetic reaction center complex (PSRC), whose precise structure is known from x-ray crystallography, yielded impressively accurate predictions(27) . All 22 transmembrane helix ends were correctly predicted, 80% within one amino acid.

The SGLT1 transmembrane helix centers were estimated from a plot 19-amino acid-wide sliding window averages of the amino acid interfacial hydrophobicities, IFH(h) of Jacobs and White (28) , where h represents the extent (range of 0-1) to which side chains capable of forming hydrogen bonds do so with one another rather than water (IFH plots resemble other hydrophobicity plots). Helix centers correspond to the peak amino acid within each broad hydrophobic peak. The IFH plot for h = 0.5 was used, because 50% of side chain potential H-bonds in the hydrophobic interiors of globular proteins are satisfied by side-chain/side-chain interactions (27, 32) . Fig. 6A shows human SGLT1 hydrophobicity plots of IFH(h) for h = 0, 0.5, and 1. Helix centers are marked by solid cross-marks 19 amino acids wide. (Broad IFH peaks corresponding to MS4-MS6 were more clearly indicated using an 11-amino acid window.)

Fig. 6B shows a plot of average RT propensity using a 19-amino acid window and the RT preferences from Table V of Levitt(29) . The plot inversely mirrors that of IFH hydrophobicity; broad valleys correspond to membrane spans, the amino acid at each valley's lowest point also indicates the helix center (hollow cross-marks, Fig. 6B). The correspondence between helix centers identified from the IFH(0.5) and RT plots (Table 1) was good, differing by 8 amino acids or less (average 3.5 amino acids).



Span MS2 is very unusual. Its helix center was not apparent, since MS2 lacks a broad IFH peak or RT valley. The other eukaryotic SGLT1 homologues also lack clear IFH and RT indications of MS2. The presence of MS2 is, however, apparent in IFH and RT plots of the bacterial Na/pantothenate cotransporter (SwissProt P16256) and hypothetical 62-kDa protein (SwissProt P31448) (not shown). The IFH maxima and RT minima of MS2 for both bacterial homologues all occur at the same amino acid position in aligned sequences. This position aligns with Asn-78 of SGLT1. The MS2 helix center of SGLT1, calculated as the midpoint between the helix ends (see below), corresponds to Ala-76, 2 amino acids upstream of Asn-78. This near coincidence of apparent helix centers restores some credibility to the MS2 helix end determination described below.

Helix ``end zones'' were next determined for each helix center indicated by the IFH(0.5) plot. Helix centers from IFH and RT plots were useful for determining the helix end zones of closely spaced helices. Averages of IFH(h) were replotted (Fig. 6C) using a centrally weighted trapezoidal window of 5 amino acids(27) . Transmembrane helices appear as broad hydrophobic peaks with smaller secondary peaks superimposed on, and flanking, them. Each PSRC helix end fell between an edgemost secondary peak of the main IFH(0.5) peak and an adjacent secondary flanking peak. SGLT1 helix ``end zones'' are indicated by the gray boxes in Fig. 6C and are listed in Table 1.

Each experimentally determined PSRC helix end was reliably indicated by an amino acid cluster with high RT propensity falling within the helix end zone. Fig. 6D shows a plot of the averaged RT propensities determined with a centrally weighted trapezoidal 5-amino acid window. Helix centers and end zones previously determined are superimposed above the RT(5) plot in Fig. 6D. An RT(5) maximum falls within each end zone or just outside it. In PSRC, the RT(5) maxima fall on average one amino acid before N ends and three amino acids after C ends. SGLT1 end zones, which did not overlap an RT(5) maximum, were nonetheless associated with one nearby that, after the +1 and -3 rules were applied, assigned the helix end to a point within the end zone (Table 1). Two exceptions were the helix C ends of MS3 and MS12, which lacked associated RT(5) maxima; these ends were instead defined by the helix end zone edge closest to a nearly appropriate RT(5) maximum(27) . Helix midpoints defined by two ends were calculated (Table 1), comparing well with the helix centers indicated by the extrema of the IFH and RT plots.

Clearly, the SGLT1 helix end predictions do not obviate the need for further physical experiments to refine our understanding of SGLT1 secondary structure, nor should overemphasis be placed on their presumed accuracy. Rather, the model that the helix ends predict can be used wisely to generate testable hypotheses of structure and function.

The IFH(h) plots of the PSRC L subunit reflect the topological orientation of the extra- and intracellular interhelical domains; the valleys of the IFH(h) plots representing the extracellular domains were characteristically shallower and more h sensitive(27) . This was interpreted to indicate that the interhelical domains that must cross the membrane during topogenesis could decrease the unfavorable free energy change during translocation by (i) replacing water H-bonded to side chains with inter-side chain H-bonds (h sensitivity) and (ii) by being less polar overall than cytoplasmic domains. The IFH(h) plots of SGLT1 show a general trend for the intracellular domains, indicated by asterisks in Fig. 6A, to correspond to the deepest valleys, but exceptions are apparent. No consistent difference was apparent between extra- and intracellular domain h-sensitivity. This disparity in IFH(h) behavior between SGLT1 and the PSRC L subunit may reflect different mechanisms of membrane protein topogenesis utilized by eukaryotes and bacteria (33) .

Implications of the New SGLT1 Model

In retrospect, only those NNSS mutants that presented the consensus farthest (14-27 amino acids) from the membrane interface were N-glycosylated. The distance ranged from 6 to 9 amino acids for the nonglycosylated extracellular NNSS insertions. A survey of native polytopic membrane proteins suggested that a minimal distance from the transmembrane span is necessary for a consensus to be N-glycosylated(14) .

We note that the helix end zones of MS7 were difficult to call. The mid to C-terminal portion of MS7 is extremely h sensitive, more so than any other span. At h = 1, a prominent IFH(5) secondary peak appears, presenting as only a shallow shoulder at h = 0.5 (Fig. 6C). If in fact these side chain H-bonding functional groups are co-satisfying in vivo, the predicted C end zone shifts N-terminally, and a more N-terminally located RT(5) peak defines the helix C end. Conservation of adequate helix length then forces the predicted N end to be defined by the more N-terminally located of the two RT(5) peaks that fall within the N end zone. The helix consequently shifts upstream by 6 and 12 amino acids at the N- and C ends, respectively. This may indicate two quasi-stable states of MS7 that could correlate with a movement of the MS7 helix normal to the membrane accompanying Na/glucose cotransport.

The N end of helix MS7 is adjacent to the highest and widest peak of reverse-turn propensity of any membrane span. This, and the high polarity of the h-sensitive C-terminal portion may explain why the PredictProtein neural network assigned only modest transmembrane helical probability to 10-11 amino acids near the helix center. The MS7 transmembrane span is clearly atypical. The C-terminal portion of the MS7 helix contains the most highly conserved sequence of amino acids (WYWCXDQVIVQR) found in 13 eukaryotic family members. Residues within and flanking MS7 have recently been shown to account for five of the 16 severest SGLT1 missense mutations, found in patients diagnosed with glucose/galactose malabsorption (34) .

The N end zone of MS11 was similarly problematic. Secondary IFH(5) peaks defining the N end zone, and thus the bracketed RT(5) peak defining the N end itself, were selected from two alternatives such that the calculated helix midpoint best corresponded with helix centers indicated by IFH and RT extrema (Table 1).

The revised secondary structure model of human SGLT1 incorporates topological information from all the foregoing analyses (Fig. 7). Two intrahelix salt bridges may be present, one in MS4 (Lys-157:Asp-161) and one in MS12 (Arg-499:Glu-503) (one helical turn separation per pair). Intrahelix salt bridges would decrease the unfavorable free energy change upon translocation into the membrane (35) . The SGLT1 mutants R499H (34) and K157A (^3)display impaired transport in oocytes. An interhelix salt bridge is suggested between Asp-294 in MS7 and Lys-321 in MS8, consistent with the short interhelical domain (Fig. 7).


Figure 7: Secondary structure model of human SGLT1 determined by N-glycosylation mutant scanning and the incorporating predictions of the transmembrane helix ends. The native N-glycan appears between MS5 and MS6. The probable SGLT1 helix ends as determined in Fig. 6and tabulated in Table 1have been incorporated into the model. The hydrocarbon interior is shown 30 Å thick, equivalent in length to a 20-amino acid alpha-helix and the interfacial layer 9 Å thick(28) .



MS10 and MS11 show near-mirror image profiles in the IFH and IFH(5) plots (Fig. 6, A and C). The extracellularly-oriented halves of their helices are similarly quite polar and h sensitive, suggesting close apposition in the bilayer and coordinated movement during cotransport, consistent with the short 10-amino acid interhelical domain. Most of the 14 spans show high h dependence, and of these most also show a high alpha-hydrophobic moment (not shown), implying contact with transported solutes and water.

The 31-amino acid region immediately upstream of MS3 deserves special mention. It resides partly extracellularly and comprises most of MS2. Nearly 50% of the residues are either Gly or Ala, the two smallest amino acid species; inclusion of Ser residues accounts for >60%. This intriguing pattern persists in a multiple sequence alignment of 13 known eukaryotic species (not shown). Enrichment in small amino acids of high RT propensity partly explains the lack of an IFH peak or RT valley typical of transmembrane helices, and PredictProtein's failure to identify MS2. The small amino acid size (low hydrophobic effect) and low polarity of the MS2 region could enable MS2 ``slippage'' transverse to the membrane with minimal free energy change. The upstream adjacent FFLAGRS sequence appears to be the domain best conserved between eukaryotic and bacterial homologues. Is the association of the two least well defined atypical transmembrane helices, MS2 and MS7, with the two most well conserved sequence motifs a coincidence?

Major revisions to the secondary structure model (Fig. 5B and 7) include (i) extracellular reassignment of the N terminus, (ii) two newly recognized membrane spans MS4 and MS12, (iii) antiparallel reorientation of helices MS1, MS2, MS3, MS13, and MS14, (iv) estimates of the helix ends, (v) contraction of several interhelical domains, particularly between MS5/MS6 (to 4 amino acids) and MS11/MS12 (to 3 amino acids), and (vi) cytoplasmic reassignment of the extremely polar region proximal to the C terminus, consistent with its immunohistochemical localization(36) . Antibodies studies place the C terminus of the E. coli Na/proline transporter putP in the cytoplasm(37) .

We infer from SGLT1 topology that the bacterial Na/proline and Na/pantothenate transporters incorporate a periplasmic N terminus and not 12 but 13 transmembrane spans; only one was assigned in the MS2/MS3 region(38) . Eukaryotic homologues end abruptly with a string of hydrophobic amino acids, which two algorithms predict form a 14th transmembrane helix. Nascent strand adsorption to the endoplasmic reticulum interface would hydrophobically precipitate backbone H-bonding (alpha-helix formation; the interface satisfies 60% of the hydrophobic effect) followed by partition into the hydrocarbon layer (28) ; MS14's low alpha-hydrophobic moment obviates a surface helix. The C terminus may either fully span the membrane with an emergent C-terminal carboxylate, or reside within the hydrocarbon layer as a helical buoy with depth of insertion determined by interhelical polar interactions. We are currently investigating whether MS14 spans the bilayer. Comparison with bacterial homologues suggests the 14th span evolved by extension of the C terminus.

In overview, a perusal of simple hydropathy plots directed the strategic placement of N-glycosylation consensuses in SGLT1. Glycosylation profiles of mutants elucidated the membrane topology, which was evaluated computationally. An important caveat bearing upon scanning mutagenesis emerged from this work: insertions near the signal anchor perturbed N-terminal topology, which advanced computational methods failed to correct. Functional expression of one glycosylated mutant contributed indispensably to the establishment of N-terminal extracellularity, augmented by truncation experiments. Knowledge of correct SGLT1 membrane topology and the helix end predictions help focus future experimental designs directed at uncovering substrate binding sites, helix-helix bundling within the membrane, postranslational modifications, and other structure/function relationships.


FOOTNOTES

*
This work was support by Grants DK-44582 and DK-44602 from NIDDK, National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
To whom correspondence should be addressed: UCLA School of Medicine, Box 951751, Los Angeles, CA 90095-1751. Tel.: 310-825-6968; Fax: 310-206-5886.

Supported by a fellowship from the Departamento de Educación y Ciencia, Spain.

(^1)
The abbreviations used are: NG, native glycosylation; PNGase F, peptidyl N-glycanase F; IFH, interfacial hydrophobicity; RT, reverse turn; MS1 through MS14 refer to the experimentally determined membrane spans; PSRC, photosynthetic reaction center complex; PAGE, polyacrylamide gel electrophoresis.

(^2)
E. Turk and E. M. Wright, unpublished results.

(^3)
M. Panayotova-Heiermann, personal communication.


ACKNOWLEDGEMENTS

We thank Manuela Contreras for superb technical assistance with the oocyte injections and radiotracer uptake assays. We thank Drs. Martín G. Martín and Bruce Hirayama for many productive discussions and critical suggestions. We also thank Ana Herdocia, Edgar Gutierrez and Jason Lam for excellent technical assistance.


REFERENCES

  1. Wright, E. M., Hager, K. M., and Turk, E. (1992) Curr. Opin. Cell Biol. 4, 696-702 [Medline] [Order article via Infotrieve]
  2. Reizer, J., Reizer, A., and Saier, M. H., Jr. (1994) Biochim. Biophys. Acta 1197, 133-166 [Medline] [Order article via Infotrieve]
  3. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132 [Medline] [Order article via Infotrieve]
  4. Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. (1984) J. Mol. Biol. 179, 125-142 [Medline] [Order article via Infotrieve]
  5. White, S. H. (1994) in Membrane Protein Structure (White, S. H. ed) pp. 97-124, Oxford University Press, New York
  6. Hediger, M. A., Coady, M. J., Ikeda, T. S., and Wright, E. M. (1987) Nature 330, 379-381 [CrossRef][Medline] [Order article via Infotrieve]
  7. Hediger, M. A., Turk, E., and Wright, E. M. (1989) Proc. Natl. Acad. Sci. U. S. A. 86, 5748-5752 [Abstract]
  8. Lee, W.-S., Kanai, Y., Wells, R. G., and Hediger, M. A. (1994) J. Biol. Chem. 269, 12032-12039 [Abstract/Free Full Text]
  9. Turk, E., Martín, M. G., and Wright, E. M. (1994) J. Biol. Chem. 269, 15204-15209 [Abstract/Free Full Text]
  10. Olender, E. H., and Simoni, R. D. (1992) J. Biol. Chem. 267, 4223-4235 [Abstract/Free Full Text]
  11. Chang, X. B., Hou, Y. X., Jensen, T. J., and Riordan, J. R. (1994) J. Biol. Chem. 269, 18572-18575 [Abstract/Free Full Text]
  12. Hresko, R. C., Kruse, M., Strube, M., and Mueckler, M. (1994) J. Biol. Chem. 269, 20482-20488 [Abstract/Free Full Text]
  13. Hediger, M. A., Mendlein, J., Lee, H. S., and Wright, E. M. (1991) Biochim. Biophys. Acta 1064, 360-364 [Medline] [Order article via Infotrieve]
  14. Landolt-Marticorena, C., and Reithmeier, R. A. (1994) Biochem. J. 302, 253-260 [Medline] [Order article via Infotrieve]
  15. Aubert, J. P., Biserte, G., and Loucheux-Lefebvre, M. H. (1976) Arch. Biochem. Biophys. 175, 410-418 [Medline] [Order article via Infotrieve]
  16. Aiyar, A., and Leis, J. (1993) BioTechniques 14, 366-369 [Medline] [Order article via Infotrieve]
  17. Sarkar, G., and Sommer, S. S. (1990) BioTechniques 8, 404-407 [Medline] [Order article via Infotrieve]
  18. Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K. (1994) Current Protocols in Molecular Biology , John Wiley & Sons, Massachusetts
  19. Laemmli, U. K. (1970) Nature 227, 680-685 [Medline] [Order article via Infotrieve]
  20. Fujiki, Y., Hubbard, A. L., Fowler, S., and Lazarow, P. B. (1982) J. Cell Biol. 93, 97-102 [Abstract]
  21. Parent, L., Supplisson, S., Loo, D. D. F., and Wright, E. M. (1992) J. Membr. Biol. 125, 49-62 [Medline] [Order article via Infotrieve]
  22. Hirayama, B. A., Wong, H. C., Smith, C. D., Hagenbuch, B. A., Hediger, M. A., and Wright, E. M. (1991) Am. J. Physiol. 261, C296-C304
  23. Haycock, J. W. (1993) Anal. Biochem. 208, 397-399 [CrossRef][Medline] [Order article via Infotrieve]
  24. Lostao, M. P., Hirayama, B. A., Loo, D. D. F., and Wright, E. M. (1994) J. Membr. Biol. 142, 161-170 [Medline] [Order article via Infotrieve]
  25. Rost, B., Casadio, R., Fariselli, P., and Sander, C. (1995) Protein Sci. 4, 521-533 [Abstract/Free Full Text]
  26. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1994) Biochemistry 33, 3038-3049 [Medline] [Order article via Infotrieve]
  27. White, S. H., and Jacobs, R. E. (1990) J. Membr. Biol. 115, 145-158 [Medline] [Order article via Infotrieve]
  28. Jacobs, R. E., and White, S. H. (1989) Biochemistry 28, 3421-3437 [Medline] [Order article via Infotrieve]
  29. Levitt, M. (1978) Biochemistry 17, 4277-4285 [Medline] [Order article via Infotrieve]
  30. Hartmann, E., Rapoport, T. A., and Lodish, H. F. (1989) Proc. Natl. Acad. Sci. U. S. A. 86, 5786-5790 [Abstract]
  31. Parks, G. D., and Lamb, R. A. (1993) J. Biol. Chem. 268, 19101-19109 [Abstract/Free Full Text]
  32. Chothia, C. (1975) Nature 254, 304-308 [Medline] [Order article via Infotrieve]
  33. Geli, V., and Benedetti, H. (1994) Subcell. Biochem. 22, 21-69 [Medline] [Order article via Infotrieve]
  34. Mart í n, M. G., Turk, E., Lostao, M. P., Kerner, C., and Wright, E. M. (1995) Nature Genet. , in press
  35. Engelman, D. M., Steitz, T. A., and Goldman, A. (1986) Annu. Rev. Biophys. Biophys. Chem. 15, 321-353 [CrossRef][Medline] [Order article via Infotrieve]
  36. Takata, K., Kasahara, T., Kasahara, M., Ezaki, O., and Hirano, H. (1991) J. Histochem. Cytochem. 39, 287-298 [Abstract]
  37. Komeiji, Y., Hanada, K., Yamato, I., and Anraku, Y. (1989) FEBS Lett. 256, 135-138 [CrossRef][Medline] [Order article via Infotrieve]
  38. Nakao, T., Yamato, I., and Anraku, Y. (1987) Mol. & Gen. Genet. 208, 70-75
  39. Mackenzie, B., Panayotova-Heiermann, M., Loo, D. D. F., Lever, J. E., and Wright, E. M. (1994) J. Biol. Chem. 269, 22488-22491 [Abstract/Free Full Text]
  40. Pajor, A. M., and Wright, E. M. (1992) J. Biol. Chem. 267, 3557-3560 [Abstract/Free Full Text]
  41. Berry, G. T., Mallee, J. J., Kwon, H. M., Rim, J. S., Mulla, W. R., Muenke, M., and Spinner, N. B. (1995) Genomics 25, 507-513 [CrossRef][Medline] [Order article via Infotrieve]
  42. Pajor, A. M. (1994) Biochim. Biophys. Acta 1194, 349-351 [Medline] [Order article via Infotrieve]
  43. Hitomi, K., and Tsukagoshi, N. (1994) Biochim. Biophys. Acta 1190, 469-472 [Medline] [Order article via Infotrieve]
  44. Jackowski, S., and Alix, J. H. (1990) J. Bacteriol. 172, 3842-3848 [Medline] [Order article via Infotrieve]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.