(Received for publication, September 19, 1996, and in revised form, December 2, 1996)
From the Department of Molecular Biology and
Genetics, University of Guelph, Guelph, Ontario N1G 2W1, Canada and
§ Department of Biochemistry Research, Hospital for Sick
Children, 555 University Avenue,
Toronto, Ontario M5G 1X8, Canada
A computational model of myelin basic protein
(MBP) has been constructed based on the premise of a phylogenetically
conserved -sheet backbone and on electron microscopical
three-dimensional reconstructions. Many residues subject to
post-translational modification (phosphorylation, methylation, or
conversion of arginines to citrullines) were located in loop regions
and thus accessible to modifying enzymes. The triproline segment
(residues 99-101) is fully exposed on the back surface of the protein
in a long crossover connection between two parallel
-strands. The
proximity of this region to the underlying
-sheet suggests that
post-translational modifications here might have potential synergistic
effects on the entire structure. Post-translational modifications that
lead to a reduced surface charge could result first in a weakened
attachment to the myelin membrane rather than in a gross conformational
change of the protein itself. Such mechanisms could be operative in
demyelinating diseases such as multiple sclerosis.
Myelin basic protein (MBP)1 is one of
the most important proteins of the myelin sheath (1-6). Its
significance is demonstrated in the shiverer mutant of
mouse, which has only a small amount of structurally unstable myelin
because the gene for MBP is mostly deleted (7, 8). This trait is
recessive and inherited in a Mendelian manner, indicating that MBP is
coded for by a single gene. In mammals, the gene consists of seven
exons, and differential splicing of the primary MBP mRNA leads to
different isoforms of MBP, i.e. forms of differing molecular
weights (9-11). Alternative splicing of mRNA transcripts is a
common mechanism for generating protein diversity. The MBP gene is thus
similar to the genes for SV40 T and t antigens, fibrinogen, lens -A
crystallin, and troponin T, in all of which primary transcripts with
identical termini are alternatively spliced to yield different mature
mRNAs (10). The shark MBP gene has also been cloned and has
revealed a similar exon structure, indicating that this protein issued
early in vertebrate evolution (11).
In mammals, the 18.5- and 14-kDa isoforms of MBP are the most common, although the relative proportions vary during development and among species. Henceforth, unless otherwise specified, we shall be using "MBP" to refer to the 18.5-kDa form. Each isoform of MBP can exist as one of many possible charge isomers, due to various post-translational modifications (3, 4). These charge isomers are denoted C1, C2, C3, C4, C5, C6, and C8, according to their elution profile on a cation exchange column at pH 10.5 (12). Component C1 is the least modified and most basic component, while successive components differ sequentially by the loss of a positive charge. Component C8 is an isoform of MBP that does not bind to the resin, and it is the most modified, containing several citrullinyl residues (13). The post-translational modifications include phosphorylation, ADP-ribosylation, and conversion of arginines to citrulline (3, 4, 13). The latter change is relevant to multiple sclerosis; often four or five arginines are so converted (14). In a recent case of fulminating multiple sclerosis known as Marburg's Disease, a young (26-year-old) woman presented with the disease and died within 6 weeks. In the MBP extracted from the autopsied brain, 18 of 19 arginine residues had been citrullinated (15).
The sequences of most forms of MBP from numerous species are known, with the human and bovine forms having been sequenced first (16, 17). We shall henceforth be referring to the human sequence. The single Trp116 serves as a focus for immunological properties of MBP and is proximal to a triproline (Pro99-Pro100-Pro101) segment (2). Residues around this triproline segment are often modified. Myelin basic protein has a number of sequence and other similarities with other protein families (18, 19). Properties such as charge density, post-translational modification by addition of fatty acids, and overall hydrophobicity are similar in MBP and proteolipid protein from the central nervous system and in the pulmonary surfactant proteins SP-B and SP-C (20). Other sequence similarities with viral proteins have stoked conjectures on viral involvement in multiple sclerosis (21-24).
Because of the reluctance of MBP to form crystals (25), its detailed tertiary structure is not known. The main structural models of this protein that exist are from the 1980s and represent abstract syntheses of biochemical data and secondary structure prediction algorithms (Refs. 26-29; see also Refs. 30, 31). The structure of a small subsegment (five residues) has recently been solved by nuclear magnetic resonance (32). The most sophisticated structural models of the whole protein remain those of Stoner (27) and Martenson (28), which were based on extensive biochemical and secondary structural data. These structures were represented at the time schematically, with parts thereof as plastic Corey-Pauling-Koltun space-filling models. Our own recent work has comprised electron microscopical investigations of the tertiary structure of bovine MBP, which is almost identical to human MBP (see accompanying paper (33)). In the process of recreating computational representations of both the Stoner and Martenson models for comparison with our experimental results, we designed a revised structure, incorporating our new electron microscopical reconstructions as tertiary structural constraints. This new model of human MBP is presented here.
The main tool used in this aspect of our studies was the INSIGHT
II molecular modeling software package (Biosym Corp., Parsippany, NJ)
running on an IBM RISC/6000 Powerstation 3AT (International Business
Machines, Markham, Ontario, Canada). This commercial program contains
utilities for building polymers of nucleic acids and proteins, multiple
sequence alignment and homology modeling, structure refinement by
rotamer rotation, energy function minimization, and graphic display of
specified sites. There are no crystallographic or NMR structures of any
proteins with significant overall sequence similarity to MBP in the
Brookhaven Protein Data Bank (PDB). We thus could not construct MBP by
homology modeling but used instead a piecemeal strategy. We relied
strongly on Stoner's (27, 29) and Martenson's (26, 28) definitions of
the residues forming a -sheet secondary structure.
For Stoner's model (27), we required a flat antiparallel -sheet
(Fig. 1), which we derived initially from the structure of excitation
energy transfer bacteriochlorophyll A protein (PDB accession code
3bcl). The INSIGHT program was used to make the putative
-helical
regions directly using specified values of (
,
). These
-helices
were placed approximately where Stoner indicated them to lie. Loops
then joined all the secondary structure elements, and the entire
structure was refined by rotamer selection and energy function
minimization. Rotamer selection means choosing the best of a series of
known possible amino acid side chain conformations to minimize steric
overlap with any other atoms in the structure. The energy function
comprises electrostatic (including hydrogen bonding) and van der Waals
forces among the atoms of the model. Physically implausible models
(e.g. with overlapping atoms) yield divergent results at
this point. This energy is not the free energy of folding of
the protein but serves only as an internal guide to refining the
structure.
For Martenson's model (28), we required two layered -sheets. The
relevant domain of the bacteriochlorophyll A protein was again the
starting point, although this time two bacteriochlorophyll A sheets
were placed over one another and then spliced together to form the
twist as indicated by Martenson in a schematic diagram (see Fig. 1).
Loops again joined the
-strands. There were no
-helices defined
here.
Finally, our model was based on the previous work by Stoner and
Martenson as well as on our new electron microscopical data (33). For
this model, we first used the flat antiparallel -sheet from the
structure of bacteriochlorophyll A protein and later, for the results
presented here, from the structure of severin (PDB accession code
1svr). Where experimental data on the structure were available, such as
the NMR structure of a tetradecapeptide repeat (32), they were
incorporated into our model. Amino acids 98-102 in human MBP
(Thr-Pro-Pro-Pro-Ser) showed a 100% identity with a pentapeptide
segment in endo-
-N-acetylglucosaminidase F1 (PDB
accession code 2ebn), and the latter coordinates were used directly to
model the former region.
To begin, the relevant regions of the human MBP sequence were modeled
as -strands using calculated (
,
) angles and overlaid onto the
-sheet of severin, and the
-strand coordinates of the known
structure were used to form the nascent human MBP model. The 2ebn
pentapeptide comprising the triproline segment was moved to fit into
the right-handed crossover region between adjacent parallel
-strands
3 and
4. The (
,
) angles of the tetradecapeptide solved
recently by NMR were used to model this segment, move it into a correct
position with respect to the
-sheet, and assign coordinates.
Finally, these segments were joined by loops using the electron
microscopical reconstructions to constrain the loops' positions. As
before, structure refinement comprised rotamer definition and energy
function minimization.
Post-translational modifications were constructed interactively using the utilities of the INSIGHT II program. To perform a computational phosphorylation, for example, a phosphate group was selected from an internal library and then bonded onto the rest of the molecule using a built-in function. The bond type (e.g. single) is chosen by the user, as well as the two hydrogen atoms to be removed by the bonding reaction. A computational citrullinlization was performed essentially the same way after deleting one of the terminal NH2 moieties of the guanidino group of arginine and replacing it with an oxygen atom.
Although the sequences of MBP from many species are known, there is no accurate way to calculate a tertiary structure from them alone. If the atomic structure of a protein with a significantly similar amino acid sequence were known, then one could align the MBP sequence with this homologous sequence and assign atomic coordinates to the amino acids of MBP based on the known structure. Refinement of the predicted structure then involves rotating amino acid side chains to prevent steric overlap, filling in gaps in the structure by loops, and finally modifying bond lengths and atomic coordinates to minimize an energy function as defined above. Unfortunately, this homology modeling approach per se is not viable here because no atomic structures are known of proteins similar enough to MBP. Nonetheless, we have created here three quantitative models of MBP starting from different premises and essentially constructed interactively residue by residue.
Previous Structural Models of Human MBPIn the 1980s,
three-dimensional models of MBP were proposed independently by Stoner
(27, 29) and by Martenson (26, 28). Stoner (27) proposed a structural
model for MBP comprising a five--sheet backbone, whose regions were
predicted using Chou-Fasman and other current secondary structure
prediction algorithms and which were arranged in a Greek key formation
with two small
-helical segments (Fig.
1a). Martenson (28) invoked hydrophobic
packing considerations and developed several more complex MBP models
based on orthogonally packed
-sheets. Immune studies and the
properties of splice variants were used to formulate the relative
arrangements of the
-strands, a series of rules that guide any
modeling of MBP. Strand
3 had to interact with strand
2 and with
amino acids further downstream (possibly on strand
4 or on strand
5). Strand
5 had to be on the exterior of the sheet, because a
splice variant does not have this segment but still must have a similar
tertiary structure. By this reasoning, it is unlikely that strand
5
is on the inside of the sheet. The
-sheet is mostly antiparallel because this arrangement is stabler than a parallel one. The folded over sheets were the result of
-bulges, but no
-helices were included here. One of Martenson's six models is presented in Fig. 1b. Neither Stoner's nor Martenson's model ever existed
entirely in a numerical form, i.e. as a file of atomic
coordinates that could be visualized using molecular graphics programs.
Moreover, these models were not unique in that many potential
arrangements of the
-sheets and connections between them were
possible. As part of our electron microscopical investigations of the
structure of MBP (33), we required a manipulatable form of each
structure for comparison with our results.
To begin with and partly as an academic exercise (in retrospect), we
used a new secondary structure prediction method of Rost and Sander
(34, 35) based on sequence alignment and neural network prediction and
available to us via an automatic mail server (36). The results (not
shown) generally confirmed the Stoner and Martenson positioning of the
-sheets. However, many considerations other than computational
secondary structure prediction (37) support the idea of the
-sheet
backbone, including experimental circular dichroism data (38-40), and
these arguments are presented well in the original papers (26-29). Our
final decision was to remain conservative and retain residues 14-21 as
1 (
-strand 1), 37-45 as
2, 86-92 as
3, 109-116 as
4,
and 149-157 as
5. We scanned the Brookhaven Protein Data Bank, a
data base of known protein structures derived by crystallography or
NMR, for one comprising a flat, antiparallel
-sheet akin to that
proposed by Stoner (27). The two best candidates found were excitation energy transfer bacteriochlorophyll A protein and severin. For both the
Stoner (27) and Martenson (28) models, the putative
-sheet regions
of human MBP were threaded into the bacteriochlorophyll A
-sheet.
The putative
-helical regions for the Stoner model were generated
de novo. The intervening segments in both models were
allowed to form coils. In Fig. 2, we present
space-filling representations of our creations of these two models.
Interestingly, in both of these structures, a number of the more
interesting sites of post-translational modification, such as
Ser7, Thr98, and the various arginines are all
exposed on the surface.
A New Structural Model of Human MBP
The Stoner and Martenson
models of MBP represent thoughtful syntheses of biochemical data
available on this protein. Both structures are plausible within the
limitations of a rarefied computational representation. However, the
shapes of these models cannot easily be reconciled with our derivation
of the appearance of MBP from electron microscopical data (33). Our
three-dimensional reconstruction has an outer circumference of
approximately 15 nm and a thickness of 2.5 nm (roughly the difference
between the outer and inner radii). The Stoner and Martenson models
both have the two longest loop regions (between -sheets
2 and
3 and sheets
4 and
5) on the same side of the molecule. As a
result, the maximum length of these models is about 10 nm. Also, the
Martenson model is too thick to fit into the experimentally determined
volume. As a result, we constructed a new model to fit our electron
microscopical reconstruction.
To begin our new model, we retained the idea of a -sheet backbone
envisaged by Stoner and Martenson but did not retain their strand
order. The
-sheet coordinates were derived from the structure of
severin (Fig. 3c) (41). We further reasoned
on the basis of total length (circumference) considerations that the
two long loops must be on opposite sides of the
-sheet. The
-sheet was placed in the center of the electron microscopical volume
to allow the long loops to fit into it. Martenson's rules (based on
immunological and other data and described above) are still consistent
with approximately 60 arrangements of the
-sheets. Fortunately, the dimensions of the EM reconstruction constrained the number of potential
arrangements of the
-sheets to two (Fig. 3, a and
b). The first topology has two crossovers and an uncommon
antiparallel
-sheet arrangement. We chose the second topology with
only one right-handed (more common than left-handed in biological
systems) crossover connection. The first topology with two crossovers
would have made the molecule too thick, notwithstanding that one
crossover would have been too short to traverse the needed distance.
The
-sheet was oriented so that the right-handed crossover
connection was on the outer surface. Surface "bumps" on the
exterior fit the crossover better, and modifiable amino acids became
accessible. Amino acids 5-11 were linked using (
,
) angles
derived from a recent NMR study (32).
In Fig. 4, we show the correspondence between our model
and the electron microscopical reconstruction of bovine MBP/C1 in low
salt buffer (33). This structure was an open "C" shape. A second
reconstruction in higher salt buffer yielded a more compact form of the
protein, but a form that could be seen to represent a closing in of the
"C". The human MBP model could be fit into this new volume simply
by cutting bonds to the loop regions, reorienting them, and resplicing.
In Fig. 5, a and b, more detailed
space-filling representations show the general shape and especially the
modifiable sites of both of these new human MBP models. As noted above,
the backbone of the structure is a -sheet modeled after that of
severin. Both severin and MBP are actin- and lipid-binding proteins
(41-44). The triproline segment is fully exposed on the back surface
of the protein, in a crossover loop between adjacent parallel strands. There is a positive congruence in that this region in the bacterial endo-
-N-acetylglucosamine F1 (2ebn) is also located in a
crossover between parallel
-strands. In our human MBP model, many
sites of post-translational modification are clustered around the
triproline segment. It is tempting to speculate that the clustering of
modifiable amino acids at such a position connecting two
-strands
has structural importance. We can suggest with somewhat more certainty
that the reduced surface charge density upon citrullinization of
arginines (as occurs in multiple sclerosis) will reduce the interaction with lipids in the myelin membrane, accounting for a certain amount of
destabilization.
Given the limited resolution of the first electron microscopical
reconstructions (33), the correspondence that could be achieved between
the experimental data and our model is remarkable. Although we do not
wish to encourage the practice of formulating atomic models of proteins
based solely on such electron microscopical structures, this strategy
appears to have been fruitful here for human MBP. This is a small
protein for which no direct structural information has hitherto been
available, yet for which a wealth of biochemical data could be
exploited to initiate and guide the model building process. The
envelope of the three-dimensional reconstruction within which we had to
fit the atomic model was a valuable constraint, which reduced the
number of possible topologies of -strand and loop arrangement to
only two. The extended "C" shape of the protein is credible given
the peripheral membrane association of this protein, and its
conformational flexibility in different conditions is also likely
(38-40). The recent literature has examples of other proteins that
have been described as "holy" (sic., meaning
"holey", i.e. with a distinct nonglobular fold) (45-48). There is also a family of pleckstrin homology domains (49)
identified in various membrane-associated and signal transduction proteins that resembles our MBP model and might have a significant relationship.
Multiple sclerosis is a human demyelinating disease. An autoimmune response to one or more of its protein components is thought to be part of the pathogenesis. It has been postulated that MBP is the agent of autoimmunity and that post-translational modifications of MBP play a key role in the demyelinating process at the molecular level. Knowledge of the tertiary structures of the MBP isoforms and charge isomers is essential to understanding the organization of the myelin membrane and the mechanisms of development of autoimmunity in multiple sclerosis. However, MBP has not been crystallized, and may never be crystallized, for high resolution x-ray diffractometry.
A three-dimensional model of MBP structure can now only be formulated
with the assistance of structure prediction algorithms and on the basis
of extensive biochemical and biophysical data that are available. In
the accompanying paper (33), we have described our structural studies
of bovine MBP charge isomer C1 associated with lipid monolayers. By
electron microscopical angular reconstitution, we found bovine MBP/C1
to posses an overall "C" shape. In this paper, we used molecular
modeling software to create quantititive atomic coordinate models of
human MBP and to localize certain post-translational modifications
relevant to multiple sclerosis. The three-dimensional electron
microscopical reconstruction served as an envelope within which an
atomic model comprising five -sheets and a large proportion of
irregular coil could be uniquely packaged. In this model, the most
important modifiable amino acids are accessible to enzymes such as
kinases. To this extent, the model is plausible.
The model for human MBP that we present here is the first of its kind for this very important protein. It was inspired by model-building exercises of the last decade, and we anticipate that details will change as new data are incorporated into it. Its value shall lie in its utility as a workbench for incorporating experimental evidence and for formulating experimentally testable hypotheses. One idea suggested here is that deimination of argininyl residues, with the accompanying loss of positive charge, does not destabilize MBP's tertiary structure per se but rather its interactions with negatively charged lipids.
We are especially grateful to Bob Creedy (Computing and Communication Services, University of Guelph), for efforts in maintaining the molecular modeling workstation.