From the Department of Genetics, School of Medicine, Stanford University, Stanford, California 94305
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although there are currently over 13,000 phenotypically annotated mutations in the Human Genome Mutation Database (20), we have little understanding of how most of these mutations result in the clinical picture associated with them. In combination with other methods, molecular dynamic methods can provide some insight into how disease-associated mutations confer a phenotype.
Type I collagen, the most abundant protein in animals, is a structural protein. Among other functions, it protects soft tissues, supports them, and, in vertebrates, connects them with the skeleton. Type I collagen provides animals with the ability to withstand forces such as pressure, torsion, and tension as well as to transmit forces from muscle to the skeleton. Its most distinctive structural feature is a triple helix characterized by X-Y-Gly repeating amino acid motifs in each of the three chains. This sequence naturally adopts a triple helix structure once nucleation has occurred.
Collagen molecules form a diverse family of 24 types encoded by over 30 genes. Fibrillar collagens such as type I, which is the major protein in bone, contain a long uninterrupted triple helix of over 1000 residues. Non-fibrillar collagens such as type IV, which is a basement membrane collagen, contain numerous disruptions across the length of the triple helix. Type I collagen is a heterotrimer formed by the products of two genes: COL1A1 and COL1A2. The COL1A1 gene located on the long arm of chromosome 17 (17q21.3117q22.05), and the COL1A2 gene is on chromosome 7 (7q21.37q22.1) (4). The coordinates of these genes in the draft human genome sequence are: chromosome 17:5465361754671961 (COL1A1) and chr7:9275677492793062 (COL1A2) (genome.ucsc.edu/).
Clinically, mutations in type I collagen genes are associated with osteogenesis imperfecta (OI)1 and some forms of Ehlers-Danlos syndrome. Clinical, genetic, and radiological data have been used to classify OI into four types (for a review, see Ref. 5). OI type II is the most severe form and is usually lethal in the perinatal period. OI type I is characterized by multiple bone fractures, usually resulting from minimal trauma, autosomal dominant inheritance, and blue sclera. OI type III is a relatively severe form that is identified by very short stature, brittle bones, and blue sclera in infancy. OI type IV is intermediate in severity between OI type I and OI type III.
The majority of identified mutations are single nucleotide substitutions that result in alteration of glycine codons within the triple helical domain of either of the chains of type I procollagen. These mutations produce phenotypes that range from mild to lethal and appear to depend, in part, on the chain in which the substitution occurs, the position of the mutation in the chain, and the substituting amino acid. However, the rules that relate these characteristics, and the influence of additional factors, remain unspecified.
It has been difficult to understand and study the effects of these mutations in tissues or cells from individuals who are affected with them. In an effort to reduce the complexity of the molecule population, one group studied short peptides that incorporated sequences surrounding two mutations, the lethal mutation G913S and the non-lethal mutation G901S (6,7) by NMR spectroscopy (numbers 901 and 913 refer to the position within the triple helix, and G and S are the single letter codes for glycine and serine, respectively). The thermal stability of the sequences flanking the lethal mutation was decreased in comparison to the stability of the sequences flanking the non-lethal mutation. Another study showed that the type of amino acid substituted for native glycine affected thermal stability (8). For example, substituting alanine or serine caused the melting point of a triple helix to decrease by 35 °C, while substituting arginine caused it to decrease more than 45 °C. The study showed a correlation between the level of destabilization and the severity of OI (8). Thus, it was proposed that stabilization apparently plays a major role in the severity of OI-associated mutations. This fact also suggests that short homotrimers can be used to model the naturally occurring heterotrimers in some cases. Structural NMR studies on similar peptides suggested that the lethal mutation altered the structure of the triple helix asymmetrically (7). This asymmetric loss of triple helical structure was attributed to disruption of the folding of the triple helix.
Substitutions for glycine within the triple helix are severely destabilizing and are usually considered to interrupt the triple helix (10). We have reproduced the destabilization caused by introducing alanine into the central glycine position of short idealized collagen-like peptide models (11). These models predict that alanine is destabilizing because of unfavorable steric and electrostatic interactions. As the interstitial region of the triple helix fills with mutating residue side chains, main chain hydrogen bonds break near the interstitial region of the triple helix (12). The serine residue side chain contains the potential to hydrogen bond with the carbonyl oxygen and amide nitrogen of the peptide backbone. This interaction can either be with an adjacent chain, with itself, or with a neighboring residue on the same chain (13).
This study continues our modeling of collagen-like peptides. We have built simplified homotrimeric models of 57 OI-associated mutations in the COL1A1 gene and their corresponding wild type peptides. We analyzed energetic and structural relationships in abnormal collagen as well and identified structural features that may be important to collagen biology. When mutations were introduced, we observed a decrease in helical structure, specifically in main chain backbone hydrogen bonds, and an increase in main chain-bound water molecules. Because of the hydrogen bonding potential of the serine side chain, serine residue mutations are of particular interest. Mutant serine residues were usually observed hydrogen bonding with a backbone atom of an adjacent chain.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Homotrimeric structures were modeled using Gencollagen (16). Gencollagen models an arbitrary collagen-like sequence into a triple helical conformation and outputs a Protein Data Bank structure. These peptide models contained regular triple helical structures with extended side chains. Each of the 57 files that resulted from the modeling was copied and edited to give it the appropriate mutation position in the center of the mutation. This process gave 57 mutant structures and 57 native structures. Next we used the Leap program of the AMBER simulation package (17) to add hydrogen atoms to each of the structures and to add side chains for the mutant positions. All simulations were performed with AMBER version 6.0 (17). Counter-ions (Na+ or Cl-) were added to neutralize the simulation system.
Using Leap, each structure was placed in a box of solvent that extended at least 10 Å away from the triple helical peptide. The structures were equilibrated by performing 200 steps of conjugate gradient minimization on the entire system followed by a slow warm up to 280 K over 20 ps of solvent equilibration. Solvent equilibration was followed by a 3000-step minimization on the entire system and another slow warm up of the whole system to 280 K over 7.5 ps. Production dynamics were then performed for 400 ps at 280 K using a 2-fs time step. We chose 280 K because previous simulations showed that higher temperatures were often not suitable for simulating the more destabilized mutations.
After simulating each mutant and wild type peptide model, we calculated the structural differences between an idealized triple helix and the simulated structures. r.m.s.d. is used as a metric to determine the deviation from an idealized triple helical structure (defined from (POG)x)and is defined as the square root of the average square distance between points in two models being compared. Triple helix ideality is defined using backbone parameters from a polyproline II helix (,
,
of Gly-X-Y are -74°, 170°, 180°, -75°, 168°, 180°, -75°, 153°, and 180°). We calculate the r.m.s.d. by aligning the central region of a simulated structure with that of an idealized triple helix and then calculating the carbon
(C
) r.m.s.d. using the 9 C
atoms closest to the mutation site. We used central C
atoms because movement at the end of a peptide is very noisy and therefore does not relate how much perturbation is caused by the mutation itself. This protocol was performed on each simulation by sampling structures every picosecond. Average r.m.s.d. values were calculated from 120400 ps and compared between the wild type and mutated peptides, the residue identities, and the clinical OI type. r.m.s.d. values of 0.0 correspond to a perfectly idealized triple helix, and larger values represent larger deviations from ideality.
We also monitored the interruption of the regular hydrogen bonding network. This process identified all solute-solute hydrogen bonds that were present more than 80% of the time, beginning after 100 ps of dynamics. Hydrogen bonds were labeled "main chain" if both donor and acceptor were part of the main chain of the triple helix and thus were internal to the triple helix. All others were labeled side chain and thus were external to the triple helix itself. The first two triplets on each end were not included in the analysis. To determine whether solvent hydrogen bonding was compensating for the lost interchain hydrogen bonds, we counted the number of solvent backbone-solute hydrogen bonds in the wild type and mutant peptides. Because serines in the place of glycines in the triple helix have increased hydrogen bonding potential, we performed a special analysis of side chains when these residues were involved.
Due to the large volume of data, we were not able to rigorously characterize every mutation. We chose three simulations for visualization and characterization. The models that we chose involved many specifically bound solvent molecules, lost interchain hydrogen bonds, and peptides with interesting serine binding patterns.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
|
The mutant peptides that had the mutations G244C, G601S, and G844V were chosen for further analysis. G244C (18) is associated with OI type II, and the peptide had one less low exchange backbone hydrogen bond. Its mutant form had five specific solvent hydrogen bonds. An analysis of solvent molecules surrounding the mutation site showed that a single water molecule bridged two of the disrupted chains, while a second water molecule, which was fully solvated, was hydrogen bonding strongly with one of the mutant cysteines (Fig. 3a). This result suggests that solvent molecules may bind specifically both in a water bridging pattern or in a configuration with only a single hydrogen bond to the peptide.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
From our studies, structural parameters, by themselves, are not enough to predict the phenotype of a disease-associated mutation in type I collagen. Although the observed structural differences correlate with the identity of the perturbing residue, the relationships to severity of disease are still elusive. The lack of correlation, while disturbing, may not be surprising because disease severity in collagen disorders results from several factors. These factors likely include, in addition to folding interruptions, alterations in the secretion of molecules that contain abnormal chains, abnormal molecular assembly into fibril in the matrix, and, finally, defective mineralization of fibrils in bone. While each of these could reflect an effect of altered structure, other factors in the cell could modulate them so that structural studies alone probably cannot predict all these variations.
Valine residues showed the largest r.m.s.d. difference from their respective wild type peptides. Valine has the largest volume of the substituting residues we analyzed, and it is likely that the r.m.s.d. difference is attributable to the larger volume required in the interstitial region of the triple helix. There was great variability in r.m.s.d. differences within a given class of amino acid mutations, giving more evidence that the amino acid environment around a mutation is as important as the identity of the mutation itself.
Serine can hydrogen bond with serine residues on an opposite chain, the main chain, solvent, or another nearby side chain. We found that serine residues usually formed highly exchangeable hydrogen bonds with the solvent and occasionally formed low exchanging hydrogen bonds with the adjacent main chain or, more rarely, with solvent. The hydrogen bond patterns serine residues adopt confer the structure on the surface of the triple helix. Since we see variability within a given substituting amino acid type, it is likely that neighboring residues confer this difference. These differences suggest that when stability of the triple helix is not significantly altered and folding of the triple helical domain is not disrupted, surface structural differences on the triple helix can confer differences in function.
Most of the mutated molecules compensated for loss of some of the intrinsic hydrogen bonds with solvent hydrogen bonds that had a low probability of exchange. Many of these hydrogen bonds were with specifically bound water molecules. Their presence supports the hypothesis proposed from earlier studies of idealized peptides that solvent molecules compensate for lost main chain hydrogen bonds (9, 12). We never observed more than 10 specific solvent hydrogen bonds and observed zero hydrogen bonds in seven mutant structures (12%). That some simulations show decreases in solvent binding when the mutation is present further illustrates how an analysis of many mutations simultaneously is necessary to give insight into general structural changes that occur when mutations are introduced.
The thermodynamic effects of mutations are hard to analyze structurally without an accurate model of the unfolded state. We do, however, observe some differences between lethal and non-lethal peptides. Lethal mutations had slightly fewer backbone hydrogen bonds than non-lethal mutations and, perhaps surprisingly, were slightly less perturbing than non-lethal mutations. We think that more severe mutations may induce a loss of flexibility so that disruptions along the chain cannot be regularly compensated. To show stronger correlations than this, we are using machine learning methods to predict the phenotype of mutations since phenotype is an accumulation of a large number of properties.
The next step for analyzing these mutations is to build an accurate energetic model of collagen-like peptides. Free energy techniques using thermodynamic integration have been successful in studying mutations in the glycine positions of collagen (11). These methods are challenging because they require a large amount of computational time to calculate in a high throughput manner. The Molecular Mechanics Poisson Boltzmann Surface Area (MM/PBSA) method has been applied to proteins and is a powerful tool for analyzing the energetic properties of protein structures. Although it is much faster than other methods, its main drawback is its association with a larger random error than thermodynamic integration calculations. To analyze a large number of peptides, we are calculating the MM/PBSA energy of both the unfolded and folded state of both the mutant and wild type peptides. These energies can then be compared with the structural results and be incorporated into machine learning methods to predict phenotype.
There are many ways that alterations of structure and folding can cause a phenotypic change on the COL1 gene products. We believe the most significant alterations in structure and stability will have a substantial impact on a phenotype. It is difficult to speculate, however, on where these causes are most likely to occur in the functional pathway of collagen gene products without further experimental evidence. A structural change may have an effect on the kinetics of folding or may inhibit critical inter- and intrahelical interactions. A mutated triple helix is likely to cause uneven packing of the triple helices into a fibril, including solvent, and subsequently a fiber. As we begin to model collagen fibrils at the atomic level, we may gain better insights into the downstream structural effects of a single point mutation.
Selected structures from these simulations are being stored to build a data base of collagen mutation models for the use of researchers. This data base could aid experimental researchers interested in characterizing disease-associated mutations as well as researchers investigating other structural features of collagen, such as enzymatic binding. Eventually we would like to incorporate these models into an algorithm that can predict the phenotype of disease-associated mutation.
We have applied a method of high throughput mutation analysis using molecular dynamics. This method shows the range of structural interactions that occur when a glycine in the triple helical domain is mutated in collagen-like peptides. These peptides seemed to compensate for mutation-induced lost stability with 1) a large number of solvent-backbone hydrogen bonds that have a high rate of exchange and 2) a small number of solvent-backbone hydrogen bonds that exchange very slowly. We believe that our method will have application in analyzing the molecular consequences of disease-associated mutations in other systems. One of the limitations in the use is the extent to which mutations like these alter the initial folding of molecules because one of the assumptions we make is that at least some of these molecules can complete, although imperfectly, the formation of triple helix. There are currently many phenotypically annotated mutations where the underlying molecular basis for the disease association is not known. In combination with machine learning methods and experimental results, molecular mechanic methods show promise in providing insight into the underlying causes of disease.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, October 11, 2002, DOI 10.1074/mcp.M200064-MCP200
1 The abbreviations used are: OI, osteogenesis imperfecta; r.m.s.d., root mean square deviation.
* This work was funded by National Institutes of Health Grants AR47720-01 (to T. E. K., principal investigator) and LM05652 (to R. B. A., principal investigator).
To whom correspondence should be addressed: Dept. of Genetics, Stanford University, 251 Campus Dr., MSOB x-215, Stanford, CA 94305-5479. Tel.: 650-736-0156; Fax: 650-725-7944; E-mail: teri.klein{at}stanford.edu
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|