Department of Surgery, School of Medicine, University of Pennsylvania, Philadelphia
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The developmentally regulated expression of distinct sarcomeric MyHC isoforms modulates both the contractile and bioenergetic properties of individual striated muscle cells (Barany 1967
; Schiaffino and Reggiani 1996
). In mammals, the previously characterized sarcomeric MyHC isoforms are encoded by two families of tandemly linked genes, with the process of isoform switching under transcriptional control. In the human genome, six genes (chromosome 17p13) encode the skeletal MyHC isoforms (Weiss et al. 1999
; Shrager et al. 2000
) and two (chromosome 14q12) encode cardiac isoforms (Mahdavi, Chambers, and Nadal-Ginard 1984
; Saez et al. 1987
). The prototypical gene in this subclass is 25 kbp in length with 6 kbp of coding sequence interrupted by 40 introns (Strehler et al. 1986
). All but one of the genes at the human skeletal and cardiac MyHC loci conform to this consensus: the recently characterized extraocular MyHC gene is approximately 65 kbp in length (Briggs and Schachat 2000
).
Three other class-II MyHC genes are currently recognized in mammals: the smooth muscle, nonmuscle A, and nonmuscle B MyHC genes. Their human chromosomal loci are 16p13 (Deng et al. 1993
), 22q11.2 (Saez et al. 1990
), and 17p13 (Simons et al. 1991
), respectively. The smooth muscle MyHC gene encodes at least four distinct proteins by a process of alternative RNA splicing, and these represent the dominant if not sole MyHCs of adult smooth muscle (Babu, Warshaw, and Periasamy 2000
). As implied by the nomenclature, the nonmuscle MyHC gene products are primarily involved in actin-based motor functions in other cell types, such as fibroblasts, endothelial cells, and leukocytes (Babu, Warshaw, and Periasamy 2000
; Kelley et al. 2000
; Seri et al. 2000
). It is widely assumed that smooth muscle represents the most primitive of the muscle cell types because of the structural similarity of its myofilaments to those of nonmuscle cells (Alberts et al. 1
994). This assumption is supported by the evolutionary relationships among the known class-II MyHCs: the mammalian smooth and nonmuscle isoforms appear to have diverged from an ancestral gene long before the onset of sarcomeric isoform diversification. The structural similarity of the vertebrate sarcomeric MyHC genes further suggests that the series of gene duplications that created this subclass postdated the major evolutionary period of intron insertion or removal.
We now present the identification and analysis of three novel human sarcomeric MyHC genes which challenge these assumptions. The genes are on human chromosomes 3, 7, and 20 and are thus physically unlinked to each other and to the previously characterized sarcomeric MyHC genes. The largest of the genes is at least 140 kbp in length. Quantitation of nucleotide sequence divergence among these distantly related genes establishes an approximate timeframe for three sarcomeric MyHC gene duplications that predated the emergence of the dedicated smooth MyHC gene. The atypical structure of these genes provides an estimate of the approximate time at which individual introns were inserted or removed in ancestral genomes. Two of the genes are predicted to encode slow-contracting MyHC isoforms, the third encodes a fast MyHC, providing important clues as to the functional significance of the gene products. The fast MyHC gene diverged first and has introns in positions seen thus far only in the recently described catchin and MyHC genes of the scallop Argopecten irradians. The invariant length of the rod-coding portions of the deduced cDNAs has implications for the use of these ancient sarcomeric MyHC isoforms as preferred molecular yardsticks in the study of early metazoan evolution. The interplay between the processes of repetitive element insertion and counterselection to restrict gene expansion is revealed by the pairwise comparison of orthologous mammalian MyHC genes, addressing an important theme in recent genome evolution.
![]() |
Experimental Procedures |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Loop Sequence Data
Universal primers were constructed from highly conserved coding sequences of both the ATP and actin-binding domains of previously characterized human MyHC isoforms. The polymerase chain reaction was used to amplify both domains from both subcloned and total genomic DNA, and the products were subsequently cloned into a TA-cloning vector (Invitrogen) for double-stranded sequencing. Confirmatory direct sequencing of the genomic cosmid was also performed using the same universal primers.
Raw Sequence Data
Additional sequences used in this study are based on (1) high throughput human genomic sequences provided to the public by the Baylor College of Medicine, the Sanger Center, the Washington University Genome Sequencing Center, and the Whitehead Institute ([skeletal locus: AC002347, AC005291, AC005323], [MYH 15: AC019169, AC020731, AC041004, AC069499], [MYH 11: AC011061, AC025518, AC026130], [MYH 16: AC004834 and AC005163, later NT_001651.2|Hs7_920], and [MYH 14: AL132825, later NT_011362.2|Hs20_11519]), and (2) expressed sequence tags (ESGs) from human cDNA libraries (as noted in table 2
). On the basis of homology to sequences originally determined in our laboratory, all files were identified using the BLAST algorithms as implemented at http://www.ncbi.nlm.nih.gov:80/blast/blast.cgi?Jform=1.
|
Ionic Interactions
Ionic interaction distributions were determined by arranging primary sequence in a 28 residue heptad repeat pattern already described (Stedman et al. 1990
). Ionic interaction analysis was based on pairing of residues at (i to i' + 5) positions (Bandman, Matsuda, and Strohman 1982
).
Divergence Analysis
Coding sequence divergence was calculated by aligning cDNA sequence as described in the text for all 14 human and selected nonhuman class-II MyHC genes and running divergence software (DIVERGE: Genetics Computer Group, Inc., Wisconsin Package). DIVERGE is based on the Perler analysis (Perler et al. 1980
) which has been further modified for unbiased estimation of nonsynonomous substitution rates (Li 1993
). The nonsynonymous substitution rates were subsequently used to generate an evolutionary topogram based on branch lengths calculated by the unweighted pair group method with arithmetic mean (UPGMA) (Sneath and Sokal 1973
, pp. 230234) and the KITSCH method using default settings (Fitch-Margoliash method with contemporary tips) (Fitch and Margoliash 1967
; Felsenstein 1993
). Results obtained by these methods were compared to those obtained for cDNA and deduced peptide sequences aligned using Clustal W (Thompson, Higgins, and Gibson 1994
) with calculation of branch lengths in the resulting phenograms by the Neighbor-Joining method (Saitou and Nei 1987
) as implemented on the Biology Workbench website http://biowb.sdsc.edu/CGI/BW.cgi#!. Protein sequence divergence was also assessed using the parsimony algorithm of the PHYLIP program PROPARS (Felsenstein 1993
).
Intron Position Analysis
Reconstructed cDNA and gene files were viewed in annotated format to facilitate the designation of intron position and phase. cDNA sequences were split at the head-rod junction based on the position of the conserved proline residue corresponding to residue 839 in the human embryonic MyHC peptide sequence. In the head domains, intron position is considered to have been conserved if the immediately flanking amino acid residues are homologous, despite change in the exact nucleotide position from the start codon. This is mandated by the variable lengths of the junctional loops. In the rod-coding sequence, intron position is based on strict preservation of nucleotide number 3' from the reference proline codon. Phases are designated as 1, 2, and 3 for introns interrupting the coding sequence before the first, second, and third bases of a codon, respectively. Graphical depictions of each of the annotated cDNAs were used as exported from MacVector with exon lengths shown on a proportional basis.
Analysis of Intron Length
Intron positions for all the genes in the skeletal MyHC cluster were exported from MacVector into Microsoft Excel for calculation of intron lengths. Repeatmasker (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker) was used to identify human repeat boundaries. A file corresponding to the extraocular MyHC gene deleted for all identifiable human repeats was used to generate a separate set of intron sizes. An anonymous sequence (AC019008) was found to represent the murine extraocular MyHC gene and was annotated to provide intron sizes for this orthologous gene comparison. This analysis also included the rat embryonic MyHC gene (X04267). Intron sizes were transferred into the statistical analysis application JMP (SAS Institute, Cary, NC) and used to calculate correlation coefficients, linear regression slopes, and 95% density ellipses for the bivariate normal distributions.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
In addition to the interaction of hydrophobic residues in the a and d positions of the alpha helices, interhelical ionic interactions can theoretically form between the g residue of each heptad repeat on one strand and an oppositely charged residue five positions further from the N-terminus (e') on the opposite strand (i, i + 5) (McLachlan and Stewart 1975
). The hypothetical (i, i + 5) interaction configuration has been confirmed from crystallographic analysis of GNC4, a DNA-binding domain containing a leucine zipper motif in yeast (O'Shea et al. 1991
; Ellenberger et al. 1992
). The distribution of potential ionic interactions within a single (Letai and Fuchs 1995
) and between paired (Arrizubieta and Bandman 1998
) rod domains of a specific type of MyHC is unique, providing a distinct ionic distribution fingerprint that may be used as a criterion for determining the propensity for higher order assembly. Side-by-side comparisons of these patterns reveal dramatically greater similarity between the novel MyHCs and the other human sarcomeric MyHCs than between either of these and the human nonsarcomeric class-II MyHCs (fig. 4 ). On the basis of the previously observed dichotomy in the pattern of ionic distributions in class-II myosin tails structures, these data predict that the novel MyHCs are sarcomeric (Arrizubieta and Bandman 1998
).
|
|
|
Evolutionary Relationships Among 11 Sarcomeric and 3 Nonsarcomeric MyHC Genes
The requirement of lateral association of the alpha helical rod domains of the class-II MyHCs precludes the insertion or deletion of codons in the corresponding portion of the encoding genes. This structural constraint greatly facilitates the unambiguous alignment of deduced amino acid sequences across a large evolutionary distance. The rod-encoding portions of the novel MyHC cDNAs were aligned with all other human and selected nonhuman class-II MyHC cDNA sequences. Nonsynonymous substitution rates were determined using the coding sequence divergence algorithm of Perler et al. (1980)
as modified by Li (1993)
(fig. 6a
). Figure 6b
shows a rooted evolutionary topogram with branch lengths based on the pairwise divergence distances (Shrager et al. 2000)
. The branch lengths shown were calculated by the UPGMA (Sneath and Sokal 1973
, pp. 230234). Virtually identical results, shown as numbers in parantheses in figure 6b,
were obtained using the KITSCH method with the default settings (Fitch-Margoliash method with contemporary tips) (Fitch and Margoliash 1967
; Felsenstein 1993
). In these analyses, the tacit assumption of a uniform molecular clock is addressed by two pairs of substitution rates. Full-length cDNA sequences are available for both sarcomeric and nonsarcomeric MyHCs for Drosophila and chicken. As indicated in both figure 6a and b,
the divergence distances for both cDNAs are similar for both species, indicating that the nonsynomymous substitution rates have been relatively uniform for this family of proteins.
|
As a further check on the overall topology of the tree, the deduced polypeptide sequences were reanalyzed using a parsimony algorithm (PROTPARS) of the PHYLIP software package (Felsenstein 1993
). This program relies on a strategy of adding sequences sequentially in the order in which they are listed, followed by the comparative evaluation of a limited number of local rearrangements. Regardless of the order in which the sequences were presented in the input file, the topology of the most parsimonious tree(s) identified differed from the illustrated evolutionary tree by no more than 1% of the overall number of substitutions required (data not shown). Despite the fact that this approach relies on a different set of assumptions about the weights of peptide substitutions than does the Perler analysis, the results support the general conclusions of the former approach.
Branch lengths defined by the foregoing methods suggest that the novel genes all trace to duplications that predated both the smooth muscle and nonmuscle MyHC gene divergence and the finned-fishtetrapod phylogenetic divergence. Pairwise comparisons to genes from other species identify the orthologous relationships and allow an estimate of the relative timing of gene and species divergence events, as shown by the vertical lines in figure 6
. A striking result from this analysis is the finding that MYH 16 is almost as divergent from the other human sarcomeric MyHCs as all of these human genes are from the single sarcomeric MyHC genes in Drosophila and Argopecten. This conclusion is further supported by the analysis of intron positions (subsequently) and recently available sequences of orthologous genes from other species (see Discussion). Among the ancestral human class-II genes, only the sarcomeric-nonsarcomeric divergence appears to have predated the invertebrate-vertebrate split. Our data (not shown) on divergence could not identify an orthologous relationship between any of the novel genes and the four sarcomeric MyHC genes of Caenorhabditis elegans (Dibb et al. 1989
), suggesting that the latter diverged after the ancestral split between species.
Novel MyHC Genes Have Introns in Atypical Positions: A Timeline for Loss and Gain
As an initial step in the analysis of evolutionary relationships among the newly recognized members of this gene family, we focused attention on the conservation of intron position. The coding sequences of the eight previously characterized sarcomeric MyHC genes are interrupted by introns at 37 conserved positions, implying a similar structure in a common ancestral gene. Four of these eight genes (embryonic, extraocular, alpha, and beta) have an additional intron interrupting the coding sequence four to seven codons 5' of the stop signal. Figure 7
depicts the positions of the introns for each of the human class-II MyHC genes, relative to the start and stop codons in the assembled cDNA sequences.
|
Relationship Between Intron Size and Gene Size and Repetitive Sequence Content
MYH 14 is 24,206 bp in length (start to stop codon) and therefore similar to the majority of previously characterized mammalian sarcomeric MyHC genes (fig. 6a,
lowermost row). The assembled draft sequence for MYH 15 reveals an unexpectedly large size of >142,000 bp, whereas MYH 16 is intermediate in size, an estimated 60,000 bp extrapolating from the 45,200-bp spanned by exons 16 through 40. The findings of anomalous size in these genes prompted us to annotate draft or completed sequence for all other identifiable human class-II MyHCs to assess the relationship between intron size, intron position, repetitive sequence content, and coding sequence divergence. Analysis of the reconstructed gene sequences revealed sizes of 63,577, >129,000, and >106,000 for the extraocular, smooth muscle, and nonmuscle MyHC B genes, respectively. In view of the topology of the evolutionary diagram in figure 6
, data on gene size suggest that the human extraocular MyHC gene has undergone a recent expansion, MYH 16 and the nonmuscle A gene (MYH 9) have undergone recent contractions, and the other sarcomeric MyHC genes have been remarkably constant in size since the MYH 14 divergence.
Maps of representative genes and selected vertebrate orthologs were aligned to facilitate the further analysis of intron evolution (fig. 8 ). Despite their similar overall size, even the most closely related human sarcomeric MyHC genes have widely divergent intron sizes (fig. 8b ). In contrast, intron size conservation is readily apparent in the orthologous gene comparisons (fig. 8c ). The intron size distributions were plotted as scattergrams with density ellipses for selected pairwise gene comparisons (fig. 9 ). Correlation coefficients achieve a level of >0.5 only for the orthologous gene comparisons. The orthologous human and rat embryonic MyHC genes are of similar size (slope 0.978). Interestingly, the slope of the linear regression fit for the human-mouse extraocular gene comparison is 1.51, as suggested by the paired alignment in figure 8c (note scaling difference). Thus, there has been a process of proportional intron expansion in the human gene or contraction in the murine gene since the time of the species divergence. The human gene has 57,763 bp of intervening sequence spanning its 63,577-bp total length, of which 21,790 bp is repetitve (34.3% of total or 37.7% expressed as percentage of the intron sequence). If this repetitive DNA sequence is deleted in the human to mouse intron length comparison, the length correlation increases from 0.8555 to 0.8644, and the slope of the linear regression fit reduces from 1.51 to 0.818. This suggests that the length difference can be largely attributed to the insertion of repetitive elements during comparatively recent primate evolution.
|
|
Conserved Intron Sequence: Evidence for Concerted Evolution
We were intrigued by the finding of near-identical intron sizes at several positions during the original PCR screen for novel MyHC genes. When further analysis revealed >95% sequence identity in these introns, we broadened the search for evidence of recent gene conversion. The subsequent availability of high throughput DNA sequences matching all of the initial PCR products facilitated a locus-wide cross-comparison of all of the intron sequences for each of the skeletal MyHC genes. Figure 10
shows a sampling of alignments of introns from the other genes with those of the IId/x gene. In most cases, portions of the flanking exons have extraordinarily low synonymous substitution rates, as indicated by the extent of nucleotide similarity listed in figure 10
. Note that in introns 13 and 23, homology exists between the IId/x and perinatal genes in the absence of homologous sequence in the intervening IIb gene.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A Revised Evolutionary History for the Class-II MyHC Gene FamilyFour Sarcomeric Genes Before the Emergence of a Dedicated Smooth Muscle MyHC Gene
The overall topology of the evolutionary tree provides evidence that, in the ancestral lineage leading to H. sapiens, stronger selective pressures have driven the diversification of sarcomeric than nonsarcomeric class-II myosins. The branch lengths in the evolutionary tree further imply that most vertebrate genomes contain orthologs to the three novel sarcomeric as well as the three nonsarcomeric MyHC genes. Tandem duplications of only one of the four primordial sarcomeric MyHC genes present at the time of the nonsarcomeric MyHC gene divergence were genetically fixed in this ancestral lineage. The divergence data for the carp-human cDNA comparisons indicates that at least three of the seven additional gene duplications had occurred by the time of the ray-finned fishtetrapod phylogenetic divergence. The products of the most recently duplicated MyHC genes accumulate to extraordinary levels in the bodies of modern day vertebrates, comprising approximately 35% of the protein content of striated muscle cells (which in turn account for 40%50% of the total body mass). Thermodynamic measurements suggest that fine tuning of the sarcomeric myosin ATPase is critical for energy conservation at both a cellular and organismic level (Schiaffino and Reggiani 1996
). An ancestral gene with the proper cis-acting transcriptional control elements for widespread expression in the cardiac and the major locomotive muscles may have come under intense selective pressure for successive duplication. In contrast, the apparent absence of further duplication of the novel genes suggests that there has been little pressure for additional diversification of these isoforms, perhaps reflecting a restricted pattern of expression in an early metazoan ancestor.
The forgoing evolutionary reconstruction is based entirely on molecular evidence from the class-II myosin rod domains, a region chosen because of the unambiguous sequence alignment throughout the coiled-coil tail. The use of dot matrices in the comparison of distantly related myosin sequences has revealed the distinct patterns of divergence in the head and rod domains. Recent studies of molecular evolution in defined subregions of the myosin head have revealed unexpected sequence similarity in distantly related but kinetically similar myosins (Goodson, Warrick, and Spudich 1999
), suggesting a process of convergent evolution. This prompted our studies of divergence in the motor domain and lends support to our contractile rate predictions for the novel gene products, despite evidence from the rod domains that these genes diverged before the diversification of the skeletal and cardiac MyHCs. As noted by Goodson, Warrick, and Spudich (1999)
, it is rarely feasible to perform kinetic analyses on intact fibers or homogeneous preparations of myosin from tissue sources.
While this manuscript was under review we became aware of two cDNA sequences for cat (Felis catus) genes most closely related to MYH 7 and MYH 16 (accession numbers AF229810 and U51472, respectively). Interestingly, cDNA for the MYH 16 ortholog was isolated from the powerful jaw closing muscles and identified as a superfast isoform. Application of the neighbor-joining method of phylogenetic comparison to the orthologous human and cat cDNA sequences yields an unrooted phenogram with the following topology and branch lengths: ([Cat Beta:0.02797, Human Beta:0.03933]:0.41712, Cat Superfast:0.03472, Human Superfast:0.04496). This indicates that the MYH 16 genes in both species are currently diverging at approximately the same rate as the cardiac beta genes, consistent with our simplifying assumption of a relatively uniform molecular clock for the class-II MyHC rod-encoding sequences. These data argue against an alternative scenario, equally plausible a priori, in which MYH 1416 originated with a comparatively recent series of gene duplications and subsequently diverged more rapidly than their widely expressed counterparts (i.e., MYH 111) as a result of relaxed selective pressures.
Branch lengths on our evolutionary trees imply that three duplications separating the novel sarcomeric MyHC genes from the previously described members of this subfamily predated the divergence of the smooth muscle from the nonmuscle MyHC genes. This suggests the counterintuitive possibility that these sarcomeric gene duplications predated the emergence of smooth muscle as a distinct cellular lineage in ancestral metazoans. The metabolic demands of locomotive muscle are likely to have restricted the size and shape of early metazoans until body plans evolved to amplify substrate thoughput and improve circulatory homeostasis. In modern day vertebrates, smooth muscle cells expressing dedicated nonsarcomeric class-II MyHCs are indispensable for intestinal peristalsis and the regulation of vascular tone. How have the body plans for the largest of the modern day invertebrates (e.g., giant squid, lobster, giant crab) addressed demands for regulated oxygen and substrate distribution to the locomotive muscle mass without a cell lineage homologous to vertebrate smooth muscle? Molluscan smooth muscles are defined morphologically as unstriated but exhibit several features which more closely resemble the striated than the smooth muscles of vertebrates (Chantler 1983
). Our data show that the gene encoding smooth muscle MyHC in the bay scallop A. irradians (Nyitray et al. 1994
) is most closely related to the sarcomeric MyHC genes of vertebrates. Conversely, the vertebrate smooth muscle ortholog of the scallop Patinopecten yessoensis is expressed only in nonmuscle cells (Hasegawa 2000
). Muscles lining the aortic wall of the lobster and crab appear to be striated (Davison, Wright, and DeMont 1995
), suggesting the possibility that, by a process of convergent evolution, a striated muscle cell lineage functionally substitutes for vertebrate smooth muscle throughout the vascular tree of large arthropods. These observations have important implications for the study of the transcriptional networks involved in smooth muscle lineage specification (Cripps, Zhao, and Olson 1999b
; Carson et al. 2000
; Zhang et al. 2001
).
Structures of the Novel Genes Provide a Timeline for Intron Loss and Gain
When considered in the context of the evolutionary topogram for the class-II MyHC gene family, the tabulation of intron positions in the novel genes provides clear evidence for intron loss and gain during well-defined time intervals. For instance, the intron interrupting the coding sequence homologous to exon 34 of the embryonic MyHC gene was present in a common ancestor to all the three of the novel genes but was lost in a more recent ancestor to all of the modern day cardiac and skeletal MyHC genes. During the same time interval, there was reciprocal gain of an intron at the position occupied by intron 31 in the embryonic MyHC gene. The absolute number of introns interrupting the coding regions of all of the class-II MyHC genes is well conserved at 39 ± 2. Interestingly, 13 of the 20 introns in the head-encoding domains of these genes occupy positions conserved across the entire spectrum of duplicated genes, whereas this applies to only 1 of the 20 introns in the tail-encoding domain. Relative to the skeletal and cardiac MyHC genes there are six missing introns and seven new ones in MYH 16, numbers intermediate to those for the skeletal versus smooth comparison. As a result of the coiled-coil alpha helical structure of the rod domains there is a requirement for at most one gap element, corresponding to the second sarcomeric skip residue (fig. 3b
), to achieve unambiguous alignment of the deduced amino acid sequences. This unique feature of the class-II MyHC rod serves to anchor the assigned intron positions. All of the intron positions and phases are rigidly conserved between the physically unlinked smooth, nonmuscle A, and nonmuscle B genes. Thus, in H. sapiens ancestors, the period of intron loss and gain spanned only the first 4 of the 13 duplications that created the modern day class-II MyHC gene subfamily.
Two opposing models for the origin of spliceosomal introns have been proposed: introns early, in which introns predated and facilitated the initial assembly of exons into genes (Gilbert, Marchionni, and McKnight 1986
; de Souza et al. 1998
) and introns late, in which introns were inserted into preexisting protein coding sequences (Logsdon and Palmer 1994
; Logsdon 1998
). The overall conservation of intron number, but head-specific conservation of intron position, is difficult to reconcile with either model in the extreme. The intermediate structures of the novel MyHC genes suggest an alternative model in which a primordial class-II MyHC gene had at least 25 introns, including all 13 of those at conserved positions in the head domain, but fewer than the 60 introns necessary to account for all of the sarcomeric and nonsarcomeric intron positions combined. Subsequent genes (in the genomes of direct ancestors of H. sapiens) underwent quasi-reciprocal intron losses and gains with preservation of the average exon size. In less complex organisms, there was increasing selective pressure for genome contraction, with net loss of introns exceeding gains. As exemplified by the sarcomeric MyHC genes of Drosophila melanogaster and A. irradians, this was partially offset in some ancestral lineages by the pressure for isoform diversification, with the resultant fixation of duplicated exons supporting productive alternative splicing. If, as recently proposed by Venkatesh, Ning, and Brenner (1999)
, a pair of duplicated exons acquired the elements required for splicing, a new intron position would be established, most likely at a protosplice site conforming to the MAG/R empirical rule. Although the exact register of codons is strictly conserved over most of the MyHC class-II rod-encoding domain, there is a wider range of neutral substitutions available than for most codons of the MyHC head. The resultant acceleration of point mutational drift in the rod-coding sequence would be expected to facilitate the process of intron insertion across spontaneously duplicated exons, accounting for the observed asymmetry in positional conservation.
Evolution of Intron and Gene Size
Our evolutionary analysis of the class-II MyHC genes contributes a new perspective to the general study of gene expansion and contraction. The 14 known members of this gene family vary in size by almost an order of magnitude, yet the majority of the genes fall into a narrow range approximating 25 kbp. The most parsimonious hypothetical scheme to account for the observed distribution of human MyHC gene sizes assigns a size of >100,000 bp to a primordial class-II MyHC gene, with comparatively rapid loss of intron sequence to yield a size of approximately 25,000 during the lineage connecting the ancestral gene that existed prior to the MYH 15 divergence to the last common ancestor to MYH 18 and 14. In this scenario, MYH 16 and the nonmuscle A gene (MYH 9) independently lost intron sequence after their respective divergences from last common ancestors with the other genes. Both the human-rat and human-chicken gene comparisons reveal an approximate rate at which intron size differences emerge, relative to the molecular clock for mutational drift in the coding sequence. The human-mouse comparison suggests that the human extraocular gene has uniquely undergone a recent reexpansion, driven in part by the appearance of additional species-specific repetitive DNA. The stochastic nature of this process likely accounts for the proportional reexpansion of most of the introns in this gene. The implied rate of this reexpansion further suggests that selective pressures have maintained the smaller size of the genes encoding the more abundantly expressed sarcomeric isoforms, restricting the local accumulation of repetitive DNA.
Phenotypic Implications of Concerted Evolution
The finding of intron sequence conservation in paralogous genes was especially surprising in regions adjacent to the exons encoding the MyHC junctional loops. Several lines of evidence suggest that variation in the junctional loop amino acid sequence has a profound effect on both the kinetics of the myosin ATPase and the velocity of unloaded muscle contraction (Uyeda, Ruppel, and Spudich 1994
; Murphy and Spudich 1998
). Loop 1 and loop 2 sequences are thought to have coevolved to allow thermodynamically optimal matching between the rates of nucleotide release and actin binding during the cross bridge cycle (Sweeney et al. 1998
). Gene conversion facilitates swapping of transcriptional regulatory and protein-coding domains between closely linked paralogous genes, a process that in most cases neutralizes functional differences among the genes (reviewed in Papadakis and Patrinos 1999
). Evidence for a recent conversion involving the loop 1 domain of the IIa and IId/x genes (and less recently the perinatal gene) implies a selective advantage associated with the elimination of mutational differences at this hypervariable site. It is notable that the physical proximity of the gene pairs correlates with the prevalence of converted domains, with the embryonic and extraocular genes seemingly out of range. Even among the adult type-II genes we could find no evidence for gene conversion in putative transcriptional regulatory regions, contrasting with recent findings at the human beta globin locus (Chiu et al. 1997
). Because meiotic gene conversion is an unavoidable consequence of close tandem gene linkage, the organization of the skeletal MyHC locus suggests that fine tuning of the loop 1 and loop 2 sequences in the type-II myosins is somehow facilitated by this process. Alternatively, the genes may require close physical contiguity for proper transcriptional regulation (e.g., access to a locus control region), with large inserts between the genes resulting in aberrant expression.
Possible Roles for the Newly Identified MyHC GenesHistorical Perspective
On completion of this study, the National Center for Biotechnology Information Website http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsProgress.shtml&ORG=Hs listed the total quantity of finished (32.5%) and draft (61%) human genome sequence at 2,991,716 for an estimated 93.5% representation of the human genome. On the basis of the methods used in the sequence query, we place comparable confidence limits on our current assessment of the total number of class-II MyHC genes in the human genome. While this manuscript was under review, we became aware of the millennial myosin census of Berg, Powell, and Cheney (2001)
, which differs from this report by listing the chromosome 7 gene as a pseudogene. Although no accession numbers are given, we presume that this corresponds to MYH 16, whose sequence is derived from accession numbers AC004834 and AC005163. Relative to the ORF in the cat superfast MyHC (U51472) we find a deletion of two bases in exon 18 at bp 5163 in AC005163. The close similarity between the coding sequence for this gene and the cat cDNA, as cited in an earlier section for the Clustal W alignment, suggests that the isolated discrepancy represents a sequencing artifact. An example of this is provided by the single base mismatch in exon 17 between the MYH 15 genomic (AC069499) and EST (AL039898) files. Further studies will be required to resolve this issue.
Our preliminary studies with RT-PCR amplification of RNA from selected human muscles indicate that the newly identified genes are not expressed at levels comparable to the previously characterized sarcomeric MyHC genes (data not shown). This is not surprising in view of the extraordinarily high level expression of the latter. In the future, this problem can be addressed by expanding the range of human tissues studied and by developing probes specific for orthologous genes in other species. The junctional loop sequences for each of the three novel genes are unprecedented, providing a point of departure for the development of monospecific antibodies but at the same time restricting confidence in the prediction of the physiological attributes of the native isoforms. Our data on sequence comparisons addressing the remainder of the motor domain suggest that the most ancient of these MyHC isoforms is fast, whereas the other two are slow. In view of historical evidence for the existence of atypical sarcomeric MyHC isoforms, we speculate that the novel genes encode one superfast isoform (distinct from the extraocular isoform), as previously seen in the jaw closing muscles of cats and nonhuman primates (Rowlerson et al. 1983
), one slow-tonic isoform, as previously seen in the intrafusal bag and chain fibers (Pedrosa-Domellof et al. 1993
), and one developmentally regulated slow isoform distinct from the embryonic and adult slow-cardiac beta isoforms (Hughes et al. 1993
). The proposed MIM (McKusick 1998
) numbers MYH 14, 15, and 16 provide an alternate designation for these genes until their specific functions and patterns of expression can be more fully characterized.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: myosin
human
sarcomeric
smooth
muscle
evolution
intron
gene
Address for correspondence and reprints: Hansell H. Stedman, Room 608, BRB 2/3, 421 Curie Blvd., Philadelphia, Pennsylvania 19104. hstedman{at}mail.med.upenn.edu
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alberts B., D. Bray, J. Lewis, M. Raff, K. Roberts, J. Watson, 1994 Molecular biology of the cell Taylor & Francis Group, London
Arrizubieta M. J., E. Bandman, 1998 The role of interhelical ionic interactions in myosin rod assembly Biochem. Biophys. Res. Commun 244:588-593[ISI][Medline]
Babu G. J., D. M. Warshaw, M. Periasamy, 2000 Smooth muscle myosin heavy chain isoforms and their role in muscle physiology Microsc. Res. Tech 50:532-540[ISI][Medline]
Bandman E. R., R. Matsuda, R. C. Strohman, 1982 Developmental appearance of myosin heavy and light chain isoforms in vivo and in vitro in chicken skeletal muscle Dev. Biol 93:508-518[ISI][Medline]
Barany M., 1967 ATPase activity of myosin correlated with speed of muscle shortening J. Gen. Physiol 50: (Suppl.) 197-218
Berg J. S., B. C. Powell, R. E. Cheney, 2001 A millennial myosin census Mol. Biol. Cell 12:780-794
Briggs M. M., F. Schachat, 2000 Early specialization of the superfast myosin in extraocular and laryngeal muscles J. Exp. Biol 203: (Part 16) 2485-2494
Carson J. A., R. A. Fillmore, R. J. Schwartz, W. E. Zimmer, 2000 The smooth muscle gamma-actin gene promoter is a molecular target for the mouse bagpipe homologue, mNkx3-1, and serum response factor J. Biol. Chem 275:39061-39072
Chantler P., 1983 Biochemical and structural aspects of molluscan muscle Pp. 77154 in A. Saleuddin and K. Wilbur, eds. The Mollusca, Vol. 4. Academic Press, New York
Chiu C. H., H. Schneider, J. L. Slightom, D. L. Gumucio, M. Goodman, 1997 Dynamics of regulatory evolution in primate beta-globin gene clusters: cis-mediated acquisition of simian gamma fetal expression patterns Gene 205:47-57[ISI][Medline]
Cope M., J. Whisstock, I. Rayment, J. Kendrick-Jones, 1996 Conservation within the myosin motor domain: implications for structure and function Structure 4:969-987[ISI][Medline]
Cripps R. M., J. Suggs, S. Bernstein, 1999a. Assembly of thick filaments and myofibrils occurs in the absence of the myosin head EMBO J 18:1793-1804
Cripps R. M., B. Zhao, E. N. Olson, 1999b. Transcription of the myogenic regulatory gene Mef2 in cardiac, somatic, and visceral muscle cell lineages is regulated by a Tinman-dependent core enhancer Dev. Biol 215:420-430[ISI][Medline]
Davison I. G., G. M. Wright, M. E. DeMont, 1995 The structure and physical properties of invertebrate and primitive vertebrate arteries J. Exp. Biol 198:2185-2196
Deng Z., P. Liu, P. Marlton, D. F. Claxton, S. Lane, D. F. Callen, F. S. Collins, M. J. Siciliano, 1993 Smooth muscle myosin heavy chain locus (MYH11) maps to 16p13. 13-p13.12 and establishes a new region of conserved synteny between human 16p and mouse 16 Genomics 18:156-159[ISI][Medline]
de Souza S. J., M. Long, R. J. Klein, S. Roy, S. Lin, W. Gilbert, 1998 Toward a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins Proc. Natl. Acad. Sci. USA 95:5094-5099
Dibb N. J., I. N. Maruyama, M. Krause, J. Karn, 1989 Sequence analysis of the complete Caenorhabditis elegans myosin heavy chain gene family J. Mol. Biol 205:603-613[ISI][Medline]
DiNardi C., S. Ausoni, P. Moretti, L. Gorza, M. Velleca, M. Buckingham, S. Schiaffino, 1993 Type 2X myosin heavy chain is coded by a muscle fiber type-specific and developmentally regulated gene J. Cell. Biol 123:823-835[Abstract]
Ellenberger T. E., C. J. Brandl, K. Struhl, S. C. Harrison, 1992 The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex Cell 71:1223-1237[ISI][Medline]
Felsenstein J., 1993 PHYLIP (phyologeny inference package). Version 3.5c Distributed by the author. Department of Genetics, University of Washington, Seattle
Fitch W. M., E. Margoliash, 1967 Construction of phylogenetic trees Science 155:279-284[ISI][Medline]
Gilbert W., M. Marchionni, G. McKnight, 1986 On the antiquity of introns Cell 46:151-153[ISI][Medline]
Goodson H., H. Warrick, J. Spudich, 1999 Specialized conservation of surface loops of myosin: evidence that loops are involved in determining functional characteristics J. Mol. Biol 287:173-185[ISI][Medline]
Hasegawa Y., 2000 Isolation of a cDNA encoding the motor domain of nonmuscle myosin which is specifically expressed in the mantle pallial cell layer of Scallop (Patinopecten yessoensis) J. Biochem. (Tokyo) 128:983-988[Abstract]
Hu J. C., E. K. O'Shea, P. S. Kim, R. T. Sauer, 1990 Sequence requirements for coiled-coils: analysis with lambda repressor-GCN4 leucine zipper fusions Science 250:1400-1403[ISI][Medline]
Hughes S. M., M. Cho, I. Karsch-Mizrachi, M. Travis, L. Silberstein, L. A. Leinwand, H. M. Blau, 1993 Three slow myosin heavy chains sequentially expressed in developing mammalian skeletal muscle Dev. Biol 158:183-199[ISI][Medline]
Kelley M. J., W. Jawien, T. L. Ortel, J. F. Korczak, 2000 Mutation of MYH9, encoding non-muscle myosin heavy chain A, in may-hegglin anomaly Nat. Genet 26:106-108[ISI][Medline]
Letai A., E. Fuchs, 1995 The importance of intramolecular ion pairing in intermediate filaments Proc. Natl. Acad. Sci. USA 92:92-96[Abstract]
Li W., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]
Logsdon J. M., 1998 The recent origins of spliceosomal introns revisited Curr. Opin. Genet. Dev 8:637-648[ISI][Medline]
Logsdon J. M., J. D. Palmer, 1994 Origin of intronsearly or late? Nature 369:526 [see also discussion 527 and 528] [ISI][Medline]
Lowey S., C. Cohen, 1962 Studies on the structure of myosin J. Mol. Biol 4:293-308[ISI]
Mahdavi V., A. P. Chambers, B. Nadal-Ginard, 1984 Cardiac alpha and beta myosin heavy chain genes are organized in tandem Proc. Natl. Acad. Sci. USA 81:2626-2630[Abstract]
Mahdavi V., E. E. Strehler, M. Periasamy, D. Wieczorek, S. Izumo, S. Grund, M.-A. Strehler, B. Nadal-Ginard, 1986 Sarcomeric myosin heavy chain gene family: organization and pattern of expression Pp. 345361 in F. D. Emerson, B. Nadal-Ginard, and M. A. Siddique, eds. Molecular biology of muscle development. Alan R. Liss, New York
McKusick V., 1998 Mendelian inheritance in man Catalogs of human genes and genetic disorders. Johns Hopkins University Press, Baltimore
McLachlan A. D., J. Karn, 1982 Periodic charge distributions in the myosin rod amino acid sequence match cross-bridge spacings in muscle Nature 299:226-231[ISI][Medline]
. 1983 Periodic features in the amino acid sequence of nematode myosin rod J. Mol. Biol 164:605-626[ISI][Medline]
McLachlan A. D., M. Stewart, 1975 Tropomyosin coiled-coil interactions: evidence for an unstaggered structure J. Mol. Biol 98:293-304[ISI][Medline]
Murphy C., J. Spudich, 1998 Dictyostelium myosin 2550K loop substitutions specifically affect ADP release rates Biochemistry 37:6738-6744[ISI][Medline]
Nyitray L., A. Jancso, Y. Ochiai, L. Graf, A. G. Szent-Gyorgyi, 1994 Scallop striated and smooth muscle myosin heavy-chain isoforms are produced by alternative RNA splicing from a single gene Proc. Natl. Acad. Sci. USA 91:12686-12690
Offer G., 1990 Skip residues correlate with bends in the myosin tail J. Mol. Biol 216:213-218[ISI][Medline]
O'Shea E. K., J. D. Klemm, P. S. Kim, T. Alber, 1991 X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil Science 254:539-544[ISI][Medline]
Papadakis M. N., G. P. Patrinos, 1999 Contribution of gene conversion in the evolution of the human beta-like globin gene family Hum. Genet 104:117-125[ISI][Medline]
Pedrosa-Domellof F., B. Gohlsch, L. E. Thornell, D. Pette, 1993 Electrophoretically defined myosin heavy chain patterns of single human muscle spindles FEBS Lett 335:239-242[ISI][Medline]
Perler R., A. Efstratiadis, P. Lomedico, W. Gilbert, R. Kolodner, J. Dodgson, 1980 The evolution of genes: the chicken preproinsulin gene Cell 20:555-566[ISI][Medline]
Rowlerson A., F. Mascarello, A. Veggetti, E. Carpene, 1983 The fibre-type composition of the first branchial arch muscles in Carnivora and Primates J. Muscle Res. Cell Motil 4:443-472[ISI][Medline]
Saez L. J., K. M. Gianola, E. M. McNally, R. Feghali, R. Eddy, T. B. Shows, L. A. Leinwand, 1987 Human cardiac myosin heavy chain genes and their linkage in the genome Nucleic Acids Res 15:5443-5459[Abstract]
Saez C. G., J. C. Myers, T. B. Shows, L. A. Leinwand, 1990 Human nonmuscle myosin heavy chain mRNA: generation of diversity through alternative polyadenylylation Proc. Natl. Acad. Sci. USA 87:1164-1168[Abstract]
Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]
Schiaffino S., C. Reggiani, 1996 Molecular diversity of myofibrillar proteins: gene regulation and functional significance Physiol. Rev 76:371-422
Sellers J., 1999 Myosins Oxford University Press, Oxford, U.K
Seri M., R. Cusano, S. Gangarossa, et al. (25 co-authors) 2000 Mutations in MYH9 result in the may-hegglin anomaly, and fechtner and sebastian syndromes Nat. Genet 26:103-105.[ISI][Medline]
Shrager J., P. Desjardins, J. Burkman, et al. (14 co-authors) 2000 Human skeletal myosin heavy chain genes are tightly linked in the Order Embryonic-IIa-IId/x-IIb-Perinatal-Extraocular J. Muscle Res. Cell Motil 21:345-355[ISI][Medline]
Simons M., M. Wang, O. W. McBride, S. Kawamoto, K. Yamakawa, D. Gdula, R. S. Adelstein, L. Weir, 1991 Human nonmuscle myosin heavy chains are encoded by two genes located on different chromosomes Circ. Res 69:530-539[Abstract]
Sneath P., R. Sokal, 1973 Numerical taxonomy; the principles and practice of numerical classification W. H. Freeman, San Francisco
Stedman H. H., M. Eller, E. H. Jullian, S. H. Fertels, S. Sarkar, J. E. Sylvester, A. M. Kelly, N. A. Rubinstein, 1990 The human embryonic myosin heavy chain: complete primary structure reveals evolutionary relationships with other developmental isoforms J. Biol. Chem 265:3568-3576
Strehler E. E., M. A. Strehler-Page, J. C. Perriard, M. Periasamy, B. J. Nadal-Ginard, 1986 Complete nucleotide and encoded amino acid sequence of a mammalian myosin heavy chain gene: evidence against intron-dependent evolution of the rod J. Mol. Biol 190:291-317[ISI][Medline]
Sweeney H., S. Rosenfeld, F. Brown, L. Faust, J. Smith, J. Xing, L. Stein, J. Sellers, 1998 Kinetic tuning of myosin via a flexible loop adjacent to the nucleotide binding pocket J. Biol. Chem 273:6262-6270
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Uyeda T. Q. P., K. M. Ruppel, J. A. Spudich, 1994 Enzymatic activities correlate with chimaeric substitutions at the actin-binding face of myosin Nature 368:567-569[ISI][Medline]
Venkatesh B., Y. Ning, S. Brenner, 1999 Late changes in spliceosomal introns define clades in vertebrate evolution Proc. Natl. Acad. Sci. USA 96:10267-10271
Weiss A., D. McDonough, B. Wertman, L. Acakpo-Satchivil, K. Montgomery, R. Kucherlapati, L. Leinwand, K. Krauter, 1999 Organization of human and mouse skeletal myosin heavy chain gene clusters is highly conserved Proc. Natl. Acad. Sci. USA 96:2958-2963
Wieczorek D., M. Periasamy, G. Butler-Browne, R. Whalen, B. Nadal-Ginard, 1985 Co-expression of multiple myosin heavy chain genes, in addition to a tissue-specific one, in extraocular muscle J. Cell Biol 101:618-629[Abstract]
Yamada A., M. Yoshio, K. Oiwa, L. Nyitray, 2000 Catchin, a novel protein in molluscan catch muscles, is produced by alternative splicing from the myosin heavy chain gene J. Mol. Biol 295:169-178[ISI][Medline]
Zhang J. C., S. Kim, B. P. Helmke, W. W. Yu, K. L. Du, M. M. Lu, M. Strobeck, Q. C. Yu, M. S. Parmacek, 2001 Analysis of SM22alpha-deficient mice reveals unanticipated insights into smooth muscle cell differentiation and function Mol. Cell. Biol 21:1336-1344