Evolutionary Implications of Three Novel Members of the Human Sarcomeric Myosin Heavy Chain Gene Family

Philippe R. Desjardins, James M. Burkman, Joseph B. Shrager, Leonard A. Allmond and Hansell H. Stedman

Department of Surgery, School of Medicine, University of Pennsylvania, Philadelphia


    Abstract
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Sarcomeric myosin heavy chain (MyHC) is the major contractile protein of striated muscle. Six tandemly linked skeletal MyHC genes on chromosome 17 and two cardiac MyHC genes on chromosome 14 have been previously described in the human genome. We report the identification of three novel human sarcomeric MyHC genes on chromosomes 3, 7, and 20, which are notable for their atypical size and intron-exon structure. Two of the encoded proteins are structurally most like the slow-beta MyHC, whereas the third one is closest to the adult fast IIb isoform. Data from pairwise comparisons of aligned coding sequences imply the existence of ancestral genomes with four sarcomeric genes before the emergence of a dedicated smooth muscle MyHC gene. To further address the evolutionary relationships of the distinct sarcomeric and nonsarcomeric rod sequences, we have identified and further annotated human genomic DNA sequences corresponding to 14 class-II MyHCs. An extensive analysis provides a timeline for intron gain and loss, gene contraction and expansion, and gene conversion among genes encoding class-II myosins. One of the novel human genes is found to have introns at positions shared only with the molluscan catchin/MyHC gene, providing evidence for the structure of a pre-Cambrian ancestral gene.


    Introduction
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The class-II myosins are distinguished from other members of this motor protein superfamily by their ability to assemble into thick filaments after homodimerization (Cope et al. 1996Citation ). The rod domain of the myosin heavy chain (MyHC) contains the dimerization interface, an amphipathic alpha-helix stabilized in a coiled-coil configuration (Lowey and Cohen 1962Citation ) by a combination of van der Walls forces and electrostatic interactions between the interdigitating amino acid side chains (Hu et al. 1990Citation ; O'Shea et al. 1991Citation ). The alternating pattern of acidic and basic residues on the surface of the homodimeric rod has been implicated in the control of higher order assembly in filaments (McLachlan and Karn 1982, 1983Citation ). The primary structure of the rod allows a subclass of these so-called conventional myosins to further assemble into highly ordered sarcomeres (Cripps, Suggs, and Bernstein 1999aCitation ), as exemplified by skeletal and cardiac muscles.

The developmentally regulated expression of distinct sarcomeric MyHC isoforms modulates both the contractile and bioenergetic properties of individual striated muscle cells (Barany 1967Citation ; Schiaffino and Reggiani 1996Citation ). In mammals, the previously characterized sarcomeric MyHC isoforms are encoded by two families of tandemly linked genes, with the process of isoform switching under transcriptional control. In the human genome, six genes (chromosome 17p13) encode the skeletal MyHC isoforms (Weiss et al. 1999Citation ; Shrager et al. 2000Citation ) and two (chromosome 14q12) encode cardiac isoforms (Mahdavi, Chambers, and Nadal-Ginard 1984Citation ; Saez et al. 1987Citation ). The prototypical gene in this subclass is 25 kbp in length with 6 kbp of coding sequence interrupted by 40 introns (Strehler et al. 1986Citation ). All but one of the genes at the human skeletal and cardiac MyHC loci conform to this consensus: the recently characterized extraocular MyHC gene is approximately 65 kbp in length (Briggs and Schachat 2000Citation ).

Three other class-II MyHC genes are currently recognized in mammals: the smooth muscle, nonmuscle A, and nonmuscle B MyHC genes. Their human chromosomal loci are 16p13 (Deng et al. 1993Citation ), 22q11.2 (Saez et al. 1990Citation ), and 17p13 (Simons et al. 1991Citation ), respectively. The smooth muscle MyHC gene encodes at least four distinct proteins by a process of alternative RNA splicing, and these represent the dominant if not sole MyHCs of adult smooth muscle (Babu, Warshaw, and Periasamy 2000Citation ). As implied by the nomenclature, the nonmuscle MyHC gene products are primarily involved in actin-based motor functions in other cell types, such as fibroblasts, endothelial cells, and leukocytes (Babu, Warshaw, and Periasamy 2000Citation ; Kelley et al. 2000Citation ; Seri et al. 2000Citation ). It is widely assumed that smooth muscle represents the most primitive of the muscle cell types because of the structural similarity of its myofilaments to those of nonmuscle cells (Alberts et al. 1Citation 994). This assumption is supported by the evolutionary relationships among the known class-II MyHCs: the mammalian smooth and nonmuscle isoforms appear to have diverged from an ancestral gene long before the onset of sarcomeric isoform diversification. The structural similarity of the vertebrate sarcomeric MyHC genes further suggests that the series of gene duplications that created this subclass postdated the major evolutionary period of intron insertion or removal.

We now present the identification and analysis of three novel human sarcomeric MyHC genes which challenge these assumptions. The genes are on human chromosomes 3, 7, and 20 and are thus physically unlinked to each other and to the previously characterized sarcomeric MyHC genes. The largest of the genes is at least 140 kbp in length. Quantitation of nucleotide sequence divergence among these distantly related genes establishes an approximate timeframe for three sarcomeric MyHC gene duplications that predated the emergence of the dedicated smooth MyHC gene. The atypical structure of these genes provides an estimate of the approximate time at which individual introns were inserted or removed in ancestral genomes. Two of the genes are predicted to encode slow-contracting MyHC isoforms, the third encodes a fast MyHC, providing important clues as to the functional significance of the gene products. The fast MyHC gene diverged first and has introns in positions seen thus far only in the recently described catchin and MyHC genes of the scallop Argopecten irradians. The invariant length of the rod-coding portions of the deduced cDNAs has implications for the use of these ancient sarcomeric MyHC isoforms as preferred molecular yardsticks in the study of early metazoan evolution. The interplay between the processes of repetitive element insertion and counterselection to restrict gene expansion is revealed by the pairwise comparison of orthologous mammalian MyHC genes, addressing an important theme in recent genome evolution.


    Experimental Procedures
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Cosmid Libraries
Total genomic cosmid libraries were constructed using pWE15 and Supercos vectors as described (Shrager et al. 2000Citation ). A full-length human embryonic MyHC cDNA was used to screen primary platings of the libraries and to probe Southern blots of EcoRI digests of DNA from purified clones (Stedman et al. 1990Citation ). Genes for six isoforms were initially identified, and a high-resolution physical map of the skeletal MyHC locus on chromosome 17 was constructed. Additional cosmid-cloned fragments corresponding to MyHC genes at other loci were further characterized by a combination of cDNA hybridization, PCR amplification, and direct nucleotide sequence determination using standard methods.

Loop Sequence Data
Universal primers were constructed from highly conserved coding sequences of both the ATP and actin-binding domains of previously characterized human MyHC isoforms. The polymerase chain reaction was used to amplify both domains from both subcloned and total genomic DNA, and the products were subsequently cloned into a TA-cloning vector (Invitrogen) for double-stranded sequencing. Confirmatory direct sequencing of the genomic cosmid was also performed using the same universal primers.

Raw Sequence Data
Additional sequences used in this study are based on (1) high throughput human genomic sequences provided to the public by the Baylor College of Medicine, the Sanger Center, the Washington University Genome Sequencing Center, and the Whitehead Institute ([skeletal locus: AC002347, AC005291, AC005323], [MYH 15: AC019169, AC020731, AC041004, AC069499], [MYH 11: AC011061, AC025518, AC026130], [MYH 16: AC004834 and AC005163, later NT_001651.2|Hs7_920], and [MYH 14: AL132825, later NT_011362.2|Hs20_11519]), and (2) expressed sequence tags (ESGs) from human cDNA libraries (as noted in table 2 ). On the basis of homology to sequences originally determined in our laboratory, all files were identified using the BLAST algorithms as implemented at http://www.ncbi.nlm.nih.gov:80/blast/blast.cgi?Jform=1.


View this table:
[in this window]
[in a new window]
 
Table 2 ESTs Matching Portions of Each of the Deduced Coding Sequences Are Tabulated by Position from 5' to 3'

 
Gene and cDNA Reconstruction
Using hybridization and STS data that formed the basis of a comprehensive physical map of the chromosome 17 locus (Shrager et al. 2000Citation ), raw high throughput sequences were analyzed and a contiguous sequence was assembled for the entire chromosome 17 locus. We have fully annotated these sequences and provided them to the publisher as annotated MacVector sequence files (also available from H.H.S. upon request). Additional high throughput genomic DNA sequence files composed of unordered fragments were restructured by reordering the fragments from 5' to 3' on the basis of homology to the human embryonic MyHC cDNA sequence. Unordered fragments from overlapping files were then used to resolve ambiguities and reciprocally fill in the sequence between fragments. The regions of coding sequence homology were screened to identify flanking sequences conforming to the intron splice consensus. The putative exon sequences were annotated as peptide-coding sequences and assembled in separate files to reconstruct the cDNAs. The assembled cDNA sequences were then used to requery the databases in search of perfectly matched EST sequences.

Ionic Interactions
Ionic interaction distributions were determined by arranging primary sequence in a 28 residue heptad repeat pattern already described (Stedman et al. 1990Citation ). Ionic interaction analysis was based on pairing of residues at (i to i' + 5) positions (Bandman, Matsuda, and Strohman 1982Citation ).

Divergence Analysis
Coding sequence divergence was calculated by aligning cDNA sequence as described in the text for all 14 human and selected nonhuman class-II MyHC genes and running divergence software (DIVERGE: Genetics Computer Group, Inc., Wisconsin Package). DIVERGE is based on the Perler analysis (Perler et al. 1980Citation ) which has been further modified for unbiased estimation of nonsynonomous substitution rates (Li 1993Citation ). The nonsynonymous substitution rates were subsequently used to generate an evolutionary topogram based on branch lengths calculated by the unweighted pair group method with arithmetic mean (UPGMA) (Sneath and Sokal 1973Citation , pp. 230–234) and the KITSCH method using default settings (Fitch-Margoliash method with contemporary tips) (Fitch and Margoliash 1967Citation ; Felsenstein 1993Citation ). Results obtained by these methods were compared to those obtained for cDNA and deduced peptide sequences aligned using Clustal W (Thompson, Higgins, and Gibson 1994Citation ) with calculation of branch lengths in the resulting phenograms by the Neighbor-Joining method (Saitou and Nei 1987Citation ) as implemented on the Biology Workbench website http://biowb.sdsc.edu/CGI/BW.cgi#!. Protein sequence divergence was also assessed using the parsimony algorithm of the PHYLIP program PROPARS (Felsenstein 1993Citation ).

Intron Position Analysis
Reconstructed cDNA and gene files were viewed in annotated format to facilitate the designation of intron position and phase. cDNA sequences were split at the head-rod junction based on the position of the conserved proline residue corresponding to residue 839 in the human embryonic MyHC peptide sequence. In the head domains, intron position is considered to have been conserved if the immediately flanking amino acid residues are homologous, despite change in the exact nucleotide position from the start codon. This is mandated by the variable lengths of the junctional loops. In the rod-coding sequence, intron position is based on strict preservation of nucleotide number 3' from the reference proline codon. Phases are designated as 1, 2, and 3 for introns interrupting the coding sequence before the first, second, and third bases of a codon, respectively. Graphical depictions of each of the annotated cDNAs were used as exported from MacVector with exon lengths shown on a proportional basis.

Analysis of Intron Length
Intron positions for all the genes in the skeletal MyHC cluster were exported from MacVector into Microsoft Excel for calculation of intron lengths. Repeatmasker (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker) was used to identify human repeat boundaries. A file corresponding to the extraocular MyHC gene deleted for all identifiable human repeats was used to generate a separate set of intron sizes. An anonymous sequence (AC019008) was found to represent the murine extraocular MyHC gene and was annotated to provide intron sizes for this orthologous gene comparison. This analysis also included the rat embryonic MyHC gene (X04267). Intron sizes were transferred into the statistical analysis application JMP (SAS Institute, Cary, NC) and used to calculate correlation coefficients, linear regression slopes, and 95% density ellipses for the bivariate normal distributions.


    Results
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Identification of Three Novel Class-II MyHC Genes in the Human Genome
Two complementary approaches were initially used to screen the human genome for previously unrecognized class-II MyHC genes: PCR amplification using degenerate primers and low stringency hybridization of cloned fragments to a full-length cDNA probe. The most informative of several PCR primer pairs targeted highly conserved coding sequences flanking the hypervariable 25- to 50-kDa loop region of the myosin head (fig. 1a ). Products generated from reactions primed with human and nonhuman genomic DNA were size fractionated on the basis of intron length variability (fig. 1b ). Sequences of the human PCR products confirmed the MyHC specificity of the amplifications. All but two of these products matched sarcomeric MyHC genes from the human cardiac and skeletal loci. Two novel sequences (fig. 1c ) derived from these PCR products exactly matched signature sequences derived from two sets of MyHC cDNA-hybridizing cosmids. Further analysis of these cosmid clones revealed hybridization patterns distinct from those previously identified in a systematic analysis of the skeletal MyHC locus (Shrager et al. 2000Citation ).



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 1.—Loop 1 signature sequences for novel human MyHC genes. a, PCR primers were designed to target conserved portions of the coding sequence for the motor domain. The presence of an intron disrupting the coding sequence for the 25- to 50-kDa junction (loop 1) facilitates the isolation of individual PCR products on the basis of size. b, Gel electrophoresis of PCR products reveals a spectrum of loop 1 intron sizes in MyHC genes from human, mouse, hamster, and rat total genomic DNA and in cloned individual genes. c, Reaction products were subcloned and sequenced, yielding identical results with two sets of templates: total human genomic DNA and selected cosmid-cloned fragments

 
During further analysis of these cosmids, high throughput sequence databanks were periodically screened for homology to the unique sequences until unannotated files containing exact matches appeared. Exon boundaries were established using intron splice donor-acceptor consensus sequences in conjunction with homology to other known human MyHC genes, facilitating assembly of deduced coding sequences for the novel genes. The complete genomic structure of one of the novel genes (MYH 14) and approximately two-thirds of the other (MYH 16) could then be established (fig. 2 ). The clones used to establish the raw data were physically mapped to human chromosomes 20 and 7, respectively. Additional BLAST hits prioritized by sequence similarity included BACs from chromosomes 3, 16, 17, and 22. The chromosome 3 and 17 BACs were each found to contain previously unrecognized MyHC gene sequences, and thus they were subjected to the same process of annotation and reassembly used for MYH 14 and 16. Throughout this process, the reconstructed coding sequences were used to rescreen human and nonhuman EST databases, with the resulting alignments used to verify exon boundary positions. The gene on chromosome 3 is also novel (MYH 15), with MyHC-coding sequences spanning four BACs in the high throughput databases (fig. 2 ). The genes on chromosomes 16, 17, and 22 correspond to the smooth muscle (MYH 11), nonmuscle B (MYH 10), and nonmuscle A (MYH 9) MyHC genes, respectively, demonstrating that genes for the full portfolio of previously recognized class-II MyHCs is contained in the draft and completed human genome databases screened. Following the initial review of this manuscript, we recognized an additional BLAST-identifiable MyHC homologue in fragmented draft sequence for chromosome 19. The sequence information in the available files (AC010515 and AC020906) suggests a possible nonsarcomeric class-II myosin and does not presently support the detailed analysis undertaken below for the other genes. Alignment and comparison of deduced polypeptide sequences for all previously characterized human myosin genes reveals a series of class-specific motifs. The novel MyHCs match the class-II consensus sequence at all these sites (fig. 3a ). Furthermore, the deduced coding sequences for the novel genes indicate that the encoded MyHCs have tails of approximately 1,100 amino acid residues. The sequence of each tail domain predicts an amphipathic alpha helix with homology to both the class-II myosins and the dimerization domain of the leucine zipper proteins. In the regions corresponding to amino acids 215–225 and 367–378 in the human embryonic skeletal MyHC, the homology data suggest that these genes encode sarcomeric MyHCs.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 2.—Physical maps for novel human MyHC genes. Note the range in sizes: 25 kbp for MYH 14 and 140+ kbp for MYH 15. The first 15 coding exons of MYH 16 span approximately 20 kbp of cosmid D based on hybridization data. Vertical lines represent the positions and approximate sizes of exons. Horizontal lines represent the portions of each gene encoding distinct structural domains of the proteins. The asterisks indicate the positions of the target sequences shown in figure 1

 


View larger version (91K):
[in this window]
[in a new window]
 
Fig. 3.—a, Conserved sequence domains among class-II MyHCs. All three of the novel genes match the class-II consensus sequences (Sarc C) and show considerable divergence from human myosins of other classes. The sequence blocks beginning at residues 215 and 367 both show greater similarity to sarcomeric than nonsacomeric class-II MyHCs. Unconventional myosins 1C and 9A are ubiquitously expressed, whereas 1 beta, 6 and 7A are expressed primarily in cochlear sensory hair cells and retinal pigment epithelium (for further information see annotations in GenBank and relevant Medline citations). Source of class-II myosin sequences as stated elsewhere in text. Accession numbers for non–class-II human myosins: Myo 1 beta—X98507, Myo 1C—U14391, Myo 6—U90236, Myo 7A—NM_000260, and Myo 9A—AF117888. b, Skip residues in the MyHC rods match the sarcomeric consensus. All previously characterized sarcomeric MyHCs have four skip residues in highly conserved positions relative to the hydrophobic face of the alpha helical coiled coil, namely residue numbers 351, 548, 745, and 970 relative to the rigidly conserved proline at the head-rod junction. All previously characterized smooth and nonmuscle class-II MyHCs have only three skip residues, lacking the one at position 548

 
Further Structural Evidence That the Novel MyHCs Are Sarcomeric
Detailed analysis of the rod domains supports the identification of the novel MyHCs as sarcomeric. A 28 residue repeat pattern, composed of four 7 residue repeat segments, is universal among class-II myosin rods. The 7 and 28 residue repeat pattern is interrupted by essential skip residues that modulate the pitch of the supercoil structure (McLachlan and Karn 1982, 1983Citation ). A proline residue acts as the beginning reference residue to the heptad repeat motif of the sarcomeric rod domain. Designating this proline as residue 1 at the beginning of the rod domain, the skip residues among previously characterized vertebrate sarcomeric MyHCs occur at exactly the following positions: 351, 548, 745, and 970. Using the amino acid sequence alignment strategy of Offer (Offer 1990Citation ), the number of skip residues and the sequence of the deduced flanking amino acids of the novel myosins establish much greater similarity to the sarcomeric than to the nonsarcomeric class-II myosins (fig. 3b ).

In addition to the interaction of hydrophobic residues in the a and d positions of the alpha helices, interhelical ionic interactions can theoretically form between the g residue of each heptad repeat on one strand and an oppositely charged residue five positions further from the N-terminus (e') on the opposite strand (i, i + 5) (McLachlan and Stewart 1975Citation ). The hypothetical (i, i + 5) interaction configuration has been confirmed from crystallographic analysis of GNC4, a DNA-binding domain containing a leucine zipper motif in yeast (O'Shea et al. 1991Citation ; Ellenberger et al. 1992Citation ). The distribution of potential ionic interactions within a single (Letai and Fuchs 1995Citation ) and between paired (Arrizubieta and Bandman 1998Citation ) rod domains of a specific type of MyHC is unique, providing a distinct ionic distribution fingerprint that may be used as a criterion for determining the propensity for higher order assembly. Side-by-side comparisons of these patterns reveal dramatically greater similarity between the novel MyHCs and the other human sarcomeric MyHCs than between either of these and the human nonsarcomeric class-II MyHCs (fig. 4 ). On the basis of the previously observed dichotomy in the pattern of ionic distributions in class-II myosin tails structures, these data predict that the novel MyHCs are sarcomeric (Arrizubieta and Bandman 1998Citation ).



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 4.—Ionic charge interactions in the MyHC rods. The positions of all pairs of oppositely charged residues conforming to the i versus I + 5 rule for interhelical salt-bridge formation are noted

 
Structural Features of the Motor Domains Suggest Two Are Slow and One Fast
Sarcomeric myosin isoforms progress through the crossbridge cycle and hydrolyze ATP at widely varying rates depending on the primary structure of the motor domains (Sellers 1999Citation , 237 pp). The best estimates of maximal contractile rate are provided by physiological measurements in intact skeletal muscle fibers of known MyHC content. We used the pam 250 scoring matrix to compare the aligned amino acid sequences for the novel MyHC head domains with those for physiologically well-characterized MyHCs, namely the cardiac beta (slow skeletal type I), embryonic, adult fast type IIa, IId/x, and IIb, in order of increasing contractile Vmax (see Shrager et al. 2000Citation and references therein). The results (table 1 ) suggest that the MyHCs encoded by MYH 14 and 15 are slow, whereas the MYH 16 product is fast.


View this table:
[in this window]
[in a new window]
 
Table 1 Similarity Scores for Amino Acid Sequencesa

 
As anticipated, the regions of greatest sequence divergence in these alignments are in the N-terminal domain and the two surface loops at the 25- to 50-kDa (loop 1) and the 50- to 20-kDa (loop 2) junctions. The study of chimeric MyHCs has previously implicated these loop domains in the control of the enzymatic and biophysical properties of the motor domain. The loop domains were realigned with those from all of the other human sarcomeric MyHC sequences and colored by amino acid side chain subtype to facilitate pattern recognition (fig. 5 ). Notable features of the novel MyHC loops in these alignments are as follows—loop 1: MYH 14, substitution of aliphatic C-terminus; MYH 15, loss of N-terminal positive charge cluster; MYH 16, reversal of charge in first residue and adjacent substitution of uncharged polar residues; loop 2: MYH 14, paired proline residues, MYH 15 sequence not available; MYH 16, loss of SNYAG conserved motif at N-terminus.



View larger version (69K):
[in this window]
[in a new window]
 
Fig. 5.—Aligned amino acid sequences for the loop 1 and 2 junctional domains. Conserved sequences were used to anchor the hypervariable portions of each loop, the junctional domains at the protease-sensitive boundaries between the 25-, 50-, and 20-kDa subfragments of the myosin head. For reference, the positions of residues relative to the amino acid sequence of MYH 1 are given in the figure. The order reflects the designation of the MYH 14 and 15 myosins as slow and the MYH 16 myosin as fast. (Sequence not available for the exon encoding the amino terminal half of MYH 14 loop 2)

 
Evidence for Expression of the Novel Genes
Assembled cDNA sequences for each of the novel MyHCs were used to detect human expressed sequence tags (ESTs) according to the BLAST algorithm. Homologous sequences were identified and aligned to all the human class-II MyHC cDNA sequences with the finding that at least two ESTs uniquely matched (>97% identical) portions of each of the reconstructed cDNA sequences corresponding to the novel genes (table 2 ). In all the cases, the ESTs were found to be much more divergent from other human sarcomeric MyHC cDNAs, indicating a one-to-one correspondence between gene and EST. The ESTs represent a variety of human source tissues, including 10-week-old whole embryo. The submitted sequences were either anonymous or misidentified as cardiac beta MyHC despite significant mismatch. None of the ESTs were found in human skeletal muscle cDNA libraries, reflecting the extraordinary abundance of the previously characterized skeletal MyHC isoforms.

Evolutionary Relationships Among 11 Sarcomeric and 3 Nonsarcomeric MyHC Genes
The requirement of lateral association of the alpha helical rod domains of the class-II MyHCs precludes the insertion or deletion of codons in the corresponding portion of the encoding genes. This structural constraint greatly facilitates the unambiguous alignment of deduced amino acid sequences across a large evolutionary distance. The rod-encoding portions of the novel MyHC cDNAs were aligned with all other human and selected nonhuman class-II MyHC cDNA sequences. Nonsynonymous substitution rates were determined using the coding sequence divergence algorithm of Perler et al. (1980)Citation as modified by Li (1993)Citation (fig. 6a ). Figure 6b shows a rooted evolutionary topogram with branch lengths based on the pairwise divergence distances (Shrager et al. 2000)Citation . The branch lengths shown were calculated by the UPGMA (Sneath and Sokal 1973Citation , pp. 230–234). Virtually identical results, shown as numbers in parantheses in figure 6b, were obtained using the KITSCH method with the default settings (Fitch-Margoliash method with contemporary tips) (Fitch and Margoliash 1967Citation ; Felsenstein 1993Citation ). In these analyses, the tacit assumption of a uniform molecular clock is addressed by two pairs of substitution rates. Full-length cDNA sequences are available for both sarcomeric and nonsarcomeric MyHCs for Drosophila and chicken. As indicated in both figure 6a and b, the divergence distances for both cDNAs are similar for both species, indicating that the nonsynomymous substitution rates have been relatively uniform for this family of proteins.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 6.—a, Distance matrix for rod-encoding domain of class-II MyHC cDNAs. Accession numbers are given at right, asterisks indicate genomic DNA files from which we have reconstructed cDNA sequences as described in the text. The nonsynonymous substitution rate algorithm of Perler et al. (1980)Citation , as modified by Li (1993)Citation , is used. All sequences are aligned beginning with the invariant proline residue that marks the head-rod junction. To optimize alignment, the codon encoding the skip residue at amino acid position 548 is deleted from all sarcomeric myosin cDNAs (see fig. 3b ). cDNA sequences are truncated after the codon for amino acid #1090 following the initial proline to eliminate the loss of alignment in the variable length unique C-terminus (removing an average of eight residues for sarcomeric and 40 residues for nonsarcomeric class-II myosins). Boxes with bold borders identify the orthologous gene substitution rates used for positioning the vertical lines in figure 6b. Sizes of the human genes, measured in basepairs from start to stop codon, are shown along the lowermost row of the table. The abbreviation kbp indicates kilobase pairs and indicates a lower limit on gene size when draft sequence files have not been finalized in GenBank. b, Evolutionary relationships among the class-II MyHC genes. All 14 human class-II MyHC genes are depicted. In the MIM numbering scheme (McKusick 1998Citation ), the previously characterized 11 genes have numbers from MYH 1–13 except for 5 (unused) and 12 (now MYO5A, see http://www.ncbi.nlm.nih.gov/omim/). The anchored topogram indicates the estimated time of divergence of individual genes relative to the divergence of species. Branch lengths calculated from the distance matrices by two different algorithms are provided, the leftmost number corresponding to the result provided by the UPGMA (Sneath and Sokal 1973Citation ) and the rightmost number (in parentheses) corresponding to the result provided by the KITSCH method with default settings (Fitch-Margoliash method with contemporary tips; Fitch and Margoliash 1967Citation ; Felsenstein 1993Citation ). Vertical bars correspond to the calculated nonsynonymous substitution rates for orthologous genes from other species, indicating the approximate timing of speciation events relative to the duplication of ancestral genes. For simplicity, numerical data are presented such that the numbers correspond to the additive branch lengths connecting tip to node and back to tip (i.e., eliminating the need to add branch lengths)

 
Alignment of these cDNA and deduced peptide sequences using Clustal W followed by phenogram construction using the Neighbor-Joining method yielded a virtually identical topology with similar relative branch lengths. The only exception was the grouping of the extraocular MyHC gene with the other five skeletal MyHCs after divergence of the cardiac alpha-beta gene pair. This divergence order corresponds to the physical linkage of the two gene clusters and suggests that a primordial skeletal-cardiac MyHC locus was split by translocation before the majority of further duplications occurred. Among the eight skeletal-cardiac MyHC gene descendants, only one, the modern-day cardiac beta-skeletal type-I gene, has retained the transcriptional machinery necessary for expression in both cardiac and skeletal muscle.

As a further check on the overall topology of the tree, the deduced polypeptide sequences were reanalyzed using a parsimony algorithm (PROTPARS) of the PHYLIP software package (Felsenstein 1993Citation ). This program relies on a strategy of adding sequences sequentially in the order in which they are listed, followed by the comparative evaluation of a limited number of local rearrangements. Regardless of the order in which the sequences were presented in the input file, the topology of the most parsimonious tree(s) identified differed from the illustrated evolutionary tree by no more than 1% of the overall number of substitutions required (data not shown). Despite the fact that this approach relies on a different set of assumptions about the weights of peptide substitutions than does the Perler analysis, the results support the general conclusions of the former approach.

Branch lengths defined by the foregoing methods suggest that the novel genes all trace to duplications that predated both the smooth muscle and nonmuscle MyHC gene divergence and the finned-fish–tetrapod phylogenetic divergence. Pairwise comparisons to genes from other species identify the orthologous relationships and allow an estimate of the relative timing of gene and species divergence events, as shown by the vertical lines in figure 6 . A striking result from this analysis is the finding that MYH 16 is almost as divergent from the other human sarcomeric MyHCs as all of these human genes are from the single sarcomeric MyHC genes in Drosophila and Argopecten. This conclusion is further supported by the analysis of intron positions (subsequently) and recently available sequences of orthologous genes from other species (see Discussion). Among the ancestral human class-II genes, only the sarcomeric-nonsarcomeric divergence appears to have predated the invertebrate-vertebrate split. Our data (not shown) on divergence could not identify an orthologous relationship between any of the novel genes and the four sarcomeric MyHC genes of Caenorhabditis elegans (Dibb et al. 1989Citation ), suggesting that the latter diverged after the ancestral split between species.

Novel MyHC Genes Have Introns in Atypical Positions: A Timeline for Loss and Gain
As an initial step in the analysis of evolutionary relationships among the newly recognized members of this gene family, we focused attention on the conservation of intron position. The coding sequences of the eight previously characterized sarcomeric MyHC genes are interrupted by introns at 37 conserved positions, implying a similar structure in a common ancestral gene. Four of these eight genes (embryonic, extraocular, alpha, and beta) have an additional intron interrupting the coding sequence four to seven codons 5' of the stop signal. Figure 7 depicts the positions of the introns for each of the human class-II MyHC genes, relative to the start and stop codons in the assembled cDNA sequences.



View larger version (78K):
[in this window]
[in a new window]
 
Fig. 7.—Positions of introns relative to conserved domains in the coding sequence. Intron positions and phases (one unless otherwise indicated) relative to the coding sequence are depicted for all human class-II MyHC genes and the sarcomeric MyHC genes of D. melanogaster and A. irradians. A widened bar represents the presence of an intron at an atypical position in a sarcomeric MyHC gene. Note that intron positions unique (in the human gene family) to MYH 16 are conserved with the A. irradians catchin-MyHC gene. Note also that there is virtually no discernable intron positional conservation in comparing the sarcomeric to nonsarcomeric MyHC rod-coding regions, whereas the conservation among nonsarcomeric MyHC genes is absolute

 
The coding sequence for MYH 14 and 15 are interrupted by introns at precisely the same positions as the embryonic MyHC at 35 and 37 sites, respectively, out of a total of 38. This contrasts sharply with a comparison to the coding sequence for the smooth MyHC gene in which only 17 of 38 intron positions are conserved. Fifteen of the conserved intron positions are in the head-encoding portion of the gene. All 39 of the intron positions are conserved in a three-way comparison of the human smooth, nonmuscle A, and nonmuscle B MyHC genes. MYH 14, 15, and 16 share extra introns interrupting the coding sequence homologous to embryonic MyHC exons 16 (the only site also shared with the nonsarcomeric MyHC genes) and 34 and lack intron 31. The sequence currently available for MYH 16 begins at a position corresponding to intron 15 in the human embryonic MyHC gene. The remainder of this gene has 28 introns, 20 of which (13 in the rod) are at positions shared with the embryonic MyHC gene. Five of the eight extra introns in the MYH 16 rod domain are shared only with the catchin and MyHC genes of the scallop A. irradians (Nyitray et al. 1994Citation ; Yamada et al. 2000Citation ), whereas none are shared with the smooth or nonmuscle MyHC genes.

Relationship Between Intron Size and Gene Size and Repetitive Sequence Content
MYH 14 is 24,206 bp in length (start to stop codon) and therefore similar to the majority of previously characterized mammalian sarcomeric MyHC genes (fig. 6a, lowermost row). The assembled draft sequence for MYH 15 reveals an unexpectedly large size of >142,000 bp, whereas MYH 16 is intermediate in size, an estimated 60,000 bp extrapolating from the 45,200-bp spanned by exons 16 through 40. The findings of anomalous size in these genes prompted us to annotate draft or completed sequence for all other identifiable human class-II MyHCs to assess the relationship between intron size, intron position, repetitive sequence content, and coding sequence divergence. Analysis of the reconstructed gene sequences revealed sizes of 63,577, >129,000, and >106,000 for the extraocular, smooth muscle, and nonmuscle MyHC B genes, respectively. In view of the topology of the evolutionary diagram in figure 6 , data on gene size suggest that the human extraocular MyHC gene has undergone a recent expansion, MYH 16 and the nonmuscle A gene (MYH 9) have undergone recent contractions, and the other sarcomeric MyHC genes have been remarkably constant in size since the MYH 14 divergence.

Maps of representative genes and selected vertebrate orthologs were aligned to facilitate the further analysis of intron evolution (fig. 8 ). Despite their similar overall size, even the most closely related human sarcomeric MyHC genes have widely divergent intron sizes (fig. 8b ). In contrast, intron size conservation is readily apparent in the orthologous gene comparisons (fig. 8c ). The intron size distributions were plotted as scattergrams with density ellipses for selected pairwise gene comparisons (fig. 9 ). Correlation coefficients achieve a level of >0.5 only for the orthologous gene comparisons. The orthologous human and rat embryonic MyHC genes are of similar size (slope 0.978). Interestingly, the slope of the linear regression fit for the human-mouse extraocular gene comparison is 1.51, as suggested by the paired alignment in figure 8c (note scaling difference). Thus, there has been a process of proportional intron expansion in the human gene or contraction in the murine gene since the time of the species divergence. The human gene has 57,763 bp of intervening sequence spanning its 63,577-bp total length, of which 21,790 bp is repetitve (34.3% of total or 37.7% expressed as percentage of the intron sequence). If this repetitive DNA sequence is deleted in the human to mouse intron length comparison, the length correlation increases from 0.8555 to 0.8644, and the slope of the linear regression fit reduces from 1.51 to 0.818. This suggests that the length difference can be largely attributed to the insertion of repetitive elements during comparatively recent primate evolution.



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 8.—Intron length comparisons among closely related genes. a, Physical map of the skeletal MyHC gene cluster at human chromosome 17p13. b, Alignment of the most closely related genes at 17p13 illustrating the disparity in intron sizes. Large filled boxes depict exons, small open boxes depict repetitive sequences (A: Alu; L: LINE; M: MIR). Note the relative scarcity of repetitive sequences in these genes (compared with extraocular in [c]) and the similarity in their overall size (numbers at right). c, Alignment of the human and mouse extraocular MyHC genes as well as a version of the human extraocular gene from which all repetitive sequences have been deleted. Note the similarity in the relative lengths of introns, despite the high density of repetitive sequence elements.

 


View larger version (43K):
[in this window]
[in a new window]
 
Fig. 9.—Quantitative analysis of intron length correlation. Intron length comparisons are plotted on an intron-by-intron basis for selected gene pairs as indicated along the x and y axes. Density ellipses (95%) for the bivariate normal distributions and slopes for a least squares fit are shown. (a) Intron lengths are compared on an intron-by-intron basis for selected gene pairs from the skeletal MYH gene cluster. Correlation coefficients range from negative for the physically most distant paralogous genes (human embryonic and extraocular) to almost one for the orthologous human and mouse extraocular genes. (b) Representative plots of intron length comparisons with the identity of the gene pairs indicated along the x and y axes. Note slope approaching one for orthologous human-rat embryonic gene comparison. (c) Evidence for recent expansion of the human extraocular gene. Note slope of 1.51 for human-mouse orthologous gene comparison. If all identifiable repeats are deleted from the human sequence, the correlation coefficient increases and the slope decreases to less than one

 
The average density of repetitive DNA sequence elements in this portion of the human genome is illustrated by the 80,614-bp intergenic region between the embryonic stop codon and the IIa start codon. In this interval, 47,881 bp of the sequence is repetitive (59.4%), comparable to the 53.6% annotated by RepeatMasker for the first 100 kbp of the human smooth muscle MyHC gene. The density of repetitive sequence is reduced to 27.0% in the embryonic MyHC gene, 18.3% in the perinatal gene, and 8.96% in the IIb gene, suggesting that the developmental and organismic patterns of gene activation correlate with the selective pressure against repeat insertion.

Conserved Intron Sequence: Evidence for Concerted Evolution
We were intrigued by the finding of near-identical intron sizes at several positions during the original PCR screen for novel MyHC genes. When further analysis revealed >95% sequence identity in these introns, we broadened the search for evidence of recent gene conversion. The subsequent availability of high throughput DNA sequences matching all of the initial PCR products facilitated a locus-wide cross-comparison of all of the intron sequences for each of the skeletal MyHC genes. Figure 10 shows a sampling of alignments of introns from the other genes with those of the IId/x gene. In most cases, portions of the flanking exons have extraordinarily low synonymous substitution rates, as indicated by the extent of nucleotide similarity listed in figure 10 . Note that in introns 13 and 23, homology exists between the IId/x and perinatal genes in the absence of homologous sequence in the intervening IIb gene.



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 10.—Intron sequence identity among tandemly linked genes at the human skeletal MyHC locus: evidence for recent gene conversion. Selected introns with >95% nucleotide sequence identity in pairwise gene comparisons are shown. The fractions in bold font depict the degree of overall nucleotide similarity in the region of putative gene conversion, extending into the exons flanking the involved introns

 
The orthologous gene alignments reveal several introns of near-identical size across species, suggesting the possibility that functional constraints are responsible for the sequence similarity seen in the parologous gene comparisons. To address this possibility, we used the human intron sequences to screen the orthologous genes for sequence similarity. No orthologous intron sequence alignment of greater than 50 bp had a sequence identity score over 50%, supporting the hypothesis that gene conversion is responsible for the sequence similarities shown in figure 10 .


    Discussion
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Three Novel Sarcomeric MyHCs: An 11-membered Family in Homo sapiens
With the identification of three novel genes encoding sarcomeric MyHCs, the size of the gene family in H. sapiens now stands at 11. Our use of high throughput human genome sequence in its draft stages greatly expedited the reconstruction of full-length cDNA sequences for each of the novel genes. Seven sarcomeric MyHC genes have been recognized in mammals since 1985 (Wieczorek et al. 1985Citation ; Mahdavi et al. 1986Citation ), with the eighth, alternatively called the IId or IIx, identified in 1993 (DiNardi et al. 1993Citation ). Several factors contributed to delay in recognition of the three novel genes. First, the nucleotide sequences are sufficiently divergent from those of the eight previously characterized genes as to require low stringency hybridization during the cloning process. Second, the genes are not physically linked to the skeletal and cardiac gene clusters. Third, MYH 15 and 16 are atypically large, spanning 3–6 times the physical size of the prototypical mammalian sarcomeric MyHC gene. Finally, transcription of these genes may be spatially or temporally restricted so as to limit their representation in commonly available human muscle cDNA libraries. The dominance of skeletal and cardiac MyHC transcripts would preclude enrichment for these rarer transcripts in hybridization-subtracted libraries.

A Revised Evolutionary History for the Class-II MyHC Gene Family—Four Sarcomeric Genes Before the Emergence of a Dedicated Smooth Muscle MyHC Gene
The overall topology of the evolutionary tree provides evidence that, in the ancestral lineage leading to H. sapiens, stronger selective pressures have driven the diversification of sarcomeric than nonsarcomeric class-II myosins. The branch lengths in the evolutionary tree further imply that most vertebrate genomes contain orthologs to the three novel sarcomeric as well as the three nonsarcomeric MyHC genes. Tandem duplications of only one of the four primordial sarcomeric MyHC genes present at the time of the nonsarcomeric MyHC gene divergence were genetically fixed in this ancestral lineage. The divergence data for the carp-human cDNA comparisons indicates that at least three of the seven additional gene duplications had occurred by the time of the ray-finned fish–tetrapod phylogenetic divergence. The products of the most recently duplicated MyHC genes accumulate to extraordinary levels in the bodies of modern day vertebrates, comprising approximately 35% of the protein content of striated muscle cells (which in turn account for 40%–50% of the total body mass). Thermodynamic measurements suggest that fine tuning of the sarcomeric myosin ATPase is critical for energy conservation at both a cellular and organismic level (Schiaffino and Reggiani 1996Citation ). An ancestral gene with the proper cis-acting transcriptional control elements for widespread expression in the cardiac and the major locomotive muscles may have come under intense selective pressure for successive duplication. In contrast, the apparent absence of further duplication of the novel genes suggests that there has been little pressure for additional diversification of these isoforms, perhaps reflecting a restricted pattern of expression in an early metazoan ancestor.

The forgoing evolutionary reconstruction is based entirely on molecular evidence from the class-II myosin rod domains, a region chosen because of the unambiguous sequence alignment throughout the coiled-coil tail. The use of dot matrices in the comparison of distantly related myosin sequences has revealed the distinct patterns of divergence in the head and rod domains. Recent studies of molecular evolution in defined subregions of the myosin head have revealed unexpected sequence similarity in distantly related but kinetically similar myosins (Goodson, Warrick, and Spudich 1999Citation ), suggesting a process of convergent evolution. This prompted our studies of divergence in the motor domain and lends support to our contractile rate predictions for the novel gene products, despite evidence from the rod domains that these genes diverged before the diversification of the skeletal and cardiac MyHCs. As noted by Goodson, Warrick, and Spudich (1999)Citation , it is rarely feasible to perform kinetic analyses on intact fibers or homogeneous preparations of myosin from tissue sources.

While this manuscript was under review we became aware of two cDNA sequences for cat (Felis catus) genes most closely related to MYH 7 and MYH 16 (accession numbers AF229810 and U51472, respectively). Interestingly, cDNA for the MYH 16 ortholog was isolated from the powerful jaw closing muscles and identified as a superfast isoform. Application of the neighbor-joining method of phylogenetic comparison to the orthologous human and cat cDNA sequences yields an unrooted phenogram with the following topology and branch lengths: ([Cat Beta:0.02797, Human Beta:0.03933]:0.41712, Cat Superfast:0.03472, Human Superfast:0.04496). This indicates that the MYH 16 genes in both species are currently diverging at approximately the same rate as the cardiac beta genes, consistent with our simplifying assumption of a relatively uniform molecular clock for the class-II MyHC rod-encoding sequences. These data argue against an alternative scenario, equally plausible a priori, in which MYH 14–16 originated with a comparatively recent series of gene duplications and subsequently diverged more rapidly than their widely expressed counterparts (i.e., MYH 1–11) as a result of relaxed selective pressures.

Branch lengths on our evolutionary trees imply that three duplications separating the novel sarcomeric MyHC genes from the previously described members of this subfamily predated the divergence of the smooth muscle from the nonmuscle MyHC genes. This suggests the counterintuitive possibility that these sarcomeric gene duplications predated the emergence of smooth muscle as a distinct cellular lineage in ancestral metazoans. The metabolic demands of locomotive muscle are likely to have restricted the size and shape of early metazoans until body plans evolved to amplify substrate thoughput and improve circulatory homeostasis. In modern day vertebrates, smooth muscle cells expressing dedicated nonsarcomeric class-II MyHCs are indispensable for intestinal peristalsis and the regulation of vascular tone. How have the body plans for the largest of the modern day invertebrates (e.g., giant squid, lobster, giant crab) addressed demands for regulated oxygen and substrate distribution to the locomotive muscle mass without a cell lineage homologous to vertebrate smooth muscle? Molluscan smooth muscles are defined morphologically as unstriated but exhibit several features which more closely resemble the striated than the smooth muscles of vertebrates (Chantler 1983Citation ). Our data show that the gene encoding smooth muscle MyHC in the bay scallop A. irradians (Nyitray et al. 1994Citation ) is most closely related to the sarcomeric MyHC genes of vertebrates. Conversely, the vertebrate smooth muscle ortholog of the scallop Patinopecten yessoensis is expressed only in nonmuscle cells (Hasegawa 2000Citation ). Muscles lining the aortic wall of the lobster and crab appear to be striated (Davison, Wright, and DeMont 1995Citation ), suggesting the possibility that, by a process of convergent evolution, a striated muscle cell lineage functionally substitutes for vertebrate smooth muscle throughout the vascular tree of large arthropods. These observations have important implications for the study of the transcriptional networks involved in smooth muscle lineage specification (Cripps, Zhao, and Olson 1999bCitation ; Carson et al. 2000Citation ; Zhang et al. 2001Citation ).

Structures of the Novel Genes Provide a Timeline for Intron Loss and Gain
When considered in the context of the evolutionary topogram for the class-II MyHC gene family, the tabulation of intron positions in the novel genes provides clear evidence for intron loss and gain during well-defined time intervals. For instance, the intron interrupting the coding sequence homologous to exon 34 of the embryonic MyHC gene was present in a common ancestor to all the three of the novel genes but was lost in a more recent ancestor to all of the modern day cardiac and skeletal MyHC genes. During the same time interval, there was reciprocal gain of an intron at the position occupied by intron 31 in the embryonic MyHC gene. The absolute number of introns interrupting the coding regions of all of the class-II MyHC genes is well conserved at 39 ± 2. Interestingly, 13 of the 20 introns in the head-encoding domains of these genes occupy positions conserved across the entire spectrum of duplicated genes, whereas this applies to only 1 of the 20 introns in the tail-encoding domain. Relative to the skeletal and cardiac MyHC genes there are six missing introns and seven new ones in MYH 16, numbers intermediate to those for the skeletal versus smooth comparison. As a result of the coiled-coil alpha helical structure of the rod domains there is a requirement for at most one gap element, corresponding to the second sarcomeric skip residue (fig. 3b ), to achieve unambiguous alignment of the deduced amino acid sequences. This unique feature of the class-II MyHC rod serves to anchor the assigned intron positions. All of the intron positions and phases are rigidly conserved between the physically unlinked smooth, nonmuscle A, and nonmuscle B genes. Thus, in H. sapiens ancestors, the period of intron loss and gain spanned only the first 4 of the 13 duplications that created the modern day class-II MyHC gene subfamily.

Two opposing models for the origin of spliceosomal introns have been proposed: introns early, in which introns predated and facilitated the initial assembly of exons into genes (Gilbert, Marchionni, and McKnight 1986Citation ; de Souza et al. 1998Citation ) and introns late, in which introns were inserted into preexisting protein coding sequences (Logsdon and Palmer 1994Citation ; Logsdon 1998Citation ). The overall conservation of intron number, but head-specific conservation of intron position, is difficult to reconcile with either model in the extreme. The intermediate structures of the novel MyHC genes suggest an alternative model in which a primordial class-II MyHC gene had at least 25 introns, including all 13 of those at conserved positions in the head domain, but fewer than the 60 introns necessary to account for all of the sarcomeric and nonsarcomeric intron positions combined. Subsequent genes (in the genomes of direct ancestors of H. sapiens) underwent quasi-reciprocal intron losses and gains with preservation of the average exon size. In less complex organisms, there was increasing selective pressure for genome contraction, with net loss of introns exceeding gains. As exemplified by the sarcomeric MyHC genes of Drosophila melanogaster and A. irradians, this was partially offset in some ancestral lineages by the pressure for isoform diversification, with the resultant fixation of duplicated exons supporting productive alternative splicing. If, as recently proposed by Venkatesh, Ning, and Brenner (1999)Citation , a pair of duplicated exons acquired the elements required for splicing, a new intron position would be established, most likely at a protosplice site conforming to the MAG/R empirical rule. Although the exact register of codons is strictly conserved over most of the MyHC class-II rod-encoding domain, there is a wider range of neutral substitutions available than for most codons of the MyHC head. The resultant acceleration of point mutational drift in the rod-coding sequence would be expected to facilitate the process of intron insertion across spontaneously duplicated exons, accounting for the observed asymmetry in positional conservation.

Evolution of Intron and Gene Size
Our evolutionary analysis of the class-II MyHC genes contributes a new perspective to the general study of gene expansion and contraction. The 14 known members of this gene family vary in size by almost an order of magnitude, yet the majority of the genes fall into a narrow range approximating 25 kbp. The most parsimonious hypothetical scheme to account for the observed distribution of human MyHC gene sizes assigns a size of >100,000 bp to a primordial class-II MyHC gene, with comparatively rapid loss of intron sequence to yield a size of approximately 25,000 during the lineage connecting the ancestral gene that existed prior to the MYH 15 divergence to the last common ancestor to MYH 1–8 and 14. In this scenario, MYH 16 and the nonmuscle A gene (MYH 9) independently lost intron sequence after their respective divergences from last common ancestors with the other genes. Both the human-rat and human-chicken gene comparisons reveal an approximate rate at which intron size differences emerge, relative to the molecular clock for mutational drift in the coding sequence. The human-mouse comparison suggests that the human extraocular gene has uniquely undergone a recent reexpansion, driven in part by the appearance of additional species-specific repetitive DNA. The stochastic nature of this process likely accounts for the proportional reexpansion of most of the introns in this gene. The implied rate of this reexpansion further suggests that selective pressures have maintained the smaller size of the genes encoding the more abundantly expressed sarcomeric isoforms, restricting the local accumulation of repetitive DNA.

Phenotypic Implications of Concerted Evolution
The finding of intron sequence conservation in paralogous genes was especially surprising in regions adjacent to the exons encoding the MyHC junctional loops. Several lines of evidence suggest that variation in the junctional loop amino acid sequence has a profound effect on both the kinetics of the myosin ATPase and the velocity of unloaded muscle contraction (Uyeda, Ruppel, and Spudich 1994Citation ; Murphy and Spudich 1998Citation ). Loop 1 and loop 2 sequences are thought to have coevolved to allow thermodynamically optimal matching between the rates of nucleotide release and actin binding during the cross bridge cycle (Sweeney et al. 1998Citation ). Gene conversion facilitates swapping of transcriptional regulatory and protein-coding domains between closely linked paralogous genes, a process that in most cases neutralizes functional differences among the genes (reviewed in Papadakis and Patrinos 1999Citation ). Evidence for a recent conversion involving the loop 1 domain of the IIa and IId/x genes (and less recently the perinatal gene) implies a selective advantage associated with the elimination of mutational differences at this hypervariable site. It is notable that the physical proximity of the gene pairs correlates with the prevalence of converted domains, with the embryonic and extraocular genes seemingly out of range. Even among the adult type-II genes we could find no evidence for gene conversion in putative transcriptional regulatory regions, contrasting with recent findings at the human beta globin locus (Chiu et al. 1997Citation ). Because meiotic gene conversion is an unavoidable consequence of close tandem gene linkage, the organization of the skeletal MyHC locus suggests that fine tuning of the loop 1 and loop 2 sequences in the type-II myosins is somehow facilitated by this process. Alternatively, the genes may require close physical contiguity for proper transcriptional regulation (e.g., access to a locus control region), with large inserts between the genes resulting in aberrant expression.

Possible Roles for the Newly Identified MyHC Genes—Historical Perspective
On completion of this study, the National Center for Biotechnology Information Website http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsProgress.shtml&ORG=Hs listed the total quantity of finished (32.5%) and draft (61%) human genome sequence at 2,991,716 for an estimated 93.5% representation of the human genome. On the basis of the methods used in the sequence query, we place comparable confidence limits on our current assessment of the total number of class-II MyHC genes in the human genome. While this manuscript was under review, we became aware of the millennial myosin census of Berg, Powell, and Cheney (2001)Citation , which differs from this report by listing the chromosome 7 gene as a pseudogene. Although no accession numbers are given, we presume that this corresponds to MYH 16, whose sequence is derived from accession numbers AC004834 and AC005163. Relative to the ORF in the cat superfast MyHC (U51472) we find a deletion of two bases in exon 18 at bp 5163 in AC005163. The close similarity between the coding sequence for this gene and the cat cDNA, as cited in an earlier section for the Clustal W alignment, suggests that the isolated discrepancy represents a sequencing artifact. An example of this is provided by the single base mismatch in exon 17 between the MYH 15 genomic (AC069499) and EST (AL039898) files. Further studies will be required to resolve this issue.

Our preliminary studies with RT-PCR amplification of RNA from selected human muscles indicate that the newly identified genes are not expressed at levels comparable to the previously characterized sarcomeric MyHC genes (data not shown). This is not surprising in view of the extraordinarily high level expression of the latter. In the future, this problem can be addressed by expanding the range of human tissues studied and by developing probes specific for orthologous genes in other species. The junctional loop sequences for each of the three novel genes are unprecedented, providing a point of departure for the development of monospecific antibodies but at the same time restricting confidence in the prediction of the physiological attributes of the native isoforms. Our data on sequence comparisons addressing the remainder of the motor domain suggest that the most ancient of these MyHC isoforms is fast, whereas the other two are slow. In view of historical evidence for the existence of atypical sarcomeric MyHC isoforms, we speculate that the novel genes encode one superfast isoform (distinct from the extraocular isoform), as previously seen in the jaw closing muscles of cats and nonhuman primates (Rowlerson et al. 1983Citation ), one slow-tonic isoform, as previously seen in the intrafusal bag and chain fibers (Pedrosa-Domellof et al. 1993Citation ), and one developmentally regulated slow isoform distinct from the embryonic and adult slow-cardiac beta isoforms (Hughes et al. 1993Citation ). The proposed MIM (McKusick 1998Citation ) numbers MYH 14, 15, and 16 provide an alternate designation for these genes until their specific functions and patterns of expression can be more fully characterized.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Annotated sequence files for MYH 14, 15, and 16 are available through the MBE website under Supplementary Materials. These MacVector (Oxford Molecular Group) files correspond to our reconstructions of genomic and cDNA from draft nucleotide sequences, internally identified by GenBank accession number.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The authors thank Neal Rubinstein for helpful comments on the manuscript and David Standiford for facilitating the quantitation of sequence divergence. Two anonymous reviewers are thanked for further suggestions which contributed to the quality of this manuscript. This work was supported by grants to H.H.S. from the Association Francaise Contre les Myopathies, the Muscular Dystrophy Association, the NIH (NINDS and NIAMS), and the U.S. Veterans Administration. J.M.B. was supported by a fellowship from the Howard Hughes Medical Institute.


    Footnotes
 
William Jeffery, Reviewing Editor

Keywords: myosin human sarcomeric smooth muscle evolution intron gene Back

Address for correspondence and reprints: Hansell H. Stedman, Room 608, BRB 2/3, 421 Curie Blvd., Philadelphia, Pennsylvania 19104. hstedman{at}mail.med.upenn.edu . Back


    References
 TOP
 Abstract
 Introduction
 Experimental Procedures
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Alberts B., D. Bray, J. Lewis, M. Raff, K. Roberts, J. Watson, 1994 Molecular biology of the cell Taylor & Francis Group, London

    Arrizubieta M. J., E. Bandman, 1998 The role of interhelical ionic interactions in myosin rod assembly Biochem. Biophys. Res. Commun 244:588-593[ISI][Medline]

    Babu G. J., D. M. Warshaw, M. Periasamy, 2000 Smooth muscle myosin heavy chain isoforms and their role in muscle physiology Microsc. Res. Tech 50:532-540[ISI][Medline]

    Bandman E. R., R. Matsuda, R. C. Strohman, 1982 Developmental appearance of myosin heavy and light chain isoforms in vivo and in vitro in chicken skeletal muscle Dev. Biol 93:508-518[ISI][Medline]

    Barany M., 1967 ATPase activity of myosin correlated with speed of muscle shortening J. Gen. Physiol 50: (Suppl.) 197-218[Abstract/Free Full Text]

    Berg J. S., B. C. Powell, R. E. Cheney, 2001 A millennial myosin census Mol. Biol. Cell 12:780-794[Abstract/Free Full Text]

    Briggs M. M., F. Schachat, 2000 Early specialization of the superfast myosin in extraocular and laryngeal muscles J. Exp. Biol 203: (Part 16) 2485-2494[Abstract/Free Full Text]

    Carson J. A., R. A. Fillmore, R. J. Schwartz, W. E. Zimmer, 2000 The smooth muscle gamma-actin gene promoter is a molecular target for the mouse bagpipe homologue, mNkx3-1, and serum response factor J. Biol. Chem 275:39061-39072[Abstract/Free Full Text]

    Chantler P., 1983 Biochemical and structural aspects of molluscan muscle Pp. 77–154 in A. Saleuddin and K. Wilbur, eds. The Mollusca, Vol. 4. Academic Press, New York

    Chiu C. H., H. Schneider, J. L. Slightom, D. L. Gumucio, M. Goodman, 1997 Dynamics of regulatory evolution in primate beta-globin gene clusters: cis-mediated acquisition of simian gamma fetal expression patterns Gene 205:47-57[ISI][Medline]

    Cope M., J. Whisstock, I. Rayment, J. Kendrick-Jones, 1996 Conservation within the myosin motor domain: implications for structure and function Structure 4:969-987[ISI][Medline]

    Cripps R. M., J. Suggs, S. Bernstein, 1999a. Assembly of thick filaments and myofibrils occurs in the absence of the myosin head EMBO J 18:1793-1804[Abstract/Free Full Text]

    Cripps R. M., B. Zhao, E. N. Olson, 1999b. Transcription of the myogenic regulatory gene Mef2 in cardiac, somatic, and visceral muscle cell lineages is regulated by a Tinman-dependent core enhancer Dev. Biol 215:420-430[ISI][Medline]

    Davison I. G., G. M. Wright, M. E. DeMont, 1995 The structure and physical properties of invertebrate and primitive vertebrate arteries J. Exp. Biol 198:2185-2196[Abstract/Free Full Text]

    Deng Z., P. Liu, P. Marlton, D. F. Claxton, S. Lane, D. F. Callen, F. S. Collins, M. J. Siciliano, 1993 Smooth muscle myosin heavy chain locus (MYH11) maps to 16p13. 13-p13.12 and establishes a new region of conserved synteny between human 16p and mouse 16 Genomics 18:156-159[ISI][Medline]

    de Souza S. J., M. Long, R. J. Klein, S. Roy, S. Lin, W. Gilbert, 1998 Toward a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins Proc. Natl. Acad. Sci. USA 95:5094-5099[Abstract/Free Full Text]

    Dibb N. J., I. N. Maruyama, M. Krause, J. Karn, 1989 Sequence analysis of the complete Caenorhabditis elegans myosin heavy chain gene family J. Mol. Biol 205:603-613[ISI][Medline]

    DiNardi C., S. Ausoni, P. Moretti, L. Gorza, M. Velleca, M. Buckingham, S. Schiaffino, 1993 Type 2X myosin heavy chain is coded by a muscle fiber type-specific and developmentally regulated gene J. Cell. Biol 123:823-835[Abstract]

    Ellenberger T. E., C. J. Brandl, K. Struhl, S. C. Harrison, 1992 The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex Cell 71:1223-1237[ISI][Medline]

    Felsenstein J., 1993 PHYLIP (phyologeny inference package). Version 3.5c Distributed by the author. Department of Genetics, University of Washington, Seattle

    Fitch W. M., E. Margoliash, 1967 Construction of phylogenetic trees Science 155:279-284[ISI][Medline]

    Gilbert W., M. Marchionni, G. McKnight, 1986 On the antiquity of introns Cell 46:151-153[ISI][Medline]

    Goodson H., H. Warrick, J. Spudich, 1999 Specialized conservation of surface loops of myosin: evidence that loops are involved in determining functional characteristics J. Mol. Biol 287:173-185[ISI][Medline]

    Hasegawa Y., 2000 Isolation of a cDNA encoding the motor domain of nonmuscle myosin which is specifically expressed in the mantle pallial cell layer of Scallop (Patinopecten yessoensis) J. Biochem. (Tokyo) 128:983-988[Abstract]

    Hu J. C., E. K. O'Shea, P. S. Kim, R. T. Sauer, 1990 Sequence requirements for coiled-coils: analysis with lambda repressor-GCN4 leucine zipper fusions Science 250:1400-1403[ISI][Medline]

    Hughes S. M., M. Cho, I. Karsch-Mizrachi, M. Travis, L. Silberstein, L. A. Leinwand, H. M. Blau, 1993 Three slow myosin heavy chains sequentially expressed in developing mammalian skeletal muscle Dev. Biol 158:183-199[ISI][Medline]

    Kelley M. J., W. Jawien, T. L. Ortel, J. F. Korczak, 2000 Mutation of MYH9, encoding non-muscle myosin heavy chain A, in may-hegglin anomaly Nat. Genet 26:106-108[ISI][Medline]

    Letai A., E. Fuchs, 1995 The importance of intramolecular ion pairing in intermediate filaments Proc. Natl. Acad. Sci. USA 92:92-96[Abstract]

    Li W., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]

    Logsdon J. M., 1998 The recent origins of spliceosomal introns revisited Curr. Opin. Genet. Dev 8:637-648[ISI][Medline]

    Logsdon J. M., J. D. Palmer, 1994 Origin of introns—early or late? Nature 369:526 [see also discussion 527 and 528] [ISI][Medline]

    Lowey S., C. Cohen, 1962 Studies on the structure of myosin J. Mol. Biol 4:293-308[ISI]

    Mahdavi V., A. P. Chambers, B. Nadal-Ginard, 1984 Cardiac alpha and beta myosin heavy chain genes are organized in tandem Proc. Natl. Acad. Sci. USA 81:2626-2630[Abstract]

    Mahdavi V., E. E. Strehler, M. Periasamy, D. Wieczorek, S. Izumo, S. Grund, M.-A. Strehler, B. Nadal-Ginard, 1986 Sarcomeric myosin heavy chain gene family: organization and pattern of expression Pp. 345–361 in F. D. Emerson, B. Nadal-Ginard, and M. A. Siddique, eds. Molecular biology of muscle development. Alan R. Liss, New York

    McKusick V., 1998 Mendelian inheritance in man Catalogs of human genes and genetic disorders. Johns Hopkins University Press, Baltimore

    McLachlan A. D., J. Karn, 1982 Periodic charge distributions in the myosin rod amino acid sequence match cross-bridge spacings in muscle Nature 299:226-231[ISI][Medline]

    ———. 1983 Periodic features in the amino acid sequence of nematode myosin rod J. Mol. Biol 164:605-626[ISI][Medline]

    McLachlan A. D., M. Stewart, 1975 Tropomyosin coiled-coil interactions: evidence for an unstaggered structure J. Mol. Biol 98:293-304[ISI][Medline]

    Murphy C., J. Spudich, 1998 Dictyostelium myosin 25–50K loop substitutions specifically affect ADP release rates Biochemistry 37:6738-6744[ISI][Medline]

    Nyitray L., A. Jancso, Y. Ochiai, L. Graf, A. G. Szent-Gyorgyi, 1994 Scallop striated and smooth muscle myosin heavy-chain isoforms are produced by alternative RNA splicing from a single gene Proc. Natl. Acad. Sci. USA 91:12686-12690[Abstract/Free Full Text]

    Offer G., 1990 Skip residues correlate with bends in the myosin tail J. Mol. Biol 216:213-218[ISI][Medline]

    O'Shea E. K., J. D. Klemm, P. S. Kim, T. Alber, 1991 X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil Science 254:539-544[ISI][Medline]

    Papadakis M. N., G. P. Patrinos, 1999 Contribution of gene conversion in the evolution of the human beta-like globin gene family Hum. Genet 104:117-125[ISI][Medline]

    Pedrosa-Domellof F., B. Gohlsch, L. E. Thornell, D. Pette, 1993 Electrophoretically defined myosin heavy chain patterns of single human muscle spindles FEBS Lett 335:239-242[ISI][Medline]

    Perler R., A. Efstratiadis, P. Lomedico, W. Gilbert, R. Kolodner, J. Dodgson, 1980 The evolution of genes: the chicken preproinsulin gene Cell 20:555-566[ISI][Medline]

    Rowlerson A., F. Mascarello, A. Veggetti, E. Carpene, 1983 The fibre-type composition of the first branchial arch muscles in Carnivora and Primates J. Muscle Res. Cell Motil 4:443-472[ISI][Medline]

    Saez L. J., K. M. Gianola, E. M. McNally, R. Feghali, R. Eddy, T. B. Shows, L. A. Leinwand, 1987 Human cardiac myosin heavy chain genes and their linkage in the genome Nucleic Acids Res 15:5443-5459[Abstract]

    Saez C. G., J. C. Myers, T. B. Shows, L. A. Leinwand, 1990 Human nonmuscle myosin heavy chain mRNA: generation of diversity through alternative polyadenylylation Proc. Natl. Acad. Sci. USA 87:1164-1168[Abstract]

    Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Schiaffino S., C. Reggiani, 1996 Molecular diversity of myofibrillar proteins: gene regulation and functional significance Physiol. Rev 76:371-422[Abstract/Free Full Text]

    Sellers J., 1999 Myosins Oxford University Press, Oxford, U.K

    Seri M., R. Cusano, S. Gangarossa, et al. (25 co-authors) 2000 Mutations in MYH9 result in the may-hegglin anomaly, and fechtner and sebastian syndromes Nat. Genet 26:103-105.[ISI][Medline]

    Shrager J., P. Desjardins, J. Burkman, et al. (14 co-authors) 2000 Human skeletal myosin heavy chain genes are tightly linked in the Order Embryonic-IIa-IId/x-IIb-Perinatal-Extraocular J. Muscle Res. Cell Motil 21:345-355[ISI][Medline]

    Simons M., M. Wang, O. W. McBride, S. Kawamoto, K. Yamakawa, D. Gdula, R. S. Adelstein, L. Weir, 1991 Human nonmuscle myosin heavy chains are encoded by two genes located on different chromosomes Circ. Res 69:530-539[Abstract]

    Sneath P., R. Sokal, 1973 Numerical taxonomy; the principles and practice of numerical classification W. H. Freeman, San Francisco

    Stedman H. H., M. Eller, E. H. Jullian, S. H. Fertels, S. Sarkar, J. E. Sylvester, A. M. Kelly, N. A. Rubinstein, 1990 The human embryonic myosin heavy chain: complete primary structure reveals evolutionary relationships with other developmental isoforms J. Biol. Chem 265:3568-3576[Abstract/Free Full Text]

    Strehler E. E., M. A. Strehler-Page, J. C. Perriard, M. Periasamy, B. J. Nadal-Ginard, 1986 Complete nucleotide and encoded amino acid sequence of a mammalian myosin heavy chain gene: evidence against intron-dependent evolution of the rod J. Mol. Biol 190:291-317[ISI][Medline]

    Sweeney H., S. Rosenfeld, F. Brown, L. Faust, J. Smith, J. Xing, L. Stein, J. Sellers, 1998 Kinetic tuning of myosin via a flexible loop adjacent to the nucleotide binding pocket J. Biol. Chem 273:6262-6270[Abstract/Free Full Text]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Uyeda T. Q. P., K. M. Ruppel, J. A. Spudich, 1994 Enzymatic activities correlate with chimaeric substitutions at the actin-binding face of myosin Nature 368:567-569[ISI][Medline]

    Venkatesh B., Y. Ning, S. Brenner, 1999 Late changes in spliceosomal introns define clades in vertebrate evolution Proc. Natl. Acad. Sci. USA 96:10267-10271[Abstract/Free Full Text]

    Weiss A., D. McDonough, B. Wertman, L. Acakpo-Satchivil, K. Montgomery, R. Kucherlapati, L. Leinwand, K. Krauter, 1999 Organization of human and mouse skeletal myosin heavy chain gene clusters is highly conserved Proc. Natl. Acad. Sci. USA 96:2958-2963[Abstract/Free Full Text]

    Wieczorek D., M. Periasamy, G. Butler-Browne, R. Whalen, B. Nadal-Ginard, 1985 Co-expression of multiple myosin heavy chain genes, in addition to a tissue-specific one, in extraocular muscle J. Cell Biol 101:618-629[Abstract]

    Yamada A., M. Yoshio, K. Oiwa, L. Nyitray, 2000 Catchin, a novel protein in molluscan catch muscles, is produced by alternative splicing from the myosin heavy chain gene J. Mol. Biol 295:169-178[ISI][Medline]

    Zhang J. C., S. Kim, B. P. Helmke, W. W. Yu, K. L. Du, M. M. Lu, M. Strobeck, Q. C. Yu, M. S. Parmacek, 2001 Analysis of SM22alpha-deficient mice reveals unanticipated insights into smooth muscle cell differentiation and function Mol. Cell. Biol 21:1336-1344[Abstract/Free Full Text]

Accepted for publication October 8, 2001.