A novel PCR-based technique using expressed sequence tags and gene homology for murine genetic mapping: localization of the complement genes
Peter R. Lawson and
Kenneth B. M. Reid
MRC Immunochemistry Unit, Department of Biochemistry, South Parks Road, Oxford University, Oxford OX1 3QU, UK
Correspondence to:
K. B. M. Reid
 |
Abstract
|
---|
The complement system is a cascade of serum proteins and receptors which forms a vital arm of innate immunity and enhances the adaptive immune response. This work establishes the chromosomal localization of four key genes of the murine complement system. Mapping was performed using a novel and rapid PCR restriction length polymorphism method which was developed to exploit the murine expressed sequence tag (EST) database. This technique circumvents the laborious cDNA or genomic cloning steps of other mapping methods by relying on EST data and the prediction of exonintron boundaries. This method can be easily applied to the genes of other systems, ranging from the interests of the individual researcher to large-scale gene localization projects. Here the complement system, probably one of the most well-characterized areas of immunology, was used as a model system. It was shown that the C3a receptor C1r and C1s genes form an unexpected complement gene cluster towards the telomeric end of chromosome 6. The second mannose binding lectin-associated serine protease gene was mapped to the telomeric end of chromosome 4, which is distinct from other complement-activating serine proteases. These results provide new insights into the evolution of this group of proteins.
Keywords: complement genes, gene localization, genetic mapping, PCR
 |
Introduction
|
---|
The complement system acts as a vital part of the innate immune system, generating several mechanisms of immune protection (1) and complementing the adaptive clonal immune system (2). It also influences how the adaptive immune response is established (3). This highly reactive and potentially destructive arm of immunity contains regulatory components which normally prevent overactivation of the system. Activation proceeds through three different pathways. Firstly, a constitutive background activation is fostered by the alternative pathway, amplifying the gentle `tick-over' of complement activation. In the presence of activating surfaces the complement cascade proceeds because activating complexes escape the control of the regulatory proteins that suppress activation. The remaining two pathways, i.e. the classical and MBLectin pathways, both utilize the same underlying machinery of the complement cascade, differing only in the recognition molecules used and the targets which lead to activation. The key recognition protein of the classical pathway is C1q, which recognizes antigen-aggregated antibody. Complement activation then proceeds through the enzymatic action of the C1q-associated serine proteases, C1r and C1s (4). The MBLectin pathway (5) allows the recognition of carbohydrate structures on potential pathogens via mannan-binding lectin (MBL) and constitutes what may be the most ancient pathway of complement activation, predating adaptive immunity (6). Activation through the MBLectin pathway is brought about via the MBL-associated serine proteases, MASP1 and MASP2, in a similar fashion to that seen in the C1 complex (7). Together all four of the associated serine proteases form a unique branch of the serine protease superfamily (6,8,9).
Two anaphylatoxin receptors, the C3a and C5a receptors (C3aR and C5aR), are present in the complement system and respond to fragments of the complement cascade generated during activation. This family of G protein-coupled, seven-transmembrane spanning proteins all act as chemoattractant receptors. Triggering of the two complement receptors is essential for eliciting various inflammatory reactions mediated by complement, such as the induction of smooth muscle contraction, increasing vascular permeability, directing the migration of phagocytic cells and inducing histamine release. These in turn enhance the adaptive immune response, as emphasized by the findings from C5aR knockout mice (10,11). The gene for C5aR is located in a cluster with other chemoattractant receptors on human chromosome 19q13.3q13.4 (12). However, the gene for C3aR encodes for a receptor with an uniquely large second extracellular loop (13) that has a distinct localization, outside the chemoattractant gene cluster, on human chromosome 12p12p13.3 (14,15).
From the complement gene localizations carried out to date, in humans and mice, there are three major gene clusters that have been described which account for many of the 40 genes which encode for the complement system proteins. First, the regulator of complement activation (RCA) gene cluster forms a linkage group for many of the complement proteins which control the activation and degradation of C3 and C4 (16). Second, some of the key early components of complement activation form the MHC class III complement cluster, which includes the serine proteases of the classical and alternative pathway, C2 and factor B, together with the two isotypes of C4, C4A and C4B (17). The third cluster constitutes some of the complement components of the membrane attack complex; C6, C7 and C9 (18). The three other complement gene arrays represent duplications, C1s and C1r (19); two of the genes which encode C8, C8A and C8B (20); and the three genes which encode for the A, B and C polypeptide chains of C1q (21).
This paper describes for the first time a rapid technique for genetically mapping mouse genes without the laborious process of cDNA or genomic cloning. This method allows the chromosomal assignment of genes without requiring large amounts of DNA for use in hybridization nor the need for fluorescent in situ hybridization (FISH) expertise and equipment. By using expressed sequence tags (EST) as tools for rapid gene localization, it allows the mapping of genes, which are known to be expressed, onto the European Collaborative Interspecific Backcross (EUCIB) genetic map. This enhances the information content of genetic maps, which are mainly composed of anonymous markers of nucleotide repeats that generally have no known gene association. This technique has been used for the chromosomal mapping of four complement genes and has shed light upon the molecular evolution of the MASPs/C1s/C1r family from the chromosomal distribution of these genes in the mouse, and has also uncovered an unexpected complement gene cluster.
 |
Methods
|
---|
Identification of the murine complement homologues
Murine ESTs for C1r, C1s and Masp2 were found in the murine EST database [dbEST at the National Centre for Biotechnology Information (NCBI), Bethesda, MD] by searching, in these cases with the human homologues of these cDNAs, using BLAST 2.0 (NCBI). The ESTs were aligned against the human homologues using Seqman (DNAstar, London, UK), then adjusted by eye. All alignments of the partial cDNA sequences derived from murine ESTs showed between 74.6 and 80.2% identity at the nucleotide level with the human counterparts. In the case of MASP1, primers were designed from the known murine cDNA sequence, accession no. D16492 (22). For each gene, the intronexon boundaries were predicted from either the human gene structure of MASP1 (23) and C1s (24) or in the case of Masp2 and C1r by predicting intronexon boundaries from the known protein module architecture. Alternatively, gene structures from other species could also be used for this prediction. Primers were designed to generate PCR products which spanned two predicted exons and the intervening intron. Introns that were between 1 and 2 kb in size were chosen for amplification. EST from dbEST that covered a suitable exonintron boundary chosen for amplification of the genes for Masp2, C1r and C1s were AA895155, AA427242 and AA798057 respectively (see Fig. 1
). However, for C3aR, the murine gene sequence was known (14,25,26). The only intron in the gene is >4 kb in length, so primers were designed to amplify a 663 bp region from the 5' end of exon 2 of the murine C3aR gene (accession nos U77461, U97537 and AF053757), running from base number 31 to 693 from U77461.

View larger version (28K):
[in this window]
[in a new window]
|
Fig. 1. The domain structures of Masp1, Masp2, C1r and C1s superimposed onto the mRNA (shaded boxes), including the position of primers (arrows) used for chromosomal localization, position of the introns identified by PCR (arrow head) and the extent of the murine EST contigs (horizontal bars). The scale is shown at the bottom right. A key for the figure is shown below.
|
|
PCR for the murine complement gene
Each PCR was optimized for the MgCl2 concentration (12.0 mM) and the annealing temperature (5565°C) in order to generate a single distinct product (see Table 1
). The PCR was performed on genomic DNA samples from the two different inbred mouse species (C57BL/6 and Mus spretus). PCR was carried out for 30 cycles of amplification, 94°C denaturation for 30 s, 30 s at the optimized annealing temperature for each set of primers, followed by a 72°C extension of 1 or 2 min (shown in Table 1
). The program finished with a final 5 min elongation reaction at 72°C. Genomic DNA (25 ng) was used in a 20 µl PCR mixture containing 16 mM (NH4)2SO4, 67 mM TrisHCl, pH 8.8, 0.01% (v/v) Tween 20, 1 U Taq DNA polymerase, 200 µM each dNTP and 0.25 µM each primer supplemented with either 1.25 or 1.5 mM MgCl2 (see Table 1
). Part of the reaction product (10 µl) was run on a 1% (w/v) agarose gel and visualized by ethidium bromide staining.
Subcloning and sequence confirmation of the PCR products used for localization
The identity of the PCR products used in genetic mapping was confirmed by sequencing. The PCR product was first cloned into a 3'-deoxythymidine residue overhang plasmid (pCRII-TOPO; Invitrogen, Groningen, The Netherlands) following the manufacturer's instructions. For each gene of interest, one plasmid bearing the amplified section of the gene was selected for sequence analysis. Fluorescent dyeprimer cycle sequencing with AmpliTaq FS (ABI Prism; Perkin-Elmer, Warrington, UK) was used to confirm the identity of the PCR product.
Screening for restriction polymorphisms
For genetic mapping of the gene of interest using backcross mice, the different parental alleles were monitored by the restriction fragmentation length polymorphisms (RFLP) contained within the PCR product of the gene of interest. The RFLP between the two mouse species were discovered by screening with 30 different restriction enzymes (New England Biolabs, Hitchin, UK) that each had 5 or 6 bp recognition sites. For speed and minimal experimental manipulations, 10µl of the PCR product generated from the different mice was used directly from the PCR without purification in a 20 µl restriction digestion using 10 U of each enzyme. The presence of restriction polymorphisms was then judged by running the digestion products on a 1% (w/v) agarose gel and comparing the two different alleles. An ApaLI polymorphism was found in the C3aR gene, Masp2 has a XhoI polymorphism, and both C1s and Masp1 have a Bsu36I polymorphism, while an AvaII polymorphism was found in the gene for C1r. The same restriction fragmentation pattern was seen with both the purified PCR product and the unpurified PCR product. The polymorphisms identified in the PCR products of these complement genes were then used for genetic mapping of those loci.
Chromosomal localization
A random selection of genomic DNA from 50 backcross mice of the EUCIB panel (Human Genome Mapping Project Resource Centre, Cambridge, UK) (27) was used for mapping. All the animals used were from a C57BL/6xM. spretus F1 generation backcrossed with either parental species. For fine-scale mapping, panels of informative recombinant mice were analysed for each gene. The inheritance of each allele was assessed by the PCR-based restriction polymorphism described above. The results were used to determine both the chromosomal location and the fine-scale mapping position of the gene, with respect to other known EUCIB markers on the mouse genome.
PCR cloning of the MASP2 serine protease domain
Murine ESTs from dbEST were aligned across the part of the cDNA sequence encoding the serine protease domain of human MASP2. All ESTs identified by database screening were aligned using the human sequence as reference. The translated murine protein sequence of the Masp2 serine protease domain shared an identity of 79.9 and 76.1% with that of human MASP2 protein and cDNA respectively. Primers, MM2-1f (AAC AGC CGC TCA TGC TGT ATA TGA G) and MM2-2r (CCC CAC TGT CAC CTC TGC AGC TGT C), were designed from the 5' and 3' cDNA extent of the murine Masp2 serine protease domain, based on the sequences of the murine ESTs (AA244853 and AA530523 respectively). PCR was carried out on ~25 ng of genomic DNA from a M. Spretus mouse, using the PCR conditions described above, except the primers MM2-1f and MM2-2r were used. This reaction was optimized to 1 mM MgCl2, an annealing temperature of 59°C and an extension time of 90 s. The single, discrete PCR product of ~ 500 bp was cloned into PCRII-TOPO. Two of the plasmid clones were sequenced with both vector and internal primers by fluorescent cycle sequencing, and it was confirmed that they containing the entire cDNA for the murine Masp2 serine protease domain.
 |
Results
|
---|
Development of RFLP
Primers were designed to generate a PCR product from each gene (Masp1, Masp2, C1r and C1s) which spanned two exons and a intervening intron. The position of the introns were predicted, in these cases, from the human gene structure or from conserved intronexon boundaries within protein modules. Alternatively, gene structures from other species could be used. Introns were chosen which were 12 kb in length to facilitate amplification of a single PCR product with sufficient yield for visualization. In the case of C3aR where the gene and cDNA sequences were known, an ~650 bp segment of the 5' region of the gene including 200 bp of exonic sequence was amplified since the only intron was >4 kb in length. Intronic DNA was chosen for amplification, since it is more divergent than exonic DNA between different species and strains, hence it was assumed to be a potent source of RFLP between the two species of mice used in backcross analysis (C57BL/6 and M. spretus). Amplification products were designed to include at least 200 bp of exonic DNA. This was used to confirm, by sequencing analysis, that the correct gene was being amplified from the murine genome. The PCR products for each gene gave a discrete single band, and these were subcloned and sequenced. The DNA sequence was in agreement with either the known gene sequence (C3aR) or of the murine EST, confirming that the PCR was specific for the gene of interest (results not shown). The sequences which were interrupted by introns, had boundaries which conformed to the GTAG rule.
The use of 30 different restriction enzymes with either 5 or 6 bp recognition sites generally produced one or two RFLP, between the two alleles. An RFLP was chosen if it could be clearly distinguished on a 1% (w/v) agarose gel. Details of the enzymes used for each gene and the allele sizes for each species are detailed in Table 1
, and examples of the restriction pattern observed for parental and heterozygous animals are shown at the top of Fig. 2
.

View larger version (23K):
[in this window]
[in a new window]
|
Fig. 2. Chromosomal localization of four complement genes. The genetic mapping of the genes C3aR, C1r, C1s (A) and Masp2 (B). (Top) An example of a 1% (w/v) agarose gel electrophoresis of the digested PCR products from the C3aR gene (A) and the Masp2 gene (B) are shown for the wild-type mice (C57BL/6 and M. Spretus) and a heterozygous mouse. The molecular sizes (bp) of the alleles are shown on the right. (Middle) Haplotyping of informative recombinant mice for the genes and flanking markers are shown. EUCIB mice numbers are listed above the columns of haplotype boxes, mice heterozygous for the parental alleles are shown as filled boxes, open boxes represent a homozygous haplotype. (Bottom) A graphical representation of the chromosomal location of the genes, shown in cM to the left; an expansion of the region is shown to the right, including the neighbouring markers.
|
|
Chromosomal localization of C1r, C1s and C3aR
Using a random panel of 50 backcross mice, C1r and C1s were shown to map to the telomeric end of chromosome 6 with LOD scores of 6 and 9.7 respectively. Fine-scale mapping of mice with informative recombinants in the region D6Mit24D6Mit14 revealed that C1s and C1r co-segregated between the markers D6Mit24 and D6Mit337, at a position ~65 cM from the centromere. This region of murine chromosome 6 containing the genes for C1r and C1s is syntenic with human chromosomal region 12p12p13.3 which contains the human genes for C1r and C1s (28). The human genes have been shown to be in close proximity, arranged in a tail-to-tail fashion (19).
The murine gene for the complement C3a anaphylatoxin receptor, C3aR, has recently been localized to region F1 on chromosome 6 (14), a region which is syntenic to an area on the short arm of human chromosome 12. This region, 12p13.2, contains the human genes for C1r and C1s (28). To examine whether these genes formed an unexpected cluster at the genetic level or merely localized to the same chromosomal band, murine C3aR was also mapped onto the EUCIB genetic map. Initial mapping showed that C3aR localized to a region of chromosome 6, with a LOD score of 7.12, which confirmed in situ hybridization data (14). Genetic mapping data also suggested a localization for C3aR which was close to C1r and C1s. Fine-scale mapping showed that C3aR co- segregates with C1s and C1r, to a segment of DNA that spans ~0.17 cM, between the markers D6Mit24 and D6Mit337 towards the telomeric end of chromosome 6, at ~65 cM from the centromere.
The intronexon boundary of the serine protease domain of MASP2
Alignment of murine EST with the human sequence for MASP2 revealed several EST which spanned the serine protease domain. The resulting coding sequence for the serine protease domain of murine Masp2 was initially used for developing a PCR-RFLP by predicting intronexon boundaries from the human MASP1 gene (23). PCR across this region revealed a surprising feature of the murine Masp2 gene. PCR with two independent primer pairs across two predicted Masp2 introns yielded products from genomic DNA that were of the same size as the EST assembled murine cDNA sequence (results not shown). This suggested that the serine protease domain of murine Masp2 is encoded by a single uninterrupted exon, similar to the gene structures of C1s and C1r (24), which is quite distinct from the six exon arrangement of the serine protease domain of human MASP1 (23). To further understand this finding, the entire serine protease encoding region of Masp2 was cloned by PCR from murine genomic DNA. PCR primers were designed at the 5' and 3' end of the serine protease domain from murine EST (AA244853 and AA530523) and produced a PCR product of the expected size ~625 bp, as predicted from the cDNA sequenced generated by alignment of Masp2 EST. This product was subcloned into a 3' deoxythymidine residue overhang vector (pCRII-TOPO). All of the 12 clones tested were of the correct size, as judged by agarose gel electrophoresis. Sequencing of one clone showed identity to the DNA sequence of Masp2 generated from EST (results not shown), with a 79.9% identity to the protease domain of human MASP2 at the protein level. This confirmed that the PCR product was the murine homologue of human MASP2 and that the genomic region encoding the serine protease domain of murine gene for MASP2 did not contain introns. An intron-less encoded serine protease domain was also observed by PCR across the human MASP2 gene (results not shown).
Chromosomal localization of Masp2
A PCR-RFLP was developed which extended across the two exons that encode the first complement control protein domain in murine Masp2. The murine Masp2 gene was mapped to chromosome 4 with a LOD score of 6.28. Fine-scale mapping with mice recombinant between D4Mit179 and D4Nds16, showed that Masp2 maps to the 0.11 cM region between D4Mit48 and D4Mit160, at the telomeric end of chromosome 4, 75 cM from the centromere. The chromosomal localization of Masp2 is distinct from that observed for Masp1, where Masp1 has been shown by FISH to map to the B2B3 region of mouse chromosome 16 (29). This was confirmed by localizing the gene onto the EUCIB genetic map using a PCR-RFLP that was developed to Masp1. Using a random selection of only 50 mice, Masp1 was clearly shown to map to the telomeric tip of chromosome 16 (results not shown). This explicitly shows that, unlike the C1q-associated serine proteases, C1r and C1s, the MBL-associated serine proteases do not form a linkage group, and in fact map to different chromosomes in mice and probably in humans.
 |
Discussion
|
---|
This paper describes the first use of a novel and rapid method of localizing murine genes onto a genetic map by utilizing data from the murine EST project. This method has been successfully used to rapidly assign high-resolution chromosomal localizations for four complement genes. This method does not rely upon technical expertise or reagents that are not commonly found in most molecular biology laboratories. More importantly it does not require cloning of the genes or cDNAs. Instead, this technique relies upon the availability of murine EST generated from this large-scale sequencing project. One potential drawback of this method is finding relevant EST information for the gene of interest; however, the large-scale murine EST sequencing effort has already provided information amassed over 650,000 sequences (dbEST release 070999) and over the next few years it will provide sequence information on the vast majority expressed genes (30). The second difficulty in using this method is the prediction of introns, but with the constant release of improving genomeEST alignment software (31), and the expanding genetic knowledge of protein modules and gene structures from humans and other species, this decision process is relatively simple. A third limitation is uncovering polymorphisms, which is only a potential drawback with single exon genes, or in genes with small introns, where less frequent untranslated and coding region polymorphisms are relied upon. Care must also be taken in the identification of genes across species due to the possibility of gene duplication and the existence of pseudogenes.
On searching for protein systems in which the murine genes have not been identified or localized, but where murine EST exist, the complement system was identified as an ideal candidate for testing this new method upon (see Table 2
). Herein, the chromosomal localization of four complement genes has been assigned by this rapid localization method and an unexpected clustering of these genes has been revealed. The murine genes for C1r and C1s map well to the telomeric end of chromosome 6, at ~65 cM, with LOD scores of 6.0 and 9.7 respectively. This region of the mouse genome is syntenic with human chromosome 12p13 which contains the two linked human genes for C1r and C1s (29). Recent mapping of the murine C3aR gene to chromosome 6 band F1 (14), a broad region that encompasses C1r and C1s, provoked the idea that C3aR may map to this cluster. This idea was tested quickly by developing a PCR-RFLP to C3aR. Mapping showed that C3aR localized to the same region as C1r and C1s on chromosome 6, and fine-scale mapping revealed that the three genes were linked and that C3aR co-segregates with C1r and C1s in an ~0.17 cM segment of the murine genome. This is an unexpected complement gene cluster because within the other complement gene clusters all the genes share high sequence homology with each other (Table 2
: the RCA cluster, C2 and factor B of the MHC class III cluster, the MAC cluster, the C1q chain genes and the C8 genes). C3aR is very distinct from C1r and C1s at the protein sequence level and even at the gene level, C3aR has a single structural exon while C1r and C1s have dispersed structural genes covering several exons. Comparison of the recent FISH mapping for the human C3aR gene to 12p13.212p13.3 (14, 15) and the radiation humanhamster hybrid mapping (28) of C1s and C1r implies that a similar clustering of genes exists in humans.
C3aR is the only seven-transmembrane receptor gene so far found on this region of chromosome 6. The other complement anaphylatoxin receptor gene, C5aR, in humans maps to the cluster of seven-transmembrane spanning receptors of the chemotactic family (32,33) on chromosome 19p13.313.4 (12). From mapping this cluster of chemotaxic receptors in mice (34) and by extrapolating from syntenic chromosomal regions, murine C5aR would be expected to map to chromosome 17, at ~10 cM from the centromere. So, despite the homology of these complement anaphylatoxin receptors at both the functional and protein family level, these two complement anaphylatoxin genes do not form a gene cluster.
The C1r and C1s genes in humans (19) and mice (shown here) are closely linked. By analogy the two MASP genes could be expected to be linked. Alternatively, Masp2 could localize with C1r and C1s, as expected from their similarity at the genomic level: the intron-less state of their serine protease domains, shown here for Masp2 (Fig. 3A
) and also from the conserved functional elements in the serine protease regions, see below (Fig. 3B
). Genetic mapping of Masp2 by developing a PCR-RFLP (Fig. 2B
, top) revealed that Masp2 maps to chromosome 4 with a LOD score of 6.28. Fine-scale mapping of informative recombinants (Fig. 2B
, middle and bottom) indicates that the murine gene lies in an ~0.11 cM region of DNA towards the telomeric end of chromosome 4, between D4Mit48 and D4Mit160. Masp2 localizes to a different position to its partner MBL-associated serine protease, Masp1, which localizes to the telomeric region of murine chromosome 16, region B2B3, by FISH (35) and by genetic mapping described herein. The non-association of the two MBL-associated serine proteases genes, Masp1 and Masp2, is in marked contrast to the position and clustering of the two C1q-associated serine proteases genes, C1r and C1s.

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 3. The genomic, cDNA and protein organization of the serine protease domains of Masp1, Masp2, C1r and C1s. (A) The genomic structure of Masp1 human gene (23) is shown above the human genes for C1r and C1s (24) and murine gene for Masp2 (data from this paper). (B) The cDNA structure of the serine protease domain (boxes) of murine Masp1, above those for Masp2, C1r and C1s. The sequence was obtained from cDNA sequence for Masp1 (22), and EST data for Masp2 (AA244853, AA764061, AA530523 and AA1266911), C1s (AA895834 and AA763800) and C1r (AA427242, AA175226, AA733400 and AA871944). Lines crossing the boxes indicate the intronexon boundaries predicted from the analogous human genes (C1r, C1s and Masp1) and data from this paper (Masp2). The three amino acids essential for serine protease specificity are indicated (His, Asp and Ser), along with the codon usage for the active site serine residue and the cysteine bond formation. The asparagine residue 6 amino acids upstream from the serine residue, shown circled, which is important for trypsin-like substrate specificity. The residue 3 amino acids C-terminal to the active site serine residue differentiates C1s from C1r and Masp1 from Masp2. Amino acid numbering is based on the murine Masp1 cDNA and Masp2 EST-derived sequences.
|
|
The chromosomal positions of the MASP genes discount the idea that C1r and C1s arose through a simple duplication of the more ancient MASP genes. This is highlighted at the functional level where human MASP1 and MASP2 do not appear to be functional analogues of C1r and C1s, despite the elegant symmetry seen between the MBLectin and Classical pathways (for review, see 1,7). After auto-activation of C1r, via C1q binding to clustered antibodies, C1r activates the serine protease activity of C1s. C1s proceeds to cleave the first two components of the classical pathway, C4 followed by C2. The resulting complex, composed of activated C4 and C2, is the classical pathway C3 convertase. In contrast, despite the MBLectin pathway impinging on the same key components and forming an identical classical pathway C3 convertase, the functions of the MASPs cannot be superimposed onto those C1r and C1s. The activation process involving the MASPs appears far more complex. MASP2 has been reported to have a similar substrate specificity to that of C1s, cleaving both C4 (7) and C2 (36), but may also possess an auto-activation capacity (37), whereas MASP1 has been shown to cleave C2 (36) and perhaps C3, thus circumventing the action of the classical pathway C3 convertase (8,38,39). The cleavage of C3 by MASP1 remains controversial, as it cannot be confirmed by others (40).
It is clear from phylogenetic analysis that MASP1 and MASP2/C1r/C1s form divergent arms of the serine protease family of the complement system (6,8,9). This is echoed at many different levels (see Fig. 3
). The functional elements of the serine protease domain, which support the evolutionary theory of Endo et al. (9), are important for clarifying that the EST identified are the murine counterparts of the human sequences. MASP1 is a TCN serine protease, since the active site serine is encoded by a TCN codon, which is TCT in mice, while MASP2/C1r/C1s are AGY types, with the active site serines being encoded respectively by AGT, AGT and AGC in mice. The histidine loop, typical of MASP1, is present in the murine Masp1, and absent in the murine Masp2, C1r and C1s. The residue, 3 amino acids C-terminal to the active site serine, is a proline in murine MASP1, which is generally conserved in chymotrypsinogen-like proteases but replaced with a valine or an alanine in MASP2/C1r/C1s (see Fig. 3B
). The idea of the two lineages of MASP/C1r/C1s (9) is reflected at the genetic level.
The intronexon organization of Masp1 displays a type of split exon arrangement for the serine protease domain (23), which is not seen in other serine protease genes (41,42). The serine protease domains of Masp2, C1r and C1s are, however, encoded by single exons, as seen in the non-functional serine protease, haptoglobin (43). The exon arrangement of the serine protease domains, taken together with the chromosomal localizations, allows speculation as to the evolution of these proteins (Fig. 4
). The parsimonious answer, based on this evidence, would suggest that the ancestral MASP/C1s/C1r gene was composed of a split exon serine protease domain, with an active site serine encoded by a TCN codon. This is confirmed by the tunicate genes which have been shown to have a split exon-encoded serine protease domain of the TCN type (6,8,9). The model proposed here can explain how this ancestral gene with a split exon-encoded serine protease domain could lead to the current gene and exon arrangements of four activating serine protease genes. Duplication of the ancestral serine protease domain gene occurs after the divergence of cyclostomes from urochordata (tunicates), probably by a RNA-mediated event as suggested by Endo et al. (9). This generates a second gene with a single exon-encoded serine protease domain with a mutation in the active site serine codon from TCN to AGY, which is speculated to be through a retrotransposition of a partially processed mRNA (9). This type of mutation in the serine protease domain from a TCN to AGY type has occurred at least twice in evolutionary history (44) and two different mechanisms have been proposed (44,45). The idea of RNA-mediated gene duplication or intron loss occurring during gene family evolution is well documented (41,46). Therefore, this would leave an ancestral MASP2/C1r/C1s-like gene and an ancestral MASP1 gene, residing on two different chromosomes (Fig. 4
). Incidentally, the two tunicate MASP genes are TCN-type serine proteases, which probably arose from an independent gene duplication event occurring within their own lineage. In addition, the apparent absence of protein for the TCN-type serine protease (Masp1) in carp, sharks and lamprey (8,9) is believed to be due to a gene silencing effect at the retrotransposition step (9) (see Fig. 4
, asterix). The second step of the model is a duplication of the ancestral MASP2/C1r/C1s gene, which would generate two single exon-encoded serine protease genes, MASP2 and the ancestral C1r/C1s, residing on two different genomic regions. This event probably occurred prior to the divergence of cartilaginous fish, as demonstrated by the presence of a C1-like complex in the nurse shark (47). The final evolutionary event occurring prior to the divergence of amphibians, involving the duplication of the C1r/C1s-like gene, would lead to the generation of the tandomly arranged, closely linked, C1-associated serine proteases genes, C1r and C1s. A similar model of gene family evolution and chromosomal dispersal, through RNA-mediated transposition and gene duplications, has been proposed for other systems, such as the neuropeptide Y receptors (48). This model, built from chromosomal localization data gene structures and phylogenetic studies of others, helps explain the different chromosomal and exon arrangements of the two lineages of MBL/C1q-associated serine proteases.

View larger version (16K):
[in this window]
[in a new window]
|
Fig. 4. A graphical representation of the molecular theory of evolution for Masp1, Masp2, C1r and C1s. The exonintron arrangement of only the serine protease encoding region (boxes and lines) of the genes is shown expanded below the chromosome (lollipops). The asterisk represents a putative gene silencing mutation in the TCN-type Masp1, proposed in an evolutionary model by Endo et al. (9), this is believed to be responsible for the lack of a gene product for a TCN-type Masp1 in sharks, carp and lamprey.
|
|
 |
Data deposition
|
---|
The complete linkage data is available from the EUCIB home page (http://www.hgmp.mrc.ac.uk/MBx/MBxHomepage.html).
 |
Acknowledgments
|
---|
We thank Dr Alister Dodds for critical discussion of this manuscript, and Dr Yvonne Boyd, MRC Mammalian Genetics Unit, Harwell, UK, for both helpful discussion and clear advice on the mouse backcross analysis. The authors acknowledge the rapid and free EUCIB service provided by the MRC-funded HGMP resource centre, Hinxton, UK.
 |
Abbreviations
|
---|
C1 first component of complement a complex of C1q, C1r2 and C1s2 |
C1q q subcomponent of complement component 1 |
C1r r subcomponent of complement component 1 |
C1s s subcomponent of complement component 1 |
C3aR C3a receptor |
C5aR C5a receptor |
EST expressed sequence tag |
EUCIB European Collaborative Interspecific Backcross |
FISH fluorescent in situ hybridization |
MASP MBL-associated serine proteases |
MBL mannan-binding lectin |
RCA regulator of complement activation |
RFLP restriction fragmentation length polymorphism |
 |
Notes
|
---|
Transmitting editor: D. Fearon
Received 7 October 1999,
accepted 25 October 1999.
 |
References
|
---|
-
Reid, K. B. M. and Law, S. K. A. 1995. Complement. IRL Press, Oxford.
-
Carroll, M. C. 1998. The role of complement and complement receptors in induction and regulation of immunity. Annu. Rev. Immunol. 16:545.[ISI][Medline]
-
Fearon, D. T. and Locksley, R. M. 1996. The instructive role of innate immunity in the acquired immune response. Science 272:50.[Abstract]
-
Schumaker, V. N., Zavodszky, P. and Poon, P. H. 1987. Activation of the first component of complement. Annu. Rev. Immunol. 5:21.[ISI][Medline]
-
Matsushita, M. 1996. The lectin pathway of the complement system. Microbiol. Immunol. 40:887.[ISI][Medline]
-
Ji, X., Azumi, K., Sasaki, M. and Nonaka, M. 1997. Ancient origin of the complement lectin pathway revealed by molecular cloning of mannan binding protein-associated serine protease from a urochordate, the Japanese ascidian, Halocynthia roretzi. Proc. Natl Acad. Sci. USA 94:6340.[Abstract/Free Full Text]
-
Thiel, S., Vorup Jensen, T., Stover, C. M., Schwaeble, W., Laursen, S. B., Poulsen, K., Willis, A. C., Eggleton, P., Hansen, S., Holmskov, U., Reid, K. B. M. and Jensenius, J. C. 1997. A second serine protease associated with mannan-binding lectin that activates complement. Nature 386:506.[ISI][Medline]
-
Matsushita, M., Endo, Y., Nonaka, M. and Fujita, T. 1998. Complement-related serine proteases in tunicates and vertebrates. Curr. Opin. Immunol. 10:29.[ISI][Medline]
-
Endo, Y., Takahashi, M., Nakao, M., Saiga, H., Sekine, H., Matsushita, M., Nonaka, M. and Fujita, T. 1998. Two lineages of mannose-binding lectin-associated serine protease (MASP) in vertebrates. J. Immunol. 161:4924.[Abstract/Free Full Text]
-
Hopken, U. E., Lu, B., Gerard, N. P. and Gerard, C. 1996. The C5a chemoattractant receptor mediates mucosal defence to infection. Nature 383:86.[ISI][Medline]
-
Hopken, U. E., Lu, B., Gerard, N. P. and Gerard, C. 1997. Impaired inflammatory responses in the reverse arthus reaction through genetic deletion of the C5a receptor. J. Exp. Med. 186:749.[Abstract/Free Full Text]
-
Gerard, N. P., Bao, L., Xiao Ping, H., Eddy, R. L., Jr, Shows, T. B. and Gerard, C. 1993. Human chemotaxis receptor genes cluster at 19q13.313.4. Characterization of the human C5a receptor gene. Biochemistry 32:1243.[ISI][Medline]
-
Crass, T., Raffetseder, U., Martin, U., Grove, M., Klos, A., Kohl, J. and Bautsch, W. 1996. Expression cloning of the human C3a anaphylatoxin receptor (C3aR) from differentiated U-937 cells. Eur. J. Immunol. 26:1944.[ISI][Medline]
-
Hollmann, T. J., Haviland, D. L., Kildsgaard, J., Watts, K. and Wetsel, R. A. 1998. Cloning, expression, sequence determination, and chromosome localization of the mouse complement C3a anaphylatoxin receptor gene. Mol. Immunol. 35:137.[ISI][Medline]
-
Paral, D., Sohns, B., Crass, T., Grove, M., Kohl, J., Klos, A. and Bautsch, W. 1998. Genomic organization of the human C3a receptor. Eur. J. Immunol. 28:2417.[ISI][Medline]
-
Heine Suner, D., Diaz Guillen, M. A., de Villena, F. P., Robledo, M., Benitez, J. and Rodriguez de Cordoba, S. 1997. A high-resolution map of the regulator of the complement activation gene cluster on 1q32 that integrates new genes and markers. Immunogenetics 45:422.[ISI][Medline]
-
Carroll, M. C., Campbell, R. D., Bentley, D. R. and Porter, R. R. 1984. A molecular map of the human major histocompatibility complex class III region linking complement genes C4, C2 and factor B. Nature 307:237.[ISI][Medline]
-
Coto, E., Martinez Naves, E., Dominguez, O., DiScipio, R. G., Urra, J. M. and Lopez Larrea, C. 1991. DNA polymorphisms and linkage relationship of the human complement component C6, C7, and C9 genes. Immunogenetics 33:184.[ISI][Medline]
-
Kusumoto, H., Hirosawa, S., Salier, J. P., Hagen, F. S. and Kurachi, K. 1988. Human genes for complement components C1r and C1s in a close tail-to-tail arrangement. Proc. Natl Acad. Sci. USA 85:7307.[Abstract]
-
Rogde, S., Olaisen, B., Gedde Dahl, T., Jr and Teisberg, P. 1986. The C8A and C8B loci are closely linked on chromosome 1. Ann. Hum. Genet. 50:139.[ISI][Medline]
-
Sellar, G. C., Blake, D. J. and Reid, K. B. M. 1991. Characterization and organization of the genes encoding the A-, B- and C-chains of human complement subcomponent C1q. The complete derived amino acid sequence of human C1q. Biochem. J. 274:481.[ISI][Medline]
-
Takahashi, A., Takayama, Y., Hatsuse, H. and Kawakami, M. 1993. Presence of a serine protease in the complement-activating component of the complement-dependent bactericidal factor, RaRF, in mouse serum. Biochem. Biophys. Res. Commun. 190:681.[ISI][Medline]
-
Endo, Y., Sato, T., Matsushita, M. and Fujita, T. 1996. Exon structure of the gene encoding the human mannose-binding protein-associated serine protease light chain: comparison with complement C1r and C1s genes. Int. Immunol. 8:1355.[Abstract]
-
Tosi, M., Duponchel, C., Meo, T. and Couture Tosi, E. 1989. Complement genes C1r and C1s feature an intronless serine protease domain closely related to haptoglobin. J. Mol. Biol. 208:709.[ISI][Medline]
-
Tornetta, M. A., Foley, J. J., Sarau, H. M. and Ames, R. S. 1997. The mouse anaphylatoxin C3a receptor: molecular cloning, genomic organization, and functional expression. J. Immunol. 158:5277.[Abstract]
-
Hsu, M. H., Ember, J. A., Wang, M., Prossnitz, E. R., Hugli, T. E. and Ye, R. D. 1998. Cloning and functional characterization of the mouse C3a anaphylatoxin receptor gene. Immunogenetics 47:64.[ISI]
-
Breen, M., Deakin, L., Macdonald, B., Miller, S., Sibson, R., Tarttelin, E., Avner, P., Bourgade, F., Guenet, J. L., Montagutelli, X., Poirier, C., Simon, D., Tailor, D., Bishop, M., Kelly, M., Rysavy, F., Rastan, S., Norris, D., Shepherd, D., Abbott, C., Pilz, A., Hodge, S., Jackson, I., Boyd, Y., Blair, H., Maslen, G., Todd, J. A., Reed, P. W., Stoye, J., Ashworth, A., McCarthy, L., Cox, R., Schalkwyk, L., Lehrach, H., Klose, J., Gangadharan, U. and Brown, S. 1994. Towards high resolution maps of the mouse and human genomesa facility for ordering markers to 0.1 cM resolution. European Backcross Collaborative Group. Hum. Mol. Genet 3:621.[Abstract]
-
Nguyen, V. C., Tosi, M., Gross, M. S., Cohen Haguenauer, O., Jegou Foubert, C., de Tand, M. F., Meo, T. and Frezal, J. 1988. Assignment of the complement serine protease genes C1r and C1s to chromosome 12 region 12p13. Hum Genet 78:363.[ISI][Medline]
-
Takada, F., Takayama, Y., Hatsuse, H. and Kawakami, M. 1993. A new member of the C1s family of complement proteins found in a bactericidal factor, Ra-reactive factor, in human serum. Biochem. Biophys. Res. Commun. 196:1003.[ISI][Medline]
-
Marra, M. A., Hillier, L. and Waterston, R. H. 1998. Expressed sequence tagsESTablishing bridges between genomes. Trends Genet. 14:4.[ISI][Medline]
-
Mott, R. 1997. EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comp. Appl. Biosci. 13:477.[Medline]
-
Murphy, P. M., Ozcelik, T., Kenney, R. T., Tiffany, H. L., McDermott, D. and Francke, U. 1992. A structural homologue of the N-formyl peptide receptor. Characterization and chromosome mapping of a peptide chemoattractant receptor family. J. Biol. Chem. 267:7637.[Abstract/Free Full Text]
-
Bao, L., Gerard, N. P., Eddy, R. L., Jr, Shows, T. B. and Gerard, C. 1992. Mapping of genes for the human C5a receptor (C5AR), human FMLP receptor (FPR), and two FMLP receptor homologue orphan receptors (FPRH1, FPRH2) to chromosome 19. Genomics 13:437.[ISI][Medline]
-
Stubbs, L., Carver, E. A., Shannon, M. E., Kim, J., Geisler, J., Generoso, E. E., Stanford, B. G., Dunn, W. C., Mohrenweiser, H., Zimmermann, W., Watt, S. M. and Ashworth, L. K. 1996. Detailed comparative map of human chromosome 19q and related regions of the mouse genome. Genomics 35:499.[ISI][Medline]
-
Takada, F., Seki, N., Matsuda, Y., Takayama, Y. and Kawakami, M. 1995. Localization of the genes for the 100-kDa complement-activating components of Ra-reactive factor (CRARF and Crarf) to human 3q27q28 and mouse 16B2-B3. Genomics 25:757.[ISI][Medline]
-
Matsushita, M. and Fujita, T. 1992. Activation of the classical complement pathway by mannose-binding protein in association with a novel C1s-like serine protease. J. Exp. Med. 176:1497.[Abstract]
-
Vorup Jensen, T., Davis, S. J., Poulsen, K., Schwaeble, W., Stover, C. M., Sim, R., Reid, K. B. M., Peterson, S. V., Thiel, S. and Jensenius, J. C. 1998. Studies of recombinant MBL and MASP-2: is MASP-2 self-activating? Mol. Immunol. 35:400 (Abstr).
-
Matsushita, M. and Fujita, T. 1995. Cleavage of the third component of complement (C3) by mannose-binding protein-associated serine protease (MASP) with subsequent complement activation. Immunobiology 194:443.[ISI][Medline]
-
Ogata, R. T., Low, P. J. and Kawakami, M. 1995. Substrate specificities of the protease of mouse serum Ra-reactive factor. J. Immunol. 154:2351.[Abstract/Free Full Text]
-
Wong, N. K. H., Dobo, J. and Sim, R. B. 1998. Interaction of MBL-associated serine proteases (MASPs) with synthetic and natural substrates and inhibitors. Mol. Immunol. 35:375 (Abstr).
-
Rogers, J. 1985. Exon shuffling and intron insertion in serine protease genes [News]. Nature 315:458.[ISI][Medline]
-
Irwin, D. M., Robertson, K. A. and MacGillivray, R. T. 1988. Structure and evolution of the bovine prothrombin gene. J. Mol. Biol. 200:31.[ISI][Medline]
-
Maeda, N., Yang, F., Barnett, D. R., Bowman, B. H. and Smithies, O. 1984. Duplication within the haptoglobin Hp2 gene. Nature 309:131.[ISI][Medline]
-
Irwin, D. M. 1988. Evolution of an active-site codon in serine proteases [Letter]. Nature 336:429.[Medline]
-
Brenner, S. 1988. The molecular evolution of genes and proteins: a tale of two serines. Nature 334:528.[ISI][Medline]
-
Weiner, A. M., Deininger, P. L. and Efstratiadis, A. 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem 55:631.[ISI][Medline]
-
Smith, S. L. 1999. Shark complement: an assessment. Immunol. Rev 166:67.[ISI]
-
Darby, K., Eyre, H. J., Lapsys, N., Copeland, N. G., Gilbert, D. J., Couzens, M., Antonova, O., Sutherland, G. R., Jenkins, N. A. and Herzog, H. 1997. Assignment of the Y4 receptor gene (PPYR1) to human chromosome 10q11.2 and mouse chromosome 14. Genomics 46:513.[ISI][Medline]