Two Ancient Classes of MIKC-type MADS-box Genes are Present in the Moss Physcomitrella patens

Katrin Henschel, Rumiko Kofuji, Mitsuyasu Hasebe, Heinz Saedler, Thomas Münster and Günter Theißen3,1

*Department of Molecular Plant Genetics, Max Planck Institute for Breeding Research, Cologne, Germany;
{dagger}Faculty of Science, Kanazawa University, Kanazawa, Japan;
{ddagger}National Institute for Basic Biology, Okazaki, Japan


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Characterization of seven MADS-box genes, termed PPM1PPM4 and PpMADS1PpMADS3, from the moss model species Physcomitrella patens is reported. Phylogeny reconstructions and comparison of exon-intron structures revealed that the genes described here represent two different classes of homologous, yet distinct, MIKC-type MADS-box genes, termed MIKCc-type genes—"c" stands for "classic"—(PPM1, PPM2, PpMADS1) and MIKC*-type genes (PPM3, PPM4, PpMADS2, PpMADS3). The two gene classes deviate from each other in a characteristic way, especially in a sequence stretch termed intervening region. MIKCc-type genes are abundantly present in all land plants which have been investigated in this respect, and give rise to well-known gene types such as floral meristem and organ identity genes. In contrast, LAMB1 from the clubmoss Lycopodium annotinum was identified as the only other MIKC*-type gene published so far. Our findings strongly suggest that the most recent common ancestor of mosses and vascular plants contained at least one MIKCc-type and one MIKC*-type gene. Our studies thus reveal an ancient duplication of an MIKC-type gene that occurred before the separation of the lineages that led to extant mosses and vascular plants more than about 450 MYA. The identification of bona fide K-domains in both MIKC*-type and MIKCc-type proteins suggests that the K-domain is more ancient than is suggested by a recent alternative hypothesis. MIKC*-type genes may have escaped identification in ferns and seed plants so far. It seems more likely, however, that they represent a class of genes which has been lost in the lineage which led to extant ferns and seed plants. The high number of P. patens MADS-box genes and the presence of a K-box in the coding region and of some potential binding sites for MADS-domain proteins and other transcription factors in the putative promoter regions of these genes suggest that MADS-box genes in mosses are involved in complex gene regulatory networks similar to those in flowering plants.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Although comparative morphological and phylogenetic analyses may have revealed the major patterns of land plant evolution, the mechanisms which generated the enormous diversity of land plant body plans have remained elusive. However, promising approaches are currently made to solve the mysteries of macroevolution, one of which is evolutionary developmental genetics ("evo-devo"). In brief, evo-devo assumes that there is a close interrelationship between developmental and evolutionary processes (Gilbert, Opitz, and Raff 1996Citation ). This is so because even the most complex organisms are generated anew in each generation by developmental processes that usually start with a single cell, the fertilized egg cell. In the case of multicellular organisms, evolution of form is thus always the evolution of developmental processes. Because development is to a very large extent under genetic control, changes in developmental control genes may be a major aspect of evolutionary changes in morphology (Theißen and Saedler 1995Citation ; Gilbert, Opitz, and Raff 1996Citation ; Theißen et al. 2000Citation ; Carroll, Grenier, and Weatherbee 2001Citation , p. 13; Davidson 2001Citation , p. 18). Understanding the phylogeny of developmental control genes may thus significantly contribute to understanding the evolution of plant and animal form.

In recent years it has been demonstrated that many of the key developmental control genes of plants and animals are members of multigene families which encode transcription factors. One of the most important gene families for understanding the interrelationships between development and evolution in green plants is arguably the family of MADS-box genes (Theißen and Saedler 1995Citation ; Theißen, Kim, and Saedler 1996Citation ; Hasebe and Banks 1997Citation ; Hasebe 1999Citation ; Theißen et al. 2000Citation ; Vergara-Silva, Martinez-Castilla, and Alvarez-Buylla 2000Citation ).

The defining characteristic of all MADS-box genes is the presence of a highly conserved, approximately 180-nucleotides-long DNA sequence, termed the MADS-box. It encodes the DNA-binding domain of the respective MADS-domain (M-domain) transcription factors (Theißen, Kim, and Saedler 1996Citation ; Theißen et al. 2000Citation ). Outside the DNA-binding domains, MADS-proteins are structurally quite diverse (Alvarez-Buylla et al. 2000bCitation ). However, like many other eukaryotic transcription factors, M-domain proteins have a modular structural organization, which provides the basis for a further subdivision of these proteins and their genes.

The majority of the plant M-domain proteins known so far belong to a single clade with a conserved structural organization, the so-called MIKC-type domain structure. It comprises (from N- to C-terminal) an N-terminal domain, which is, however, present only in a minority of proteins; an M-domain, which is the major determinant of DNA-binding but which also performs dimerization and accessory factor binding functions; an intervening (I) domain, which constitutes a key molecular determinant for the selective formation of DNA-binding dimers; a keratin-like (K) domain, which promotes protein dimerization; and a C-terminal domain (C-domain), which is involved in transcriptional activation or in the formation of ternary or quaternary protein complexes (Riechmann and Meyerowitz 1997Citation ; Egea-Cortines, Saedler, and Sommer 1999Citation ; Becker et al. 2000Citation ; Theißen et al. 2000Citation ; Honma and Goto 2001Citation ). By phylogeny reconstruction, MIKC-type genes can be further subdivided into a considerable number of defined gene clades. Most clade members share highly related functions and similar expression patterns during vegetative or reproductive growth (Theißen, Kim, and Saedler 1996Citation ; Theißen et al. 2000Citation ).

The most well-known MIKC-type MADS-box genes function as homeotic selector genes in the specification of floral organ identity (Schwarz-Sommer et al. 1990Citation ; Weigel and Meyerowitz 1994Citation ; Theißen and Saedler 1999, 2001Citation ; Theißen 2001Citation ). These floral organ identity genes are involved in differentiating different plant organs from each other. They do so by encoding transcription factors which bind to regulatory DNA sequences of their target genes and either activate or repress these genes as appropriate for the development of the respective floral organs. The target genes, or genes further downstream, function as "realizator genes," meaning that they encode the enzymes and structural proteins needed for any floral organ to develop its characteristic identity.

The floral organ identity genes are generally expressed in those organs whose identity they specify, such as sepals and petals, petals and stamens, or stamens and carpels in case of the class A, class B, and class C floral organ identity genes, respectively (Weigel and Meyerowitz 1994Citation ; Theißen and Saedler 1999Citation ; Theißen 2001Citation ; Theißen and Saedler 2001Citation ). However, the flowering plant Arabidopsis alone contains more than 80 different MADS-box genes (Riechmann et al. 2000Citation ), many of which are involved in other developmental processes inside and outside of the flower, including fruit, leaf, and root development (Alvarez-Buylla et al. 2000a;Citation Liljegren et al. 2000Citation ; Theißen et al. 2000Citation ).

Because MADS-box genes play such important roles in diverse aspects of flowering plant development, understanding their phylogeny may strongly improve our understanding of the origin and evolution of flowering plant architecture. For example, floral organ identity totally depends on the activity of some MADS-box genes. The origin of the hallmark of the flowering plants, the flower, can therefore only be understood in the context of MADS-box gene phylogeny. The analogous may be true for more ancient structural features, such as mega- and microsporophylls or even vegetative organs. Thus, the question arises as to when and how the genes arose during evolution that today control the development of flowering plants. To answer this, the phylogeny of MADS-box genes has to be reconstructed and superimposed on the phylogeny of land plants. Toward that goal, MADS-box genes have to be studied in phylogenetically informative taxa, including all major groups of land plants (embryophytes).

Extant land plants very likely represent a monophyletic group of organisms whose closest relatives are a group of green freshwater algae termed stoneworts (charophytes) (Bhattacharya and Medlin 1998Citation ; Graham and Wilcox 2001Citation , pp. 497–498). Land plants comprise structurally relatively simple bryophytes as well as more complex tracheophytes (vascular plants) (Kenrick and Crane 1997Citation ). Extant bryophytes comprise liverworts, hornworts, and mosses. Extant vascular plants range from clubmosses (lycophytes) and ferns and their allies (such as whisk ferns and horsetails) to complex seed plants (spermatophytes), comprising gymnosperms and angiosperms (Kenrick and Crane 1997Citation ; Nickrent et al. 2000Citation ; Pryer et al. 2001Citation ). Bryophytes are very likely not a monophyletic group, but the question as to which of the bryophyte lineages is the most basal is still highly controversial (Qiu and Palmer 1999Citation ; Nickrent et al. 2000Citation ). A widely held view places liverworts as the first branch of the land plant tree, whereas some studies suggest that hornworts are the earliest land plants, with mosses and liverworts jointly forming the second deepest lineage (Kenrick and Crane 1997Citation ; Nishiyama and Kato 1999Citation ; Nickrent et al. 2000Citation ).

A detailed characterization of the MADS-box gene family in the pteridophytes Ceratopteris pteroides and C. richardii (leptosporangiate ferns) and Ophioglossum pedunculosum (eusporangiate fern) (Kofuji and Yamaguchi 1997Citation ; Münster et al. 1997Citation ; Hasebe et al. 1998Citation ; Theißen et al. 2000Citation ; Münster, Faigl, and Theißen, personal communication) suggested that the most recent common ancestor of ferns and seed plants about 400 MYA already contained at least two different MIKC-type MADS-box genes (Münster et al. 1997Citation ; Hasebe et al. 1998Citation ; Theißen et al. 2000Citation ). These genes probably had expression patterns and functions that were more ubiquitous than those of the highly specialized floral organ identity genes from extant flowering plants, as suggested by the expression patterns of genes from extant ferns. No clear evidence has been obtained so far that there are orthologs of floral homeotic genes in ferns. Thus, the most recent common ancestors of extant ferns and seed plants possibly did not have orthologs of floral homeotic genes yet (Theißen et al. 2000Citation ). However, the deep branching of the MADS-box gene tree is not well resolved, and molecular clock estimates based on angiosperm sequences suggested that the floral homeotic genes diverged before the seed plant–fern split (Purugganan 1997Citation ). It cannot be ruled out, therefore, that floral homeotic gene orthologs existed already in the most recent common ancestor of ferns and seed plants but have either been lost near the base of the lineage that led to extant ferns or exist in ferns but have escaped identification so far (Hasebe 1999Citation ). It would be interesting to see, therefore, what kind of MADS-box genes are present in even more basal, nonvascular plants, such as mosses, which separated from the lineage that led to extant vascular plants about 450 MYA.

Because of its unprecedented technical advantages as a green plant model system (Cove and Knight 1993Citation ; Schaefer and Zryd 1997Citation ; Reski 1999Citation ; Schaefer 2001Citation ), we and others (Krogan and Ashton 2000Citation ) have chosen Physcomitrella patens for a characterization of moss MADS-box genes. In the following, we outline structural data for seven different MADS-box genes, but we also report that the total number of MADS-box genes in P. patens is even higher. These MADS-box genes can be subdivided into at least two different classes, classical MIKC-type genes (henceforth termed MIKCc-type genes) and a deviant type of MIKC-type genes with structurally abnormal I- and K-regions, termed MIKC*-type genes. We identified LAMB1 from the lycophyte Lycopodium annotinum (Svensson, Johannesson, and Engström 2000Citation ) as the only other MIKC* gene published so far. Our data imply that the complexity of the MADS-box gene family in the moss Physcomitrella is higher than was previously assumed (Krogan and Ashton 2000Citation ). The implications of our findings for a reconstruction of MADS-box gene phylogeny and the prospects of future research are discussed.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Plant Material
Physcomitrella patens (Hedw.) B. S. G. was grown under standard conditions as described by D. G. Schaefer (protocol available via the World Wide Web at http://www.unil.ch/lpc/docs/PPprotocols2001.pdf).

DNA Synthesis and Sequencing
Oligonucleotides were purchased from LifeTech. All DNA sequences corresponding to PPM and PpMADS genes were determined by the MPIZ DNA core facility and the NIBB Center for Analytical Instruments, respectively, on Applied Biosystems (Weiterstadt, Germany) Abi Prism 377 and 3700 sequencers using BigDye-terminator chemistry. Premixed reagents were from Applied Biosystems.

Isolation of cDNAs
For the isolation of cDNAs of PPM1PPM4, poly(A)+-RNA was isolated from plant material, including all major stages of the P. patens life cycle (protonema, gametophores, and sporophytes of different ages), according to a method published by Chomczynski and Sacchi (1987)Citation , using Total RNA Reagent (BIOMOL). Partial cDNAs of PPM2c and PPM3 were isolated by 3' rapid amplification of cDNA ends (RACE) as generally described (Frohman, Dush, and Martin 1988Citation ; Münster et al. 1997Citation ). Oligonucleotide PRQVT2 (5'-CGR CAR GTG ACS TTC TSC AAR CG-3'), which had been derived from fern, gymnosperm, and angiosperm sequences, was used as an MADS-box specific primer. RACE products were cloned into a standard plasmid vector, pGEM-T (Promega, Mannheim, Germany).

The PPM4 cDNA was isolated by 3' RACE with primers derived from genomic sequence fragments of PPM4 (see later) and a RACE-adapter primer.

Additional cDNAs of PPM3 and partial cDNAs of PPM1 were isolated by screening of a cDNA phage library representing P. patens protonema mRNA (kindly provided by R. Strepp and R. Reski) under conditions of moderate stringency. Radioactively labeled cDNAs of PPM2c and PPM3 were used as hybridization probes.

Upstream sequences overlapping with the 3' fragments of PPM1, PPM2c and PPM3 were isolated by 5' RACE, employing a commercially available kit (5'/3'-RACE Kit; Boehringer Mannheim, Mannheim, Germany).

cDNAs of PpMADS1PpMADS3 were isolated in the following way. Poly(A)+-RNA was isolated from a mixture of protonemata and gametophores cultured at 25°C and gametophores cultured at 16°C using Dynabeads mRNA Direct Kit (DYNAL, Oslo, Norway) and further purified with ISOGEN-LS (Wako Pure Chemical, Osaka, Japan). The Marathon cDNA Amplification Kit (Clontech, Palo Alto, Calif.) and the 3' RACE System (Life Technologies, Inc., Rockville, Md.) were used to synthesize cDNA of each poly(A)+-RNA, using nested M-domain–specific primers (M1: 5'-SAR MTN AAR MGG ATM GAG AAC-3' and duMADS2: 5'-CAU CAU CAU CAU AAR AAR GCI TAY GAR CTI TCN TCN GT-3') and modified poly-T primers supplied by the kits. The PCR-amplified candidate cDNAs were separated on 1% agarose gels, and fragments of over 500 bp were cloned into the pAMP1 vector (Life Technologies). From candidate clones obtained by 3'-RACE, 480 clones (288 clones for the mixture of cDNA from protonemata and gametophores cultured at 25°C and 192 clones for cDNA from gametophores cultured at 16°C) were characterized. All clones were digested with Sau3A1, and the 288 and 192 clones were classified into groups of 46 and 32, respectively. In the group of 46, subsets of 2, 7, and 37 corresponded to PpMADS1, PpMADS2, and unrelated DNA fragments, respectively. In the group of 32, subsets of 7, 8, and 17 corresponded to PpMADS2, PpMADS3, and unrelated DNA fragments, respectively. The 5' region of each gene was obtained using the Marathon cDNA Amplification Kit and sequenced. The nucleotide sequences of the gene-specific primers are deposited in the DNA database, with the sequence of each gene as additional information. Nucleotide sequence of each gene was confirmed by sequencing PCR products using two gene-specific primers located close to each end of the putative mRNA.

Nucleotide sequence data of the cDNAs have been deposited in the EMBL, GenBank, and DDBJ Nucleotide Sequence Databases under the accession numbers AF150931 (PPM1; Krogan and Ashton 2000Citation ), AJ419328 (PPM2c), AJ419329 (PPM3), AJ419330 (PPM4), AB067688 (PpMADS1), AB067689 (PpMADS2), and AB067690 (PpMADS3). Genomic sequences are available under AF150932 (PPM1; Krogan and Ashton 2000Citation ), AF150934 (PPM2; Krogan and Ashton 2000Citation ), AJ421637 (PPM3), and AJ421638 (PPM4).

Genomic DNA-based PCR Techniques
Genomic DNA was isolated from 3-week-old gametophytes using a DIECA-based protocol. Genomic DNA fragments corresponding to isolated cDNAs were amplified with primers derived from the known cDNA sequences. DNA (400 ng) was used as template in a standard PCR program (denaturation at 94°C, annealing at 66°C, elongation at 72°C) of 40 cycles.

Sequence information of putative promoter regions and 5' untranslated regions (UTRs) was obtained by the rapid amplification of genomic ends (RAGE) technique using primers of the 5' parts of known sequences and an adapter-specific oligonucleotide, as described elsewhere (Siebert et al. 1995Citation ; Cormack and Somssich 1997Citation ). Second rounds of PCRs with nested primers were carried out employing primary amplification products as templates.

A PPM4 fragment was isolated by 3' RAGE, running the PCR with oligonucleotides PKH05 (MADS-box primer) (5'-AGG CAR GTG ACS TWC TSC AAR MG-3'), which had been derived from known MADS-box sequences, and an adapter-specific oligonucleotide PAP1 (5'-GTA ATA CGA CTC ACT ATA GGG C-3'). PCR products were cloned into the plasmid vector pGEM-T (Promega).

To complete the genomic 5' part of PPM4, 5' RAGE was performed. After cloning the PPM4 cDNA, the corresponding genomic DNA fragments were amplified with primers derived from the cDNA sequences.

Southern Analysis
Southern blots were prepared by standard methods with 10 µg of total DNA from P. patens gametophytes per lane, digested with different restriction enzymes. Gene-specific hybridization probes were mainly obtained from the region downstream of the MADS-box to avoid cross-hybridization with other gene family members even under stringent hybridization conditions. All probes were radioactively labeled by oligo-probe labeling using Klenow enzyme and {alpha}32P-dCTP (Sambrook, Fritsch, and Maniatis 1989Citation , pp. 10.13–10.17). The filters were hybridized at 68°C (high stringency) or at 60°C (moderate stringency) for 16 h in 5 x SSC, 5 x Denhardt's solution, 0.5% SDS, and 1 mg/ml herring sperm DNA and washed at 68°C in 0.1 x SSPE/0.1% SDS (high stringency) or at 55°C in 1 x SSPE/0.1% SDS (moderate stringency).

Sequence Alignments and Construction of Phylogenetic Trees
Multiple alignments of conceptual amino acid sequences were generated by using the Genetics Computer Group program PILEUP (version 10.0), with a gap weight of 8 and a gap length weight of 2 (default parameters), and were optimized by hand, if necessary. Only the M-domains (amino acid [aa] position 1–60) and part of the K-domains (aa position 95–145; numbering according to the SQUA protein), fused together, were used for the reconstruction of phylogenetic trees because these represent the only sequences which could be unambiguously aligned in the case of the data sets used. In addition, the 51st amino acid of LAMB1, which represents an autapomorphic insertion (fig. 1 ; Svensson, Johannesson, and Engström 2000Citation ), was removed for phylogeny reconstructions. Trees were constructed by the neighbor-joining method and statistically evaluated by bootstrap analysis as described (Münster et al. 1997Citation ). Accession numbers of the sequences used can be found at the MADS homepage (http://www.mpiz-koeln.mpg.de/mads).



View larger version (66K):
[in this window]
[in a new window]
 
Fig. 1.—Sequence alignment of M-domain proteins. Conceptual amino acid sequences of MADS-box genes from P. patens (PPM and PpMADS sequences) were aligned to representatives of different MADS-box gene subfamilies. A "<" sign indicates that the sequence of AGL17 is incomplete at the N terminus. The N-, M-, I-, K-, and C-domains are marked; the hydrophobic amino acids (L, I, V, and M) of the K-domain are highlighted in bold. Brackets below the sequences indicate the positions of introns which are located between codons (zero-phase introns). Triangles indicate the position of introns which are located within codons. Except for the last intron of PPM1, PPM2, and PpMADS1, they are all phase 2 introns, i.e., they are located between the second and the third codon positions. DEF, GLO, PLE, and SQUA are derived from Antirrhinum majus; AGL2, AGL6, AGL12, and AGL17 from Arabidopsis thaliana; AGL15-1 from Brassica napus; TM8 from Lycopersicon esculentum; TOBMADS1 from Nicotiana tabacum; CRM1 and CRM3 from C. richardii; and LAMB1 from L. annotinum

 

    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Cloning of MADS-box Genes from P. patens
Employing a diversity of molecular cloning techniques, sequence information about seven different MADS-box genes of the moss P. patens was obtained. Two of the genes are identical with PPM1 and PPM2 as described by Krogan and Ashton (2000)Citation , so we adopt these names here. In the case of PPM2, however, a novel cDNA isoform (PPM2c) was found; its sequence is identical to that of the PPM2 cDNA (sensu Krogan and Ashton 2000Citation ), except for the sequence downstream of the 10th exon where PPM2c is identical to the cDNA termed PPM2b (Krogan and Ashton 2000Citation ). Complete sequencing of representative cDNAs and conceptual translation of the open reading frames yielded proteins of lengths of 283 aa (PPM1), 284 aa (PPM2c), 372 aa (PPM3), 380 aa (PPM4), 281 aa (PpMADS1), 306 aa (PpMADS2), and 320 aa (PpMADS3) (fig. 1 ).

Hybridization of Southern blots containing genomic DNA of P. patens, with different probes specific for the moss genes PPM1PPM4, under stringent conditions gave single, but different, bands for each gene, indicating that PPM1PPM4 represent four different single-copy genes (fig. 2 ).



View larger version (67K):
[in this window]
[in a new window]
 
Fig. 2.—Genomic Southern blot analysis of some of the P. patens MADS-box genes. Genomic DNA was digested with the restriction enzymes EcoRI, EcoRV, HindII, or HindIII, as indicated above the lanes, separated on an agarose gel, and blotted onto a nylon membrane. Probes were used which were specific for the different PPM genes under the high-stringency hybridization and washing conditions applied

 
Sequence Comparisons Reveal Two Distinct Classes of MIKC-type Genes in P. patens
Multiple sequence alignments of the conceptual PPM and PpMADS amino acid sequences with sequences of diverse MIKC-type M-domain proteins from ferns and seed plants demonstrated that all the M-domain proteins from P. patens have recognizable M- and K-domains linked together by an I-domain and followed by a C-domain (fig. 1 ). Thus, all these conceptual gene products may be considered as MIKC-type proteins. However, the sequence alignment also revealed that two different classes of P. patens proteins can be distinguished. PPM1, PPM2, and PpMADS1 have a primary structure which is very similar to that of the other MIKC-type proteins known (fig. 1 ). We refer here to proteins with such a structure as MIKCc-type proteins (or, in brief, MIKCc proteins), where "c" stands for "classic." In contrast, PPM3, PPM4, PpMADS2, and PpMADS3 share some features which distinguish them from the MIKCc-type proteins. Proteins with at least some of these features will henceforth be termed MIKC*-type (or MIKC*) proteins.

The unusual features of the MIKC*-type proteins concern mainly the I-domain, but the K-domain also shows some peculiarities. The I-domains of the MIKC*-type proteins are considerably longer than those of the MIKCc-type proteins. For example, in the alignment presented here the I-domain of PPM3 is 62 aa long, and the I-domain of PpMADS2 is even 87 aa long, whereas that of the typical MIKCc-type protein SQUAMOSA (SQUA) from Antirrhinum majus has a length of 35 aa (Huijser et al. 1992Citation ) (fig. 1 ).

The K-domains of MIKC*-type proteins also show regularly spaced hydrophobic amino acids as the K-domains of the MIKCc-type proteins do, but some of the hydrophobic amino acids may be shifted by one or two positions or may be less conserved (fig. 1 ). In addition, there are three indels where the K-domains of the P. patens MIKC*-type proteins are longer than the K-domains of the MIKCc-type proteins. The indels are very small: two comprise two amino acids and one just one amino acid residue. Interestingly, one insertion of two amino acids is also found in two closely related MIKCc-type proteins, CRM3 and CRM9, from the fern Ceratopteris (fig. 1 ; Münster et al. 1997Citation ; Theißen et al. 2000Citation ).

Taken together, a longer I-domain seems to be the most characteristic hallmark of the MIKC*-type proteins, whereas a deviating K-domain, with respect to both sequence and length, is a typical, but less diagnostic, feature.

There is one published M-domain protein, LAMB1, from the lycopod (clubmoss) L. annotinum (Svensson, Johannesson, and Engström 2000Citation ) that shares the most remarkable feature of PPM3, PPM4, PpMADS2, and PpMADS3 (fig. 1 ): the I-domain of LAMB1 is significantly longer than those of the MIKCc-type proteins (fig. 1 ). This raises the intriguing question as to whether this similarity between the P. patens MIKC*-type proteins and LAMB1 indicates a close evolutionary relationship between the respective genes. In the following, we consider LAMB1 tentatively as an MIKC*-type protein.

Exon-intron Structures of MIKCc- and MIKC*-type Genes
Interpretations of structural differences between MIKCc- and MIKC*-type proteins critically depend on proper recognition of the different domains and thus on assumptions about homologies. The hypothesis about the domain structures of the MIKC-type proteins implied by figure 1 (henceforth termed "longer I-domain hypothesis") was not easy to corroborate by simple sequence comparisons. There was an alternative hypothesis ("K-domain insertion hypothesis") which suggested that the length difference between MIKCc- and MIKC*-type proteins upstream of the C-domain is mainly because of an insertion within the K-domain rather than an elongation of the I-domain. Determination of sequence similarities between both sets of putative K-domains of MIKC*-type proteins and the K-domains of MIKCc-type proteins, as well as the analysis of patterns of conserved amino acids (data not shown), slightly favored the longer I-domain hypothesis, as did pairwise and multiple sequence alignments involving different samples of MIKC-type proteins (an example is given in fig. 1 ). However, these analyses could not completely rule out the K-domain insertion hypothesis being true.

To distinguish between the two hypotheses and to better understand the differences between the two classes of MIKC-type genes, the sequences and exon-intron structures of the genomic loci of the P. patens MADS-box genes were determined. The positions of all introns with respect to the protein sequences are depicted in figure 1 . The exon-intron structures of PPM1PPM4 are compared with those of SQUA and LAMB1 in figure 3 . It becomes obvious that under the longer I-domain hypothesis, the exon-intron structures of both classes of MIKC-type proteins appear generally quite conserved. Most important, the positions of introns within the K-domains are identical for all MIKC-type genes under that hypothesis, except that one intron is missing in LAMB1 (fig. 1 ). In contrast, under the K-domain insertion hypothesis, intron positions differ dramatically in MIKCc- and MIKC*-type genes (data not shown). Moreover, the K-domain insertion hypothesis requires the split of a domain assumed to be of some functional relevance (for protein-protein interactions). For these reasons we currently strongly favor the longer I-domain hypothesis over its alternative.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 3.—Analysis of the exon-intron structures of the P. patens MADS-box genes PPM1PPM4. Exons are shown as colored boxes, the 5'- and 3'-UTRs as white boxes, and the introns as bold black lines. Homologous exons are connected with thin vertical lines. In addition, the genomic structure of SQUAMOSA as a representative of an MIKCc-type and of LAMB1 as an MIKC*-type MADS-box gene are presented

 
Under the longer I-domain hypothesis, the major structural difference between the two subtypes of MIKC genes is the following: whereas the I-domain of MIKCc-type proteins is largely encoded by just one exon, that of MIKC*-type proteins is encoded by four (PPM3, PPM4, LAMB1) or five (PpMADS2, PpMADS3) exons, thus explaining the increase in I-domain length in MIKC*-type proteins (figs. 1 and 3 ). As already described for LAMB1 (Svensson, Johannesson, and Engström 2000Citation ), the last and third last of the I-domain exons of each MIKC*-type gene are remarkably small, encoding only a few amino acids (figs. 1 and 3 ). The analysis of exon-intron structures thus supports the view that a deviant I-domain is the most consistent feature that distinguishes the MIKC*-type proteins from the MIKCc-type proteins.

The similarity in exon-intron structures between the P. patens MIKC*-type genes and LAMB1 in the I-domain (figs. 1 and 3 ) strongly suggests a relatively close relationship between the respective genes. Despite the high structural similarity of the P. patens MIKC* proteins and LAMB1, the overall sequence identity between these proteins is quite low. PPM3 and LAMB1, for example, show an overall identity of only 26.9% (a similarity of 37.3%), probably reflecting the fact that the lineages that led to extant mosses and vascular plants (including lycophytes) were already separated about 450 MYA. In line with this, and despite the general similarity among the MIKC*-type genes, LAMB1 shows some unique features which distinguishes it from the P. patens MIKC* genes. Downstream of the first intron within the K-domain, all other introns are missing (figs. 1 and 3 ); the three indels found in the K-domain of the P. patens MIKC* proteins are not found; there is an exceptionally long C-domain that seems to be composed of three quite imperfect repeat units (Svensson, Johannesson, and Engström 2000Citation ).

MIKC*-type Proteins are Closely Related
The similarities between PPM3, PPM4, PpMADS2, PpMADS3 and LAMB1 suggest that they are more closely related to each other than to any other MADS-box gene known. To test this hypothesis, the evolutionary relationships between the moss genes and the other known MADS-box genes were determined by phylogeny reconstructions involving a diverse set of MIKC-type proteins from vascular plants (including LAMB1) and the conceptual gene products of all the P. patens MADS-box genes reported here. The set of seed plant MIKCc proteins used comprised representatives of all the major subfamilies of plant M-domain proteins known (Theißen, Kim, and Saedler 1996Citation ; Becker et al. 2000Citation ; Theißen et al. 2000Citation ). Only sequences of M- and K-domains were used for these kinds of studies because they produce the most unambiguous alignments.

A representative example of the phylogenetic trees obtained is shown in figure 4 . It indicates that both the MIKCc- and the MIKC*-type genes cluster together with reasonable bootstrap support. Given that the tree in figure 4 is unrooted, this suggests that at least one of the two types of genes (MIKC* or MIKCc) represents a clade, meaning that the respective genes share a most recent common ancestor not shared with any of the other MIKC-type genes known. In addition, it becomes apparent that the P. patens MIKC proteins constitute two strongly supported clades (one containing MIKCc, the other containing MIKC* proteins). This suggests that the respective genes within each clade originated by relatively recent gene, chromosome, or genome duplications in the lineage that led to extant mosses. It cannot be completely ruled out, however, that the different loci within one clade are more ancient than it seems but were kept similar by gene conversion events.



View larger version (49K):
[in this window]
[in a new window]
 
Fig. 4.—Phylogenetic tree showing the relationships between some MIKCc-type and all MIKC*-type M-domain proteins known. Names of species from which the respective genes were isolated are given in parentheses behind the protein names. Proteins from the moss P. patens are shown by inverted boxes, the LAMB1 protein from the lycophyte L. annotinum is highlighted by a shaded box, and those of ferns by open boxes. All other genes have been isolated from seed plants (angiosperms or gymnosperms). The numbers next to some nodes give bootstrap percentages, shown only for relevant nodes (mainly those defining gene subfamilies or clades of special interest for this work). Subfamilies and gene classes (MIKC* or MIKCc) are labeled by brackets at the right margin

 
In addition, it is noteworthy that the phylogeny reconstructions could not be used to assign the P. patens MIKCc genes to any of the well-defined subfamilies of ferns or seed plant MADS-box genes (fig. 4 ). In figure 4 the P. patens MIKCc proteins form a sister clade to CRM1-like proteins from the fern Ceratopteris, but in several phylogeny reconstructions with different data sets, this was never supported by high bootstrap values (data not shown). The significance of the observation thus remains unclear. Therefore, orthology relationships between the MADS-box genes from the moss P. patens on the one hand and from seed plants on the other could not be identified.

All the features outlined here were not only supported by neighbor-joining trees as the one shown in figure 4 but also by maximum-likelihood trees (unpublished data).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Homology of P. patens MADS-box Genes
We report here the cDNA and genomic sequences of seven different MADS-box genes from the moss model system P. patens. Their conceptual gene products are all recognized as MIKC-type proteins (fig. 1 ). As suggested by their characteristic and unique domain structure, all MIKC-type genes are probably homologous genes which originated from a single common ancestor gene; they were probably generated by gene duplications, sequence diversification, and fixation events, whereas domain swapping played only a minor role, if any (Theißen, Kim, and Saedler 1996Citation ; Theißen et al. 2000Citation ). This implies that the MADS-box genes reported here are genuine homologs of the many well-characterized MADS-box genes from flowering plants, including almost all of the floral homeotic genes and some of the floral meristem identity genes. However, by phylogeny reconstructions, these genes could not be assigned to any of the well-defined MIKC-type gene subfamilies known from seed plants (fig. 4 ; Theißen, Kim, and Saedler 1996Citation ; Becker et al. 2000Citation ). Orthology between the moss genes and any MADS-box gene (or clades of genes) from seed plants could not thus be identified here. The long-independent evolutionary history of mosses and seed plants, and many independent gene duplications both in the lineage that led to extant mosses and in that which led to extant seed plants, may well have obscured any traces of orthology.

A Novel Class of MIKC-type Genes
Despite the homology of all the P. patens MADS-box genes reported here, the structures of four proteins PPM3, PPM4, PpMADS2, and PpMADS3 showed some unusual features, which we interpret as a longer I-domain and a less strict conservation of the length and the hydrophobic amino acids of the K-domain. Phylogeny reconstructions (fig. 4 ) corroborate the view that the respective genes, together with LAMB1 from the lycophyte L. annotinum, are a hitherto unknown class of MIKC-type genes, termed MIKC*-type genes. Sequence comparisons and the analysis of exon-intron structures (figs. 1 and 3 ) suggest that an extended I-domain, encoded by four or five rather than one exon (as in the case of MIKCc genes), is the most characteristic feature of the MIKC*-type genes.

Are MIKC*-type Genes Restricted to "Lower" Land Plants?
PPM3, PPM4, PpMADS2, PpMADS3 and LAMB1 are the only MIKC*-type genes that have been identified so far, implying that this class of genes has not been reported so far for ferns and seed plants (fig. 5 ). This raises the intriguing question as to whether MIKC*-type genes are really absent in these taxa or whether they have just not been identified yet. Given the intensive characterization of the MADS-box gene family in some ferns, gymnosperms, and flowering plants (for references, see Introduction and Theißen et al. 2000Citation ), we think it is quite unlikely that MIKC* genes exist in these taxa but have not been isolated or recognized yet. However, a low expression level could have helped to hide MIKC* genes, because most of the MADS-box genes were first identified at the cDNA level.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5.—The ancestry of MIKCc- and MIKC*-type genes in the evolution of land plants. A phylogenetic tree of some major taxa of land plants is shown; the topology of the tree is according to recent publications (Kenrick and Crane 1997Citation ; Nickrent et al. 2000Citation ; Pryer et al. 2001Citation ). Seed plants comprise angiosperms (flowering plants) and gymnosperms. Together with ferns (including allies such as horsetails and whisk ferns) and the lycophytes, they constitute the vascular plants. At the terminal branches the identification of MIKCc and MIKC* genes in representative taxa is indicated. At the root of the tree a minimal set of MADS-box genes that was very likely present in the last common ancestor of mosses and vascular plants is shown. Branches are not drawn to scale. The separation of the lineages that led to extant mosses from the other taxa (vascular plants) occurred about 450 MYA. The separation of the lineage that led to extant ferns from the lineage that led to seed plants occurred about 400 MYA, as revealed by fossil and molecular evidence

 
In the case of the flowering plant Arabidopsis thaliana, where almost the complete genome has been sequenced, the presence or absence of MIKC* genes can now be checked more rigorously. However, there are still limitations with respect to the identification of these genes. MIKC* genes may be recognized by their membership in the respective phylogenetic cluster or by the number of exons encoding the I-domain. Both require reliable information about the exon-intron and domain structures of the genes, which is available for many, but by far not all, A. thaliana MADS-box genes. The computer prediction of the short exons in the I- and K-regions of MIKC-type genes often fails (unpublished data), so that in general a cDNA sequence is required to accurately determine the exon-intron structure of an MIKC-type gene.

There are at least 82 MADS-box genes in the A. thaliana genome (Riechmann et al. 2000Citation ), at least 32 of which are Type II (plant MIKC or animal and fungal MEF2) genes and thus have to be considered here (Alvarez-Buylla et al. 2000bCitation ). For all of these, MADS- and K-domains could be predicted with sufficient accuracy, aligned, and used for phylogeny reconstructions analogous to the one shown in figure 4 . None of these was identified as an MIKC*-type protein (unpublished data). Thus, there is no evidence so far that corresponding genes exist in A. thaliana. More rigorous statements about the absence or presence of MIKC*-type genes in A. thaliana require the determination of the exon-intron structure for all the genes in the genome.

An Ancient Duplication of an MIKC-type Gene
The data presented here demonstrate that MIKCc-type as well as MIKC*-type genes are present in both a moss and a clubmoss (lycophyte). Recently, dramatic progress has been made in understanding the relationships between the major land plant taxa, including mosses and clubmosses (fig. 5 ). There is evidence that there are three clades of extant vascular plants, lycophytes, seed plants, and ferns plus allies such as horsetails (equisetophytes) and whisk ferns (psilophytes) (Pryer et al. 2001Citation ). Ferns (sensu lato) and seed plants form a clade, so that lycophytes appear as the most basal vascular plants; mosses represent an outgroup of the vascular plants (fig. 5 ). The presence of MIKCc and MIKC* genes in both mosses and clubmosses thus strongly suggests that the last common ancestor of mosses and vascular plants, that probably existed about 450 MYA, contained already an MIKCc as well as an MIKC* gene (fig. 5 ). Our studies, therefore, reveal an ancient duplication of an MIKC-type gene that occurred before the separation of the lineages that led to extant mosses and vascular plants more than about 450 MYA. But which kind of MIKC gene was duplicated at that time?

Are MIKCc or MIKC* Genes Ancestral?
Because MIKCc as well as MIKC* genes have been isolated from both mosses and vascular plants, it is difficult to say which class of genes is ancestral. In principle, MIKC* genes may have originated from MIKCc genes, or vice versa, or both genes may have originated from a common ancestor which was neither a clear MIKCc nor an MIKC* gene. We will restrict our considerations here mainly to the I-domain, the most important character distinguishing both gene types. Svensson, Johannesson, and Engström (2000)Citation argued that the MIKC* genes may represent the ancestral gene type and that MIKCc genes evolved under the loss of three exons in the I-region. However, the fact that MIKCc, but not MIKC*, genes have been isolated from a grade of green algae (Y. Tanabe, M. Hasebe, and M. Ito, personal communication) makes it appear much more likely that MIKCc rather than MIKC* genes are ancestral. Together with the phylogeny reconstructions shown in figure 4 , this would imply that the MIKC* genes are monophyletic (a clade), whereas the MIKCc genes are paraphyletic. It cannot be completely ruled out, however, that MIKC* genes have just been overlooked in algae so far. Under the hypothesis that MIKCc genes are ancestral, an increase in the number of I-domain exons from one to four or five has to be postulated. A formal third possibility postulates the root between the MIKC* and the MIKCc genes. In that case, it seems also not unreasonable that the most recent common ancestor of MIKCc and MIKC* genes had neither one nor four I-domain exons. Rather, it may have had two, a relatively long and a short exon, and duplication of that arrangement may have led to the situation in extant MIKC* genes, whereas loss of the short exon occurred in the lineage that led to extant MIKCc genes. However, sequence similarity between the different exons of the I-domain does not support this scenario. In contrast, similarity between the first and second exon of the I-domain of PpMADS2 and PpMADS3 indicates that these originated by an exon duplication, implying that four rather than five I-domain exons is the ancestral state within the MIKC* genes.

Have MIKC* Genes Been Lost During the Evolution of Higher Vascular Plants?
If MIKC*-type genes are really absent from extant ferns and seed plants and are not just awaiting isolation from these taxa, this class of genes must have been lost in the lineage that led to extant ferns (sensu lato) and seed plants; alternatively (but less likely), this class of genes was lost several times independently, e.g., at the base of extant ferns and seed plants. It can be assumed that not many, probably only one, MIKC*-type genes were present in the respective lineage, to make the loss of a complete class of genes a likely event. Even then, it raises the intriguing question as to what function MIKC* genes had and have, so that this gene type was conserved in mosses and clubmosses for about 450 MYA but became dispensable in the "higher" vascular plants.

Definition, Ancestry, and Functional Importance of the K-domain
The K-domain is a domain of about 70–amino acids roughly spanning positions 110–180 of MIKCc-type M-domain proteins. It is shown here that MIKC*-type proteins too contain a K-domain and that the K-box was thus combined with the MADS-box before the gene duplication which generated both classes of genes more than 450 MYA. In line with this, MIKC-type genes with bona fide K-domains have even been isolated from charophyte algae (Y. Tanabe, M. Hasebe, and M. Ito, personal communication), suggesting that this gene type existed even before green plants colonized the land.

Alvarez-Buylla et al. (2000b)Citation have maintained that some Type II MADS-box genes (comprising MIKCc-type genes according to our definition and MEF2-like genes from fungi and animals) from Arabidopsis, such as AGL12 and AGL25-like genes, do not contain a K-box. They concluded that the K-box was acquired by an MADS-box gene during the course of Type II gene evolution, after the gene lineages that led to AGL12 and AGL25-like genes had already split off, because phylogeny reconstructions suggested that these genes were basal within the MIKCc gene tree (Alvarez-Buylla et al. 2000bCitation ).

However, one should note that different definitions of K-domains are used in different studies. The K-domain (or box) was originally defined as a domain in some early-identified M-domain proteins "which has a low but significant similarity to a portion of keratin sequences" (Ma, Yanofsky, and Meyerowitz 1991Citation ). It was known that the region of keratin with similarity to the K-domains of the plant proteins is part of the coiled-coil sequence that constitutes the central rod-shaped domain of keratin, and thus it was postulated (but not demanded by definition!) that the K-domain adopts a coiled-coil structure. It was noted that K-domains can potentially form two amphipathic helices (Ma, Yanofsky, and Meyerowitz 1991Citation ) because of a regular spacing of hydrophobic amino acids. In the following years, sequence similarity to the early-identified K-domains and the regular spacing of hydrophobic amino acids were taken as criteria for identifying K-domains by other researchers, including ourselves (e.g., Davies and Schwarz-Sommer 1994Citation ; Rounsley, Ditta, and Yanofsky 1995Citation ; Theißen et al. 1995Citation ; Münster et al. 1997Citation ), partly because these criteria, which also clearly apply to AGL12 (fig. 1 ), are relatively straightforward.

Whether the similarity between keratin and K-domains reflects common ancestry (homology) is unclear. We can be quite confident, however, that the similarity among the different K-domains of plant MADS-box genes is based on homology, because phylogenetic reconstructions employing M-domain sequences or I- and K-domain sequences give almost the same results (Theißen, Kim, and Saedler 1996Citation ). Thus, the original definition of the K-box (or domain) refers to a homologous stretch of DNA (or protein). We stick to that definition here because homology is certainly an appropriate concept in evolutionary considerations. For example, according to our definition and findings, there were two sequence elements, M-domain and K-domain, present in the most recent ancestor of MIKCc and MIKC* genes more than 450 MYA that have since then been coinherited through many gene duplications, sequence divergence, fixation, and speciation events, which yielded extant MIKC gene diversity (including AGL12 and AGL25-like genes). Nevertheless, there may have been rare cases of recombination (Alvarez-Buylla et al. 2000bCitation ). Note that these statements are independent of the actual higher-order structure that the different K-domains adopt.

In contrast, Alvarez-Buylla et al. (2000b)Citation use the formation of a coiled-coil structure as the defining character of a K-domain. However, because few mutations may be able to change that structure (as suggested for AGL12), this may not yield an especially stable phylogenetic signal. But even worse, there is not one K-domain for which the structure has been determined by physical methods, so that there is currently no way to assess the reliability of the computer methods used. And computer predictions of protein structures are notoriously unreliable when applied to novel types of sequences, as in this case, so that it may appear premature to switch already to a novel definition of the K-domain.

But let us assume that computer modeling is able to accurately predict the structure of K-domain–like sequences and use the formation of a coiled-coil structure for a definition of the K-domain. Even that would not imply a relatively recent origin of the M- and K-domain connection because preliminary attempts to root the MIKCc gene tree employing algae sequences did not place AGL12 near the base of the tree (Y. Tanabe, M. Hasebe, and M. Ito, personal communication). Interestingly, the programs that do not predict coiled-coil structures in AGL12 strongly predict such structures in all MIKC*-type proteins except LAMB1 (unpublished data). All this would argue that the situation in AGL12 (and in possibly some other proteins) is derived rather than basal.

One should also take into consideration, however, that the computer programs predicting coiled-coils may need some refinements, especially when applied to principally novel types of sequences. Physical determination of K-domain structures (e.g., via X-ray diffraction or high-resolution NMR) will be required to clarify that case.

Although the three-dimensional structure of the K-domain is highly speculative, its functional importance is more clear. A number of studies have demonstrated that the K-domain is important for the formation of protein dimers (Schwarz-Sommer et al. 1992Citation ; Tröbner et al. 1992Citation ; Riechmann, Krizek, and Meyerowitz 1996Citation ; Pelaz et al. 2001Citation ). Given the differences between P. patens MIKCc and MIKC* proteins, it could well be that the two protein classes have different dimerization specificities. For example, it might be that MIKC*-type proteins dimerize only with each other, and likewise MIKCc-type proteins only with MIKCc-type proteins, or that MIKC*-type proteins heterodimerize with non–M-domain proteins, something that has not been published so far for MIKCc-type proteins. Gel retardation assays and the yeast two-hybrid system—techniques that have been successfully applied to study dimerization of MIKCc proteins (Fan et al. 1997Citation ; Egea-Cortines, Saedler, and Sommer 1999Citation )—could be used to test the dimerization behavior of MIKC*-type proteins.

Alternative Processing of PPM2 Transcripts
The cDNA sequence of PPM2 reported here (PPM2c) differs from the two forms cloned by Krogan and Ashton (2000)Citation , who named these isoforms PPM2 and PPM2b. Throughout the coding region, it is identical with PPM2, but in the downstream part of the 3' UTR, PPM2c is identical to PPM2b. The sequence diversity in the 3' region can be explained by two elementary events: (1) an alternative splicing of the splice or nonsplice type, concerning the 10th intron (numbering as described; Krogan and Ashton 2000Citation ); and (2) as a consequence of intron 10 retention, an alternative 3' processing and polyadenylation site within intron 10 is used. The isolation of three different cDNA isoforms (PPM2, PPM2b, PPM2c) corresponding to the PPM2 gene thus documents alternative splicing and differential processing in the 3' region. Extensive alternative splicing is well known for the MEF2-like MADS-box genes from animals (Black and Olson 1998Citation ) but has been only rarely described for plant MIKC-type MADS-box genes. However, the taxa in which alternative splicing of MIKC-type genes has been observed include ferns and seed plants (Kyozuka et al. 1997Citation ; Theißen et al. 2000Citation ; Gocal et al. 2001Citation ) as well as mosses (Krogan and Ashton 2000Citation ; this work), suggesting that this is an ancient and important phenomenon. Its functional relevance, however, has remained elusive.

MADS-box Gene Networks in Mosses?
An analysis of the putative promoter region of PPM2 showed that it contains a putative CArG-box motif (Schwarz-Sommer et al. 1992Citation ) known to bind M-domain proteins 394 bp upstream of the translation start point, ATG. Moreover, there are consensus binding sites (CCANTG) for FLORICAULA/LEAFY-like proteins (Busch, Bomblies, and Weigel 1999Citation ) 1,095 and 1,373 bp upstream of the starting point of translation (unpublished data), which may indicate that PPM2 transcription is controlled by M-domain proteins and FLORICAULA/LEAFY-like proteins. In addition, all genes reported here encode K-domains which may promote protein-protein interactions with other proteins containing K-domains. This and the high number of at least five, but more likely even more, MADS-box genes of MIKCc and MIKC* type each in P. patens, as indicated by Southern hybridization experiments under conditions of moderate stringency (unpublished data), suggest that MADS-box genes in mosses are involved in complex gene regulatory networks similar to those in flowering plants.

Implications for Functional Studies
The functions of the genes reported here still have to be determined. The fact that P. patens is the moss model system (Cove and Knight 1993Citation ; Reski 1999Citation ; Schaefer 2001Citation ) should seriously facilitate the respective attempts. Not only can P. patens be stably transformed and is amenable to a lot of other techniques of molecular genetics, but it is actually the only land plant so far in which gene knock-out and allele replacement approaches via homologous recombination are directly accessible to plant development study (Schaefer and Zryd 1997Citation ). However, the fact that there are several quite similar and closely related genes in P. patens (both MIKCc and MIKC* types; figs. 1 and 4 ) makes it conceivable that these genes have completely or partially redundant functions. Functional redundancy is well known for closely related MADS-box genes in the flowering plant A. thaliana, such as CAULIFLOWER and APETALA1, the SHATTERPROOF genes (SHP1 and SHP2), and the SEPALLATA genes (SEP1, SEP2, SEP3); in several cases, single-gene knock-outs did not show any obvious phenotype (Kempin, Savidge, and Yanofsky 1995Citation ; Liljegren et al. 2000Citation ; Pelaz et al. 2001Citation ). It could well be, therefore, that more than one of the P. patens MADS-box genes reported here has to be knocked-out to obtain a mutant phenotype. Preliminary evidence that this is indeed the case has already been obtained (unpublished data).

Anyway, even more laborious efforts to determine the functions of P. patens MIKC genes may be adequate and eventually revealing: the phenotypes of P. patens MIKCc gene knock-outs may give us no less than a clue about the ancestral functions of genes whose extant relatives are now involved in very special and important ontogenetic processes such as generating flowers. Because flowers are the hallmark of the flowering plants, the developmental processes governed by MIKCc genes are of crucial importance for the architecture of the flowering plant sporophyte, which is very different from the structure of both the moss sporophyte and the dominating moss gametophyte. Thus, by studying the functions of MIKCc genes in P. patens and comparing them with those of their flowering plant homologs, fascinating new insights into the evolution of developmental control gene functions may be obtained (Theißen, Münster, and Henschel 2001Citation ). Moreover, because the clubmoss L. annotinum may not be a suitable model system for functional studies, P. patens may currently provide the only chance to determine the functions of MIKC* genes.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank René Strepp and Ralf Reski for a cDNA phage library representing P. patens protonema mRNA. We also thank the Automatic DNA Isolation and Sequencing team of the MPIZ for some of the sequencing work. This research was partly supported by grants from the Ministry of Education, Science, Culture, and Sports, Japan (M.H., R.K.).


    Footnotes
 
Diethard Tautz, Reviewing Editor

1 Present address: Lehrstuhl for Genetics, Friedrich Schiller University, Philosophenweg 12, D-07743 Jena, Germany Back

Abbreviations: aa, amino acid(s); C-domain, C-terminal domain; I, intervening; K, keratin-like; M-domain, MADS domain; RACE, rapid amplification of cDNA ends; RAGE, rapid amplification of genomic ends; UTR, untranslated region. Back

Keywords: MADS-box gene Physcomitrella moss evolution Back

Address for correspondence and reprints: Günter Theißen, Lehrstuhl for Genetics, Friedrich Schiller University, Philosophenweg 12, D-07743 Jena, Germany. guenter.theissen{at}uni-jena.de . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Alvarez-Buylla E. R., S. J. Liljegren, S. Pelaz, S. E. Gold, C. Burgeff, G. S. Ditta, F. Vergara-Silva, M. F. Yanofsky, 2000a. MADS-box gene evolution beyond flowers: expression in pollen, endosperm, guard cells, roots and trichomes Plant J 24:457-466[ISI][Medline]

    Alvarez-Buylla E. R., S. Pelaz, S. J. Liljegren, S. E. Gold, C. Burgeff, G. S. Ditta, L. R. De Pouplana, L. Martinez-Castilla, M. F. Yanofsky, 2000b. An ancestral MADS-box gene duplication occurred before the divergence of plants and animals Proc. Natl. Acad. Sci. USA 97:5328-5333[Abstract/Free Full Text]

    Becker A., K.-U. Winter, B. Meyer, H. Saedler, G. Theißen, 2000 MADS-box gene diversity in seed plants 300 million years ago Mol. Biol. Evol 17:1425-1434[Abstract/Free Full Text]

    Bhattacharya D., L. Medlin, 1998 Algal phylogeny and the origin of land plants Plant Physiol 116:9-15[Free Full Text]

    Black B. L., E. N. Olson, 1998 Transcriptional control of muscle development by myocyte enhancer factor-2 (MEF2) proteins Annu. Rev. Cell Dev. Biol 14:167-196[ISI][Medline]

    Busch M. A., K. Bomblies, D. Weigel, 1999 Activation of a floral homeotic gene in Arabidopsis Science 285:585-587[Abstract/Free Full Text]

    Carroll S. B., J. K. Grenier, S. D. Weatherbee, 2001 From DNA to diversity Blackwell Science, Malden, Mass

    Chomczynski P., N. Sacchi, 1987 Single-step method of RNA-isolation by acid guanidinium thiocyanate phenol-chloroform extraction Anal. Biochem 162:156-159[ISI][Medline]

    Cormack R. S., I. E. Somssich, 1997 Rapid amplification of genomic ends (RAGE) as a simple method to clone flanking genomic DNA Gene 194:273-276[ISI][Medline]

    Cove D. J., C. D. Knight, 1993 The moss Physcomitrella patens, a model system with potential for the study of plant reproduction Plant Cell 5:1483-1488[Free Full Text]

    Davidson E. H., 2001 Genomic regulatory systems Academic Press, San Diego, Calif

    Davies B., Z. Schwarz-Sommer, 1994 Control of floral organ identity by homeotic MADS-box transcription factors Pp. 235–258 in L. Nover, ed. Results and problems in cell differentiation, Vol. 20. Plant promoters and transcription factors. Springer, Berlin

    Egea-Cortines M., H. Saedler, H. Sommer, 1999 Ternary complex formation between the MADS-box proteins SQUAMOSA, DEFICIENS and GLOBOSA is involved in the control of floral architecture in Antirrhinum majus EMBO J 18:5370-5379[Abstract/Free Full Text]

    Fan H. Y., Y. Hu, M. Tudor, H. Ma, 1997 Specific interactions between the K domains of AG and AGLs, members of the MADS domain family of DNA binding proteins Plant J 12:999-1010[ISI][Medline]

    Frohman M. A., M. K. Dush, G. R. Martin, 1988 Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer Proc. Natl. Acad. Sci. USA 85:8998-9002[Abstract]

    Gilbert S. F., J. M. Opitz, R. A. Raff, 1996 Resynthesizing evolutionary and developmental biology Dev. Biol 173:357-372[ISI][Medline]

    Gocal G. F. W., R. W. King, C. A. Blundell, O. M. Schwartz, C. H. Andersen, D. Weigel, 2001 Evolution of floral meristem identity genes. Analysis of Lolium temulentum genes related to APETALA1 and LEAFY of Arabidopsis Plant Physiol 125:1788-1801[Abstract/Free Full Text]

    Graham L. E., L. W. Wilcox, 2001 Algae Prince Hall, New Jersey

    Hasebe M., 1999 Evolution of reproductive organs in land plants J. Plant Res 112:463-474[ISI]

    Hasebe M., J. A. Banks, 1997 Evolution of MADS gene family in plants Pp. 179–197 in K. Iwatsuki and P. H. Raven, eds. Evolution and diversification of land plants. Springer-Verlag, Tokyo

    Hasebe M., C.-K. Wen, M. Kato, J. A. Banks, 1998 Characterization of MADS homeotic genes in the fern Ceratopteris richardii Proc. Natl. Acad. Sci. USA 95:6222-6227[Abstract/Free Full Text]

    Honma T., K. Goto, 2001 Complexes of MADS-box proteins are sufficient to convert leaves into floral organs Nature 409:525-529[ISI][Medline]

    Huijser P., J. Klein, W. E. Lönnig, H. Meijer, H. Saedler, H. Sommer, 1992 Bractomania, an inflorescence anomaly, is caused by the loss of function of the MADS-box gene squamosa in Antirrhinum majus EMBO J 11:1239-1249[Abstract]

    Kempin S. A., B. Savidge, M. F. Yanofsky, 1995 Molecular basis of the cauliflower phenotype in Arabidopsis Science 267:522-525[ISI][Medline]

    Kenrick P., P. R. Crane, 1997 The origin and early evolution of plants on land Nature 389:33-39[ISI]

    Kofuji R., K. Yamaguchi, 1997 Isolation and phylogenetic analysis of MADS genes from the fern Ceratopteris richardii J. Phytogeogr. Taxon 45:83-91

    Krogan N. T., N. W. Ashton, 2000 Ancestry of plant MADS-box genes revealed by bryophyte (Physcomitrella patens) homologues New Phytol 147:505-517[ISI]

    Kyozuka J., R. Harcourt, W. J. Peacock, E. S. Dennis, 1997 Eucalyptus has functional equivalents of the Arabidopsis AP1 gene Plant Mol. Biol 35:573-584[ISI][Medline]

    Liljegren S. J., G. S. Ditta, Y. Eshed, B. Savidge, J. L. Bowman, M. F. Yanofsky, 2000 SHATTERPROOF MADS-box genes control seed dispersal in Arabidopsis Nature 404:766-770[ISI][Medline]

    Ma H., M. F. Yanofsky, E. M. Meyerowitz, 1991 AGL1-AGL6, an Arabidopsis gene family with similarity to floral homeotic and transcription factor genes Genes Dev 5:484-495[Abstract]

    Münster T., J. Pahnke, A. Di Rosa, J. T. Kim, W. Martin, H. Saedler, G. Theißen, 1997 Floral homeotic genes were recruited from homologous MADS-box genes preexisting in the common ancestor of ferns and seed plants Proc. Natl. Acad. Sci. USA 94:2415-2420[Abstract/Free Full Text]

    Nickrent D. L., C. L. Parkinson, J. D. Palmer, R. J. Duff, 2000 Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants Mol. Biol. Evol 17:1885-1895[Abstract/Free Full Text]

    Nishiyama T., M. Kato, 1999 Molecular phylogenetic analysis among bryophytes and tracheophytes based on combined data of plastid coded genes and the 18S rRNA gene Mol. Biol. Evol 16:1027-1036[Abstract]

    Pelaz S., C. Gustafson-Brown, S. E. Kohalmi, W. L. Crosby, M. F. Yanofsky, 2001 APETALA1 and SEPALLATA3 interact to promote flower development Plant J 26:385-394[ISI][Medline]

    Pryer K. M., H. Schneider, A. R. Smith, R. Cranfill, P. G. Wolf, J. S. Hunt, S. D. Sipes, 2001 Horesetails and ferns are a monophyletic group and the closest living relatives to seed plants Nature 409:618-622[ISI][Medline]

    Purugganan M. D., 1997 The MADS-box floral homeotic gene lineages predate the origin of seed plants: phylogenetic and molecular clock estimates J. Mol. Evol 45:392-396[ISI][Medline]

    Qiu Y.-L., J. D. Palmer, 1999 Phylogeny of early land plants: insights from genes and genomes Trends Plant Sci 4:26-30[ISI][Medline]

    Reski R., 1999 Molecular genetics of Physcomitrella Planta 208:301-309[ISI]

    Riechmann J. L., J. Heard, G. Martin, et al. (14 co-authors) 2000 Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes Science 290:2105-2110[Abstract/Free Full Text]

    Riechmann J. L., B. A. Krizek, E. M. Meyerowitz, 1996 Dimerization specificity of Arabidopsis MADS domain homeotic proteins APETALA1, APETALA3, PISTILLATA, and AGAMOUS Proc. Natl. Acad. Sci. USA 93:4793-4798[Abstract/Free Full Text]

    Riechmann J. L., E. M. Meyerowitz, 1997 MADS domain proteins in plant development Biol. Chem 378:1079-1101

    Rounsley S. D., G. S. Ditta, M. F. Yanofsky, 1995 Diverse roles for MADS box genes in Arabidopsis development Plant Cell 7:1259-1269[Abstract/Free Full Text]

    Sambrook J., E. F. Fritsch, T. Maniatis, 1989 Molecular cloning: a laboratory manual. 2nd edition Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY

    Schaefer D. G., 2001 Gene targeting in Physcomitrella patens Curr. Opin. Plant Biol 4:143-150[ISI][Medline]

    Schaefer D. G., J.-P. Zryd, 1997 Efficient gene targeting in the moss Physcomitrella patens Plant J 11:1195-1206[ISI][Medline]

    Schwarz-Sommer Z., I. Hue, P. Huijser, P. J. Flor, R. Hansen, F. Tetens, W. E. Lönnig, H. Saedler, H. Sommer, 1992 Characterization of the Antirrhinum floral homeotic MADS-box gene deficiens: evidence for DNA binding and autoregulation of its persistent expression throughout flower development EMBO J 11:251-263[Abstract]

    Schwarz-Sommer Z., P. Huijser, W. Nacken, H. Saedler, H. Sommer, 1990 Genetic control of flower development by homeotic genes in Antirrhinum majus Science 250:931-936[ISI]

    Siebert P. D., A. Chenchik, D. E. Kellogg, K. A. Lukyanov, S. A. Lukyanov, 1995 An improved PCR method for walking in uncloned genomic DNA Nucleic Acids Res 23:1087-1088[ISI][Medline]

    Svensson M. E., H. Johannesson, P. Engström, 2000 The LAMB1 gene from the clubmoss, Lycopodium annotinum, is a divergent MADS-box gene, expressed specifically in sporogenic structures Gene 253:31-43[ISI][Medline]

    Theißen G., 2001 Development of floral organ identity: stories from the MADS house Curr. Opin. Plant Biol 4:75-85[ISI][Medline]

    Theißen G., A. Becker, A. Di Rosa, A. Kanno, J. T. Kim, T. Münster, K.-U. Winter, H. Saedler, 2000 A short history of MADS-box genes in plants Plant Mol. Biol 42:115-149[ISI][Medline]

    Theißen G., J. Kim, H. Saedler, 1996 Classification and phylogeny of the MADS-box multigene family suggest defined roles of MADS-box gene subfamilies in the morphological evolution of eukaryotes J. Mol. Evol 43:484-516[ISI][Medline]

    Theißen G., T. Münster, K. Henschel, 2001 Why don't mosses flower? New Phytol 150:1-5[ISI]

    Theißen G., H. Saedler, 1995 MADS-box genes in plant ontogeny and phylogeny: Haeckel's ‘biogenetic law’ revisited Curr. Opin. Genet. Dev 5:628-639[ISI][Medline]

    ———. 1999 The golden decade of molecular floral development (1990–1999): a cheerful obituary Dev. Genet 25:181-193[ISI][Medline]

    ———. 2001 Floral quartets Nature 409:469-471[ISI][Medline]

    Theißen G., T. Strater, A. Fischer, H. Saedler, 1995 Structural characterization, chromosomal localization and phylogenetic evaluation of two pairs of AGAMOUS-like MADS-box genes from maize Gene 156:155-166[ISI][Medline]

    Tröbner W., L. Ramirez, P. Motte, I. Hue, P. Huijser, W. E. Lönnig, H. Saedler, H. Sommer, Z. Schwarz-Sommer, 1992 GLOBOSA: a homeotic gene which interacts with DEFICIENS in the control of Antirrhinum floral organogenesis EMBO J 11:4693-4704[Abstract]

    Vergara-Silva F., L. Martinez-Castilla, E. R. Alvarez-Buylla, 2000 MADS-box genes: development and evolution of plant body plans J. Phycol 36:803-812[ISI]

    Weigel D., E. M. Meyerowitz, 1994 The ABCs of floral homeotic genes Cell 78:203-209[ISI][Medline]

Accepted for publication January 2, 2002.