1 Department of Veterinary Pathobiology, University of Illinois at Urbana-Champaign, Urbana, IL 61802, USA
2 Department of Biological Sciences, University of Iowa, Iowa City, IA 52242, USA
Correspondence
Lois L. Hoyer
lhoyer{at}uiuc.edu
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The GenBank accession numbers for the sequences reported in this paper are AY227440 (ALS5-1), AY227439 (ALS5-2), AY269422 (ALS9-2), AY269423 (ALS9-1) and AY296650 (ALS7).
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Characterization of ALS genes is necessary to place functional observations into an appropriate context. Previous data have shown that sequence variability occurs at various levels within the ALS family. In addition to the variability in the tandem repeat domain discussed above, certain ALS genes also encode other repeated regions that differ between alleles; the best example is ALS7 that has a complex repeated region in its 3' domain (Hoyer & Hecht, 2000; Zhang et al., 2002
). Also, examination of various C. albicans isolates showed that certain strains are lacking ALS genes that are present in others (Hoyer & Hecht, 2000
, 2001
). Finally, allelic sequence polymorphisms can be found outside of repeated regions. These sequence differences are more pronounced within the 3' end of the gene, which encodes the heavily glycosylated portion of the mature protein (Hoyer et al., 1998b
; Hoyer & Hecht, 2001
). However, examination of two alleles of ALS5 from different strains indicated that sequence polymorphisms were also found in the 5' domain. Proteins encoded by these alleles have 2·8 % amino acid differences within the N-terminal domain. This figure was at least double the frequency of sequence differences observed for other C. albicans genes for which alleles had been sequenced (Hoyer & Hecht, 2001
). At this time, it is unknown whether these sequence differences are sufficient to result in altered Als protein function.
In C. albicans, ALS genes are found on three different chromosomes: chromosome 3 (ALS6 and ALS7), R (ALS3) and 6 (ALS1, ALS2, ALS4, ALS5; Wickes et al., 1991; Hoyer et al., 2001
). Additional mapping efforts showed that although several ALS genes are found on chromosome 6, ALS2 and ALS4 are on SfiI fragment C while ALS1 and ALS5 are on SfiI fragment O (Chu et al., 1993
; Hoyer et al., 1998b
). Work with a C. albicans genomic fosmid library indicated that ALS1 and ALS5 were found on the same fosmids and, therefore, located within approximately 50 kb of each other (Hoyer & Hecht, 2001
).
Indications of the existence of ALS9 were first found among preliminary data from the C. albicans genome project (http://www-sequence.stanford.edu/group/candida). Although this sequence has been recognized as an ALS gene (Hoyer, 2001), this manuscript is the first description of its structure. Genome annotation (http://genolist.pasteur.fr/CandidaDB) suggested that strain SC5314 encodes two divergent ALS9-like sequences. These sources also suggested that ALS5, ALS1 and ALS9 were contiguous in the C. albicans genome. The goal of the work presented here was to characterize the variability within the ALS9 coding region and to establish its physical location with respect to other ALS genes. Here we confirm the sequence divergence of the ALS9 alleles in strain SC5314 and demonstrate that ALS5, ALS1 and ALS9 are contiguous on C. albicans chromosome 6. We also examined the allelic diversity of ALS9 among a larger collection of C. albicans strains and the allelic arrangement of ALS genes on the homologous copies of chromosome 6. These data show that the sequence divergence observed for ALS9 alleles is much greater than that previously documented for other ALS genes and reinforce the conclusion that sequence divergence is not limited to the repeated regions of ALS genes.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
PCR reactions.
Throughout this manuscript, primers are designated by numbers and the corresponding sequence is listed in Table 2. Primers were synthesized by Integrated DNA Technologies (Coralville, IA, USA). PCR amplifications were performed using an I-Cycler (Bio-Rad). Reactions varied depending on the experimental goal and product size. In general, a typical reaction contained 200 ng genomic DNA, 1·5 mM (for fragments less than 2 kb) or 3 mM (for fragments larger than 2 kb) MgCl2, 2 µM each primer, 0·2 mM each dNTP and 2·5 units Taq (Invitrogen) or Pfu Turbo polymerase (Stratagene). Pfu polymerase was used for amplification of larger fragments or when a proofreading polymerase was needed to maintain sequence fidelity. PCR reactions were incubated at 95 °C for 5 min, followed by 2535 cycles of 95 °C denaturation for 30 s, primer annealing at 5057 °C for 45 s and elongation at 72 °C for 110 min depending on the length of expected products. Each reaction included a final 7 min extension at 72 °C to complete elongation of products.
|
Phylogenetic analysis.
Phylogenetic analysis utilized DNA sequences of the 5' domains (approx. 1300 nt) from each C. albicans ALS gene and the two alleles of ALS9. All gene sequences were from strain SC5314. Sequences were aligned using CLUSTALW (Combet et al., 2000) with a gap opening penalty of 10 and a gap extension penalty of 0·05. Phylogenetic analysis used PAUP (Phylogenetic Analysis Using Parsimony) in the Wisconsin Package. The maximum-parsimony (MP), distance with minimum-evolution (ME) model and maximum-likelihood (ML) criteria were employed. The F84 model was chosen for ME and ML analysis to ameliorate AT-rich biases for DNA data. The gamma model was used to correct site-to-site heterogeneity. The gamma parameter
was estimated from each dataset by PAUP. Amino acid sequences were translated from the nucleotide sequences using the alternate yeast genetic code (Santos & Tuite, 1995
). Amino acid sequences were aligned by specifying a gap opening penalty of 10 and a gap extension penalty of 0·1. The protein phylogenetic trees were constructed using parsimony and distance criteria, respectively. The best trees were searched by evaluating all possible trees using an exhaustive search for MP and ME analysis. An heuristic search was used for ML analysis. Bootstrap with 1000 replicates was performed for all the analyses. TREEVIEW software (Page, 1996
) was used for printing phylogenetic trees.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The ALS gene order was corroborated by construction of mutants that demonstrated ALS5 and ALS1 could be removed using a single deletion step. The deletion cassette was amplified from pDDB57 using primers specific for the region 5' of ALS5 and 3' of ALS1 (Fig. 2). Transformation of a deletion cassette amplified with primers 9 and 10 (Table 2
) removed either 12·5 (clone 2027) or 15 kb (clone 2025) of sequence from strain CAI4. The size differences in these bands were dependent on which alleles of ALS5 and ALS1 were removed. A similar cassette amplified with primers 15 and 16 (Table 2
) was transformed into CAI4 to delete ALS1 and ALS9 simultaneously. This transformation yielded clone 2166 in which 14·5 kb of sequence containing the ALS1-1 and ALS9-1 alleles was deleted (Fig. 3
). The strains created using this technique were evaluated further by PCR using primers 21 and 22 for clones 2027 and 2025 (Fig. 2
), and primers 20 and 23, for clone 2166 (Fig. 3
). In each case, the PCR primers amplified across the deleted region and produced fragments of the expected size. DNA sequencing of these products verified that the ALS5ALS1 deletions removed the region from -39 of ALS5 to +1 of ALS1. The ALS1ALS9 deletion removed the region from -57 of ALS1 to +30 of ALS9. Removal of each pair of genes in a single transformation step supported the conclusion that the ALS5, ALS1 and ALS9 genes were contiguous on C. albicans SC5314 chromosome 6.
|
|
Unlike some ALS genes, alleles of ALS9 from strain SC5314 could not be distinguished from each other based on agarose gel mobility (see below). The first ALS9 allele (ALS9-2) was isolated by PCR amplification using primers 24 and 25 (Table 2) and SC5314 genomic DNA (Fig. 4
). To isolate the second allele (ALS9-1), we created a strain lacking the previously isolated sequence. Strain 1976 was constructed by transformation of CAI4 with a disruption cassette amplified from plasmid pDDB57 using primers 26 and 16 and verified by Southern blotting (Fig. 4
). PCR primers 27 and 28 were used to amplify the second ALS9 allele from strain 1976 genomic DNA (Fig. 4
). Both ALS9-encoding fragments were cloned into the vector pCRBlunt (Invitrogen) for DNA sequence analysis.
|
The physical location of the two sequences within the SC5314 genome supported the conclusion that these divergent sequences encoded ALS9 alleles rather than different genes. Since the 5' domains of the two coding regions were divergent (see below), we designed primers specific for each coding region (primer 29 for ALS9-1 and primer 30 for ALS9-2; Table 2; Fig. 4
). When paired with primer 3 (hybridizes within the 3' end of ALS1; Fig. 1
), the ALS9-specific primers each amplified the expected 2·9 kb fragment from SC5314 genomic DNA (Fig. 5
, lanes 1 and 2). However, amplification of genomic DNA from strain 2166 only yielded a product with primer 30, demonstrating that the primer 29-specific sequences were absent in this strain (Fig. 5
, lanes 3 and 4). The converse result was obtained with genomic DNA from strain 1976, where only the primer 3/primer 29 pair generated a product (Fig. 5
, lanes 5 and 6). These results provide further evidence that the two ALS9 sequences are allelic and that each is located 2·8 kb downstream of the ALS1 coding region on homologous copies of chromosome 6.
|
|
|
|
|
One heterozygous ALS9-1/ALS9-2 strain from each genetic clade (P52037, P22095, P52098, P22059 and OKP109) was selected for DNA sequencing of a portion of the 5' domain from each allele. Primers 41 and 42 were used to amplify ALS9-1 and primers 43 and 44 amplified ALS9-2 (Fig. 6). Alignment of the resulting sequences indicated that, for each allele, there were only 5 nt changes across the approximately 1050 bp of sequence derived from each strain (data not shown). Therefore, although the 5' domain of the ALS9-1 and ALS9-2 groups of alleles are quite divergent from each other, their sequences are highly conserved across a diverse group of strains. For comparative purposes, the same analysis was performed for ALS5, since non-tandem-repeat sequence differences in this gene had previously represented the greatest allelic sequence divergence (Hoyer & Hecht, 2001
). In the previous work, the sequence differences were between alleles from strains 1161 (GenBank accession number AF068866) and CA1 (GenBank accession number AF025429). In addition to the strains from which ALS9 alleles were analysed above, one additional strain from each genetic clade was added to the analysis (K221, P57047, P57069, P57096 and OKP77). PCR amplification and DNA sequence analysis of approximately the first 850 bp of the ALS5 coding region showed that the alleles from 1161 and CA1 are the most divergent (19 nt changes) and that in positions of nucleotide changes, the sequence of the CA1 allele was more representative of the strains examined than the allele from 1161 (data not shown). Only the sequence from strain P52098 more closely resembled that from 1161. These data support the conclusion that the ALS9 allelic sequence divergence is greater than that previously observed for other ALS genes.
Phylogenetic analysis of ALS family
Across the ALS family, the 5' domain is the most conserved portion of the genes in both length and sequence (Hoyer, 2001). By analogy to the structure of S. cerevisiae
-agglutinin (Chen et al., 1995
), this region of the protein is most likely to be involved in adhesive interactions and, therefore, carry functional significance. Sequence alignment showed that the 5' domains of ALS9-1 and ALS9-2 were only 89 % identical. The significance of this observation is brought into perspective considering that the 5' domains of ALS1 and ALS5 are 90 % identical. On the amino acid level, the N-terminal domains of Als9-1p and Als9-2p are 84 % identical compared to the 87 % identity between Als1p and Als5p and the 85 % identity between Als1p and Als3p. Therefore, the ALS9 allelic sequences and their encoded proteins are less similar than sequences from different loci in the ALS family. To better understand the relationship between the ALS9 alleles and their relationship to the remainder of the ALS genes, we conducted a phylogenetic analysis.
Phylogenetic analysis utilized DNA sequences from the 5' domains of each C. albicans ALS gene and the two alleles of ALS9. All sequences were from strain SC5314. The DNA dataset generated a single identical tree (Fig. 9) using MP, distance with ME (distance measured by F84 model) and ML (gamma model,
=0·7). This tree was based on 1320 characters of which 602 were constant and 558 were parsimony-informative. The parsimony tree length was 1341 with a consistency index of 0·76, an ME score of 0·88 and -ln likelihood of 7393. In this tree, ALS9-1 and ALS9-2 grouped together with 100 % bootstrap support. Protein sequence phylogenetic trees were constructed using parsimony and distance criteria, respectively. The protein dataset produced three most parsimonious trees and a single distance tree that was identical to one of three best parsimony trees. These trees were similar to the best DNA tree in the major branch pattern. Als9-1p and Als9-2p formed a clade with 100 % bootstrap support in all these trees. These results indicated that, despite their large sequence differences, the two ALS9 alleles were more similar to each other than to any other ALS genes.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Clues to the existence of ALS9 were first found among data from the C. albicans genome sequencing project (http://www-sequence.stanford.edu/group/candida) and the potential for allelic divergence was evident in the CandidaDB annotation (http://genolist.pasteur.fr/CandidaDB). Data from our studies have refined the information from these sources. Within the current release of the genome sequence (assembly 19), the assembly of individual ALS alleles was largely correct, but their organization onto homologous copies of chromosome 6 was not. In addition, our PCR data indicated that the region upstream of ALS5 is variable in length between alleles and the longer sequence we detected was absent from the genome assembly. The high degree of allelic sequence variability in these loci contribute to the challenge of accurate diploid sequence assembly.
Previously characterized examples demonstrated that different manifestations of allelic diversity are found in C. albicans. Indications of allelism in C. albicans were initially deduced using a spheroplast fusion approach to studying resistance to 5-fluorocytosine (Whelan et al., 1986; Whelan, 1987
). Molecular biological techniques have offered several examples of C. albicans allelism, including detection of non-functional alleles in a study of the ERG3 gene (Miyazaki et al., 1999
). In this work, nucleotide changes led to a premature stop codon in one allele and three amino acid changes in another that rendered both incapable of producing functional C-5 sterol desaturase. Allelic sequence variation has been detected outside coding regions as demonstrated by Yesland & Fonzi (2000)
who showed that nucleotide differences in regions flanking PHR1 were responsible for biased targeting of disruption cassettes during mutant construction. Staib et al. (2002)
documented allelic sequence variation in the promoter region of SAP2 and showed that the divergent sequences resulted in differential regulation of the alleles. To date, the largest published example of C. albicans allelic variation is between the a and alpha alleles of PAP, OBP and PIK in the mating type locus, where alleles are only approximately 60 % identical (Hull & Johnson, 1999
). The net effect of this sequence divergence on protein function has yet to be determined.
The strong allelic variation observed for ALS9 may suggest distinct functional specificities. Sequence differences between the alleles are not random and it appears that the two alleles evolved independently, perhaps for different functions. Evidence presented here also highlights the potential for recombination between the alleles, suggesting that ALS9-1 and ALS9-2 may not be completely independent at this time. Understanding the effect of sequence differences on protein function is of primary importance as we continue to characterize the ALS family. Future work using the strains and approaches presented here will explore these relationships.
![]() |
ACKNOWLEDGEMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Chen, M. H., Shen, Z. M., Bobin, S., Kahn, P. C. & Lipke, P. N. (1995). Structure of Saccharomyces cerevisiae alpha-agglutinin. Evidence for a yeast cell wall protein with multiple immunoglobulin-like domains with atypical disulfides. J Biol Chem 270, 2616826177.
Chu, W. S., Magee, B. B. & Magee, P. T. (1993). Construction of an SfiI macrorestriction map of the Candida albicans genome. J Bacteriol 175, 66376651.[Abstract]
Combet, C., Blanchet, C., Geourjon, C. & Deleage, G. (2000). NPS@: network protein sequence analysis. Trends Biochem Sci 25, 147150.[CrossRef][Medline]
De Bernardis, F., Sullivan, P. A. & Cassone, A. (2001). Aspartyl proteinases of Candida albicans and their role in pathogenicity. Med Mycol 39, 303313.[Medline]
Fonzi, W. A. & Irwin, M. Y. (1993). Isogenic strain construction and gene mapping in Candida albicans. Genetics 134, 717728.
Fu, Y., Ibrahim, A. S., Sheppard, D. C., Chen, Y. C., French, S. W., Cutler, J. E., Filler, S. G. & Edwards, J. E., Jr (2002). Candida albicans Als1p: an adhesin that is a downstream effector of the EFG1 filamentation pathway. Mol Microbiol 44, 6172.[CrossRef][Medline]
Gaur, N. K. & Klotz, S. A. (1997). Expression, cloning, and characterization of a Candida albicans gene, ALA1, that confers adherence properties upon Saccharomyces cerevisiae for extracellular matrix proteins. Infect Immun 65, 52895294.[Abstract]
Gaur, N. K., Klotz, S. A. & Henderson, R. L. (1999). Overexpression of the Candida albicans ALA1 gene in Saccharomyces cerevisiae results in aggregation following attachment of yeast cells to extracellular matrix proteins, adherence properties similar to those of Candida albicans. Infect Immun 67, 60406047.
Gillum, A. M., Tsay, E. Y. & Kirsch, D. R. (1984). Isolation of the Candida albicans genes for orotidine-5'-phosphate decarboxylase by complementation of S. cerevisiae ura3 and E. coli pyrF mutations. Mol Gen Genet 198, 179182.[Medline]
Hicks, J. B. & Herskowitz, I. (1976). Interconversion of yeast mating types. I. Direct observations of the action of the homothallism (HO) gene. Genetics 83, 245258.
Hoyer, L. L. (2001). The ALS gene family of Candida albicans. Trends Microbiol 9, 176180.[CrossRef][Medline]
Hoyer, L. L. & Hecht, J. E. (2000). The ALS6 and ALS7 genes of Candida albicans. Yeast 16, 847855.[CrossRef][Medline]
Hoyer, L. L. & Hecht, J. E. (2001). The ALS5 gene of Candida albicans and analysis of the Als5p N-terminal domain. Yeast 18, 4960.[CrossRef][Medline]
Hoyer, L. L., Scherer, S., Shatzman, A. R. & Livi, G. P. (1995). Candida albicans ALS1: domains related to a Saccharomyces cerevisiae sexual agglutinin separated by a repeating motif. Mol Microbiol 15, 3954.[Medline]
Hoyer, L. L., Payne, T. L., Bell, M., Myers, A. M. & Scherer, S. (1998a). Candida albicans ALS3 and insights into the nature of the ALS gene family. Curr Genet 33, 451459.[CrossRef][Medline]
Hoyer, L. L., Payne, T. L. & Hecht, J. E. (1998b). Identification of Candida albicans ALS2 and ALS4 and localization of Als proteins to the fungal cell surface. J Bacteriol 180, 53345343.
Hoyer, L. L., Fundyga, R., Hecht, J. E., Kapteyn, J. C., Klis, F. M. & Arnold, J. (2001). Characterization of agglutinin-like sequence genes from non-albicans Candida and phylogenetic analysis of the ALS family. Genetics 157, 15551567.
Hube, B. & Naglik, J. (2001). Candida albicans proteinases: resolving the mystery of a gene family. Microbiology 147, 19972005.
Hube, B., Stehr, F., Bossenz, M., Mazur, A., Kretschmar, M. & Schafer, W. (2000). Secreted lipases of Candida albicans: cloning, characterisation and expression analysis of a new gene family with at least ten members. Arch Microbiol 174, 362374.[CrossRef][Medline]
Hull, C. M. & Johnson, A. D. (1999). Identification of a mating type-like locus in the asexual pathogenic yeast Candida albicans. Science 285, 12711275.
Jentoft, N. (1990). Why are proteins O-glycosylated? Trends Biochem Sci 15, 291294.[CrossRef][Medline]
Kapteyn, J. C., Hoyer, L. L., Hecht, J. E., Muller, W. H., Andel, A., Verkleij, A. J., Makarow M, Van Den Ende, H. & Klis, F. M. (2000). The cell wall architecture of Candida albicans wild-type cells and cell wall-defective mutants. Mol Microbiol 35, 601611.[CrossRef][Medline]
Miyazaki, Y., Geber, A., Miyazaki, H., Falconer, D., Parkinson, T., Hitchcock, C., Grimberg, B., Nyswaner, K. & Bennett, J. E. (1999). Cloning, sequencing, expression and allelic sequence diversity of ERG3 (C-5 sterol desaturase gene) in Candida albicans. Gene 236, 4351.[CrossRef][Medline]
Monod, M. & Borg-von Zepelin, M. (2002). Secreted proteinases and other virulence mechanisms of Candida albicans. Chem Immunol 81, 114128.[Medline]
Odds, F. C. (1988). Candida and Candidosis, 2nd edn. London: Baillière Tindall.
Page, R. D. (1996). TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12, 357358.[Medline]
Pujol, C., Pfaller, M. & Soll, D. R. (2002). Ca3 fingerprinting of Candida albicans bloodstream isolates from the United States, Canada, South America, and Europe reveals a European clade. J Clin Microbiol 40, 27292740.
Santos, M. A. & Tuite, M. F. (1995). The CUG codon is decoded in vivo as serine and not leucine in Candida albicans. Nucleic Acids Res 23, 14811486.[Abstract]
Staib, P., Kretschmar, M., Nichterlein, T., Hof, H. & Morschhauser, J. (2002). Host versus in vitro signals and intrastrain allelic differences in the expression of a Candida albicans virulence gene. Mol Microbiol 44, 13511366.[CrossRef][Medline]
Whelan, W. L. (1987). The genetic basis of resistance to 5-fluorocytosine in Candida species and Cryptococcus neoformans. Crit Rev Microbiol 15, 4556.[Medline]
Whelan, W. L., Markie, D. & Kwon-Chung, K. J. (1986). Complementation analysis of resistance to 5-fluorocytosine in Candida albicans. Antimicrob Agents Chemother 29, 726729.[Medline]
Wickes, B., Staudinger, J., Magee, B. B., Kwon-Chung, K. J., Magee, P. T. & Scherer, S. (1991). Physical and genetic mapping of Candida albicans: several genes previously assigned to chromosome 1 map to chromosome R, the rDNA-containing linkage group. Infect Immun 59, 24802484.[Medline]
Wilson, R. B., Davis, D., Enloe, B. M. & Mitchell, A. P. (2000). A recyclable Candida albicans URA3 cassette for PCR product-directed gene disruption. Yeast 16, 6570.[CrossRef][Medline]
Yesland, K. & Fonzi, W. A. (2000). Allele-specific gene targeting in Candida albicans results from heterology between alleles. Microbiology 146, 20972104.
Zhang, N., Harrex, A. L., Holland, B., Cannon, R. D. & Schmid, J. (2002). Genomic markers of pathogenicity of Candida albicans. Abstract S-9. ASM Conference on Candida and Candidiasis, Tampa, FL, USA. Washington, DC: American Society for Microbiology.
Received 18 May 2003;
revised 2 July 2003;
accepted 3 July 2003.