Allelic variation in the contiguous loci encoding Candida albicans ALS5, ALS1 and ALS9

Xiaomin Zhao1, Claude Pujol2, David R. Soll2 and Lois L. Hoyer1

1 Department of Veterinary Pathobiology, University of Illinois at Urbana-Champaign, Urbana, IL 61802, USA
2 Department of Biological Sciences, University of Iowa, Iowa City, IA 52242, USA

Correspondence
Lois L. Hoyer
lhoyer{at}uiuc.edu


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
The ALS gene family of Candida albicans consists of eight genes (ALS1 to ALS7 and ALS9) that encode cell-wall glycoproteins involved in adhesion to host surfaces. Considerable allelic sequence variability has been documented for regions of ALS genes encoding repeated sequences. Although regions of ALS genes encoding non-repeated sequences tend to be more conserved, some sequence divergence has been noted, particularly for alleles of ALS5. Data from the C. albicans genome sequencing project provided the first indication that strain SC5314 encoded two divergent ALS9-like sequences and that three of the ALS genes (ALS5, ALS1 and ALS9) were contiguous on chromosome 6. Data from PCR analysis and construction of both single and double deletion mutants indicated that the divergent sequences were alleles of ALS9, and located downstream of ALS5 and ALS1. Sequences within the 5' domain of ALS9-1 and ALS9-2 varied by 11 %. Within the 3' domain of each allele, extra nucleotides were present in two regions of ALS9-2, designated Variable Block 1 (VB1) and Variable Block 2 (VB2). Analysis of strains from the five major C. albicans genetic clades showed that both ALS9 alleles are widespread among these strains, that the sequences of ALS9-1 and ALS9-2 are conserved among diverse strains and that recombinant ALS9 alleles have been generated during C. albicans evolution. Phylogenetic analysis showed that, although divergent in sequence, ALS9 alleles are more similar to each other than to any other ALS genes. The degree of sequence divergence for ALS9 greatly exceeds that observed previously for other ALS genes and may result in functional differences for the proteins encoded by the two alleles.


Abbreviations: ALS, agglutinin-like sequence; ME, distance with minimum-evolution; ML, maximum-likelihood; MP, maximum-parsimony; VB, variable block

The GenBank accession numbers for the sequences reported in this paper are AY227440 (ALS5-1), AY227439 (ALS5-2), AY269422 (ALS9-2), AY269423 (ALS9-1) and AY296650 (ALS7).


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Candida albicans is an opportunistic fungal pathogen that causes oral and vaginal mucosal infections as well as systemic disease (Odds, 1988). C. albicans has several gene families that encode proteins involved in pathogenesis (De Bernardis et al., 2001; Hube & Naglik, 2001; Hube et al., 2000; Monod & Borg-von Zepelin, 2002). Among these is the ALS (Agglutinin-Like Sequence) family that encodes large cell-surface glycoproteins (Hoyer, 2001). The ALS family includes eight genes named ALS1 to ALS7 and ALS9. Recent work showed that ALS3 and ALS8, which were presumed to represent separate physical loci, are the same gene, eliminating ALS8 from the family nomenclature (X. Zhao, J. A. Nuessen, R. P. Leng, A. J. P. Brown & L. L. Hoyer, unpublished). Characterization of the ALS family revealed a common three-domain gene structure (Hoyer, 2001). In the centre of each coding region is a domain composed entirely of tandem copies of a highly conserved 108 bp unit. Within a given ALS gene, the size of this region varies considerably between alleles due to differences in the number of copies of the 108 bp sequence present (Hoyer, 2001). Flanking this central tandem repeat region are a 5' domain of approximately 1·3 kb that is 55–90 % conserved across the family and a 3' domain of variable length and sequence. The tandem repeat region and 3' domain encode serine/threonine-rich amino acid sequences that are heavily glycosylated in the mature Als protein. Our current working model of the Als proteins is that they encode a relatively non-glycosylated N-terminal domain of approximately 320 aa that is displayed on the cell surface by the remainder of the mature protein, which due to its heavy glycosylation, assumes an elongated conformation (Jentoft, 1990; Kapteyn et al., 2000). In this model, the variable length of the tandem repeat domain results in display of the N terminus either closer to or at greater length from the C. albicans cell surface. Recent studies have demonstrated a role in adherence for certain Als proteins (Gaur et al., 1999; Fu et al., 2002), but a broad study of function across the family still is required.

Characterization of ALS genes is necessary to place functional observations into an appropriate context. Previous data have shown that sequence variability occurs at various levels within the ALS family. In addition to the variability in the tandem repeat domain discussed above, certain ALS genes also encode other repeated regions that differ between alleles; the best example is ALS7 that has a complex repeated region in its 3' domain (Hoyer & Hecht, 2000; Zhang et al., 2002). Also, examination of various C. albicans isolates showed that certain strains are lacking ALS genes that are present in others (Hoyer & Hecht, 2000, 2001). Finally, allelic sequence polymorphisms can be found outside of repeated regions. These sequence differences are more pronounced within the 3' end of the gene, which encodes the heavily glycosylated portion of the mature protein (Hoyer et al., 1998b; Hoyer & Hecht, 2001). However, examination of two alleles of ALS5 from different strains indicated that sequence polymorphisms were also found in the 5' domain. Proteins encoded by these alleles have 2·8 % amino acid differences within the N-terminal domain. This figure was at least double the frequency of sequence differences observed for other C. albicans genes for which alleles had been sequenced (Hoyer & Hecht, 2001). At this time, it is unknown whether these sequence differences are sufficient to result in altered Als protein function.

In C. albicans, ALS genes are found on three different chromosomes: chromosome 3 (ALS6 and ALS7), R (ALS3) and 6 (ALS1, ALS2, ALS4, ALS5; Wickes et al., 1991; Hoyer et al., 2001). Additional mapping efforts showed that although several ALS genes are found on chromosome 6, ALS2 and ALS4 are on SfiI fragment ‘C’ while ALS1 and ALS5 are on SfiI fragment ‘O’ (Chu et al., 1993; Hoyer et al., 1998b). Work with a C. albicans genomic fosmid library indicated that ALS1 and ALS5 were found on the same fosmids and, therefore, located within approximately 50 kb of each other (Hoyer & Hecht, 2001).

Indications of the existence of ALS9 were first found among preliminary data from the C. albicans genome project (http://www-sequence.stanford.edu/group/candida). Although this sequence has been recognized as an ALS gene (Hoyer, 2001), this manuscript is the first description of its structure. Genome annotation (http://genolist.pasteur.fr/CandidaDB) suggested that strain SC5314 encodes two divergent ALS9-like sequences. These sources also suggested that ALS5, ALS1 and ALS9 were contiguous in the C. albicans genome. The goal of the work presented here was to characterize the variability within the ALS9 coding region and to establish its physical location with respect to other ALS genes. Here we confirm the sequence divergence of the ALS9 alleles in strain SC5314 and demonstrate that ALS5, ALS1 and ALS9 are contiguous on C. albicans chromosome 6. We also examined the allelic diversity of ALS9 among a larger collection of C. albicans strains and the allelic arrangement of ALS genes on the homologous copies of chromosome 6. These data show that the sequence divergence observed for ALS9 alleles is much greater than that previously documented for other ALS genes and reinforce the conclusion that sequence divergence is not limited to the repeated regions of ALS genes.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Strains and growth conditions.
C. albicans constructs and their parent strains are listed in Table 1. Other C. albicans isolates were defined previously (Hoyer et al., 1995). Yeast cells were cultured in YPD medium (1 % yeast extract, 2 % peptone, 2 % glucose) at 37 °C. Transformants were selected on synthetic complete medium without uridine (SC-Uri; Hicks & Herskowitz, 1976) and grown in liquid SC-Uri for DNA extraction.


View this table:
[in this window]
[in a new window]
 
Table 1. C. albicans constructs and parent strains

 
Gene deletions.
Strains lacking ALS alleles were created using the PCR product-directed method of Wilson et al. (2000). Plasmid pDDB57 was a gift from Aaron Mitchell (Columbia University, New York, USA) and was used as the template for amplification of disruption cassettes. Disruption cassette DNA was purified on agarose gels and extracted with a GeneClean kit (Qbiogene). About 10 µg DNA was transformed into spheroplasted CAI4 cells and transformants were selected on SC-Uri plates containing 1 M sorbitol. Correct transformants were identified using PCR and Southern blots. Genomic DNA was extracted from C. albicans using the MasterPure Yeast DNA Purification Kit (Epicentre). Southern blotting used Genius reagents and chemiluminescent detection (Roche).

PCR reactions.
Throughout this manuscript, primers are designated by numbers and the corresponding sequence is listed in Table 2. Primers were synthesized by Integrated DNA Technologies (Coralville, IA, USA). PCR amplifications were performed using an I-Cycler (Bio-Rad). Reactions varied depending on the experimental goal and product size. In general, a typical reaction contained 200 ng genomic DNA, 1·5 mM (for fragments less than 2 kb) or 3 mM (for fragments larger than 2 kb) MgCl2, 2 µM each primer, 0·2 mM each dNTP and 2·5 units Taq (Invitrogen) or Pfu Turbo polymerase (Stratagene). Pfu polymerase was used for amplification of larger fragments or when a proofreading polymerase was needed to maintain sequence fidelity. PCR reactions were incubated at 95 °C for 5 min, followed by 25–35 cycles of 95 °C denaturation for 30 s, primer annealing at 50–57 °C for 45 s and elongation at 72 °C for 1–10 min depending on the length of expected products. Each reaction included a final 7 min extension at 72 °C to complete elongation of products.


View this table:
[in this window]
[in a new window]
 
Table 2. Oligonucleotide primers used in this study

 
DNA sequencing and analysis.
Genes for DNA sequencing were amplified from genomic DNA using Pfu Turbo polymerase (Stratagene). PCR products representing full-length ALS9 genes were cloned into pCRBlunt (Invitrogen) according to manufacturer's instructions. PCR products encoding a portion of ALS genes were purified using Wizard PCR Preps DNA Purification System (Promega). DNA sequencing was performed by Elim Biopharmaceuticals (Hayward, CA, USA). DNA sequence comparisons used Wisconsin Package Version 9.1 [Genetics Computer Group (GCG), Madison, WI, USA].

Phylogenetic analysis.
Phylogenetic analysis utilized DNA sequences of the 5' domains (approx. 1300 nt) from each C. albicans ALS gene and the two alleles of ALS9. All gene sequences were from strain SC5314. Sequences were aligned using CLUSTALW (Combet et al., 2000) with a gap opening penalty of 10 and a gap extension penalty of 0·05. Phylogenetic analysis used PAUP (Phylogenetic Analysis Using Parsimony) in the Wisconsin Package. The maximum-parsimony (MP), distance with minimum-evolution (ME) model and maximum-likelihood (ML) criteria were employed. The F84 model was chosen for ME and ML analysis to ameliorate AT-rich biases for DNA data. The gamma model was used to correct site-to-site heterogeneity. The gamma parameter {alpha} was estimated from each dataset by PAUP. Amino acid sequences were translated from the nucleotide sequences using the alternate yeast genetic code (Santos & Tuite, 1995). Amino acid sequences were aligned by specifying a gap opening penalty of 10 and a gap extension penalty of 0·1. The protein phylogenetic trees were constructed using parsimony and distance criteria, respectively. The best trees were searched by evaluating all possible trees using an exhaustive search for MP and ME analysis. An heuristic search was used for ML analysis. Bootstrap with 1000 replicates was performed for all the analyses. TREEVIEW software (Page, 1996) was used for printing phylogenetic trees.


   RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
ALS5, ALS1 and ALS9 are contiguous in the C. albicans genome
Our first step in investigating gene organization and allelic variation within the region encoding ALS5, ALS1 and ALS9 was to verify that the genes were contiguous in the C. albicans genome. This gene order was initially verified using PCR (Fig. 1). Primers were designed to hybridize within the 5' end or 3' end of the various coding regions and paired to verify gene order and distance between ORFs. Primers 1 and 2 (Table 2) were used to amplify genomic DNA from strain SC5314. The resulting fragment was approximately 5 kb, consistent with the 5064 bp fragment predicted from the genome assembly and the conclusion that ALS5 and ALS1 were contiguous in the genome. A similar strategy was used to support the conclusion that ALS1 and ALS9 were contiguous. PCR amplification with primers 3 and 4 yielded the predicted 3·1 kb fragment (Fig. 1). These results were consistent with the conclusion that ALS5 and ALS1 were separated by 4·8 kb of non-coding sequence and that ALS1 and ALS9 had 2·8 kb of sequence between them.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 1. PCR amplification of genomic DNA from C. albicans strain SC5314 to demonstrate the gene order and spacing of ALS5, ALS1 and ALS9. (a) Line drawing showing gene order and spacing of the ALS genes and flanking coding regions. Arrows within the coding region indicate the direction of transcription. The copy of chromosome 6 encoding the small alleles of each ALS gene (named ALSn-2) is depicted. Numbers in parentheses are PCR primer numbers specified in Table 2. Forward primer numbers are above the line diagram and reverse primer numbers are below the diagram. (b) Ethidium-bromide-stained PCR products separated on an agarose gel. Primer pairs for each reaction are indicated above each lane. Molecular sizes are shown on the left.

 
Genome assembly of regions flanking the ALS loci was also verified by PCR. Primer 5 from the CDC50 coding region upstream of ALS5, paired with primer 6 from within the 5' end of ALS5 (Table 2), produced two PCR products from SC5314 genomic DNA. The size of these PCR products differed by approximately 1 kb. The size of the smaller fragment (3·8 kb) was consistent with predictions from the genome assembly (http://www-sequence.stanford.edu/group/candida). Additional experimentation with strains lacking one ALS5 allele (see below) indicated that the two fragments were derived from sequences on homologous chromosomes (data not shown). The 3·8 kb fragment was immediately upstream of one allele, ALS5-2, and the 4·8 kb fragment upstream of the other allele, ALS5-1. Confirmation of the identity of the gene downstream of ALS9 was accomplished in a similar manner. Ironically, the coding sequence immediately downstream of ALS9 was ALA1. In this case, ALA1 was not a synonym for ALS5 (Gaur & Klotz, 1997), but encoded alanyl-tRNA synthetase. Primers 7 and 8 (Table 2) produced the predicted 1·3 kb fragment (Fig. 1). These data demonstrated that the genome assembly was essentially accurate and that no other ALS genes were located within this region.

The ALS gene order was corroborated by construction of mutants that demonstrated ALS5 and ALS1 could be removed using a single deletion step. The deletion cassette was amplified from pDDB57 using primers specific for the region 5' of ALS5 and 3' of ALS1 (Fig. 2). Transformation of a deletion cassette amplified with primers 9 and 10 (Table 2) removed either 12·5 (clone 2027) or 15 kb (clone 2025) of sequence from strain CAI4. The size differences in these bands were dependent on which alleles of ALS5 and ALS1 were removed. A similar cassette amplified with primers 15 and 16 (Table 2) was transformed into CAI4 to delete ALS1 and ALS9 simultaneously. This transformation yielded clone 2166 in which 14·5 kb of sequence containing the ALS1-1 and ALS9-1 alleles was deleted (Fig. 3). The strains created using this technique were evaluated further by PCR using primers 21 and 22 for clones 2027 and 2025 (Fig. 2), and primers 20 and 23, for clone 2166 (Fig. 3). In each case, the PCR primers amplified across the deleted region and produced fragments of the expected size. DNA sequencing of these products verified that the ALS5ALS1 deletions removed the region from -39 of ALS5 to +1 of ALS1. The ALS1ALS9 deletion removed the region from -57 of ALS1 to +30 of ALS9. Removal of each pair of genes in a single transformation step supported the conclusion that the ALS5, ALS1 and ALS9 genes were contiguous on C. albicans SC5314 chromosome 6.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 2. Construction of C. albicans strains with the ALS5ALS1 region deleted. (a) Schematic of the ALS5 and ALS1 loci in C. albicans strain SC5314. Alleles on both homologous chromosomes are shown. The deletion cassette and arrows to indicate its integration to delete ALS5 and ALS1 are shown below the allelic maps. Numbers in parentheses denote locations of PCR primers specified in Table 2. Solid black bars show the position of the ALS5 upstream and ALS1 downstream probes for Southern blots. (b) Southern blot of genomic DNA from strains CAI4, 2025 and 2027. Genomic DNA was digested with BglII and hybridized with the ALS5 upstream probe. The probe was generated using PCR primers 11 and 12 (Table 2). (c) Southern blots of BglII-digested genomic DNA hybridized with the ALS1 downstream probe. The probe was amplified by PCR using primers 13 and 14 (Table 2). Molecular sizes are shown on the left.

 


View larger version (32K):
[in this window]
[in a new window]
 
Fig. 3. Construction of a C. albicans strain with the ALS1ALS9 region deleted. (a) Schematic of the ALS1 and ALS9 loci in C. albicans strain SC5314. Alleles on both homologous chromosomes are shown. The deletion cassette and arrows to indicate its integration to delete ALS1 and ALS9 are shown below the allelic maps. Numbers in parentheses denote locations of PCR primers specified in Table 2. Solid black bars show the position of the ALS1 upstream and ALS9 downstream probes for Southern blots. (b) Southern blot of BglII-digested genomic DNA from strains CAI4 and 2166 hybridized with the ALS1 upstream probe. The probe was generated using primers 17 and 18 (Table 2). (c) Southern blot of BglII-digested genomic DNA hybridized with the ALS9 downstream probe. This probe was generated by PCR using primers 19 and 20 (Table 2). Molecular sizes are shown on the left.

 
Divergent gene sequences are alleles of ALS9
DNA sequences indicating the presence of ALS9 in the C. albicans genome were initially detected among data from the genome sequencing project (http://www-sequence.stanford.edu/group/candida). The potential for the presence of two highly divergent alleles was obvious within the CandidaDB genome annotation (http://genolist.pasteur.fr/CandidaDB). CandidaDB listed three sequences (ALS9.5eoc, ALS11.5f and ALS11.3f) with similarity to the ALS9 sequence from strain 1161, which we previously deposited in the GenBank database (accession numbers AF229989 for the 5' domain and AF229990 for the 3' domain). Of the CandidaDB sequences, ALS9.5eoc clearly matched our AF229989 sequence (5' end of the gene) and ALS11.3f corresponded to AF229990 (3' end of the gene). The largest difference was observed for ALS11.5f which resembled AF229989, but showed such sequence divergence that it was possible this sequence corresponded to a different locus. Therefore, our efforts were focused on determining whether the alleles of ALS9 in strain SC5314 were as divergent as the genome sequence data indicated.

Unlike some ALS genes, alleles of ALS9 from strain SC5314 could not be distinguished from each other based on agarose gel mobility (see below). The first ALS9 allele (ALS9-2) was isolated by PCR amplification using primers 24 and 25 (Table 2) and SC5314 genomic DNA (Fig. 4). To isolate the second allele (ALS9-1), we created a strain lacking the previously isolated sequence. Strain 1976 was constructed by transformation of CAI4 with a disruption cassette amplified from plasmid pDDB57 using primers 26 and 16 and verified by Southern blotting (Fig. 4). PCR primers 27 and 28 were used to amplify the second ALS9 allele from strain 1976 genomic DNA (Fig. 4). Both ALS9-encoding fragments were cloned into the vector pCRBlunt (Invitrogen) for DNA sequence analysis.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 4. Construction of a C. albicans strain with one ALS9 allele deleted. (a) Schematic of the ALS9 locus in C. albicans strain SC5314. Alleles on homologous chromosomes are shown. The deletion cassette and arrows to indicate its integration to delete ALS9-2 are shown below the allelic maps. Numbers in parentheses denote locations of PCR primers specified in Table 2. The ALS9 downstream probe is shown with solid black bars. (b) Southern blot of genomic DNA from strains CAI4 and 1976. Genomic DNA was digested with BglII and hybridized with the ALS9 downstream probe. The probe was generated using PCR primers 19 and 20. Molecular sizes are shown on the left.

 
DNA sequencing of the resulting clones showed that the first allele amplified was most similar to the ALS9 sequence from strain 1161 (GenBank accession numbers AF229989 and AF229990). This sequence was deposited into GenBank, assigned accession number AY269422 and named ALS9-2. DNA sequence analysis of the allele amplified from strain 1976 revealed that it was almost identical to the ALS11.5f sequence from CandidaDB. The DNA sequence of this allele was deposited into GenBank with the accession number AY269423 and named ALS9-1. Consistent with our previous nomenclature for ALS alleles (Hoyer et al., 1998b), the allele with the longer tandem repeat domain was named ALS9-1 (17 tandem copies of the 108 bp sequence) and the allele with the shorter tandem repeat region called ALS9-2 (14 tandem copies).

The physical location of the two sequences within the SC5314 genome supported the conclusion that these divergent sequences encoded ALS9 alleles rather than different genes. Since the 5' domains of the two coding regions were divergent (see below), we designed primers specific for each coding region (primer 29 for ALS9-1 and primer 30 for ALS9-2; Table 2; Fig. 4). When paired with primer 3 (hybridizes within the 3' end of ALS1; Fig. 1), the ALS9-specific primers each amplified the expected 2·9 kb fragment from SC5314 genomic DNA (Fig. 5, lanes 1 and 2). However, amplification of genomic DNA from strain 2166 only yielded a product with primer 30, demonstrating that the primer 29-specific sequences were absent in this strain (Fig. 5, lanes 3 and 4). The converse result was obtained with genomic DNA from strain 1976, where only the primer 3/primer 29 pair generated a product (Fig. 5, lanes 5 and 6). These results provide further evidence that the two ALS9 sequences are allelic and that each is located 2·8 kb downstream of the ALS1 coding region on homologous copies of chromosome 6.



View larger version (99K):
[in this window]
[in a new window]
 
Fig. 5. Ethidium-bromide-stained PCR products amplified using ALS9 allele-specific primer pairs to demonstrate which alleles are present in wild-type and mutant strains. Genomic DNA from strains SC5314, 2166 and 1976 was amplified using the primer pairs indicated above each lane. The location of primer 3 is shown in Fig. 1; primers 29 and 30 are shown in Fig. 4. Molecular sizes are shown on the left.

 
Comparison of ALS9 alleles from SC5314
Line diagrams comparing ALS9-1 and ALS9-2 are shown in Fig. 6. Alignment of the ALS9-1 and ALS9-2 sequences from SC5314 using the GAP program of the Wisconsin software package revealed that the sequences were 93 % identical at the DNA level and 87 % identical at the amino acid level. The magnitude of the sequence differences was most pronounced within the 5'/N-terminal domain (nt 1–1296; aa 1–432), where sequences were 89 % identical at the nucleotide level and only 84 % identical at the amino acid level. The tandem repeat sequences of the ALS9 alleles were very similar to each other and exhibited the 108 bp unit length found in other ALS gene tandem repeat regions. ALS9-1 contained 17 copies of the tandemly repeated sequence and ALS9-2 contained 14 copies. The 3' ends of the genes were similar, but ALS9-2 contained additional sequence in two regions; these regions were named Variable Block 1 (VB1) and Variable Block 2 (VB2) (Fig. 6). In these regions, ALS9-2 contained an extra 144 bp (VB1; nt 3195–3338) and 117 bp (VB2; nt 4176–4292) not present in ALS9-1.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 6. Schematic alignment of ALS9 alleles from strain SC5314 to highlight sequence features, probe and primer locations. Numbers in parentheses indicate the positions of primers specified in Table 2. Locations of the tandem repeat domain, VB1 and VB2 are shown. The length of the tandem repeat domain was verified by PCR amplification of genomic DNA with primers 31 and 32. Positions of Southern blot probes for Fig. 7 are indicated by the solid black bars at the bottom of the figure.

 
Although the ALS9 tandem repeat unit was the same length as in other ALS genes, the consensus sequence of the motif was different from that observed in ALS1 and ALS5. The tandem repeats of ALS1 and ALS5 have been used to define subfamilies of ALS genes (Hoyer et al., 1998b; Hoyer, 2001). ALS1 tandem repeats cross-hybridize with those from ALS2, ALS3 and ALS4, while the ALS5 repeats cross-hybridize with those in ALS6 and ALS7. Hybridization of the ALS9 tandem repeats to BglII-digested genomic DNA from various C. albicans strains demonstrated that the fragment does not cross-hybridize with other ALS sequences at high stringency (Fig. 7). The hybridizing sequences are the same as those detected by an ALS9-specific fragment from the 3' end of the gene, suggesting that in the strains tested, there were no additional genes belonging to the ALS9 subfamily. The two probes used in Fig. 7 cross-hybridized with genomic DNA from Candida dubliniensis, suggesting that ALS9-like sequences were present. Similar to results for C. albicans, the ALS9 tandem repeat and 3' end probes hybridized to fragments of the same size, suggesting that both sequences were contiguous in the C. dubliniensis genome (data not shown).



View larger version (60K):
[in this window]
[in a new window]
 
Fig. 7. Southern blots of BglII-digested genomic DNA from various C. albicans strains hybridized with ALS9-encoding fragments. Probes for this analysis are shown in Fig. 6. Strains used are the same as those analysed during characterization of other ALS genes (Hoyer et al., 1995, 1998a, b, 2001; Hoyer & Hecht, 2000). (a) Hybridization with the ALS9 tandem repeat probe. (b) The same blot hybridized with the ALS9 3' end probe. Molecular sizes are shown on the left.

 
Which alleles of ALS5, ALS1 and ALS9 are on the same chromosome in strain SC5314?
Strains created in this study were analysed to determine which alleles of ALS5, ALS1 and ALS9 were located on each homologue of chromosome 6. Previous work demonstrated that the alleles of ALS5 in strain SC5314 differ by one copy of the tandem repeat within the central domain (GenBank accession numbers AY227439 and AY227440). This difference can be detected by agarose gel electrophoresis of PCR products generated using primers 45 and 46. Alleles of ALS1 in strain SC5314 were shown by Southern blotting to differ by 12 copies of the tandem repeat within the central domain; this difference is large enough to be detected easily on Southern blots of genomic DNA (Hoyer et al., 1995). ALS9 alleles can be discriminated in strain SC5314 by PCR amplification of the VB2 region using primers 39 and 40. Previous Southern blots of genomic DNA from strain 2025 showed it was lacking the small allele of ALS1 while the ALS1 large allele was deleted in strain 2027 (Fig. 2). PCR analysis revealed the presence of the larger ALS5 allele in strain 2025 and the smaller ALS5 allele in strain 2027 (Fig. 8). From these data, we concluded that the large alleles of ALS5 and ALS1 were on the same chromosome while the small alleles of each gene occupied the homologous chromosome. The ALS9 allele for each chromosome was identified by PCR. Genomic DNA from strain 2166 yielded the larger fragment that was characteristic of ALS9-2 compared to a control reaction with genomic DNA from strain 1976, which yielded the smaller band found in ALS9-1 (Fig. 8). Therefore, the large alleles of each gene occupied the same chromosome while all three small alleles were found on the other. All figures in this paper were drawn to reflect this conclusion.



View larger version (56K):
[in this window]
[in a new window]
 
Fig. 8. Agarose gel of ethidium-bromide-stained PCR products to determine which ALS alleles are on the same homologue of chromosome 6. (a) Amplification of genomic DNA from strains SC5314, 2025 and 2027 using primers 45 and 46. (b) Amplification of genomic DNA from strains SC5314, 1976 and 2166 using primers 39 and 40. Molecular sizes are shown on the left.

 
Allelic diversity of ALS9 in a larger strain collection
To determine if ALS9 allelic diversity was specific to strain SC5314 or a common feature of the species, we tested 30 C. albicans isolates from a geographically diverse collection of the five major genetic clades described for C. albicans (Blignaut et al., 2002; Pujol et al., 2002). PCR primers were designed to assess which allele was present in the 5' end of the gene and to deduce the size of VB1 and VB2 in each strain. The position of PCR primers used in this analysis is indicated in Fig. 6. Primers 41 and 42 are specific for the 5' domain of the ALS9-2 allele while primers 43 and 44 only amplify ALS9-1. Primers 37 and 38 amplify the VB1 region; primers 39 and 40 amplify the VB2 region. The size of each VB was compared to that for the sequenced alleles from strain SC5314. Results for each strain are shown in Table 3 and are indicated as ‘9-1’ or ‘9-2’ depending on whether they were similar to the sequences found in ALS9-1 or ALS9-2.


View this table:
[in this window]
[in a new window]
 
Table 3. ALS9 allelic diversity in a collection of geographically diverse C. albicans isolates

 
Within the 5' end of the gene, both ALS9-1 and ALS9-2 alleles were found. In each case where ALS9-1 was present, the ALS9-2 product was also detected. The ALS9-1 allele was not found by itself in any strains tested. This conclusion was the same for the VB1 region where either the heterozygous alleles were detected or only ALS9-2 was found. Within the VB2 region, however, ALS9-1 and ALS9-2 could be detected within the same strain, or each by itself. These results indicated that the allelic variability observed in SC5314 was present across a larger population of C. albicans strains and suggest that recombinant ALS9 alleles have been generated during C. albicans evolution.

One heterozygous ALS9-1/ALS9-2 strain from each genetic clade (P52037, P22095, P52098, P22059 and OKP109) was selected for DNA sequencing of a portion of the 5' domain from each allele. Primers 41 and 42 were used to amplify ALS9-1 and primers 43 and 44 amplified ALS9-2 (Fig. 6). Alignment of the resulting sequences indicated that, for each allele, there were only 5 nt changes across the approximately 1050 bp of sequence derived from each strain (data not shown). Therefore, although the 5' domain of the ALS9-1 and ALS9-2 groups of alleles are quite divergent from each other, their sequences are highly conserved across a diverse group of strains. For comparative purposes, the same analysis was performed for ALS5, since non-tandem-repeat sequence differences in this gene had previously represented the greatest allelic sequence divergence (Hoyer & Hecht, 2001). In the previous work, the sequence differences were between alleles from strains 1161 (GenBank accession number AF068866) and CA1 (GenBank accession number AF025429). In addition to the strains from which ALS9 alleles were analysed above, one additional strain from each genetic clade was added to the analysis (K221, P57047, P57069, P57096 and OKP77). PCR amplification and DNA sequence analysis of approximately the first 850 bp of the ALS5 coding region showed that the alleles from 1161 and CA1 are the most divergent (19 nt changes) and that in positions of nucleotide changes, the sequence of the CA1 allele was more representative of the strains examined than the allele from 1161 (data not shown). Only the sequence from strain P52098 more closely resembled that from 1161. These data support the conclusion that the ALS9 allelic sequence divergence is greater than that previously observed for other ALS genes.

Phylogenetic analysis of ALS family
Across the ALS family, the 5' domain is the most conserved portion of the genes in both length and sequence (Hoyer, 2001). By analogy to the structure of S. cerevisiae {alpha}-agglutinin (Chen et al., 1995), this region of the protein is most likely to be involved in adhesive interactions and, therefore, carry functional significance. Sequence alignment showed that the 5' domains of ALS9-1 and ALS9-2 were only 89 % identical. The significance of this observation is brought into perspective considering that the 5' domains of ALS1 and ALS5 are 90 % identical. On the amino acid level, the N-terminal domains of Als9-1p and Als9-2p are 84 % identical compared to the 87 % identity between Als1p and Als5p and the 85 % identity between Als1p and Als3p. Therefore, the ALS9 allelic sequences and their encoded proteins are less similar than sequences from different loci in the ALS family. To better understand the relationship between the ALS9 alleles and their relationship to the remainder of the ALS genes, we conducted a phylogenetic analysis.

Phylogenetic analysis utilized DNA sequences from the 5' domains of each C. albicans ALS gene and the two alleles of ALS9. All sequences were from strain SC5314. The DNA dataset generated a single identical tree (Fig. 9) using MP, distance with ME (distance measured by F84 model) and ML (gamma model, {alpha}=0·7). This tree was based on 1320 characters of which 602 were constant and 558 were parsimony-informative. The parsimony tree length was 1341 with a consistency index of 0·76, an ME score of 0·88 and -ln likelihood of 7393. In this tree, ALS9-1 and ALS9-2 grouped together with 100 % bootstrap support. Protein sequence phylogenetic trees were constructed using parsimony and distance criteria, respectively. The protein dataset produced three most parsimonious trees and a single distance tree that was identical to one of three best parsimony trees. These trees were similar to the best DNA tree in the major branch pattern. Als9-1p and Als9-2p formed a clade with 100 % bootstrap support in all these trees. These results indicated that, despite their large sequence differences, the two ALS9 alleles were more similar to each other than to any other ALS genes.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 9. Phylogenetic tree of nucleotide sequences from the 5' domain of C. albicans ALS genes. All sequences were from strain SC5314. Sequences of ALS1 (contig 19-10233) and ALS2 (contig 19-20090) were obtained from the C. albicans genome project data (http://www-sequence.stanford.edu/group/candida). Other sequences were from GenBank: ALS3 (AY223551), ALS4 (AF272027), ALS5 (AY227439), ALS6 (AY225310), ALS7 (AY296650), ALS9-1 (AY269423) and ALS9-2 (AY269422). The same tree was generated using MP, the ME model and ML criteria. The bar at the bottom of the figure indicates the branch length that corresponds to 10 substitutions per 100 nt. Bootstrap values with 1000 replicates are shown on the tree.

 

   DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
PCR and gene deletion experiments demonstrated that ALS5, ALS1 and ALS9 are contiguous on C. albicans chromosome 6. All three genes are transcribed in the same direction. The two ALS9-like sequences found among data from the C. albicans genome project occupy the same physical location on the homologous copies of chromosome 6 in strain SC5314 and, therefore, both encode ALS9 alleles. Within strain SC5314, alleles of ALS5, ALS1 and ALS9 with the larger tandem repeat domain are found on the same chromosome. More C. albicans isolates would need to be examined to determine if the localization of large alleles is unique to strain SC5314 or found commonly in other strains. The most marked differences between the ALS9 alleles in strain SC5314 were within the 5' domain and in the VBs in the 3' end of the coding region. Sequences within the 5' domain of each allele varied by 11 %. Analysis of a geographically diverse set of C. albicans isolates showed that both ALS9 alleles are represented within the major genetic clades and that there is little sequence divergence for the same allele between strains. This analysis also showed that the structure of ALS9 alleles is variable since the different 5' domains can be combined with either VB1 or VB2 sequence in the same allele, suggesting that recombination occurs between alleles. Previous phylogenetic analysis of the ALS family indicated that recombination is likely to play a large role in the evolution of ALS genes (Hoyer et al., 2001). Results from the current work must be interpreted in light of the possibility that the strains examined encode additional ALS9 variant alleles that were not recognized by our PCR primers. It is possible that strains apparently homozygous for various ALS9 sequences encode alleles that remain to be characterized and that the ALS9 allelic variability is greater than we document here. Within the strains examined, however, the VB1 and VB2 regions produce a consistent-sized PCR product. These results support the conclusion that there is little added variability within the VB regions and contrast with observations for other genes such as ALS7, where 3' domain sequence differences take the form of a highly divergent, complex repeated region (Hoyer & Hecht, 2000; Zhang et al., 2002).

Clues to the existence of ALS9 were first found among data from the C. albicans genome sequencing project (http://www-sequence.stanford.edu/group/candida) and the potential for allelic divergence was evident in the CandidaDB annotation (http://genolist.pasteur.fr/CandidaDB). Data from our studies have refined the information from these sources. Within the current release of the genome sequence (assembly 19), the assembly of individual ALS alleles was largely correct, but their organization onto homologous copies of chromosome 6 was not. In addition, our PCR data indicated that the region upstream of ALS5 is variable in length between alleles and the longer sequence we detected was absent from the genome assembly. The high degree of allelic sequence variability in these loci contribute to the challenge of accurate diploid sequence assembly.

Previously characterized examples demonstrated that different manifestations of allelic diversity are found in C. albicans. Indications of allelism in C. albicans were initially deduced using a spheroplast fusion approach to studying resistance to 5-fluorocytosine (Whelan et al., 1986; Whelan, 1987). Molecular biological techniques have offered several examples of C. albicans allelism, including detection of non-functional alleles in a study of the ERG3 gene (Miyazaki et al., 1999). In this work, nucleotide changes led to a premature stop codon in one allele and three amino acid changes in another that rendered both incapable of producing functional C-5 sterol desaturase. Allelic sequence variation has been detected outside coding regions as demonstrated by Yesland & Fonzi (2000) who showed that nucleotide differences in regions flanking PHR1 were responsible for biased targeting of disruption cassettes during mutant construction. Staib et al. (2002) documented allelic sequence variation in the promoter region of SAP2 and showed that the divergent sequences resulted in differential regulation of the alleles. To date, the largest published example of C. albicans allelic variation is between the a and alpha alleles of PAP, OBP and PIK in the mating type locus, where alleles are only approximately 60 % identical (Hull & Johnson, 1999). The net effect of this sequence divergence on protein function has yet to be determined.

The strong allelic variation observed for ALS9 may suggest distinct functional specificities. Sequence differences between the alleles are not random and it appears that the two alleles evolved independently, perhaps for different functions. Evidence presented here also highlights the potential for recombination between the alleles, suggesting that ALS9-1 and ALS9-2 may not be completely independent at this time. Understanding the effect of sequence differences on protein function is of primary importance as we continue to characterize the ALS family. Future work using the strains and approaches presented here will explore these relationships.


   ACKNOWLEDGEMENTS
 
We thank Aaron Mitchell for plasmid pDDB57 and Bill Fonzi for C. albicans strain CAI4. Nucleotide sequence data for C. albicans were obtained from the Stanford Genome Technology Center at http://www-sequence.stanford.edu/group/candida. Sequencing of C. albicans was accomplished with the support of the National Institute of Dental and Craniofacial Research and the Burroughs Wellcome Fund. Information about coding sequences and proteins was obtained from CandidaDB available at http://genolist.pasteur.fr/CandidaDB, which has been developed by the Galar Fungail European Consortium (QLK2-2000-00795). This research was supported by Public Health Service grants DE14158 and AI39735 from the National Institutes of Health.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Blignaut, E., Pujol, C., Lockhart, S., Joly, S. & Soll, D. R. (2002). Ca3 fingerprinting of Candida albicans isolates from human immunodeficiency virus-positive and healthy individuals reveals a new clade in South Africa. J Clin Microbiol 40, 826–836.[Abstract/Free Full Text]

Chen, M. H., Shen, Z. M., Bobin, S., Kahn, P. C. & Lipke, P. N. (1995). Structure of Saccharomyces cerevisiae alpha-agglutinin. Evidence for a yeast cell wall protein with multiple immunoglobulin-like domains with atypical disulfides. J Biol Chem 270, 26168–26177.[Abstract/Free Full Text]

Chu, W. S., Magee, B. B. & Magee, P. T. (1993). Construction of an SfiI macrorestriction map of the Candida albicans genome. J Bacteriol 175, 6637–6651.[Abstract]

Combet, C., Blanchet, C., Geourjon, C. & Deleage, G. (2000). NPS@: network protein sequence analysis. Trends Biochem Sci 25, 147–150.[CrossRef][Medline]

De Bernardis, F., Sullivan, P. A. & Cassone, A. (2001). Aspartyl proteinases of Candida albicans and their role in pathogenicity. Med Mycol 39, 303–313.[Medline]

Fonzi, W. A. & Irwin, M. Y. (1993). Isogenic strain construction and gene mapping in Candida albicans. Genetics 134, 717–728.[Abstract/Free Full Text]

Fu, Y., Ibrahim, A. S., Sheppard, D. C., Chen, Y. C., French, S. W., Cutler, J. E., Filler, S. G. & Edwards, J. E., Jr (2002). Candida albicans Als1p: an adhesin that is a downstream effector of the EFG1 filamentation pathway. Mol Microbiol 44, 61–72.[CrossRef][Medline]

Gaur, N. K. & Klotz, S. A. (1997). Expression, cloning, and characterization of a Candida albicans gene, ALA1, that confers adherence properties upon Saccharomyces cerevisiae for extracellular matrix proteins. Infect Immun 65, 5289–5294.[Abstract]

Gaur, N. K., Klotz, S. A. & Henderson, R. L. (1999). Overexpression of the Candida albicans ALA1 gene in Saccharomyces cerevisiae results in aggregation following attachment of yeast cells to extracellular matrix proteins, adherence properties similar to those of Candida albicans. Infect Immun 67, 6040–6047.[Abstract/Free Full Text]

Gillum, A. M., Tsay, E. Y. & Kirsch, D. R. (1984). Isolation of the Candida albicans genes for orotidine-5'-phosphate decarboxylase by complementation of S. cerevisiae ura3 and E. coli pyrF mutations. Mol Gen Genet 198, 179–182.[Medline]

Hicks, J. B. & Herskowitz, I. (1976). Interconversion of yeast mating types. I. Direct observations of the action of the homothallism (HO) gene. Genetics 83, 245–258.[Abstract/Free Full Text]

Hoyer, L. L. (2001). The ALS gene family of Candida albicans. Trends Microbiol 9, 176–180.[CrossRef][Medline]

Hoyer, L. L. & Hecht, J. E. (2000). The ALS6 and ALS7 genes of Candida albicans. Yeast 16, 847–855.[CrossRef][Medline]

Hoyer, L. L. & Hecht, J. E. (2001). The ALS5 gene of Candida albicans and analysis of the Als5p N-terminal domain. Yeast 18, 49–60.[CrossRef][Medline]

Hoyer, L. L., Scherer, S., Shatzman, A. R. & Livi, G. P. (1995). Candida albicans ALS1: domains related to a Saccharomyces cerevisiae sexual agglutinin separated by a repeating motif. Mol Microbiol 15, 39–54.[Medline]

Hoyer, L. L., Payne, T. L., Bell, M., Myers, A. M. & Scherer, S. (1998a). Candida albicans ALS3 and insights into the nature of the ALS gene family. Curr Genet 33, 451–459.[CrossRef][Medline]

Hoyer, L. L., Payne, T. L. & Hecht, J. E. (1998b). Identification of Candida albicans ALS2 and ALS4 and localization of Als proteins to the fungal cell surface. J Bacteriol 180, 5334–5343.[Abstract/Free Full Text]

Hoyer, L. L., Fundyga, R., Hecht, J. E., Kapteyn, J. C., Klis, F. M. & Arnold, J. (2001). Characterization of agglutinin-like sequence genes from non-albicans Candida and phylogenetic analysis of the ALS family. Genetics 157, 1555–1567.[Abstract/Free Full Text]

Hube, B. & Naglik, J. (2001). Candida albicans proteinases: resolving the mystery of a gene family. Microbiology 147, 1997–2005.[Free Full Text]

Hube, B., Stehr, F., Bossenz, M., Mazur, A., Kretschmar, M. & Schafer, W. (2000). Secreted lipases of Candida albicans: cloning, characterisation and expression analysis of a new gene family with at least ten members. Arch Microbiol 174, 362–374.[CrossRef][Medline]

Hull, C. M. & Johnson, A. D. (1999). Identification of a mating type-like locus in the asexual pathogenic yeast Candida albicans. Science 285, 1271–1275.[Abstract/Free Full Text]

Jentoft, N. (1990). Why are proteins O-glycosylated? Trends Biochem Sci 15, 291–294.[CrossRef][Medline]

Kapteyn, J. C., Hoyer, L. L., Hecht, J. E., Muller, W. H., Andel, A., Verkleij, A. J., Makarow M, Van Den Ende, H. & Klis, F. M. (2000). The cell wall architecture of Candida albicans wild-type cells and cell wall-defective mutants. Mol Microbiol 35, 601–611.[CrossRef][Medline]

Miyazaki, Y., Geber, A., Miyazaki, H., Falconer, D., Parkinson, T., Hitchcock, C., Grimberg, B., Nyswaner, K. & Bennett, J. E. (1999). Cloning, sequencing, expression and allelic sequence diversity of ERG3 (C-5 sterol desaturase gene) in Candida albicans. Gene 236, 43–51.[CrossRef][Medline]

Monod, M. & Borg-von Zepelin, M. (2002). Secreted proteinases and other virulence mechanisms of Candida albicans. Chem Immunol 81, 114–128.[Medline]

Odds, F. C. (1988). Candida and Candidosis, 2nd edn. London: Baillière Tindall.

Page, R. D. (1996). TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12, 357–358.[Medline]

Pujol, C., Pfaller, M. & Soll, D. R. (2002). Ca3 fingerprinting of Candida albicans bloodstream isolates from the United States, Canada, South America, and Europe reveals a European clade. J Clin Microbiol 40, 2729–2740.[Abstract/Free Full Text]

Santos, M. A. & Tuite, M. F. (1995). The CUG codon is decoded in vivo as serine and not leucine in Candida albicans. Nucleic Acids Res 23, 1481–1486.[Abstract]

Staib, P., Kretschmar, M., Nichterlein, T., Hof, H. & Morschhauser, J. (2002). Host versus in vitro signals and intrastrain allelic differences in the expression of a Candida albicans virulence gene. Mol Microbiol 44, 1351–1366.[CrossRef][Medline]

Whelan, W. L. (1987). The genetic basis of resistance to 5-fluorocytosine in Candida species and Cryptococcus neoformans. Crit Rev Microbiol 15, 45–56.[Medline]

Whelan, W. L., Markie, D. & Kwon-Chung, K. J. (1986). Complementation analysis of resistance to 5-fluorocytosine in Candida albicans. Antimicrob Agents Chemother 29, 726–729.[Medline]

Wickes, B., Staudinger, J., Magee, B. B., Kwon-Chung, K. J., Magee, P. T. & Scherer, S. (1991). Physical and genetic mapping of Candida albicans: several genes previously assigned to chromosome 1 map to chromosome R, the rDNA-containing linkage group. Infect Immun 59, 2480–2484.[Medline]

Wilson, R. B., Davis, D., Enloe, B. M. & Mitchell, A. P. (2000). A recyclable Candida albicans URA3 cassette for PCR product-directed gene disruption. Yeast 16, 65–70.[CrossRef][Medline]

Yesland, K. & Fonzi, W. A. (2000). Allele-specific gene targeting in Candida albicans results from heterology between alleles. Microbiology 146, 2097–2104.[Abstract/Free Full Text]

Zhang, N., Harrex, A. L., Holland, B., Cannon, R. D. & Schmid, J. (2002). Genomic markers of pathogenicity of Candida albicans. Abstract S-9. ASM Conference on Candida and Candidiasis, Tampa, FL, USA. Washington, DC: American Society for Microbiology.

Received 18 May 2003; revised 2 July 2003; accepted 3 July 2003.