From the The frequent variations of human complement
component C4 gene size and gene numbers, plus the extensive
polymorphism of the proteins, render C4 an excellent marker for major
histocompatibility complex disease associations. As shown by definitive
RFLPs, the tandemly arranged genes RP, C4,
CYP21, and TNX are duplicated together as a
discrete genetic unit termed the RCCX module. Duplications of the RCCX
modules occurred by the addition of genomic fragments containing a long
(L) or a short (S) C4 gene, a CYP21A or a
CYP21B gene, and the gene fragments TNXA and
RP2. Four major RCCX structures with bimodular L-L,
bimodular L-S, monomodular L, and monomodular S are present in the
Caucasian population. These modules are readily detectable by
TaqI RFLPs. The RCCX modular variations appear to be a root
cause for the acquisition of deleterious mutations from pseudogenes or
gene segments in the RCCX to their corresponding functional genes. In a
patient with congenital adrenal hyperplasia, we discovered a
TNXB-TNXA recombinant with the deletion of
RP2-C4B-CYP21B. Elucidation of the
DNA sequence for the recombination breakpoint region and sequence
analyses yielded definitive proof for an unequal crossover between
TNXA from a bimodular chromosome and TNXB from a monomodular chromosome.
Besides the immunoglobulins, complement component C4 is probably
the most polymorphic serum protein. There are two isotypes, C4A and
C4B, that manifest remarkable differences in chemical reactivities and
serological properties (reviewed in Ref. 1). More than 34 allotypes for
C4A and C4B have been demonstrated by agarose gel electrophoresis,
based on gross differences in electric charge (2). Similar to the
protein, the complement C4 genes are unusually complex with
frequent variations in gene size and gene number. In addition, the
genes surrounding C4A or C4B also exhibit
considerable variations. These neighboring genes include RP1
or RP2 at the 5' region, CYP21A, or
CYP21B and TNXA or TNXB at the 3'
region (Fig. 1). The complex organizations of the C4A and
C4B genes, together with the extensive polymorphisms of the
C4A and C4B proteins render C4 an excellent marker for MHC1-associated diseases (1,
3). For instance, congenital adrenal hyperplasia (CAH) is mainly caused
by mutations or deletions of CYP21B (4), and systemic lupus
erythematosus is correlated with C4A deficiencies (5). In
addition, insulin-dependent diabetes mellitus (6, 7);
sudden infant death syndrome and spontaneous recurrent abortion (8, 9);
IgA deficiency and common variable immunodeficiency (10, 11); IgA
nephropathy (12); skin vitiligo and pemphigus vulgaris (13, 14); and
autism and narcolepsy (15, 16) have all been suggested to be associated
with specific alleles or null alleles of C4.
The human C4 genes are either 21 kb (long, L) or 14.6 kb
(short, S) in size (17). This dichotomous size variation is due to the
presence of an endogenous retrovirus HERV-K(C4) in intron 9 of the long
gene (18-20). There may be one, two, or three C4 genes in
the MHC class III region of chromosome 6 (21). Most people have two C4
genes in the MHC with one coding for a C4A protein and the other coding
for a C4B protein (Fig. 1). C4A has higher affinities to amino
group-containing targets; C4B has higher affinities to hydroxyl
group-containing targets. These differences are the result of four
amino acid changes between positions 1101 and 1106 (22-24). A
significant proportion of the population has a single C4
gene in chromosome 6 coding for C4A or for C4B. Deletion or duplication
of the C4 genes are always concurrent with their downstream genes, steroid 21-hydroxylase genes, CYP21A or
CYP21B (25, 26).
RP1 is one of the four novel genes,
RD-SKI2W-DOM3Z-RP1, present in the 30-kb genomic region
between complement component genes factor B (Bf)
and C4 (27-31). RP2 is a partially duplicated
gene segment that contains only 913 bp of the sequence corresponding to
the last two and one-half exons of RP1. RP1 transcripts are ubiquitously expressed. Derived amino acid sequence suggested that
RP1 codes for a nuclear protein that is probably a
serine/threonine kinase (27, 30, 32).
The cytochrome P450 steroid 21-hydroxylase genes CYP21A or
CYP21B are located 3028 bp downstream of C4A or
C4B, respectively (17). CYP21 is essential for the
biosynthesis of glucocorticoid and mineralocorticoid hormones. The
complete absence of CYP21 leads to salt wasting, low activity of CYP21
causes simple virilizing, and below average CYP21 activity causes
androgen excess (reviewed in Ref. 4). CYP21A is a
pseudogene, because it contains three deleterious mutations: an 8-bp
deletion in exon 3 and a T nucleotide insertion in exon 7 that result
in frameshift mutations as well as a C to T transition in exon 8 that
generates a premature stop codon. In addition, there are many other
point mutations in coding and noncoding sequences (33-35). If on both
copies of chromosome 6, the deleterious mutations in CYP21A
are incorporated into CYP21B or the CYP21B genes
are deleted, the subject suffers CAH.
The 3'-ends of CYP21A or CYP21B overlap with the
3'-ends of extracellular matrix protein tenascin TNXA or
TNXB by 444 bp, respectively (36). The gene configurations
of TNXA and TNXB are opposite to those of
RP, C4, and CYP21. The TNXB
gene is 68.2 kb in size, consists of 45 exons and encodes a protein of
4289 amino acids (38).2 The
derived amino acid sequence of TNXB reveals a heptad, 18.5 epidermal
growth factor repeats, 32 fibronectin type III repeats, and a
fibrinogen domain. The overall structure of TNXB shows a striking
similarity to extracellular matrix proteins tenascin/cytostatin (TN-C)
and restrictin (TN-R) (39-41). TN-C is present in the central and
peripheral nervous system and in smooth muscle and tendon. It is
probably involved in cell adhesion and cell morphology (42). TN-R is
expressed in the nervous system and implicated in neural cell
attachment (43). The function of TNXB is yet to be determined. Its
transcripts are ubiquitously expressed in the fetus (38). TNXA is a partially duplicated gene segment that corresponds
to intron 32 to exon 45 of TNXB. In addition, there is a
120-bp deletion at exon 36-intron 36 that results in a frameshift
mutation and premature termination of translation (30, 40).
The demonstration of the endogenous retrovirus HERV-K(C4) mediating the
size variation of C4 genes (18) and the elucidation of DNA
sequences for RP1 and RP2 (30), C4A
and C4B (22, 17), CYP21A and CYP21B
(33-35), and TNXA and TNXB (30, 38)2
provide important information for resolving the fine structures and the
complex organizations of the consecutive genes RP, C4, CYP21, and TNX. The concurrent deletions/duplications
of C4 and CYP21 genes (25, 26) prompted us to
investigate if RP and TNX also undergo
rearrangements in normal individuals and in selected CAH patients.
Diagnostic RFLPs for RP1 and RP2, and, for
TNXA and TNXB, have been devised.
RP-C4-CYP21-TNX genes are organized in variable, modular
fashions. The unusually frequent modular variation appears to be the
root cause for unequal crossovers and exchange of sequences between the
functional and nonfunctional genes of the RCCX.
Oligonucleotides--
Oligonucleotides were synthesized by an
Applied Biosystems model 380B DNA Synthesis machine. The sequences
added to facilitate cloning are represented in lowercase type. For
amplification and sequencing of TNX genes (38)
the following oligonucleotides were used: RDX-5, aga gAA TTC
AGT GAA ATC AGG GAG ACC; RDX-3, gag gaa TTC CAG TGC AGC ACG
GCG AA; SDX-52, GGA GCC TCA GAG TGT GCA; SDX-32,
CAA TCG GAG CCT CCA CCA; XB54H, gtg gaa ttc AAG CGA GCA CCT
GAC TCA; and XA31H, gtt gaa ttc TTT TCT TGA CTC CCA CCT G. For amplification of CYP21A probe (35), the following were used: 21A5, TGT GGC CAT TGA GGA GGA A; and 21A3,
TGC CAC CGA TCA GGA GGT C.
Isolation of Human Genomic DNA--
Genomic DNAs were isolated
following standard protocols from cultured cell lines HepG2 (liver
carcinoma) and MOLT4 (T-cell leukemia), peripheral blood of normal
individuals, and a congenital adrenal hyperplasia patient (CAH-E1).
Appropriate consents from blood donors were obtained according to
approved protocols by the Institutional Board of the Columbus
Children's Hospital.
Complement C4 Allotyping--
Complement C4A and C4B allotypes
from EDTA-blood plasma were determined as described in Refs. 44 and 45.
Briefly, 10 µl of plasma was digested with 0.1 unit of neuraminidase
(Sigma) at 4 °C overnight and with 0.1 unit of carboxypeptidase B
(Sigma) at room temperature for 30 min. Two agarose gels were prepared. Four µl of digested plasma was loaded to each agarose gel and resolved by high voltage gel electrophoresis (46). One of the agarose
gels was subjected to immunofixation using goat antiserum against human
C4 (Incstar, Stillwater, MN). Plasma proteins in the other agarose gel
were subjected to immunoblot analyses using anti-Ch1 or anti-Rg1
monoclonal antibodies (anti-C4B, catalog no. C057-325.2, lot no.
120287; anti-Rg1, RGd1; kindly provided by Dr. Joann M. Moulds,
Houston, TX) at a dilution of 1:5000 and 1:1000, respectively. Immune
complexes were detected by the chemiluminescence method using the ECL
Plus reagents (Amersham Pharmacia Biotech).
HLA Typing--
HLA typing of the E family was kindly performed
by The Ohio State University Tissue Typing Laboratory.
PCR of Cosmid and Genomic DNA--
PCR of cosmid and genomic DNA
were performed following standard procedures (47). PCR products were
purified from 0.8% low gelling temperature agarose and cloned into
pBluescript vectors.
DNA Probes--
DNA probes used were as follows: for
RP, RP1.1, a 1.1-kb insert of RP1
cDNA (30) and RP1 3' probe, a 651-bp
NheI-EcoRI fragment of RP1.1; for
C4, PA, a 476-bp
BamHI-KpnI cDNA fragment isolated from pAT-A
(pAT-A contains the almost full-length cDNA insert for the human
C4A4 allele (22, 48)) and PB, a
C4d-specific 926-bp BamHI DNA fragment subcloned
from Southern Blot Analysis--
Ten µg of genomic DNA was digested
to completion with the appropriate restriction enzymes for 16 h,
resolved on an agarose gel, blotted onto Hybond-N+ membrane
(Amersham Pharmacia Biotech), and hybridized with an appropriate
[ DNA Sequencing and Sequence Analysis--
Sequencing reactions
were performed using a Sequenase kit (U.S. Biochemical Corp.,
Cleveland, OH) and [35S]ATP following the dideoxy
sequencing method (51), or by the automated sequencing method using an
ABI 377 machine. DNA sequences were compiled using PC/Gene software
(Intelligenetics, Mountain View, CA). Comparisons of the sequences were
performed by FASTA, BESTFIT, PILEUP, and PRETTY programs in the GCG
package through the Pittsburgh Supercomputing Center (52).
Allotyping of C4--
Allotyping of the plasma C4 proteins from
the E family is shown in Fig.
2I. Phenotypically, E1 has
C4A1, C4A3, and C4BQ0 (lane 1); E2 has C4A3 and
C4BQ0 (lane 2); and E3 has C4A1, C4A3, and C4B1
(lane 3). Lane 4 shows a
control plasma with C4A3 and C4B1. Immunoblot analyses revealed
positive reactions of C4A1 in E1 (lane 1), C4A1
and C4B1 in E3 (lane 3), and C4B1 in the control plasma (lane 4) with anti-Ch1 monoclonal
(panel II). Similarly, positive reactions were
observed for the C4A3 allotype in E1, E2, E3, and a control plasma with
the anti-Rg1 monoclonal (lanes 1-4,
panel III). In separate experiments on family B
(data not shown), C4 allotyping showed that B1 has C4AQ0, C4B1; B2 has
C4A3, C4B1, C4BQ0; and B3 has C4A3, C4AQ0, C4B1. Immunoblot analyses showed that the C4A3 allotype reacted with anti-Rg1 and the C4B1 reacted with anti-Ch1.
Modular Variations for RP, C4, CYP21, and TNX--
Whether the
flanking genes RP (located 5' to C4) and
TNX (located 3' to CYP21) are involved in the
C4-CYP21 gene deletion events were investigated. Diagnostic
RFLPs were devised to detect and distinguish the presence of the
RP1 and RP2 as well as TNXA and
TNXB. By using an RP1 3' probe for hybridization,
the RP1 gene can be represented by a 9.6-kb BamHI
fragment, and the RP2 (and TNXA gene segments)
can be represented by a 5.0-kb BamHI fragment. By using a
600-bp TNX probe corresponding to TNXB exons 35-37, the presence of the TNXB gene can be detected by a
9.0-kb ScaI fragment, and the TNXA gene can be
detected by a 4.0-kb ScaI fragment (Fig.
3I). Previously, techniques
were established for detecting the presence of the C4A and
C4B genes by NlaIV RFLP, and their associations
with Rg1 or Ch1 antigenic determinants by EcoO 109I RFLP
(50). In addition, the presence of CYP21A and
CYP21B can be detected by the 3.2- and 3.7-kb
TaqI fragments, respectively (34). Fig. 3 (panels
II-IV) shows a series of Southern blot analysis of DNA
samples isolated from two families, B and E. Members of the B family
appear to be normal. For the E family, there is a CAH patient E1.
As shown in Fig. 3II, restriction fragments corresponding to
RP1 (9.6-kb BamHI fragment) and RP2
(5.0-kb BamHI fragment) were detected in all individuals
except B1 and E1. In these two individuals, only the
RP1-specific 9.6-kb fragment is present (lanes
1 and 5).
The presence of the C4A or C4B gene was analyzed
by NlaIV restriction patterns shown in in Fig.
3III (A). B1 contains the 467-bp fragment
corresponding to C4B genes (lane 1).
E1 and E2 have the 276- and 191-bp fragments specific for
C4A genes (lanes 5 and 6).
Other individuals contain fragments corresponding to both
C4A and C4B genes. The associations of
C4 genes with the major Chido (Ch1) and Rodgers (Rg1)
antigens were revealed in Fig. 3III (B) by the
EcoO 109I restriction patterns. The C4B genes in
B1 express the Ch1 antigen, since only the 458-bp fragment can be
detected (lane 1). The C4A genes in E2
express the Rg1 epitope, since only the 565-bp fragment is present
(lane 6). Both Rg1- and Ch1-specific fragments
are present in E1 (lane 5). Therefore, one of the
C4A genes (C4A1) in E1 expresses Ch1 (that is frequently associated with C4B), and the other C4A gene
(C4A3) expresses Rg1. All other individuals contain both C4A
and C4B genes. These C4 genes express Rg1 and Ch1
because both 565- and 458-bp fragments were detectable. An additional
344-bp Eco0 109I fragment exists in all individuals, since
the 926-bp C4d-specific probe (PB) was used.
This fragment is common to C4 genes with Rg1 or
Ch1 (50).
The presence of CYP21A and CYP21B was determined
by TaqI RFLP (Fig. 3IV). Both CYP21A
(3.2-kb fragment) and CYP21B (3.7-kb fragment) were detected
in all samples except B1 and E1. B1 only contains the functional
CYP21B gene (lane 1), while the CAH
patient E1 only contains the pseudogene CYP21A
(lane 5).
The presence of TNXA and TNXB genes are
determined by ScaI RFLP (Fig. 3V). Restriction
fragments for both TNXA (4.0-kb ScaI fragment)
and TNXB (9.0-kb ScaI fragment) were detected in
all individuals except B1 and E1 (Fig. 3II (B)).
These two individuals have the 9.0-kb ScaI restriction
fragment corresponding to TNXB but no 4.0-kb ScaI
fragment corresponding to TNXA (lanes
1 and 5).
From the above results, it becomes clear that individuals who have a
single locus for RP also have single gene loci for
C4, CYP21, and TNX. Individuals with
both RP1 and RP2 loci also have duplicated loci
for C4, CYP21, and TNX. Hence, the
four tandemly arranged genes RP, C4,
CYP21, and TNX are duplicated or deleted together
as a discrete genetic unit. This genetic unit is designated as the
RCCX module.
Molecular Basis of the C4 and RP TaqI RFLP and Variations of the
RP, C4, CYP21, and TNX (RCCX) Loci in the B and E Families--
The
TaqI RFLP is the most widely used technique to illustrate
the complexities of polymorphisms in the number and size of C4 genes. Four fragments of 7.0, 6.4, 6.0, and 5.4 kb in
size can be detected by Southern blot analysis of
TaqI-digested genomic DNA hybridized to PA, a
C4 5' probe (26, 50, 53). At the same time, all these
TaqI fragments can also be detected by an RP1.1
probe, suggesting that the TaqI RFLP is caused by both
RP and C4 gene variations. There is a
BamHI site at the 5'-untranslated region of the
C4A and C4B genes. To segregate the
RP1/RP2 variation from the C4 long/short
(L/S) size dichotomy, the BamHI-TaqI
double digests of genomic DNA were performed (Fig.
4I). As shown in Fig. 4II, the single 6.4-kb TaqI fragment for B1 (Fig.
4II (A), lane 1) was split
into a 3.1-kb TaqI-BamHI fragment corresponding
to an RP1 gene (Fig. 4II (B),
lane 1) and a 3.2-kb
BamHI-TaqI fragment corresponding to a short
C4 gene (Fig. 4II (C), lane
1). The 7.0- and 6.0-kb TaqI fragments detected
in HepG2 (Fig. 4II (A), lane 2) were split into 3.1- and 2.1-kb
TaqI-BamHI fragments corresponding to
RP1 and RP2 genes (Fig. 4II
(B), lane 2) and a single 4.0-kb BamHI-TaqI fragment corresponding to long
C4 genes (Fig. 4II (C), lane 2). The 7.0- and 5.4-kb TaqI
fragments detected in MOLT4 (Fig. 4II (A),
lane 3) were divided into the RP1-associated
3.1-kb TaqI-BamHI fragment and the
RP2-associated 2.1-kb TaqI-BamHI
fragment (Fig. 4II (B), lane 3) and
the 4.0- and 3.2-kb BamHI-TaqI fragments associated with long C4 genes and short C4 genes,
respectively (Fig. 4II (C), lane 3).
Therefore, the restriction patterns of the TaqI genomic
Southern blot for RP or C4 genes are resulted from a combination of variations of RP1 or RP2,
one or two loci of C4 genes, and long or short C4
genes. The basis of the TaqI RFLP is interpreted as follows.
The 7.0-kb fragment indicates the presence of an RP1 gene
linked to a long C4 gene; the 6.4-kb fragment corresponds to
an RP1 gene linked to a short C4 gene; the 6.0-kb
fragment represents an RP2 linked to a long C4
gene; and the 5.4-kb fragment stands for an RP2 linked to a
short C4 gene. The actual sizes of these TaqI
restriction fragments derived from DNA sequences are 0.1 kb greater or
smaller than the apparent sizes described above.
The specific combinations of RP1/RP2 genes with
C4(L)/C4(S) genes in the B and E family members were
examined by TaqI digests and RP1.1 probe as shown
in Fig. 5I. The
RP1-C4(S) haplotype is present in B1, as represented by the
single 6.4-kb TaqI fragment (lane 1).
The RP1-C4(L) haplotype is present in E1, as shown by the
single 7.0-kb TaqI fragment (lane 5).
Therefore, homozygous, single RP1 gene, and single
C4 gene are present in B1 and E1. B2 has the homozygous
RP1-C4(L)-RP2-C4(L) haplotype as revealed by the presence of
the 7.0- and 6.0-kb TaqI fragments (lane
2). B3 and B4 (children of B1 and B2) are heterozygous for
the RP1-C4(S) haplotype and the
RP1-C4(L)-RP2-C4(L) haplotype, since they both contain the
7.0-, 6.4-, and 6.0-kb TaqI fragments (lanes
3 and 4). E2 and E3 (parents of E1) are
heterozygous for the RP1-C4(L)-RP2-C4(L) haplotype and the
RP1-C4(L) haplotype, since they both have the 7.0- and
6.0-kb TaqI fragments (lanes 6 and
7).
Detection of the TNXA-associated 120-bp Deletion in a TNXB Gene of
CAH Patient E1--
Compared with TNXB, TNXA has
not only a 63-kb truncation of the 5' region (30, 38)2 but
also a 120-bp deletion spanning across the junction of exon 36 and
intron 36 (30). This deletion attributes to a TaqI RFLP when
a TNX 3' probe is used for Southern blot analysis;
TNXA associates with a 2.4-kb fragment (TNX-2.4), and
TNXB associates with a 2.5-kb fragment (TNX-2.5). The
analysis of this TNX TaqI polymorphism in the B and E
families are shown in Fig. 5III. B1, who has a monomodular
RCCX (S) structure, exhibited TNX-2.5 (lane 1)
that is consistent with the presence of TNXB genes only. B2,
who has a bimodular RCCX (L-L) structure, revealed both TNX-2.4 and
TNX-2.5 (lane 2), which is concordant with the
presence of both TNXA and TNXB. B3 and B4, who
are heterozygous for bimodular RCCX (L-L) and monomodular RCCX (S),
also revealed both TNX-2.5 and TNX-2.4 fragments.
E3 is heterozygous for monomodular RCCX (L) and bimodular RCCX (L-L),
implying that she has two TNXB genes and one TNXA
gene. As expected, the relative band intensity of TNX-2.5 to TNX-2.4 is
2:1 (lane 7). E2 is also heterozygous for
monomodular RCCX (L) and bimodular RCCX (L-L). He has two
TNXB genes and one TNXA gene in a diploid genome.
However, the band intensity of TNX-2.5 is only half of that for TNX-2.4
(lane 6), which is opposite from the expected
result. The CAH patient E1 is homozygous for monomodular RCCX (L) and
has only TNXB but no TNXA. Unexpectedly, he
manifested both the TNXB-related 2.5-kb fragment and
TNXA-related 2.4-kb fragment. The relative band intensity
for TNX-2.5 and TNX-2.4 is 1:1. These results suggest that in both E1
and E2, one of the TNXB genes contains the 120-bp deletion.
Hence, the aberrant TNXB gene in E1 originates from the
paternal chromosome. Correlation of the HLA typing data for the E
family (Table I) and the TaqI RFLP data suggests that the aberrant TNXB gene is present in
HLA haplotype A3 B35 DR1.
Based on the phenotypes of the C4A and C4B proteins and haplotypes of
the RCCX modules, the organizations of RCCX genotypes for the B and E
families were deduced and are shown in Table I. The haplotype
c of the B family and the haplotype a of the E
family both have bimodular RCCX structures coding for C4A3 protein from the two C4 loci.
An Unequal Crossover between Monomodular and Bimodular RCCX
Chromosomes Leading to Gene Deletion and Gene Duplication--
The
presence of a monomodular RCCX structure in E1 with
RP1-C4(L), CYP21A, and a
TNXB gene with TNXA-associated 120-bp deletion leads us to hypothesize that there was an unequal crossover between TNXA and TNXB from the homologous chromosomes.
This unequal crossover resulted in the formation of an XB-XA
recombinant and the deletion of the RP2-C4B-CYP21B genes
(Fig. 6I). To test this
hypothesis, the genomic region spanning the puta-tive 120-bp deletion
was amplified by PCR using the CAH-E1 genomic DNA. The strategies for
PCR are depicted in Fig. 6II. When primers RDX5 and RDX3
were used, two PCR products of 493 and 613 bp were obtained. These products were cloned and sequenced. Both sequences correspond to exons
35-37 of TNX. The sequence of the 613 bp is identical to
that of the regular TNXB, while the 493-bp product appears to contain a 120-bp deletion similar to that observed in
TNXA. To further prove that this is an aberrant
TNXB gene with a TNXA-associated 120-bp deletion,
a 2.6-kb genomic DNA fragment was amplified using RYM25 and XA31H.
RYM25 is a TNXB-specific primer because it is located in
exon 32 and is 220 bp upstream of the gene duplication breakpoint for
TNXA and TNXB. The XA31H primer spans across the 120-bp deletion and therefore is TNXA-specific. The 2.6-kb
PCR product was cloned and sequenced to completion. The sequences of
the 2.6- and 493-bp fragments overlap and so were compiled. The
resulting sequence E1XB-A was aligned with two normal TNXB sequences (TNXB-H and TNX-M) (38)2 and two normal
TNXA sequences (TNXA-M and TNXA-Y) (30, 54) obtained from
three different laboratories. In addition, we included a recombinant
TNXA sequence in a pauciarticular juvenile rheumatoid arthritis (JRA) patient, L1XA-B, that acquired the described 120-bp genomic DNA sequence (64). The alignment starts at the breakpoint of
gene duplications for TNXA-TNXB. The alignment reveals a
picture for the original DNA recombination leading to the reciprocal
recombinant sequences in E1XB-A and L1XA-B (Fig.
7I). There are nine single nucleotide changes or informative sites (marked by
asterisks) which can differentiate TNXA from
TNXB. A Chi sequence related to DNA recombination
in bacteriophage
For L1XA-B, it has sequences characteristic of TNXA at the
first five informative sites but has sequences characteristic of TNXB at the last four sites and without the 120-bp deletion.
This aberrant TNXA sequence in L1XA-B appears to be the
reciprocal product of E1XB-A resulting from DNA recombination between
TNXA and TNXB.
Here we show definitively that in human MHC class III region, four
tandemly arranged genes serine/threonine kinase RP,
complement C4, steroid 21-hydroxylase CYP21, and
tenascin TNX are organized as a genetic unit designated as
an RCCX module. In a monomodular RCCX haplotype, the "full-length"
genes RP1 and TNXB are always present, implying
relevant cellular functions for RP1 and for TNXB. In an RCCX bimodular
haplotype, duplication of the RCCX module occurs by the addition of a
C4 gene and a CYP21 gene together with the
TNXA and RP2 gene segments. This additional,
modular genomic fragment is either 32.5 or 26.2 kb in size, depending on whether the C4 gene contains the endogenous retrovirus
HERV-K(C4). The three pseudogenes/gene segments, CYP21A,
TNXA, and RP2, present between the two
C4 loci, probably do not encode for functional protein
products. The concurrent deletions of a C4A or
C4B gene with a CYP21A or CYP21B gene
is a well established phenomenon (25, 26). This report provides the
detailed documentation for the modular deletions or duplications of
RP and TNX genes together with C4 and
CYP21.
Although this multiple-gene modular variation observed in RCCX is
uncommon in mammalian genetics, a similar phenomenon with concurrent
variations of at least three tandem genes has been observed in a
genomic region at chromosome 5q12-13 (reviewed in Ref. 55). The three
genes are BTFp44 (a p44 subunit of transcriptional factor
TFIIH), NAIP (neuronal apoptosis inhibitory protein), and SMN (survival motor neuron). One, two, or three modules of
these three genes (each module spans about 500 kb in size) are present in the population (56). The divergence of sequences in C4A
and C4B is analogous to that in SMNT
and SMNC; the presence of the pseudogene
CYP21A and the functional CYP21B is analogous to
the pseudogene NAIP The frequency of the RCCX modular variations has been studied in a
population of 150 normal Caucasian females. It has been discovered that
75.4% of the C4 genes are long, and 24.6% are short.
Bimodular and monomodular RCCX organizations are present in about 71.6 and 16.2% of the chromosome 6, respectively. Trimodular RCCX
haplotypes have a frequency about 12.2%
(58).3 Excluding the
trimodular haplotypes (2, 59), there are four major RCCX modular
structures in the Caucasian population: bimodular long-long (L-L),
bimodular long-short (L-S), monomodular long (L), and monomodular short
(S) (Fig. 8). These four RCCX structures can be detected conveniently by TaqI RFLPs. The widely
applied TaqI RFLP analysis of C4 genes yields
information on the combination of RP1 or RP2 with
C4(L) or C4(S). It does not yield
definitive information, however, on whether the C4 gene
codes for C4A or C4B proteins. The information for the presence of
C4A and C4B genes may be obtained by
NlaIV RFLP analysis or by direct DNA sequencing. From these
four RCCX organizations, 12 haplotypes for RP1/RP2,
C4A/C4B, CYP21A/CYP21B, TNXA/TNXB are
observable in the normal and in disease populations. Six of these
haplotypes are more common in the normal population and they are
highlighted (haplotypes 1, 2, 5, 9, 10, and 12). RCCX bimodular
haplotypes with two C4B genes (haplotypes 4 and 8) are yet
to be shown definitively. Bimodular haplotypes with two
CYP21A genes (haplotypes 3 and 6) and a monomodular
haplotype with a CYP21A gene (haplotype 11) are present in
CAH patients. In two Brazilian tribes, a fifth RCCX organization with
two short C4 genes is found (60). This bimodular
C4(S)-C4(S) combination (haplotype 13) is extremely rare in
other ethnic groups.
Children's Hospital Research Foundation,
Columbus, Ohio 43205, the § Molecular,
Department of Medical Microbiology and
Immunology, Ohio State University, Columbus, Ohio 43210, and the
** University of Cincinnati Medical Center, Cincinnati, Ohio 45229
ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES
INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES
View larger version (11K):
[in a new window]
Fig. 1.
A molecular map of the human MHC complement
gene cluster. This map represents the most common organization of
the genes in the normal population from complement component
C2 gene to extracellular matrix protein TNXB
gene. The horizontal arrows represent the direction of gene
transcription. Pseudogenes or gene segments are shaded. The
negative signs for the intergenic distances between CYP21A
and TNXA and between CYP21B and TNXB
represent overlaps at the 3'-ends of these genes (data adapted from
Ref. 27).
EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES
JM-2a that contains a C4B5 gene (22); for
CYP21, a 757-bp fragment of CYP21A, amplified from cos 2 using primers 21A5 and 21A3 (a cosmid isolated
from a human genomic library DA (49), cos 2 spans from the 5' region of C4A1 gene to the 3' region of
TNXB gene; the genomic DNA of cos 2 derives from
the HLA haplotype A3 B47 DR7 that contains a deletion of the
CYP21B gene); for TNX, a 600-bp fragment of TNXB, corresponding to exons 35-37 of TNXB,
amplified from cos 2 using primers RDX-5 and RDX-3.
-32P]dCTP-labeled probe, as described in Ref. 50.
RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES
View larger version (34K):
[in a new window]
Fig. 2.
C4 allotypes of the E family.
I, human C4A and C4B allotypes detected by immunofixation.
Immunoblot analysis is shown of the C4 allotypes using anti-Ch1
monoclonal antibody (II) and anti-Rg1 monoclonal antibody
(III). The C4A1 allotype with reverse association of Ch1
epitope is marked by an arrow. Lane 4 is from a
control plasma with C4A3 and C4B1.
View larger version (32K):
[in a new window]
Fig. 3.
Modular variations of
RP, C4, CYP21, and
TNX. DNA samples were isolated family B (B1-B4)
and family E (E1-E4), digested by appropriate restriction enzymes, and
subjected to Southern blot analyses. Panel I illustrates
the structural basis to distinguish RP1 and
RP2 by BamHI RFLP, and
TNXA and TNXB by ScaI
RFLP. Solid bars represent locations of the probes used.
Panel II shows detection of RP1 and
RP2 genes by BamHI RFLP. The probe used
was RP1.1. Panel III-A shows the variation of
C4A and C4B isotypes by
NlaIV RFLP; panel III-B reveals the
association of the Rg1 and Ch1 antigenic determinants in
C4 genes by EcoO109I RFLP. The probe used
was PB. Panel IV exhibits the presence of
CYP21A and CYP21B genes by
TaqI RFLP. The probe used was a 757-bp fragment
amplified from CYP21A. Panel V shows the detection of
TNXA and TNXB and ScaI
RFLP. The probe used was a 600-bp fragment corresponding to exons
35-37 of TNXB.
View larger version (25K):
[in a new window]
Fig. 4.
Southern blot analysis to segregate the
RP1/RP2 variation from the C4
long/short (L/S) size dichotomy. Panel
I is a schematic diagram for the molecular basis of the
TaqI and TaqI-BamHI RFLP. In
panel II (A), genomic DNA isolated
from B1 (lane 1), HepG 2 (liver carcinoma,
lane 2) and Molt 4 (T-cell leukemia,
lane 3) were digested with TaqI,
blotted, and hybridized with RP1.1 probe. In
panel II (B and C), DNAs
were double-digested with TaqI and BamHI and
subjected to RP1.1 probe (B) or C4 5'
probe PA (C).
View larger version (61K):
[in a new window]
Fig. 5.
Taq I polymorphism to detect the
TNXA-associated 120-bp deletion in the TNXB
gene of CAH patient E1. Genomic DNA isolated from B and E
families were digested with TaqI, blotted, and hybridized
with appropriate probes to demonstrate different RP-C4
combinations (panel I), the presence of
CYP21A and CYP21B genes (panel
II), and the variations of TNX genes
(panel III). The 2.5-kb Taq I fragment
is usually associated with TNXB, while the 2.4-kb
Taq I fragment is usually associated with TNXA.
An arrow indicates the unusual association of a 2.4-kb
fragment with TNXB in E1.
RCCX modular structures of the B and E families
is located at nucleotides 450-457. Five of the
TNXA/TNXB informative sites (nucleotides 14, 60, 63, 73, and
278) are 5' to the Chi site. The other four sites
(nucleotides 2234, 2268, 2273, and 2423) are 3' to the Chi site and they are flanking the 120-bp deletion. For E1XB-A, the sequence is TNXB-specific for the first five informative
sites but is TNXA-specific for the last four sites. It also
acquires the TNXA-specific 120-bp deletion (Fig.
7II). This suggests that E1XB-A is the result of a
recombination between TNXB and TNXA, occurring
between nucleotides 278 and 2234.
View larger version (26K):
[in a new window]
Fig. 6.
I, a model for an unequal crossover
between a bimodular RCCX chromosome and a monomodular chromosome to
generate a TNXB/XA recombinant and a TNXA/XB
recombinant; II, PCR strategy to amplify the breakpoint
region of gene recombination and its corresponding exon-intron
structure.
View larger version (26K):
[in a new window]
Fig. 7.
I, an alignment of the normal and
recombinant TNXA and TNXB genomic DNA sequences.
Only the sequences around the informative sites are shown. The
nucleotide numbering starts at the breakpoint region
(boldface and underlined) of the TNXB and TNXA
gene duplication. The normal TNXB sequences, TNXB-H
(GenBankTM accession no. U89337) and TNXB-M
(GenBankTM accession no. X71937), are taken from Footnote 2 and Ref. 38, respectively. The normal TNXA sequences, TNXA-Y
(GenBankTM accession no. L20263) and TNXA-M
(GenBankTM accession no. S38953) are taken from Ref. 30 and
Ref. 38, respectively. E1XB-XA was generated by this work. L1XA-XB
(GenBankTM accession no. AF077974) is from Ref. 72.
TNXA- and TNXB-specific sequences or informative
sites are marked by asterisks. II, a
schematic diagram showing the reciprocal
locations of TNXA and TNXB informative sites in
the TNX recombinants of CAH-E1 and JRA-L1. Informative sites
are shown as vertical strokes; an open box
represents the 120-bp deletion. The continuous DNA sequence for the
breakpoint region of the TNXB/TNXA recombinant (E1XB-A) is
available from GenBankTM under the accession number
AF086641.
DISCUSSION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES
(with deletion of exon 5)
and the intact NAIP. The absence of the
SMNT gene is associated with most spinal
muscular atrophy, and deletion of the NAIP gene in most
severe forms of spinal muscular atrophy (57).
View larger version (17K):
[in a new window]
Fig. 8.
RCCX modular variations in the human
population. Monomodular and bimodular RCCX structures with 13 different haplotypes of RP, C4, CYP21,
and TNX gene combinations are shown. The common haplotypes
are in boldface type. The bimodular and monomodular
haplotypes are present in 71.6 and 16.2% of the normal Caucasian
population (C.A. Blanchong and C. Y. Yu, manuscript in
preparation). Disease haplotypes associated with CAH are marked with
asterisks. 21A, CYP21A; XA,
TNXA. The presence of haplotypes 4 and 8 (indicated by
question marks) has not been definitively
demonstrated.
The diversities in the number and size of the RCCX modules probably contribute to the genetic variability and instability of the HLA class III region. A recombination between a bimodular RCCX chromosome and a monomodular RCCX chromosome may lead to the exchange or homogenization of polymorphic or mutant sequences between complement C4A and C4B loci. This has been demonstrated by the presence of the C4A-associated amino acid residues in many C4B allotypes, and vice versa. The typical examples are the reverse associations of Ch1 antigenic determinant with C4A1 and C4A13 and of Rg1 antigenic determinant with C4B5 (22, 61, 62). It was also shown that in a systemic lupus erythematosus patient, there is an acquisition of a 2-bp insertion in exon 29 from C4AQ0 to C4BQ0 (63, 64). The same type of recombination may also lead to the acquisition of mutations from pseudogene CYP21A or gene segment TNXA to their corresponding functional genes (65, 66). This is manifested in many disease-associated haplotypes such as the presence of two CYP21A pseudogenes in RCCX bimodular haplotypes (65) and a CYP21A/CYP21B hybrid gene in the HLA haplotype A3 B47 DR7 with RCCX monomodular structure of CAH patients (67, 68). Another example comes from the presence of a TNXB/TNXA hybrid gene with the 120-bp deletion together with the deletion of the RP2-C4B-CYP21B gene in CAH patient(s), as demonstrated in this paper.
In CAH patient E1, the 120-bp deletion in the TNXB-XA recombinant will cause a premature termination and therefore truncation of the carboxyl-terminal sequences. The truncation includes three fibronectin type III repeats (with four N-linked glycosylation sites) and the entire fibrinogen domain. This mutation may diminish or knockout the function of TNX. The haplotype for the recombinant chromosome with monomodular RCCX characterized the presence of a CYP21A pseudogene linked to the TNXB/XA hybrid is HLA A3 B35 DR1, RP1 C4A3 CYP21A TNXB-XA.
Three independent observations on the deletions of the CYP21B genes, which could have arisen by a mechanism similar to that for haplotype b of CAH-E1, were reported. In the first study, a salt-losing CAH patient was found to have haplotype HLA B35 DR1, the presence of a single C4A3 gene and a CYP21A gene, and a deletion of C4B with CYP21B and (69). This is similar to haplotype b of CAH-E1. Whether this patient has a recombinant TNXB-XA in the monomodular RCCX was not determined.
In the second study, a de novo deletion of C4B together with CYP21B was suggested to derive from a meiotic unequal crossover between the maternal homologous chromosomes (70). From our current knowledge, this de novo deletion probably resulted from an unequal crossover between TNXA from a bimodular L-S chromosome (HLA A30 B13 DR7) and TNXB from a monomodular S chromosome (HLA A1 B8 DR3). The recombinant has a monomodular L chromosome (HLA A30 B13 DR3) with CYP21A and a 2.4-kb TaqI fragment at its 3'-end, which is indicative of a 120-bp deletion in the TNXB gene.
In the third study, one of the chromosomes 6 in a CAH patient appears to have a monomodular RCCX with the 120-bp deletion in TNXB. The other chromosome 6 has a bimodular RCCX with two CYP21A genes and no CYP21B gene, as revealed by Southern blot analyses using TaqI-, BglII-, and BssHII-digested genomic DNA. Immunoblot analysis showed that this patient did not produce any TNXB protein. Therefore, both TNXB genes from the homologous chromosomes were presumed to be nonfunctional. Since the patient suffers Ehlers Danlos syndrome in addition to CAH, it is proposed that malfunction of TNXB is associated with the connective tissue disease (71).
Another important piece of evidence for the described unequal crossover comes from studies on the molecular genetics of a pauciarticular JRA patient in our laboratory. This JRA patient has an RCCX bimodular haplotype with two CYP21B genes and a 5'-TNXA/XB-3' hybrid with the TNXA-specific truncation of exons 1-32 at the 5' region, and the presence of the TNXB-specific 120-bp sequence at the 3' region (72). The TNXA-XB hybrid appears to be the reciprocal product in CAH-E1 5'-TNXB/XA-3', attributable to the genetic recombination between TNXA and TNXB. This notion is substantiated by the reciprocal associations of the informative sites for TNXB and for TNXA at the two ends of the hybrid sequences (Fig. 7).
In the bimodular (or trimodular) RCCX haplotypes, one of the duplicated genes or gene segments could undergo sequence mutations without the immediate deleterious effect of knocking out the gene function. The RCCX modular variations in the population allowed sequence variations and enhanced the incorporation of diversified or mutant sequences among the paralogous genes, pseudogenes, or gene segments. The selection advantage is probably the emergence of various polymorphic forms of complement C4A and C4B to tackle different microbial antigens (73). The burdens are the accompanying genetic or autoimmune diseases such as CAH, systemic lupus erythematosus, and possibly EDS, caused by unequal crossovers and incorporations of deleterious mutations in the constituents of the RCCX.
In the HLA class II region between DRB1 and DRA
genes, there may be 1-3 DRB pseudogenes. In addition,
DRB3, DRB4, or DRB5 can be present
(74). The DR locus is about 350 kb centromeric to the RCCX
modules. Between heterozygous chromosomes with different DR gene number
and RCCX modules, misalignments and unequal crossovers would occur
during meiosis. This may result in deletion or duplication of
structural genes between these two variable regions. More than 10 structural genes with important functions have been discovered between
DR and RCCX (37).2 Recombinant chromosomes with the
essential structural genes deleted would be lethal. Therefore, the
apparent productive recombination frequencies between certain MHC
haplotypes would be less than expected. This would have contributed to
the linkage disequilibrium of the MHC genes, since many of the
"ancestral" MHC haplotypes remain largely conserved in the population.
![]() |
ACKNOWLEDGEMENTS |
---|
We are indebted to Dr. Joann Moulds for kind instruction in C4 allotyping techniques, the Ohio State University Tissue Typing Laboratory for HLA typing, Bi Zhou for excellent technical assistance, and Dr. Arthur Burghes for helpful discussion.
![]() |
FOOTNOTES |
---|
* This work was supported by NIAMSD, National Institutes of Health (NIH), Grant R01 AR43969; March of Dimes Birth Defects Foundation Grant FY95-1087 (Basil O'Conner Starter Scholar Research Award); Children's Hospital Research Foundation (Columbus, OH) Grant 210698; and the Pittsburgh Supercomputing Center through the NIH Center for Research Resources Cooperative Agreement (Grant 1P41 RR06009) (to C. Y. Y.).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported by an Ohio State University Presidential Fellowship.
§§ To whom all correspondence should be addressed: Room W208, Children's Hospital Research Foundation, 700 Children's Dr., Columbus, OH 43205. Tel.: 614-722-2821; Fax: 614-722-2774; E-mail: cyu{at}chi.osu.edu.
2 L. Rowen, C. Dankers, D. Baskin, J. Faust, C. Loretz, M. E. Ahearn, A. Banta, S. Schwartzell, T. M. Smith, T. Spies, and L. Hood, GenBankTM accession no. U89337.
3 C. A. Blanchong and C. Y. Yu, unpublished observation.
![]() |
ABBREVIATIONS |
---|
The abbreviations used are: MHC, major histocompatibility complex; RCCX, RP, complement C4, steroid 21-hydroxylase CYP21, and tenascin TNX; CAH, congenital adrenal hyperplasia; bp, base pair(s); kb, kilobase pair(s); PCR, polymerase chain reaction; JRA, juvenile rheumatoid arthritis.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|