(Received for publication, April 25, 1995; and in revised form, August 7, 1995)
From the
The genes COL4A5 and COL4A6, coding for the
basement membrane collagen chains, 5(IV) and
6(IV),
respectively, are located head-to-head in close proximity on human
chromosome Xq22, and COL4A6 is transcribed from two
alternative promoters in a tissue-specific fashion (Sugimoto M.,
Oohashi T., and Ninomiya Y.(1994) Proc. Natl. Acad. Sci. U. S. A. 91, 11679-11683). Immunofluorescence studies using
chain-specific antibodies demonstrated that the two genes are expressed
in a tissue-specific manner (Ninomiya, Y., Kagawa, M., Iyama, K.,
Naito, I., Kishiro, Y., Seyer, J. M., Sugimoto, M., Oohashi, T., and
Sado, Y.(1995) J. Cell Biol. 130, 1219-1229). We report
here for the first time the isolation and the structural organization
of the human COL4A6 gene. The entire gene presumably exceeds
200 kilobase pairs and contains 46 exons. Exons 1` and 1 encode the two
different 5`-UTRs and the two amino-terminal parts of of the signal
peptide. The carboxyl part of the signal peptide and the 7 S domain are
coded for by the following 6 different exons, 2-7, whereas the
exons 7-42 encode the central COL 1 domain, which contains the
Gly-X-Y repeats. The last three exons, 43-45,
encode the carboxyl-terminal NC1 domain. Sizes of more than a half of
the exons of the gene are the same as those of Col4a2 but quite
different from those of COL4A5. Within the COL4A6 gene we found three CA repeat markers that can be used for allele
detection. The detailed structure of the COL4A6 gene and the
high heterozygosity microsatellite markers located within the gene will
be useful for linkage analysis and familial diagnosis of diseases
caused by mutations of this gene.
Epithelial cells present a sheet-like extracellular structure,
basement membrane (BM), ()at their basal surfaces. This
structure appears as a
100-nm-thick layer, consisting of type IV
collagen, laminin, heparan sulfate proteoglycan, and some other
glycoproteins(1) . Since BMs are attached to the underlying
extracellular matrix by anchoring fibrils formed by type VII collagen
and microfibrils, there may be functional interactions between matrix
molecules. On the other hand, BMs also play a crucial role in adhesion
of epithelial and other types of cells, differentiation of a variety of
cells, and tissue repair(2) . BMs are known to be affected in
certain disease states. For example, the epitope for the circulating
antibody for Goodpasture syndrome that causes glomerulonephritis and
pulmonary hemorrhage has been shown to be on the carboxyl-terminal part
of the NC1 domain of the
3(IV) collagen chain(3) . Also, a
hereditary Alport syndrome characterized by sensorineural hearing loss,
progressive glomerulonephritis, and, occasionally, ocular defects has
been shown to be caused by mutations of the gene COL4A5,
coding for the
5(IV) collagen chain(4) .
Type IV
collagen was initially thought to be a heterotrimeric molecule of two
1(IV) chains and one
2(IV) chain. But recent biochemical
investigation and the molecular biology approach have made possible the
identification of additional four different
(IV) chains,
3,
4,
5, and
6(5, 6, 7) .
However, almost nothing is known about how these chains are involved in
the assembly of the individual molecules. The newest chain,
6(IV),
was found by characterizing the region upstream of the neighboring
gene, COL4A5(8) , and by cross-hybridization with the
homologous gene, COL4A4(7) . Of interest is not only
that the two most recently discovered genes, COL4A5 and COL4A6, are arranged in a head-to-head fashion on chromosome
Xq22 but also that the other set of ``new'' genes for
collagen IV, COL4A3 and COL4A4, are located on
chromosome 2 (9, 10) in a presumably similar fashion
as the former two(11) . Furthermore,
1(IV) and
2(IV)
chains are known to exist in all BMs, and
3(IV) and
4(IV)
chains appear to co-localize in certain BMs but not all; whereas the
5(IV) and
6(IV) chains are not necessarily distributed
together in some tissues(12) . Differential expression of the
latter two genes could be explained by the transcription from the two
alternative promoters in the COL4A6 gene(13) . Precise
characterization of the COL4A6 gene is essential for studies
on pathogenesis of certain diseases such as Alport's syndrome
associated with diffuse esophageal leiomyomatosis(14) . In the
present study, we describe the isolation and characterization of the COL4A6. The gene harbors 46 exons, spanning at least 200 kb in
size. We also determined the sequences of the exon/intron junctions of
the entire gene.
Figure 1:
Structure of the
human 6(IV) gene, COL4A6 and relative location of genomic
DNA clones. The COL4A6 gene contains 46 exons: exon 1` (one white box) and exons 1 to 45 (black boxes). The COL4A5 gene is located next to the COL4A6 gene as
indicated by arrows. Analysis of cDNAs revealed the presence
of the two transcripts: exon 1` is spliced to exon 2 in one transcript
and exon 1 to 2 in the other(13) . Top part of the figure shows
the only one of the two transcripts, and the relative location of the
coding regions is indicated along with the individual exons. The 7 S
domain is encoded by the first 7 exons. The central COL1 domain
containing 25 imperfections is coded for exons 7-43, whereas the
last three exons code for the NC1 domain. The gene spans a length of
>200 kb. Relative location of the 13 phage clones is shown by short bars. Numbers in parentheses under the bars indicate the length of the individual phage clones. Four
ovoid-shaped marks within the gene indicate the locations in the gene
that are not covered by the 13 isolated phage clones. Sizes of the EcoRI fragments containing individual exons are shown by numbers under each exon. EcoRI fragments that contain
multiple exons are shown by brackets above the numbers. Four exons 4, 12, 29, and 39 contain EcoRI
sites, whose locations are marked by E. The area covering
exons 38-41 is enlarged in a square to show small EcoRI
fragments.
The exon sizes vary between 36 and >980 bp, but if the 5`- and 3`-untranslated region and the NC1 domain are excluded, they vary from 36 to 222 bp (Fig. 2). Exon sizes of the COL4A6 are summarized and compared with those of other type IV collagen genes, COL4A5(20) , Col4a2(27) , COL4A2(28) , and COL4A4(29) (Fig. 3). Exon sizes of the gene are rather small; most of them are smaller than 150 bp. Especially exons coding for only Gly-X-Y repeats, characteristic for collagen, are smaller. Not many exons represent 54 or 45 bp in size as seen in fibrillar collagen genes(23) .
Figure 2: Exon-intron boundaries of the COL4A6 gene. Nucleotide sequences of exon/intron boundaries are shown by capital letters for exons and by small letters for introns. Nucleotide sequences of exons are shown by three letters in a block to indicate codons. Noncoding sequence is shown continuously in exons 1 and 45. Note that many of the glycine codons are split (exons 18-45 with some exceptions) and that the splitting pattern is always after the first G. Ends of introns are all conserved according to GT-AG rule, which are highlighted by boldface letters. In the middle of the figure the sizes of the individual exons are shown in parentheses.
Figure 3: Comparison of exon sizes in the COL4A6 gene and in the COL4A5, COL4A2, Col4a2, and COL4A4 genes. Exon numbers are indicated by the numbers in circles on the left side of each gene. Exon sizes of the COL4A6 gene are drawn on the right-hand side of the gene and compared with those of other collagen IV genes. As a representative for group A(29) , the structure of the COL4A5 gene (20) is shown. All characterized genes, Col4a2(27) , COL4A2(28) , and COL4A4(29) that belong to the group B are shown. Note that many of the exon sizes of the COL4A6 gene are the same (shaded) as the corresponding exons among the group B genes. The NC1 domain is encoded by five exons for group A gene but by only three exons for group B genes. Exon sizes are not determined in the area drawn by split lines. Noncoding regions are indicated by open boxes.
Exon sizes of the COL4A6 gene were compared with those of the other type IV
collagen genes. Exons 4-42 all encode Gly-X-Y repeats. Many of them code for imperfection of
Gly-X-Y repeats. As shown in Fig. 3, sizes of
many exons coding for the COL 1 domain in the COL4A6 gene are
the same as those of the Col4a2 , COL4A2, and COL4A4 genes (numbers are shaded); however, none of
the exons encoding the corresponding domains in COL4A5 were
the same sizes as in the COL4A6 gene. This indicates that the COL4A6 gene is more closely related to COL4A2 and COL4A4 than to the genes coding for the odd-numbered
(IV)chains,
1,
3, and
5(29) . The NC1
domain of the
6(IV) chain is encoded by three different exons, 43,
44, and 45. This pattern is common for isolated and characterized
even-numbered genes, COL4A2, Col4a2, and COL4A4,
whereas COL4A1 and COL4A5 encode the NC1 domain by
means of five different exons.
In the previous study on the cDNAs we
did not notice a very small EcoRI fragment. However, when we
were comparing the nucleotide sequences between cDNA and genomic DNA,
we identified a 36-bp EcoRI fragment located in the exon 39
(see Fig. 1). It contains AATTCCTGGACCTAAAGGGCCTAAGGGAGACCAAGGG,
which should be between G and A
of the
cDNA sequence in the previous paper(12) . This makes amino acid
sequence (G)IPGPKGPKGDQG(I) (amino acid residues in parentheses
indicate that the 36 bp contained a part of coding nucleotides).
All of the exon/intron boundary sequences are shown in Fig. 2. As highlighted, dinucleotides at the beginning (gt) and ending (ag) of introns are all conserved. The first 15 exons, 3-17, of the COL4A6 gene begin with an intact glycine codon, whereas all of the last 38 exons start with a two-thirds intact glycine codon except for the three exons (22, 23, and 32), which start with the intact glycine codon. Split glycine codons were also found in exons of other collagen genes(30, 31) . It is intriguing that the genes all code for collagens with imperfections in the Gly-X-Y repeats and that the glycine codon is almost always split after the first G. This type of split codon is also found in noncollagenous genes that harbor Gly-X-Y repeat sequences with imperfections, such as those encoding mannose-binding protein(32) , lung surfactant apoprotein(33) , acetylcholineesterase(34) , complement C1q B chain(35) .
More than 60 mutations in the COL4A5 gene have been identified in X-linked Alport syndrome patients(36) . Recently, a new report showed mutations in COL4A3 and COL4A4 genes from autosomal recessive-type patients(37) . Intriguingly, deletions of the 5` part of both COL4A5 and COL4A6 genes have also been revealed in seven patients with Alport's syndrome associated with diffuse leiomyomatosis(14) ; however, how the COL4A6 gene is involved in the pathogenesis of diffuse leiomyomatosis has not yet been clarified. Therefore the high heterozygosity microsatellite markers located within the COL4A6 gene will be a useful tool for linkage analysis and familial diagnosis of the diseases caused by COL4A6 gene mutations.
In conclusion we reported the isolation and structure of the COL4A6 gene, which is aligned together with COL4A5 in head-to-head fashion on chromosome X. The detailed structure of the COL4A6 gene determined in this study will be important for finding mutations in patients with X-linked Alport's syndrome and/or leiomyomatosis and in those with other diseases caused by the mutated gene.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D63525 [GenBank]to D63568[GenBank].