(Received for publication, February 13, 1997, and in revised form, April 17, 1997)
From the Department of Biological Sciences, The proteins of the primary cell walls of
suspension cultured cells of five plant species,
Arabidopsis, carrot, French bean, tomato, and tobacco, have
been compared. The approach that has been adopted is differential
extraction followed by SDS-polyacrylamide gel electrophoresis (PAGE),
rather than two-dimensional gel analysis, to facilitate protein
sequencing. Whole cells were washed sequentially with the following
aqueous solutions, CaCl2, CDTA (cyclohexane diaminotetraacetic acid, DTT (dithiothreitol), NaCl, and borate. SDS-PAGE analysis showed consistent differences between species. From
the 233 proteins that were selected for sequencing, 63% gave N-terminal data. This analysis shows that (i) patterns of proteins revealed by SDS-PAGE are strikingly different for all five species, (ii) a large number of these proteins cannot be identified by data base
searches indicating that a significant proportion of wall proteins have
not been previously described, (iii) the major proteins that can be
identified belong to very different classes of proteins, (iv) the
majority of proteins found in the extracellular growth media are absent
from their respective cell wall extracts, and (v) the results of the
extraction process are indicative of higher order structure. It appears
that aspects of speciation reside in the complement of extracellular
wall proteins. The data represent a protein resource for cell wall
studies complementary to EST (expressed sequence tag) and DNA
sequencing strategies.
The plant cell wall is a dynamic system generally considered to be
composed of more than 90% carbohydrate polymers. Proteins, phenolics
and possibly lipids make up the remainder of the wall (1-3). To date,
most research interest has been in the carbohydrate components because
of considerations of their structural role and commercial interest.
This has led to a number of models for the integration, interpolymeric
association, and assembly of the wall (3, 4). By comparison, our
knowledge of the complexity of protein in the plant cell wall is in a
less advanced state. Much of the understanding of the range of
structural wall proteins has come from cDNA and genomic cloning
exercises and has led to the identification of glycine-, cysteine-,
proline-, and hydroxyproline-rich subsets of wall proteins. In
addition, many extracellular enzymes have been identified that are
required for the restructuring and modification of this dynamic
extracellular matrix which underpin its role in defense,
detoxification, signaling, cell-cell recognition, cell expansion, cell
adhesion, cell separation, translocation, differentiation, and
morphogenesis (2, 5, 6). However, there is a lack of direct studies on
the proteins themselves and the true range of extracellular proteins
and their species differences remains to be elucidated. The present
work describes the systematic extraction and sequencing of the major
primary wall proteins from five species representing four families of
plants.
Since whole plant tissue is complicated by the presence of different
tissue types including lignified secondary walls, we have chosen to use
suspension cultures as a source of experimental material due to their
relative uniformity. There are a large number of studies which show
that molecular probes for proteins and enzymes derived from tissue
cultured cells locate in a predictable way in the intact plant, and
thus tissue cultures can be a reliable guide to a substantial number of
phenomena in the intact plant. To cover a diverse range of species, we
have used suspension cultures of Arabidopsis, carrot, French
bean, tomato, and tobacco. The reasons for choosing these species
relates to their academic and commercial interest.
Arabidopsis has not been as extensively used in tissue
culture but is currently the target of an extensive EST1 sequencing exercise, and an
international program aimed at the sequencing of its entire genome is
under way (7, 8). Carrot, a member of the Umbelliferae, has been an
important species in modeling embryogenesis and elongation growth (9).
Tissue cultures of the leguminous species, French bean, have been
extensively used as a model system for cell wall biosynthesis and
modifications during responses to pathogens (2). The two solanaceous
species tobacco and tomato allow a comparison of the extent of
conservation of wall proteins within one family. Tomato is important
commercially and was the subject of the first successful attempts
to modify the expression of wall proteins by transformation (10).
Tobacco is an important model plant for attempts to modify cellulose
extractability by modifying lignification (11).
The aim is to generate a protein resource for the plant cell wall
analogous to current efforts of EST and genomic sequencing, so allowing
for the future identification of potential biochemical function arising
from these exercises. Homology searches of the derived amino acid
sequence data from the present study should provide a firm indication
of the number of components of the cell wall to which function is still
to be ascribed.
The derivation and maintenance of cultures
of Arabidopsis (12), tobacco (13), tomato (14), and French
bean (15) have been described previously. The carrot cultures
(previously unpublished) were grown in Murashige and Skoog (16) basal
salts supplemented with 3% sucrose and 2 mg/ml 2,4-D. Suspension
cultures were grown in 100-ml batches in 250-ml Erlenmeyer flasks under
a 16-h photoperiod (Arabidopsis, tomato, and carrot) or in
the dark (French bean and tobacco) at 24 °C while rotary shaken at
130 rpm and subcultured every 7-10 days.
Cells growing 4-5 days after subculture were harvested
by filtration on Miracloth. The cells were washed three times with dH2O (3 ml/g fr. weight). All subsequent manipulations were
conducted at 4 °C. The cells were stirred in three volumes of 0.2 M CaCl2 for 30 min and collected by filtration
on Miracloth and washed three more times with dH2O as
before. Subsequent extractions, each with three volumes, were carried
out sequentially on the same cells for 30 min each, with 50 mM CDTA in 50 mM sodium acetate, pH 6.5, followed by 2 mM DTT, 1 M NaCl, and finally 0.2 M borate, pH 7.5. The borate extraction was conducted at
room temperature. Between extractions cells were washed on the filter
three times, each with three volumes of dH2O. Extracts and
the culture media were refiltered through GF/A paper before being
dialyzed against a 10-fold excess of dH2O with three
changes. The samples were lyophilized and reconstituted in SDS-PAGE
loading buffer (17) prior to analysis. The lyophilized culture
filtrates were reconstituted at 30 µg/µl, whereas all the other
extracts were reconstituted at 15 µg/µl in gel loading buffer.
SDS-PAGE was carried out on 10% gels,
using Bio-Rad Mini-PROTEAN II apparatus and stained with Coomassie
Brilliant Blue (17). Since protein recoveries varied between the
different extracts within each species and also between species the
amount of extract loaded per lane was optimized by SDS-PAGE: typically
between 10 and 20 µl of reconstituted extract was used per lane. For
sequencing purposes, this process was also necessary due to the
complexity and uneven abundance of the components within each extract,
and more than one loading was used to maximize the acquisition of sequence data for the minor protein bands. For N-terminal amino acid
sequence determination, proteins were transferred onto Problot membrane
(Applied Biosystems, Inc. (ABI), Foster City, CA) and visualized by
Coomassie Blue staining as directed in the Problot manual. Amino acid
sequence analysis was performed in an ABI model 477 Sequencer. Sequence
similarities were determined using the electronic mail "BLAST"
series of programs (18) against the non-redundant protein,
non-redundant DNA, and non-redundant EST data bases maintained by the
National Center for Biotechnology Information.
Analytical Strategy
Although there have been a number of systematic protein
sequencing projects in plants, usually based on two-dimensional
isoelectric focusing/SDS-PAGE systems of whole plant extracts (19-22),
these are limited in the amount of information that was obtained due to
mass limitations of the resolved material. Whereas the resolving power
of two-dimensional gels is much higher than that of SDS-PAGE gels, the
loading of each band per gel is higher for the latter generating
sufficient mass for routine N-terminal analysis; the main problem is
resolution. Accordingly we have chosen to reduce the complexity of the
initial mixture of proteins by differentially extracting the cell wall
from intact suspension cultured cells successively with
CaCl2, CDTA, DTT, NaCl, and borate. The rationale for the
initial wash was based on the successful use of CaCl2 to
extract wall proteins (23). CDTA would be expected to remove any
proteins associated with the pectin fraction since calcium would
promote such binding. DTT was chosen to reduce any protein-protein interactions based upon cysteine disulfide bonds. NaCl was used to
extract any strongly ionically bound proteins. Finally, borate was used
to disrupt any interactions due to glycoprotein side chains and other
saccharides in the wall. At all stages, intervening water washes were
used. For all species, secreted proteins were also characterized in the
culture filtrate. Microscopic examination of the cells after these
extractions showed the cells to be plasmolyzed, demonstrating that the
plasma membrane of the cells remained intact, indicating that none
of the extracted components were cytosolic.
Extraction and Patterns of Cell Wall Proteins
SDS-PAGE analysis reveals that the subsets of wall proteins
obtained by successively washing whole cells are distinct for each
reagent employed and for each species. Sequential extracts are shown
for Arabidopsis (Fig. 1A), carrot
(Fig. 1B), French bean (Fig. 1C), tomato (Fig.
1D), and tobacco (Fig. 1E). It can be seen for
each species that the proteins found in the culture filtrate exhibit a
strikingly different profile from that of their respective wall
fractions. Similarly, there are distinct subsets of proteins extracted
by CaCl2, CDTA, DTT, NaCl, and borate (Fig. 1,
A-E). Individual extractions, hereafter referred to as
non-sequential extractions, carried out with the same series of
reagents on the Arabidopsis and tomato cells yielded different subsets
of proteins to those acquired through sequential extraction (Fig.
2, A and B). Since there is
clearly a difference between the pattern of proteins extracted using
sequential and non-sequential extraction, both approaches were used to
maximize sequence data for these two species. It is obvious that within
the complex three-dimensional matrix that represents the cell wall
there must be some restriction on the dynamic state of the wall or else
one would not observe differential extraction.
Systematic Sequencing
A summary of the proteins from each species for which N-terminal
sequencing was attempted is shown in Table I. The
N-terminal sequences that were obtained, along with the amino acid
yield in the first sequencing cycle and the corresponding protein
molecular weights, are listed in Table II.
Table I.
A summary of the number of N-terminal protein sequences attempted and
determined from the individual plant species
Table II.
The N-terminal amino acid sequences obtained from cell wall and extra
cellular proteins from suspension cultured cells of Arabidopsis,
tomato, tobacco, French bean, and carrot
Plant Material
Fig. 1.
SDS-PAGE analysis of the extracellular and
cell wall proteins sequentially extracted from intact suspension
cultured cells of Arabidopsis (A), carrot
(B), French bean (C), tomato (D),
and tobacco (E). For each species the proteins were
extracted by stirring whole cells in the requisite solution for 30 min
beginning with 0.2 M CaCl2, then sequentially
with 50 mM CDTA, 2 mM DTT, 1 M
NaCl, and finally 0.2 M sodium borate, pH 7.5. Cross-contamination was prevented by rigorously washing the whole cells
with dH2O before and after each extraction. Each extract
including the culture filtrate was extensively dialyzed against
dH2O and lyophilized before SDS-PAGE was carried out on
10% (w/v) discontinuous gels, which were stained with Coomassie
Brilliant Blue. Lane 1, molecular weight markers as
indicated; lane 2, culture filtrate proteins; lane
3, proteins extracted with CaCl2; lane 4,
CDTA-extracted proteins; lane 5, DTT-extracted proteins;
lane 6, NaCl-extracted proteins; lane 7,
borate-extracted proteins; lane 8, molecular weight markers
as indicated. Bands that are labeled are those that were subjected to
protein sequencing.
[View Larger Version of this Image (67K GIF file)]
Fig. 2.
SDS-PAGE analysis of the cell wall proteins
extracted non-sequentially from intact suspension cultured cells of
Arabidopsis (A) and tomato
(B). The proteins were extracted by stirring whole
cells in the requisite solution for 30 min. The extractants were 0.2 M CaCl2, 50 mM CDTA, 2 mM DTT, 1 M NaCl, and 0.2 M sodium borate, pH 7.5. Contamination from the culture filtrate proteins was
prevented by rigorously washing the cells with dH2O
beforehand. The extracts were extensively dialyzed against
dH2O and lyophilized before SDS-PAGE was carried out on
10% (w/v) discontinuous gels. Visualization was by Coomassie.
Lane 1, molecular weight markers as indicated; lane
2, contains the proteins extracted with CaCl2; lane 3, CDTA-extracted proteins; lane 4,
DTT-extracted proteins; lane 5, NaCl-extracted proteins;
lane 6, borate-extracted proteins; lane 7,
molecular weight markers as indicated. Bands that are labeled are those
that were subjected to protein sequencing.
[View Larger Version of this Image (55K GIF file)]
Species
Total no. of proteins for which sequencing was
attempted
No. of proteins which gave sequence data
No. of unique
sequences obtained
No. of sequences unclassified by homology or
function
Arabidopsis
86
47 (55%)
31
19
Carrot
16
10 (62%)
10
9
French
bean
26
21 (81%)
17
13
Tomato
78
46
(59%)
30
18
Tobacco
27
22 (81%)
20
15
Total
233
146 (63%)
108
74
Band
Accession
no.
Mr
Initial level
(pmol)
Sequence
Sequence similarity
Sequentially extracted
Arabidopsis proteins
Culture
B
[GenBank]
70
28
TTRTPLFLGLDEHTADLXFE
Arabidopsis
subtilisin like protease (100%) (100%)a
Filtrate
C
[GenBank]
65
30
EDRTY
D
[GenBank]
60
5
KGVNDGT
E
[GenBank]
54
13
KVPVDDQFRRVNNGGATDTR
Carrot
glycoprotein (75%) (75%)a
F
[GenBank]
52
12
EPFIGVNYGQVADNLP
Wheat
-1,3-glucanase (69%) (69%)a
G
[GenBank]
42
10
EYFIGVN
Wheat
-1,3-glucanase
(57%)
H
[GenBank]
34
12
EQDRR
I
[GenBank]
31
20
IALTV
J
[GenBank]
30
15
NFQRDVEITWGDMRR
Arabidopsis
xyloglucan endotransglycosidase (87%) (87%)a
K
[GenBank]
25
43
ASSSSEDFDFFYFVQQGXP
Arabidopsis
extracellular ribonuclease (84%) (84%)a
L
[GenBank]
23
16
ASSSSEDFDFFY
Arabidopsis
extracellular ribonuclease (100%)
CaC2
A
[GenBank]
96
12
AVREYHWFVE
D
[GenBank]
60
30
NPNYKEALSK
Arabidopsis
cellulase (100%)
E
[GenBank]
59
30
KVPVDDQFRR
Carrot glycoprotein
(100%)
F
[GenBank]
54
40
KVPVD
Carrot glycoprotein
(100%)
[GenBank]
30
NPNYK
Arabidopsis
cellulase (100%)
G
[GenBank]
52
20
ATLTVFFRDN
I
[GenBank]
44
5
HLKYKDPEQG
K
[GenBank]
36
30
ADRELHRSKA
O
[GenBank]
30
39
NPNYKEALSKSLLFFQGQRR
Arabidopsis
cellulase (95%) (95%)b
Q
[GenBank]
25
100
RIPGIYSGGAWQNAHATFYG
Arabidopsis
expansin (75%) (90%)a,b
R
[GenBank]
23
57
IPCRKAIDVPFGXRYVVXTW
Arabidopsis
xyloglucan endotransglycosidase (65%) (85%)a,b
S
[GenBank]
18
13
ADLTR
T
[GenBank]
17
30
ADREPNHFVA
CDTA
A
[GenBank]
55
10
EATVDMPLD
D
[GenBank]
36
8
NPNY
Arabidopsis
cellulase (100%)
DTT
B
[GenBank]
45
30
ARKFF
Arabidopsis
triose-phosphate isomerase (100%)
NaCl
H
[GenBank]
22
24
ARKFFVG
Arabidopsis
triose-phosphate isomerase (100%)
Non-sequentially extracted
Arabidopsis proteins
CDTA
A
[GenBank]
72
9
KVPV
Carrot
glycoprotein (100%)
B
[GenBank]
69
8
KVPVDDQFR
Carrot glycoprotein
(100%)
I
[GenBank]
39
16
KDLXHRDDKT
J
[GenBank]
36
16
KDLXHRDDKT
K
[GenBank]
33
18
IPCRKAIDVVF
Arabidopsis
xyloglucan endotransglycosidase (82%)
DTT
A
[GenBank]
68
14
AVPPRYGYTRG
B
[GenBank]
64
8
MREIEHIPPP
F
[GenBank]
36
70
ARKFFVGRNWPEL
Arabidopsis
triose-phosphate isomerase (62%) (69%)a,b
G
[GenBank]
30
20
ARKFFV
Arabidopsis
triose-phosphate isomerase (100%)
H
[GenBank]
25
10
VLTIYA
NaCl
A
[GenBank]
62
5
YKTIGKGYR
Mouse T cell
receptor (55%)
B
[GenBank]
60
10
DVGKFK
D
[GenBank]
48
9
DNPSSTPPLR
E
[GenBank]
46
25
NPNYKEALSKSLLFFQGQRR
Arabidopsis
cellulase (95%)
F
38
43
APXSEGY
K TVRF
G
[GenBank]
36
10
EDLPEK
H
[GenBank]
33
20
APQEPNQFQLLKYH
Maize cytochrome
P450 (50%) (75%)a
I
[GenBank]
31
12
SDRELHRSKAAYFF
J
[GenBank]
27
10
VDTSRLFLTVVNNPPTVV
Arabidopsis
hypothetical protein (67%)
K
[GenBank]
25
70
RIPGIYSGGAWQNAHATFYG
Arabidopsis
expansin (65%)
Sequentially
extracted carrot proteins
CaC2
A
[GenBank]
68
6
EPPYRLVDN
B
[GenBank]
66
6
GPLNAQHQS
C
[GenBank]
58
10
QLAELKYVI
D
[GenBank]
46
7
DLSNLLSRVPNERSN
E
[GenBank]
43
9
GVREDTYPDVVXTA
F
[GenBank]
32
8
AEYPNDVNLTVYWDP
G
[GenBank]
30
19
SEVGALVFQPKTRF
H
[GenBank]
24
8
AHSDAVTPLPARSKV
Human genomic
sequence (69%)
CDTA
C
[GenBank]
56
5
SQEDTPL
E
[GenBank]
30
5
ATNPSGQ
Sequentially
extracted French bean proteins
Culture
A
[GenBank]
41
20
NYDKPOVEKPOVYKPOVEKPOVY
Proline-rich
protein (contains 3 repeating blocks of 4 amino acids)
Filtrate
B
[GenBank]
36
3
NYDKPOVEKP
Proline-rich
protein
CaCl2
A
[GenBank]
230
8
NMYLPOVOOOOVVPTF
B
[GenBank]
140
48
NHYSYSSOOOOOVVSS
Extensin
C
[GenBank]
136
10
NYDKPOVEKPOVYK
Proline-rich
protein (see bean culture filtrate, A)
D
[GenBank]
84
8
NYDKPOVEKPOVYKP
Proline-rich
protein (see bean culture filtrate, A)
E
[GenBank]
65
7
EDPVRFNLG
F
[GenBank]
60
8
EDAYKFTTW
G
[GenBank]
53
10
VAGRSVVKIAEGYL
H
[GenBank]
46
5
KPDPEAVLIV
I
[GenBank]
45
7
NSKPPEALILVKXSQ
J
[GenBank]
44
10
SHDKPDHIRLFELKKDDLLISVHNA
K
[GenBank]
43
24
YDKKVDSIILFGVNG
L
[GenBank]
42
30
NYDKPOVEKPOVYKPOVEKPOVYKP
Proline-rich
protein (see bean culture filtrate, A)
M
[GenBank]
35
12
ELPVNFYALNLTADNINIGY
O
[GenBank]
33
10
NYDKNFYEDTLP
P
[GenBank]
26
5
EYPVVFVKGLFFGKG
Q
[GenBank]
22
25
QNQPPDFANOFIIPQNAA
CDTA
C
[GenBank]
36
100
DVNGGGHTLPQPLYQTTVVL
Desulfovibrio
desulfuricans periplasmic Fe hydrogenase (75%)
D
[GenBank]
33
20
AGVDPAIPAYVKTNG
E
[GenBank]
30
10
MGQGAVEGQLFYNVQ
Pseudomonas
fluourescens protein F (80%)
Sequentially extracted tomato
proteins
Culture
A
[GenBank]
65
23
EGKAIGLAKPRMDST
Filtrate
D
[GenBank]
35
20
EQFDEEFDIT
E
[GenBank]
31
200
EQXGSQAGGALRAGL
French
bean chitinase (73%)
G
[GenBank]
25
8
STDFDFNN
[GenBank]
14
AKDFDFFYFVQQWP
Tomato extracellular
ribonuclease (100%)
H
[GenBank]
24
173
AKDFDFFYFVQQWPGGYYDTPKQPKQ
Tomato
extracellular ribonuclease (69%) (65%)a,b
Iac
[GenBank]
23
112
AKDFD
Tomato
extracellular ribonuclease (100%)
Ib
[GenBank]
22
30
KSTDFDYNNKKANYD
Ic
[GenBank]
21
15
SNAVAVLNXXEXM
CaC2
D
[GenBank]
64
10
ANAKVPSH
7
KT PPR PSH
E
[GenBank]
62
6
EVPLDDTGL
F
[GenBank]
50
8
EVLYIPVTTDA
Human
Ig heavy chain DJ region (64%)
G
[GenBank]
44
18
VAGKSFVPIAAGRQ
Tobacco P7 curled
leaf protein (64%)
H
[GenBank]
40
13
VAGK
Tobacco
P7 curled leaf protein (100%)
I
[GenBank]
39
5
SPVEGGPXGXL
J
[GenBank]
35
8
EQXGRQRQGG
French bean
chitinase (70%)
K
[GenBank]
34
9
EQXGSQA
French bean
chitinase (86%)
L
[GenBank]
33
7
EQXGSQA
French bean
chitinase (86%)
O
[GenBank]
28
15
ADREP
35
ALVED
P
[GenBank]
27
18
TGVNYGQLGNNLP
Tobacco
-1,3-glucanase (77%) (85%)a
Q
[GenBank]
23
100
TNPNFILTL
Tomato osmotin (100%)
CDTA
E
[GenBank]
34
10
EQXGS
French bean
chitinase (100%)
F
[GenBank]
31
18
EQXGS
French bean
chitinase (100%)
G
[GenBank]
23
8
TNPNF
Tomato
osmotin (100%)
NaC
E
[GenBank]
34
38
EQXGSQAGGA
French
bean chitinase (90%)
F
[GenBank]
30
10
SNPNFILTLV
Tomato osmotin (90%)
G
[GenBank]
23
8
ANPEVRNNLP
Non-sequentially
extracted tomato proteins
DTT
A
[GenBank]
62
3
MEKGYYDLES
C
[GenBank]
40
18
ANDPDFPYTVQANRP
E
[GenBank]
35
20
EQXGSQ
French bean
chitinase (100%)
F
[GenBank]
22
8
MNIPPGD
NaC
A
[GenBank]
80
30
STHTSDFLKL
C
[GenBank]
76
20
STRTPEFLGLDNQCGVWA
Tomato
subtilisin protease (50%)
E
[GenBank]
66
21
GYMKYKDPKQPLLGRRXD
Barley
-exoglucanase (55%) (61%)b
G
[GenBank]
64
65
ANAKVPSHTISNPF
H
[GenBank]
49
22
EVLYI
Human Ig heavy chain DJ
region (60%)
I
[GenBank]
47
15
VAGK
Tobacco P7
curled leaf protein (100%)
J
[GenBank]
44
60
VAGKSFVPIALGRQSKQTPF
Tobacco P7
curled leaf protein (55%)
K
[GenBank]
40
60
GPVEIYYLQSADAKG
L
[GenBank]
38
25
VKIGTYELLKGDFSV
[GenBank]
38
ELQLNYYTKSXXRAE
Tomato
peroxidase (72%)
M
[GenBank]
35
12
ELQLNYYTKSWXRAE
Tomato peroxidase
(72%)
O
[GenBank]
23
12
ALVEDPQMQKYHKH
Borate
B
[GenBank]
44
10
VAGKSFV
Tobacco P7 curled
leaf protein (100%)
C
[GenBank]
42
4
HPVEI
D
[GenBank]
40
8
ELQLNY
Tomato peroxidase (67%)
H
[GenBank]
23
14
SNPNFILTLVNNVPYTIWPA
Tomato
osmotin (90%) (70%)b
I
[GenBank]
20
5
EYIPFIHEWV
Sequentially
extracted tobacco proteins
Culture
A
[GenBank]
70
6
GLVPPADKY
Filtrate
B
[GenBank]
41
31
AVNGGPATLPEYQI
Human
40-kDa urinary tract integral stone protein (64%)
C
[GenBank]
34
10
AHVEVPNSLY
CaCl2
A
[GenBank]
67
11
STIEVRNNSPYYSVD
Tobacco
osmotin (60%)
B
[GenBank]
66
9
VPPAVWNSXNYNS
C
[GenBank]
57
10
GEQPGDQARGARNPXGNN
Tobacco
chitinase (55%) (62%)b
D
[GenBank]
56
7
QDPYVDFLK
E
[GenBank]
52
8
AQPPQQADFL
F
[GenBank]
50
5
FYAGLILTLVNTFPYNISPASS
Tobacco
protein P10 (64%) (59%)b
G
[GenBank]
48
9
QYVKDPDKQVVARIFLDLQLVQR
H
[GenBank]
47
40
ATIEVYNILPYYYYVSKAWSWNGN
I
[GenBank]
46
8
GEQPGDQARGARPXGNN
Tobacco
chitinase (47%)
J
[GenBank]
45
8
QDAYRFLXTHTYG
K
[GenBank]
40
5
QPEESVFFA
L
[GenBank]
34
38
WPXAQIFSAVRGXVN
M
[GenBank]
33
7
EQCQDMAGGR
French bean chitinase
(72%)
N
[GenBank]
28
11
IWVGISYKIHSLYFQ
O
[GenBank]
27
28
GYPRKXVDVFTFTN
P
[GenBank]
15
10
TIEEVLNLPPYVVAA
Q
[GenBank]
14
12
AVFVILTNVYT
CDTA
B
[GenBank]
46
3
GPEEWVK
E
[GenBank]
33
5
EQCQDMAGGAR
French bean chitinase
(72%)
a
Sequences which have homologies to
Arabidopsis ESTs.
b
Sequences which have homologies to ESTs from species other
than Arabidopsis.
c
This band could be separated into 4 more by increasing the
strength of the resolving gel from 10 to 14% (w/v) acrylamide.
From the 86 proteins selected for sequencing from the Arabidopsis extracts (Fig. 1A), 47 sequences were obtained, two of which yielded double sequences: band F in the CaCl2 extract and band F in the non-sequential NaCl extract. In the case of band F from the CaCl2 extract, it was possible to decipher the double sequence into separate sequences. This was performed by a process of subtraction since both the sequences within the CaCl2 band F also appear as single sequences at other molecular weights within the same CaCl2 extract, i.e. CaCl2 band E and CaCl2 band O. Discounting sequences that were present at either more than one molecular weight or were within other fractions left 31 unique sequences, which were found within the Arabidopsis wall extracts. The presumptive glycoprotein sequence starting "KVPV" is present in several extracts at more than one molecular weight (culture filtrate, band E; CaCl2, bands E and F; non-sequential CDTA, bands A and B) and may represent differential glycosylation. Evidence for microheterogeneity in primary sequence can be seen in bands F and G of the Arabidopsis culture filtrate proteins, which contained identical sequences except for the tyrosine or proline at position two. Bands K and L of the culture filtrate were found to contain identical sequences beginning with "ASSS." The sequence beginning "NPNY" was found in the CaCl2 extract, bands F, D, and O; the sequential CDTA extract, band D; and the non-sequential NaCl extract, band E. The sequence beginning with "IPCR" was found in the CaCl2 extract, band R and the non-sequential CDTA extract, band K. The sequence beginning "RIPG" was found within the CaCl extract, band Q and the non-sequential NaCl extract, band K. The sequence beginning with "ARKF" was seen in the sequential DTT and NaCl extracts, bands B and H, respectively, and also within bands F and G of the non-sequential DTT extract. Bands I and J from the non-sequential CDTA extract were also found to contain the same sequence beginning with "KDLX."
Carrot Cell Wall ProteinsAll of the 10 carrot sequences were unique to carrot and were different from each other.
French Bean Cell Wall proteinsFrom the 26 French bean proteins that were processed for sequencing (Fig. 1C), 21 sequences were obtained, of which six shared the same N-terminal four amino acids: "NYDK." These sequences (culture filtrate, bands A and B; CaCl2, bands C, D, L, and O) were identical apart from the CaCl2 extract's band O, where the sequence diverged substantially after the fourth amino acid. All of the other bean sequences were only seen in this species and were not represented at multiple molecular weights. Not counting sequences that appeared more than once left a total of 17 unique sequences in the bean extracts.
Tomato Cell Wall ProteinsSeventy-eight proteins were selected for sequencing from the tomato extracts (Fig. 1D), of which 46 yielded sequence information. Four of these proteins contained double sequences: culture filtrate band G, CaCl2 bands D and O, and the non-sequential NaCl band L, all of which were deciphered into two individual sequences. This was possible by subtraction, in a similar way to deciphering the Arabidopsis double sequences. From the 46 sequences obtained, 30 were unique. The shortfall is due to a total of nine sequences, which were found at more than one position, or extract. Bands containing the sequence beginning "AKDF" were seen in the culture filtrate as bands G, H, and Ia (band I could be resolved into four more components by increasing the polyacrylamide concentration from 10 to 14% in the electrophoresis separating gel). The sequence beginning "KSTD" was also seen in band Ib of the culture filtrate and as the second component of band G of the same extract except that the N-terminal residue Lys was absent in the latter case. The sequence beginning "EQXG" was seen a total of eight times: culture filtrate, band E; CaCl2, bands J, K, and L; the sequential CDTA extract, bands E and F; the sequential NaCl extract, band E; and the non-sequential DTT extract, band E. However, the CaCl2 band J sequence was only the same up to the sixth residue, after which it diverged compared with culture filtrate band E. The sequence beginning "ANAK" appeared twice: once in the non-sequential NaCl extract, band G, and as one of the two short sequences that belong to CaCl2 band D. Bands containing the sequence beginning with "VAGK" were CaCl2, bands G and H; non-sequential NaCl extract, bands I and J; and band B of the non-sequential borate extract. These sequences were identical except residue 11, which was Ala in band G of the CaCl2 extract and Leu in band J of the non-sequential NaCl extract. Sequences beginning with either "SNPN" or as "TNPN" appeared four times: twice beginning with TNPN, band Q in the CaCl2 extract and as band G in the sequential CDTA extract, and twice beginning with SNPN as band F in the sequential NaCl extract and as band H in the non-sequential borate extract. Apart from this heterogeneity at the N terminus, all four sequences were identical. The sequence beginning "ALVE" appeared as band O in the non-sequential NaCl extract and as one of the two short sequences that were identified from band O of the CaCl2 extract. Incidentally, the second of the short sequences from band O of the tomato CaCl2 extract (beginning "ADRE") was also noted within the Arabidopsis CaCl2 extract, band T. Bands containing the sequence beginning with "EVLY" were seen in the CaCl2 extract, band F and in the non-sequential NaCl extract, band H. Finally, the sequence beginning with "ELQL" was seen in the non-sequential NaCl extract within bands L, M, and the non-sequential borate band D.
Tobacco Cell Wall ProteinsTwenty-seven of the protein bands from the tobacco extracts were selected for sequencing (Fig. 1E). Only two of the sequences were found to be either in more than one of the tobacco extracts or at more than one molecular weight within the same extract. The first begins with "GEQP" and was in the CaCl2 extract as band C and band I. The sequence from band C was 18 amino acids long and that of band I, 17 amino acids long; both were identical, except in the case of band C's sequence, which had an extra Asn present at amino acid position 13. The second tobacco sequence to appear in two different extracts began with "EQCQ" and was found within the CaCl2 extract, band M and the CDTA extract, band E. All the other tobacco sequences are listed as they appear in Table II. Discounting any sequences that appeared more than once in these extracts left a total of 20 unique tobacco sequences.
Overall Series of Sequencing Cell Wall ProteinsOn average 63% of the proteins from all of the plant species proved to be sequenceable, but these were only from the bands selected for analysis that were generated in this study. Generally the culture filtrate proteins yielded a high success rate in terms of the numbers of proteins that were sequenced: 92%, 11 out of 12, for Arabidopsis; 67%, 8 out of 12, for tomato; 100% for bean and tobacco: two and three proteins, respectively. In contrast, the two culture filtrate proteins selected from the carrot culture filtrate did not sequence. Those proteins extracted with CaCl2 also gave a high success rate when sequenced: 60%, 12 out of 20, for Arabidopsis; 100%, eight out of eight for carrot; 94%, 16 out of 17, for French bean; 71%, 12 out of 17, for tomato and 100%, 17 out of 17, for tobacco. In those extracts that were made sequentially after the initial CaCl2 wash, the number of proteins that were successfully sequenced from those selected dropped off significantly. For example, in the sequential CDTA extracts 40%, two out of five, for Arabidopsis; 43%, three out of seven, for tomato; 40%, two out of five, for carrot, 60%, three out of five, for bean and 33%, two out of six, for tobacco were sequenceable. In the subsequent sequential extracts where protein bands were targeted for sequencing, considerably less proteins were amenable to this type of analysis. It was because of this continuing decline in successfully obtaining sequences that paralleled each step in the sequential extraction that none of the sequential borate proteins were selected for sequencing.
Homology Searches
The identities given for each sequence similarity listed in Table II can be defined as the percentage of amino acid matches for that query sequence against a sequence found in a particular data base. It should be noted that this figure does not take into account any conservative substitution. Throughout this study there have been numerous examples of a single sequence that was either present at more than one molecular weight within the same extract, or was present in more than one extract. It was therefore general practice to terminate sequence runs after the first four or five sequencing reaction cycles if this was observed. However, it is clear that microheterogeneity is occasionally seen between almost identical sequences, especially when longer sequences were obtained.
In the case of Arabidopsis, the classic model plant, which
is a Brassica (an order that also includes plants such as oil seed rape
and cauliflower), the proteins that could be identified were a
subtilisin-like protease, -1,3-glucanase, two xyloglucan
endotransglycosidases, an extracellular ribonuclease, cellulase, a
carrot-like glycoprotein, an Arabidopsis hypothetical
protein, and an expansin. Those that showed lower sequence
similarities, with less than 65% identities, were a cytochrome P-450,
triose-phosphate isomerase, and a mouse T-cell receptor. Nineteen
proteins did not show any similarity to any member of the data bases
searched.
The member of the Umbelliferae, carrot, was included because it has been used to mimic elongation growth and embryogenesis. Only one of the 10 sequences obtained from the carrot wall fractions bore any similarity to data base sequences, unlikely as it may seem, which corresponded to an unknown genomic DNA sequence from human.
The legume, French bean, has been used to model both differentiation and elicitor-induced pathogen stress (2). Identified wall proteins include extensin and a hybrid proline-rich, cysteine-rich chitin-binding protein (2). The bean culture was the only one in this study that led to the identification of hydroxyproline-rich glycoproteins. There were also two proteins that had a degree of similarity to two bacterial proteins, one of which was an iron hydrogenase. Thirteen of the French bean proteins were unlike any members present within the data bases searched.
For the two model solanaceous species, tobacco and tomato, there were
several proteins, common to both species, that could be identified in
similarity searches and based on these results may therefore have
similar functions. For example, both species appeared to have forms of
osmotin, chitinase, and pathogen-related proteins: tobacco P7 curled
leaf protein in the case of tomato extracts and tobacco protein P10 in
the case of the tobacco extracts. The tomato extracts were also found
to contain proteins that bore similarity to other known extracellular
proteins, which were an extracellular ribonuclease, -1,3-glucanase,
subtilisin-like protease,
-exoglucanase, and a peroxidase. One of
the other tomato proteins that was found to have an analogue within the
data bases was a human IgG heavy chain fragment. Altogether, 18 of the
tomato proteins could not be identified by data base searches. Whereas
the tomato fractions only contained a single form of chitinase, similar
to one from French bean, the tobacco extracts were found to have two
forms, one resembling a French bean chitinase and the other a tobacco
chitinase. The tobacco extracts also contained a protein that bore
similarity to an integral stone protein associated with the human
urinary tract. Altogether, 15 of the tobacco proteins could not be
identified by data base searches.
The proteins of the plant extracellular matrix comprise a subset that contains both structural proteins and enzymes. Many of these have been identified at the protein level, but a large number have also been characterized from gene sequences on the basis of a repeat motif or a secretory leader sequence. In comparison to present knowledge of the proteins of some metabolic pathways such as the Calvin cycle, shikimic acid, and lignin biosynthetic pathways, the components of which have been completely cloned, the proteins and their cognate genes of the extracellular matrix are poorly characterized. This is probably due in part to the relative inaccessibility of wall proteins in the plant. Additionally, differentiation of the cells, which in part resides in profound changes in wall structure, and the extent to which individual proteins are immobilized within these structures increases complexity. One approach to increase knowledge of the range of wall proteins is to use tissue-cultured cells, which can be grown in bulk to allow characterization of proteins of walls that resemble primary walls. These cultures can also be stimulated to mimic developmental processes such as elongation growth and embryogenesis in carrot cells or xylogenesis and secondary wall formation as in Zinnia or French bean (24). A wide range of cells have also been used to mimic aspects of pathogen stress using fungal elicitors. A distinct advantage of cell suspensions is that they can be subjected to washing regimes that elute non-covalently bound proteins without disrupting the cells. Our studies demonstrate, by comparing the plant species, that (i) patterns of proteins revealed by SDS-PAGE are strikingly different, (ii) a large number of these proteins that were sequenced cannot be identified by data base searches, and (iii) the major proteins that can be identified belong to very different classes of proteins. It appears that aspects of speciation resides in the complement of extracellular and cell wall proteins. Although it is difficult to discriminate between an extracellular protein and a cell wall protein in planta, this is not the case with suspension-cultured cells and is exemplified since most of the protein sequences found within the culture filtrates were absent from the subsequent salt-eluted extracts. Moreover, the culture filtrate profiles for each individual species was different to any of their respective wall extracts (Fig. 1, A-E).
The use of tissue cultures to model structures and processes in the whole plant has been validated by a large number of studies. These show that purification of proteins and cloning of cDNAs that originate through this route give rise to antibodies and cDNAs that can be used on the whole plant and locate to the predicted tissues and cells. Examples include structural proteins such as glycine-rich proteins (25), hydroxyproline-rich glycoproteins (5), proline-rich proteins (26), and enzymes such as peroxidases (23), chitinases, and laccases (27). Analyses of the sequences generated in this study validate the methodology, since many are known wall proteins and the cells remain intact. Only two potential cytoplasmic proteins have been revealed, a triose-phosphate isomerase and a P-450, of which their degree of identity was relatively low. Indeed, the types of extracellular proteins we have come across and identified through homology searches encompass the carbohydrate-modifying enzymes such as xyloglucan endotransglycosidases, cellulases, and glucanases; other examples of wall proteins are the expansin, peroxidase, protease, ribonuclease, chitinases, extensin, and proline-rich protein.
There are a number of reasons why certain known wall proteins may be absent from this study. The most conspicuous example is perhaps members of the hydroxyproline-rich glycoprotein family, only two of which were detected in the bean extracts. Their absence could be because they are present as minor components of the wall and consequently not detected here, or that they are only expressed in response to a particular environmental or developmental response that is not present within the culture regimes (28). The proteins could be N-terminally blocked, or it may also be that they are not readily extracted by these reagents, or that the more heavily glycosylated types are not stained with the Coomassie dye. It would seem that many of the enzyme activities associated with the wall are only known through function so may not yet be present within the data bases and are still waiting to be characterized by purification and subsequent sequencing.
For certain sequences within each species, there also existed a certain degree of heterogeneity in terms of the occasional amino acid substitution and also their appearance at different molecular weights. The former may be explained by the proteins originating from different genes and the latter by post-translational mechanisms such as glycosylation.
Cloning of any of the protein sequences reported here will undoubtedly be accelerated by the ever increasing numbers of ESTs that are being characterized. One surprising outcome of this research is that the sequenced Arabidopsis proteins did not have a larger proportion of "hits" in the EST data base. This could be because the majority of ESTs do not represent full-length cDNAs. Greater exploitation of the EST data base would thus require extensive internal amino acid sequence data to be generated if this were the case. It is envisaged that, within the foreseeable future, most if not all of the genes in several plants will be available in public access data bases. However, it has also been estimated that the function of at least 70% of the Arabidopsis genes alone, at present, cannot be identified by sequence homology to genes or proteins that have already been designated a function (7, 8). Therefore, having prior knowledge relating to a gene, such as its expression pattern or the eventual location of the expressed protein, may help in elucidating function. This information should therefore complement new data from systematic DNA sequencing exercises in that it gives a direct localization of the protein identified in this way.