Proteomics in reproductive research: The potential importance of proteomics to research in reproduction

Ian A. Brewis1

Department of Molecular Biology and Biotechnology and Department of Obstetrics and Gynaecology (Jessop Hospital for Women), The University of Sheffield, Sheffield, S10 2UH, UK

Recent developments in certain areas of reproductive technology, e.g. intracytoplasmic sperm injection (ICSI), animal cloning and human embryonic stem (ES) cell derivation, have been remarkable and advances are continuing at a considerable rate. However, our understanding of the molecular basis of most aspects of reproduction, particularly in the human, is still extremely poor. This is in marked contrast with the enormous developments in analysis of both genes and their products (proteins) that are currently taking place in the biosciences. For example, microscopic arrays (microarrays) of DNA or oligonucleotides, containing up to several hundred thousand different sequences arranged as individual spots on a `chip', are beginning to be applied to genomic studies and investigations into gene expression (Graves, 1999Go). Over and above this, sequencing programmes of entire genomes are proceeding at a dramatic pace with the Human Genome Project fulfilling its promise as the single most important project in biomedical science. It is currently ahead of schedule and latest estimates predict a draft version by Spring 2000 with a polished highly accurate version by 2003 (Collins et al., 1998Go; Marshall, 1999Go).

Once completed, the human genome sequence will present unique scientific opportunities to researchers. However, this huge database will only be a starting point for the more challenging aim of understanding how a cell or whole tissue actually functions at a molecular level during health and disease. Until now, the molecular biology techniques available for high-throughput DNA analysis have resulted in an emphasis on the `message' (mRNA or cDNA) rather than the product of the message (the protein) (Blackstock and Weir, 1999Go). While this approach will remain important, DNA sequence data alone reveals nothing about the level of protein expression, the protein isoforms that may be produced from each gene (by alternative splicing) or the extent to which proteins are post-translationally modified.

Therefore, many molecular researchers predict that attention will increasingly turn to the identification and characterization of the PROTEin products expressed by the genOME of an organism or tissue and the term `proteome' has been coined to describe this approach (for reviews see Pennington et al., 1997; Celis et al. 1998; Blackstock and Weir, 1999; Dove, 1999). Unlike the genome, the proteome is not a fixed feature of an organism. Rather it differs from tissue to tissue and even between different cell types. The protein expressed by a tissue or cell is also dependent on the stage of development and may vary as a result of the environment, e.g. due to a disease state. Although not being undertaken on the same scale as the genome sequencing programmes, there is much evidence to suggest that proteome analysis (proteomics) will have a significant impact on our understanding of the molecular composition and function of cells. Indeed, interest in this approach is increasing and it is rapidly becoming a central theme in molecular research. Currently there are a number of projects employing proteomics to study lower organisms or a particular human tissue, which typically contain several thousand proteins (Pennington et al., 1997Go). However, very few researchers in molecular reproduction currently employ proteome technology and the aim of this article is to introduce the approach and suggest some potential applications that might be important to research in human reproduction.

There are a number of variations in the methodology involved in proteome technology (for a comprehensive description, see Link, 1999). However, proteomics has been, and continues to be, dominated by two-dimensional (2D) electrophoresis and this will undoubtedly remain the core technology for the foreseeable future. This is because it is still the most effective and reproducible method of separating a complex mixture of proteins. The basic principles were derived 25 years ago, and they have remained essentially unchanged (O'Farrell, 1975Go). Proteins in a cell or tissue extract are separated first in one dimension on the basis of charge (isoelectric focusing, IEF) and then in a second dimension on the basis of molecular size (polyacrylamide gel electrophoresis, PAGE) resulting in a defined pattern of spots (Dove, 1999Go). For many years, 2D electrophoresis was beset by problems of spatial non-reproducibility and the difficulty of separating basic proteins. However, there have been some important recent advances facilitating marked improvements. In particular, the advent of immobilized pH gradient strips (IPG) for the IEF dimension now allows both consistency of resolution and also separation of basic proteins on the same gel which was not previously possible (Görg et al., 1988Go; Celis and Gromov, 1999Go).

Following 2D electrophoresis and visualization, e.g. by silver staining of proteins, there are then essentially two concurrent phases to proteome analysis. In the first instance, all of the proteins expressed by a particular tissue or cell under normal conditions may be identified and their position on 2D gels mapped – the constitutive proteome. This information can then be used to generate a 2D reference map and database and a number of these have been established for certain human tissues (for website addresses, see Pennington et al., 1997). Such well-maintained and readily accessible databases are imperative to the successful development of proteomics. The advent of affordable computer systems and commercial software packages has made it easier to scan and archive gels and to characterize protein expression quantitatively. 2D reference maps can then be used as a point of comparison for changes in protein expression, post-translational modification or even sub-cellular localization during development or due to changes in physiological conditions caused by disease or other stimuli. Secondly, in addition to mapping approaches, it is also possible to engage in targeted proteomics to address specific questions regarding function of the cell or tissue. For example, proteomics may be used to study protein–protein, protein–nucleic acid or protein–small molecule interactions. In this respect such studies are likely to be focused on groups or clusters of proteins rather than the entire proteome and to involve antibody based approaches.

Central to proteome technology is the ability to be able to obtain sequence data to identify individual protein spots. While it is still possible to use traditional Edman sequencing, mass spectrometry has become the method of choice for protein identification and for the identification of post-translational modifications as it is much more sensitive (<10 pmol protein needed). This is important because proteomics has no equivalent of the PCR amplification used in genomics and therefore correct sample handling and sensitivity are critical. An individual protein spot is first excised from the gel, trypsin-digested, the sample cleaned and then subjected to peptide mass fingerprinting, usually using matrix-assisted laser desorption-ionization time-of-flight (MALDI-TOF) mass spectrometry where the masses of the peptides derived from the in-gel digestion are accurately measured (Jensen et al., 1998Go).

Protein identification from this mass spectrum data may be accomplished by a variety of methods. The use of PeptideSearch software to screen a comprehensive non-redundant protein sequence database located in one dedicated centre, either the National Center of Biotechnology Information (NCBI), USA or European Bioinformatics Institute (EBI), Hinxton, Cambridge, UK, is recommended although there are a number of other options available. If it is not possible to identify a protein unambiguously in this fashion then nanoelectrospray tandem mass spectrometry may then be performed. Peptide sequence tags are constructed and are used for database searches in either full-length databases, e.g. SWISS-PROT, or back-translated to cDNA in expressed-sequence tag (EST) databases in order to identify the protein or a corresponding EST. As a last resort, in cases where the protein still remains unknown, it is then possible to compare sets of tandem mass spectra to obtain amino acid sequence in order to design oligonucleotide probes for cloning of the cognant gene (for further details and websites see Pennington et al., 1997; Jensen et al., 1998).

The potential impact for proteomics on molecular research in reproductive medicine is enormous. Mapping tissues important to reproductive function (e.g. testis, ovary, placenta, endometrium and oviduct) as well as gametes and embryos will enable us to carefully examine a variety of events at the molecular level. The effects of reproductive toxicants, particular disease states or hormones and cytokines might also be examined at a protein expression level to help us improve our understanding of these processes. Diseases that might be investigated include implantation failure and miscarriage, infertility and polycystic ovary (PCO) syndrome. In this context it is important to remember that many diseases, for example endometriosis, are not necessarily genetic in origin but may result from aberrant expression and localization of protein. In addition, stage-specific events, such as those related to the menstrual cycle or embryo and gamete development, might also be probed. To date, there have been relatively few studies examining proteomes of whole tissues of the reproductive tract. The only notable exception is a study of human endometrial tissue which illustrates the potential of proteomics to reproduction (Byrjalsen et al., 1999Go). In the search for new markers of hyperplasia and adenocarcinoma, workers have demonstrated that certain proteins are more abundant. A number of these have been identified and it is hoped that specific assays might be established for the early diagnosis, prognosis or choice of treatment of endometrial disorders.

No studies have been conducted on the human embryo but in the early 1990s Keith Latham and Davor Solter characterized a number of aspects of early mouse embryo development. They established proteome maps of the early embryo and analysed the extensive reprogramming in protein synthesis that occurs at the 1- and 2-cell stages (Latham et al., 1991Go). Alterations in protein synthesis following nuclear transfer of 8-cell nuclei to enucleated 1-cell embryos were probed (Latham et al., 1994Go) and differences in protein synthesis in different regions of the embryo were also investigated (Latham et al., 1993Go). However, these studies fell short of identifying individual protein spots. With the advent of the new technology for highly sensitive protein sequencing it would be very interesting to determine the nature of the proteins involved. In particular the identification of egg cytosolic factors involved in reprogramming the nucleus following fertilization and differences at a protein level between embryo tissue and embryonic stem (ES) cells would be very important. Recent work has also mapped different stages of embryo development in the mouse and clearly demonstrated that the approach is very sensitive. Whilst 20 embryos is considered to be the optimal loading for visualization of the entire proteome, it is possible to observe the major proteins present by analysing just a single embryo (Sasaki et al., 1999Go). This is important when considering the scarcity of certain human material available for research and it would be very interesting to use these approaches to examine human embryos.

In gamete research, The Center for Recombinant Gamete Contraceptive Vaccinogens have begun the long task of mapping the proteome of human spermatozoa and shown that these cells contain ~1400 distinct protein moieties (Naaby-Hansen et al., 1997Go). A few of the proteins present have been identified and they have recently used proteomics to investigate which sperm proteins are recognized by antisperm antibodies in the serum of infertile men and women (Shetty et al., 1999Go). Other examples where proteomics might be employed to study function include the identification of sperm proteins that are responsible for gamete recognition during primary zona interactions. Many events at the molecular level at fertilization rely on post-translational modifications of key proteins and hence proteomic studies will be more informative than using standard DNA/RNA technology.

In summary, the potential importance of proteomic approaches has been clearly demonstrated in other fields of human medical research, including liver and heart disease and certain forms of cancer (for websites see Pennington et al., 1997). The continued integration of proteomic and genomic data will have a fundamental impact on our understanding of the normal functioning of cells and organisms and will give insights into complex cellular processes and disease and provide new opportunities for the development of diagnostics and therapeutics. The challenge to researchers in the field of reproduction is to harness this new technology as well as others that are available to a greater extent than at present as they have considerable potential to greatly improve our understanding of the molecular aspects of reproduction both in health and disease.

Acknowledgments

The author thanks Professor Harry Moore (The University of Sheffield) and The Infertility Research Trust for their continued support.

Notes

1 To whom correspondence should be addressed at: Department of Molecular Biology and Biotechnology, The University of Sheffield, Sheffield, S10 2TN, UK. Back

This debate was previously published on Webtrack 88, September 28, 1999

References

Blackstock, W.P. and Weir, M.P. (1999) Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol., 17, 121–127.[ISI][Medline]

Byrjalsen, I., Larsen, P.M., Fey, S.J. et al. (1999) Two-dimensional gel analysis of human endometrial proteins: characterization of proteins with increased expression in hyperplasia and adenocarcinoma. Mol. Hum. Reprod., 5, 748–756.[Abstract/Free Full Text]

Celis, J.E. and Gromov, P. (1999) 2D protein electrophoresis: can it be perfected? Curr. Opinion Biotechnol., 10, 16–21.[ISI][Medline]

Celis, J.E., Ostergaard, M., Jensen, N.A. et al. (1998) Human and mouse proteomic databases: novel resources in the protein universe. FEBS Letts., 430, 64–72.[ISI][Medline]

Collins, F.S., Patrinos, A., Jordan, E.et al./DOE and NIH planning groups (1998) New goals for the U.S. Human Genome Project: 1998–2003. Science, 28, 682–689.

Dove, A.Q. (1999) Proteomics: translating genomics into products. Nature Biotechnol., 17, 233–236.[ISI][Medline]

Görg, A., Postel, W. and Günther, S. (1988) The current state of two-dimensional electrophoresis with immobilised pH gradients. Electrophoresis, 9, 531–546.[ISI][Medline]

Graves, D.J. (1999) Powerful tools for genetic analysis come of age. Trends Biotechnol., 17, 127–134.[ISI][Medline]

Jensen, O.N., Larsen, M.R. and Roepstorff, P. (1998) Mass spectrometric detection and microcharacterisation of proteins from electrophoretic gels: strategies and applications. Proteins – Structure, Function and Genetics, S2, 74–89.

Latham, K.E., Garrels, J.I., Chang, C. and Solter, D. (1991) Quantitative analysis of protein synthesis in mouse embryos. I: Extensive reprogramming at the one and two-cell stages. Development, 112, 921–932.[Abstract]

Latham, K.E., Beddington, R.S., Solter, D. and Garrels, J.I. (1993) Quantitative analysis of protein synthesis in mouse embryos. II: Differentiation of endoderm, mesoderm, and ectoderm. Mol. Reprod. Dev., 35, 140–150.[ISI][Medline]

Latham, K.E., Garrels, J.I. and Solter, D. (1994) Alterations in protein synthesis following transplantation of mouse 8-cell stage nuclei to enucleated 1-cell embryos. Dev. Biol., 163, 341–350.[ISI][Medline]

Link, A.J. (1999) Methods in Molecular Biology. Vol. 112. 2D proteome analysis protocols. Human Press Inc., Totowa, New Jersey, USA.

Marshall, E. (1999) Human Genome Project: Sequencers endorse plan for a draft in 1 year. Science, 284, 1439–1441.[Free Full Text]

Naaby-Hansen, S., Flickinger, C.J. and Herr, J.C. (1997) Two-dimensional gel electrophoretic analysis of vectorially labeled surface proteins of human spermatozoa. Biol. Reprod., 56, 771–787.[Abstract]

O'Farrell, P.H. (1975) High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem., 250, 4007–4021.[Abstract]

Pennington, S.R., Wilkins, M.R., Hochstrasser, D.F. and Dunn, M.J. (1997) Proteome analysis: from protein characterisation to biological function. Trends Cell Biol., 7, 168–173.[ISI]

Sasaki, R., Nakayama, T. and Kato, T. (1999) Microelectrophoretic analysis of changes in protein expression patterns in mouse oocytes and preimplantation embryos. Biol. Reprod., 60, 1410–1418.[Abstract/Free Full Text]

Shetty, J., Naaby-Hansen, S., Shibahara, H. et al. (1999) Human sperm proteome: immunodominant sperm surface antigens identified with sera from infertile men and women. Biol. Reprod., 61, 61–69.[Abstract/Free Full Text]