Division of Basic Science, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111
ABSTRACT
Einarson, Margret B., and Erica A. Golemis. Encroaching genomics: adapting large-scale science to small academic laboratories. Physiol Genomics 2: 8592, 2000.The process of conducting biological research is undergoing a profound metamorphosis due to the technological innovations and torrent of information resulting from the execution of multiple species genome projects. The further tasks of mapping polymorphisms and characterizing genome-wide protein-protein interaction (the characterization of the proteome) will continue to garner resources, talent, and public attention. Although some elements of these whole genome size projects can only be addressed by large research groups, consortia, or industry, the impact of these projects has already begun to transform the process of research in many small laboratories. Although the impact of this transformation is generally positive, laboratories engaged in types of research destined to be dominated by the efforts of a genomic consortium may be negatively impacted if they cannot rapidly adjust strategies in the face of new large-scale competition. The focus of this report is to outline a series of strategies that have been productively utilized by a number of small academic laboratories that have attempted to integrate such genomic resources into research plans with the goal of developing novel physiological insights.
proteomics; two-hybrid analysis; genomic resources
IT IS AN INTERESTING TIME to be a young independent investigator. Of course, those of an optimistic temperament could probably make a similar statement at any random point throughout scientific history; further justifications need to be provided for the claim of interest to rise above the trite. In this report, we would like to make the argument that the opening of an era promising to be dominated by genomics, and the accompanying technological tsunami, are a particularly stimulating force for young investigators. Currently, large projects such as the genome-wide mapping of polymorphisms, characterization of the proteome via analysis of protein-protein interactions across complete organismal genomes, and the multi-organismal Genome Sequence Projects garner extensive media attention and claim a highly publicized share of available scientific funding [although in recent years, the actual percent of the National Institutes of Health budget devoted to the Human Genome Project has hovered between only 12% of total available funding (12)]. The initial justification for these projects was that they would lead to significant benefits for science and the general public. The cited benefits include advances in generally available information and the development of new technologies, originally to support overtly genomic efforts, that would be adaptable into standard research techniques. These projects are certainly fulfilling these promises, on or ahead of schedule, and their benefits to the public and the pursuit of science as a whole are undoubted. But what of their impact on traditional academic investigators, intent on pursuing science as a career as well as a calling?
"Traditionally" (that is, over the last 20 years), cellular and molecular biological research laboratories generally have grounded their efforts in one of two philosophical approaches. The first approach has been to focus on a single biological process, using genetic, biochemical, and descriptive analyses to generate a complete characterization of the elements involved in the process. The second approach has been to focus primarily on a specific biological molecule, whether protein, nucleic acid, or lipid, that was originally nominated as interesting because of its known association with a biological phenomenon. This molecule is then followed through multiple biological processes to gain a complete view of its potential functions. Whether the initiating impulse was the study of a process or a protein, the pursuit of the research goal led gradually over time to the establishment of a web of intersecting control mechanisms. Both of these approaches have been extremely successful in generating biological insights and have provided the conceptual framework for current understanding of organismal function.
Over the last few years, the genomics revolution has begun to impact the normal paradigm of conducting research in highly significant ways. One challenge emerging from the new technologies is that the large genomics projects under way seem very likely to render obsolete large components of standard research project design, causing investigators to have to rethink (in some cases, very rapidly) how to carry on productively in their core research areas. For example, if a laboratory's primary research interest has been to identify genes induced by a particular transcription factor implicated in stress response, groups using traditional approaches to identify individual targets are likely to find their work overwhelmed by the transcriptional profiles that can rapidly be developed by groups with access to gene chip technology. If a laboratory has been proceeding through the characterization of a novel oncoprotein Z, and Z is suddenly assigned a complete network of interactive partners and sphere of influence through the activity of mass spectrometric and two-hybrid proteome projects, what is left for the laboratory originally chasing Z to do, and how well positioned will it be for subsequent Z studies? Clearly, the answer depends on how creatively the original Z researchers can move from identification of networks back to the original biology and function that caused Z to be of interest in the first place.
On the positive side, the genomics tools that have become widely available in the last several years, including access to full sequence databases to facilitate cloning, structural modeling to predict function, cross-genome comparisons, and protein interaction approaches, which place otherwise uncharacterized open reading frames (ORFs) into specific pathways, have greatly reduced the activation energy required for investigators to launch into new project areas. In addition to these novel technologies is the emergence of a cadre of aggressively competing biotechnology supply companies ready to provide investigators with a broad set of clones, antibodies, and other tools for relatively minor financial expenditure. Previously, a laboratory would have to develop these resources alone, which would favor larger groups with more financial backing. In this environment of more equitable resource availability, a decision to attempt higher risk/higher impact science is not so strongly disfavored, because much less time is required to build or obtain the requisite tools. This carrot, coupled with the stick of large-project competition noted above, provides a second strong impetus toward creative functional analysis.
At this point, no single model for genomics-augmented research in small laboratories has emerged, and different groups are exploiting the new resources in different ways. It is difficult if not impossible to assign priority to any one group for devising successful genomic strategies; the following set of examples typify approaches utilized by our laboratory and other small groups headed by relatively junior principal investigators that have from inception developed research programs influenced heavily by genomic approaches and appear to be having some early success in providing insight into interesting biological problems.
Comparative analysis of defined genes in model organisms.
In devising a new research project, the intellectual progression is generally from the abstract to the concrete. First, an area of physiological interest is identified; next, a relatively limited number of central molecules are identified for intensive study; finally, the mechanism of experimental analysis is established. Until relatively recently, a primary limiting experimental parameter for small groups has been the determination of which organism would be suitable for the desired project. Saccharomyces cerevisiae studies would tend to support genetic approaches, Xenopus oocytes would be useful for biochemical analysis and studies of early development, work in mammalian cells or in mammals would be preferred for cell biological analyses of processes, and so forth. As each of these approaches has the potential to yield unique insights into a gene's function, it would be desirable to pursue analyses in multiple organisms, particularly for genes that are strongly conserved across evolution. Given the considerable demands of establishing expertise in specific organismal systems, it has been difficult for individual small laboratories to significantly exploit more than one or two organisms under their own purview, and this is not likely to change significantly. Nevertheless, by utilizing genomic resources to develop a core set of reagents suitable for studying a gene of interest in multiple organisms and by developing a set of collaborations among interested groups with relevant expertise, it is possible to make progress over a broad front.
The developing story of the Ste20/Pak kinases demonstrates the progress that can be made coordinating a gene-centered cross-species approach (Fig. 1). The STE20 kinase in S. cerevisiae was identified based on its physiological role as an essential upstream element in the mating pheromone signaling pathway (32, 43). The identification of the mammalian ortholog (gene of homologous function) of STE20, the p21-activated kinase (Pak) (37), as a kinase associated with activated Cdc42 and Rac, additionally implicated the STE20/Pak proteins as effectors of these small GTPases, which regulate actin organization, cell motility, DNA synthesis, and programmed cell death. For these reasons, it appeared focus on this ortholog group might provide insight into the coordination of central kinase cascades and cytoskeletal controls. In the Chernoff laboratory (Fox Chase Cancer Center), these findings were pursued by comparing S. cerevisiae STE20 and mammalian Pak to design degenerate primers to amplify Paks from a series of model organisms. They successfully isolated Paks from S. pombe, Caenorhabditis elegans, and Drosophila melanogaster, as well as Pak-related proteins from S. cerevisiae and mammals. This project resulted in the generation of an extended profile of the Pak family (reviewed in Ref. 48), confirming the ubiquitous presence of Paks in eukaryotes and providing the ability to assay Paks in the organisms assayed. Studies in S. pombe for the first time indicated that loss of Pak function could result in lethality (42), a result obscured in S. cerevisiae because of the presence of genes of partially redundant function. Strikingly, loss of S. pombe Pak1 activity also resulted in loss of cell polarity, and had multiple similarities to defects in Cdc42, confirming an evolutionarily conserved role of Paks in the regulation of the actin cytoskeleton. Examination of Pak1 in mammalian tissue culture cells showed that Pak1 overexpression results in dramatic changes in cytoskeletal architecture (49; also, see Ref. 16), and additionally, in an activity specific to multicellular organisms, induced polarized cell movement (47). Studies with mutagenized forms of Pak indicated that some of these changes in morphology and motility were independent of Pak1 kinase activity, suggesting the protein has multiple functional domains (49, 47). This led to the search for interacting partners utilizing copurification and two-hybrid analysis, which have begun to connect Pak to a web of interacting proteins (J. Chernoff, personal communication). These results can be compared and contrasted with the information available about the now well-studied STE20 protein in S. cerevisiae. Similarly, studies of Pak paralogs (genes of related, but not homologous function) such as the Mst kinases (13, 20) allow comparison of the biological function of closely related, but distinct regulatory genes. Finally, isolation of the Pak genes of worms and mice provides the tools with which to generate loss of function models. These knockout animals will be invaluable for understanding the physiological role of Pak proteins in humans. These experimental approaches (degenerate PCR, analysis of a candidate gene in the context of a defined family in one or more simple eukaryotes, followed up by examination in more complex organisms) have much to offer smaller laboratories striving to obtain as much information as possible before the creation of more costly reagents such as knockouts or transgenics.
|
Genomic conservation profiling as a guide to physiological significance.
To date, the genome projects have provided complete sequence information for >6,000 S. cerevisiae ORFs and >19,000 C. elegans ORFs, whereas the number of human genes may approach 100,000 upon the completion of genome sequencing. On the simplest level, the presence of completed sequences, available as full-length clones, is already and will increasingly be a great time saver for investigators. On the largest scale, systematic comparison across full genomes can provide insight into the nature of metazoan vs. single cell life (e.g., Ref. 10). In the middle ground, the available information can be mined in a third way, to nominate promising candidates for reverse genetic (gene to function, rather than function to gene) analysis. This will facilitate sorting those genes likely to be of significant function from those less vital for the life of an organism.
For example, a novel gene or class of genes "Z" may be of interest because overexpression of the gene/class leads to an important biological phenotype (cancer, altered development) or because sequence comparison indicates that it is related to a class of genes with essential function. Nevertheless, targeted deletion of Z1 may lead to no obvious phenotype, because the function of Z is redundantly specified by the Z1, Z2, and Z3 family; or because Z1 may be a relatively recent arrival to the Z family (e.g., arising by imperfect duplication) and in fact has no current significant function in the organism. Such inability to obtain an assayable loss of function can be a significant limitation to many types of genetic and biochemical analysis. Furthermore, even in advance of completed sequence databases, it was clear that particular families of metazoan genes were likely to be very large within a single species, including homeobox proteins (5, 6), chemosensory receptors (55), and others, making redundancy a significant potential problem. Although much information will ultimately be derived from analyzing the summed phenotype of a z1 z2 z3 deletion mutant, significantly fewer resources would be required to get a basic understanding of Z activity if a Z family member could be identified with a nonredundant function, as only a single deletion would be required to obtain a manipulatable phenotype. How might this most intelligently and effectively be accomplished?
The Sluder laboratory at the University of Georgia, Athens, has had a long-term interest in the nuclear hormone receptor (nhr) superfamily in the genetically amenable model organism C. elegans (Fig. 2). They began attacking the problem of identifying nhr candidates using a degenerate PCR approach (52), similar in concept to that noted above for STE20/PAK genes. Simultaneously, they monitored the C. elegans genome project as it moved toward completion, searching the database reiteratively with the conserved nhr DNA binding domain to identify more than 260 putative nuclear hormone receptors (Ref. 53; and A. Sluder, personal communication). Further analysis of the cDNAs and their predicted splicing patterns coupled with the representation of these sequences in expressed sequence tag (EST) databases further defined these putative ORFs as in vivo expressed nhrs. Although C. elegans is the "flagship" nematode used as an animal model, sequencing of a sibling species of worm, C. briggsae, is also under way, with the goal of facilitating comparison of both protein coding and regulatory regions (information at http://genome.wustl.edu/gsc/). Comparison of identified C. elegans nhrs with entries to date in the C. briggsae database, as well as nhrs defined in flies and vertebrates, revealed the presence of evolutionarily conserved orthologous classes of nhr but surprisingly indicated that a large proportion of the nhrs (>90%) were nematode specific. The degree of divergence indicated they would not have been identified by degenerate PCR or hybridization-based approaches, demonstrating an initial advantage of the database-rooted approach.
|
Finally, although a primary goal of the database project was to identify important targets for functional analysis, detailed analysis of the nhr database yielded significant unanticipated benefits. Most intriguing of these was a discovery bearing on the central area of nhr structure and regulatory control. Nhrs have a well-characterized structure, including a carboxy-terminal ligand-binding domain that regulates function, as well as their conserved amino-terminal Cys2-Cys2 zinc finger DNA binding domain (27). However, some members of the nhr superfamily conserve this characteristic DNA binding domain but lack a regulatory ligand binding domain. Scrutiny of the assembled C. elegans nhr data set for the first time allowed the identification of a third group: nhrs that appear to have a carboxy terminus which is intermediate between ligand-binding and non-ligand-binding carboxy-terminal domains (53). This finding promises to yield significant insights into the evolutionary processes governing the formation of the ligand binding structure, a problem of keen interest to the pharmacological industry in its quest to develop designer drugs. This discovery raises the intriguing biological prospect that some nhrs may in fact be governed by as yet undiscovered classes of nonhormonal small molecule ligands.
Further study of both conserved nhrs and divergent nhrs will inform and guide our understanding of the role of nhrs in multiple species. The exploitation by single small research groups of the genomic information available for a directed search and the further refinement of that search utilizing biological research can clearly productively transform the genomic sequence information into a series of niche databases for study by individuals.
Cross-species functional complementation, molecular modeling, and web building.
The previous two examples commence with an interest in a particular gene or genes and proceed outward to an understanding of process. Alternatively, the starting point may be interest in a particular biological process, with the goal of identifying and understanding the interactions of genes involved in mediating the process. In some organisms, this process-oriented goal can be readily pursued using powerful genetic means. For example, yeast can be treated with mutagens, and colonies displaying specific phenotypes of interest can then be readily selected. In other more complex organisms, use of genetic approaches is intrinsically more difficult. One option that exploits the growing knowledge of the evolutionary conservation of central regulatory pathways is to speculatively cross species boundaries. One approach we have utilized is expressing genes from more intractable organisms (e.g., humans) to induce phenotypes in genetically manipulatable organisms (e.g., yeast) as a screening mechanism by which to nominate candidates of interest for study.
The starting point for our laboratory in 1993 was an interest in the mechanism by which control of cell morphology is coordinated with cell cycle control, particularly insofar as these processes are targets of dysregulation in mammalian cancer (Fig. 3). Toward developing a better understanding of this process, we wished to identify genes involved in the specification of cell shape and cell cycle. To identify such genes, we chose to express a human cDNA library in yeast and to screen for genes that altered the pattern of yeast budding. This phenotype (pseudohyphal growth) is easily assessed by visually inspecting the shape of the yeast colonies. This change in colony morphology is known to result from cellular hyper-elongation, altered polarity of cytoskeleton during cell division, changes in growth-related signal transduction, and extension of particular (G2) cell cycle compartments (18, 26, 39).
|
Genes nominated as candidates in our screen were compared with genome databases to determine completeness; available EST sequences were used to facilitate cloning of full-length sequences and to predict functional domains. This preliminary information would be used to design two-hybrid screens that identify mammalian partner proteins predicted to interact with our candidates and suggest spheres of activity. In some cases, where analysis of amino acid primary sequence information was relatively uninformative, we were able to make use of molecular modeling resources, to gain insight into possible function of candidates or their interactors based on predicted structures. Simultaneously, we could perform relatively simple tests in yeast to determine which of the genetic pathways associated with induction of pseudohyphal growth were being specifically targeted by overexpression of our candidates. Guided by the sum of the sequence analysis, predicted partners, and targeted genetic tests, it is possible to more rapidly design informative experiments considering the candidate gene in its native environment. Using such a strategy, we have been able to identify and characterize the novel human enhancer of filamentation (HEF1) protein as a signaling intermediary that connects information about cellular morphology and attachment at focal adhesions to control of progression through mitosis (30, 31, 41). Although clearly an important central element in these control pathways and targeted in cancers (14), HEF1 overexpression does not independently induce cancer, emphasizing the value of our approach. Other genes isolated by yeast screening clearly fall into similar categories of regulators of signaling and actin cytoskeleton (unpublished results), as anticipated, whereas others are still of unclear function. With the help of the available genomic resources, we have not ended up "stranded" but have been able to gain intriguing insights into cellular physiology. Finally, based on our experiences studying HEF1 and other genes using genome-oriented technologies such as the yeast two-hybrid system (17, 21), we were able to identify specific areas where it would be of interest to modify and improve the core technology. This has led us to spawn a technological project developing a dual-bait two-hybrid system (Ref. 51; also see Ref. 24) that would be more specific than the original. This improved two-hybrid analysis allows us in concept to screen libraries for proteins interacting with HEF1 vs. additional closely related family members (e.g., Refs. 45 and 22). This again bears on the issues of gene duplication and redundancy raised in earlier sections and emphasizes a final theme: with the application of genomic technologies to biological problems, the biological systems will in turn drive the evolution of the technologies.
Over the waterfall.
Clearly, this sampling of the utilization of genomic information to academic laboratory science reflects only a portion of the potential applications available. Other studies have performed comparative analysis of a single chromosomal region across species to determine the transferability of genomic information between species (36) or focused on tissue-specific expression (e.g., Ref. 3). One unique aspect of the field of genomics which increases the diversity of applications is that the majority of the sequence information and its analysis is accessible on the World Wide Web. The journal Nucleic Acids Research annually devotes its entire January issue to the publishing of a compendium of data bases. Additional web resources are listed in publications directed at specific groups of researchers (4, 23, 46, 54, 57), and excellent reviews continue to summarize technologies as they come available (e.g., Refs. 11 and 40). As the sources of information proliferate, sites which are designed to integrate genomic information will also become more prevalent (for example, GeneCards; http://bioinformatics.weizmann.ac.il/cards/). These freely available resources provide researchers with an appealing challenge: to synthesize the information in a meaningful way and to employ it creatively to their best advantage. In this context, it seems likely that a major need of most researchers will not be a requirement for additional technology but, instead, will be the establishment of collaborative interactions with colleagues in different subfields of biological research, to avoid misusing or misinterpreting the floods of arriving data due to unfamiliarity with specific biological systems.
However, there are only ~100,000140,000 genes in humans. It is not unreasonable to suggest that over the next 510 years, all will be sequenced, most grouped into associative networks by two-hybrid (38) or mass spectrometry (28) genome projects, and most or all characterized for expression profile on the RNA and protein level on a tissue-by-tissue (or even cell-by-cell) basis. Against this background, what will an "interesting" experiment become? It seems likely that the current emphasis on gene discovery and understanding control networks in processes such as cell cycle, signal transduction, and apoptosis will fade as most of the main control proteins are well defined by a combination of the large projects and small-scale research efforts including strategies such as those outlined above. Some research areas (e.g., study of intracellular vesicle trafficking, human neurobiology) that are not yet generally approachable by large-scale approaches will continue to be exciting and will be organized along current research paradigms, but enhanced by novel imaging technologies. For the new millennium, two trends may predominate. First, the sum of knowledge developed over the next decade should make it possible to approach problems of physiology previously inaccessible, in areas not entirely considered within the aegis of molecular biology at this time. As a single example, major psychosocial studies currently investigate problems such as the role of stress or depression in determining outcomes for cancer treatment or incidence of specific diseases. In the absence of convenient and noninvasive ways to explore the mechanisms of such possible effects, these studies remain focused on psychology and statistics, peripheral to basic science studies of disease progression. In the near future, use of RNA prepared from T cells of psychologically defined patient groups to probe gene chips may lead to surprisingly quantifiable changes in immune responses. Or no significant effects may be found; but at least the experiment can be done, by building bridges from basic scientists to clinicians as has been urged for at least a decade by proponents of "translational research." Second, the prospect of databases containing overwhelming sets of information may be unnerving and demoralizing to "bicycle shop"-scale investigators (many scientists, personal communications). The sense may prevail that there is now a limit to the ability of an individual to make a unique and creative contribution to human understanding, which is, after all, the reason many are drawn into science in the first place. However, while some fields of discovery may be becoming crowded, the field of invention is always by definition wide open, and bicycle shops are the proverbial locale in which to dream up novel technologies. It seems likely that much of the displaced investigative effort may naturally move in the direction of applied biology, with great promise for gene therapy and fields as yet unimagined. Excitement is where you find it, or make it.
ACKNOWLEDGMENTS
Address for reprint requests and other correspondence: E. A. Golemis, W422, Fox Chase Cancer Center, 7701 Burholme Ave., Philadelphia, PA 19111 (E-mail: EA_Golemis{at}fccc.edu).
FOOTNOTES
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
REFERENCES
|
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Visit Other APS Journals Online |