About the US Tomato Sequencing Project
Project Manager and Educational Outreach Coordinator
Joyce Van Eck
Scientific objectives and approaches
The tomato genome is comprised of approximately 950 Mb of DNA -- more than 75% of which is heterochromatin and largely devoid of genes. The majority of genes are found in long contiguous stretches of gene-dense euchromatin located on the distal portions of each chromosome arm. As part of an international consortium, these gene rich regions of the tomato genome will be sequenced using a minimal tiling path approach. The US project is geared towards establishing the foundations for sequencing by establishing 2 additional BAC libraries, obtaining BAC end sequence (400,000 reads) and sequencing a sheared library. The Sol Genomics Network (SGN), an organism database devoted to the genomics of solanaceous species, will be expanded to accommodate and incorporate all of the sequencing, annotation and mapping information for all 12 tomato chromosomes and begin integrating SGN with other databases through a series of shared, common software and algorithms so as to create a network of plant genomic information. Currently, the US has been assigned chromsomes 1, 10 and 11 for full sequencing in a follow-up project.
Broader impact of the project
Sequencing the tomato genome is the cornerstone of a larger international effort: "The International Solanaceae Genome Project". The goal is to establish a network of information, resources and scientists to tackle two of the most significant questions in plant biology/agriculture:
- How can a common set of genes/proteins give rise to a wide range of morphologically and ecologically distinct organisms that occupy our planet?
- How can a deeper understanding of the genetic basis of plant diversity be harnessed to better meet the needs of society in an environmentally-friendly and sustainable manner?
The family Solanaceae is ideally suited to answer both of these questions for reasons that will be enumerated in this proposal. Immediate application of the tomato genome sequence to other solanaceous species is possible since the tomato genome is connected to these other species by comparative genetic maps and the level of microsynteny appears to be well conserved with respect to gene content and order. Finally, because the Solanaceae represents a distinct and divergent sector of flowering plants, distant from Arabidopsis, Medicago and rice, the tomato genome sequence will provide a rich resource for investigating the forces of gene and genome evolution over long periods of evolutionary time.
The mission of our educational outreach program is to provide research-training opportunities in computational genomics to undergraduates and high school students. By offering hands-on training in computational genomics to these students, we hope to expose them to the nature of genomics information/datasets and the myriad of fascinating biological questions that can be addressed through the application of computational tools to genomics information.
The introduction of high capacity DNA sequencing has changed forever the nature of life sciences research. No longer are biologists limited by the ability to collect genetic information, but rather they are limited by the ability to turn this information into discovery. Organizing, storing, curating and extracting biological insights from these data is the central challenge facing biology today. To meet this challenge, we must attract and train students who are mathematically and computationally savvy and yet have attained a level of biological intuition that can lead them to tackling important biological questions. These students will have as part of their undergraduate training hands-on experience in computational genomics. Their research experience will also be supplemented through a number of new course offerings in computational biology now being offered at Cornell (or to be offered soon) as well as a genomics minor that is now being developed at Cornell.
Student Research Opportunities
Undergraduate positions are available at the SOL Genomics Network, a database for genomic information of the nightshade plant family, which includes important crop species such as tomato, potato and eggplant.
We are seeking highly motivated individuals with strong interests in computers and biology to work on different bioinformatics problems, including web-programming of new tools for plant scientists and designing and implementing relational databases for genomics applications.
Knowledge of Perl, SQL, and Linux, BSD or other UNIX-like operating systems are desirable, but not required.
Hourly paid positions and honor student positions are available. Work-study students are encouraged to apply. To apply, please send a summary of prior experience/interests, list of relevant course work and names/e-mail addresses of at least 2 references by e-mail to: Joyce Van Eck, firstname.lastname@example.org.