Transcriptome sequences of the wild tomato species, S. peruvianum Soon-Ju Park, Ke Jiang, Michael C. Schatz, and Zachary B. Lippman Cold Spring Harbor Laboratory The research groups of Michael C. Schatz and Zachary B. Lippman at Cold Spring Harbor Laboratory in New York have generated transcriptome sequences of the green-fruited wild tomato species Solanum peruvianum. These transcriptome data provide a new resource for biological discovery on tomato development and evolution. The transcriptome sequences were generated using Illumina sequencing technology consisting of paired-end 50 base pair reads, which resulted in an estimated 5 to 1000-fold transcript coverage, depending on the expression levels. In addition to establishing the transcriptome using the open source de novo mRNA assembly algorithm Inchworm (inchworm.sourceforge.net), reference-guided cDNA reconstruction was also conducted using S. lycopersicum cv. Heinz as a reference. Both approaches provided high quality transcriptome reconstructions, and provided below are a selection of statistics resulting from the de novo assembly and reference-based cDNA reconstruction. Only those contigs from the de novo assembly that match known annotated genes are provided. de novo assembly: number of contigs: 177863 transcriptome size: 61852880 N50: 633 bp minimum length: 100 bp maximum length: 15880 bp mean length: 347 bp median length: 156 bp reference-based reconstruction: number of reconstructed cDNAs: 17430 transcriptome size: 23325040 bp minimum length: 54 bp maximum length: 15320 bp mean length: 1338 bp median length: 1139 bp Initial usage of the data has indicated excellent coverage and depth: the number of genes captured in S. peruvianum is very close to the the number of genes captured in a similar experiment completed from domesticated tomato. Preliminary analysis indicates that greater than 75% of the S. peruvianum raw sequence reads align to S. lycopersicum cv. Heinz, suggesting high conservation in coding regions between two tomato species. Initial estimate of divergence in coding regions between the two species found 834 genes with identical coding regions (zero divergence), a mean divergence of 0.00704 and median divergence of 0.005522 (under the Jukes-Cantor model for multi-substitution corrections). We are pleased to provide to the Solanaceae and broader plant biology communities these pre-publication transcriptome sequences of the S. peruvianum. Both data sets, the de novo assembled transcriptome, and reference-guided reconstruction of cDNAs, are free for the community without any restrictions in their use. For further information, please contact Z.B. Lippman (lippman@cshl.edu)