About the tomato unigene build 2

May 2008

A new unigene build for tomato has been assembled from the following data:
  • 323,277 ESTs from the tomato species
    • Solanum lycopersicum with 307,350 sequences
    • Solanum habrochaites with 8,255 sequences
    • Solanum pennellii with 7,812 sequences
    • Solanum pimpinellifolium with 8 sequences
    • Solanum peruvianum with 42 sequences
    • Solanum cheesmaniae with 4 sequences
    • Solanum lycopersicoides with 2 sequences
  • New EST sequences were obtained from:
    • GenBank database (dbEST and mRNA for nucleotide)
  • The new build contains 42,257 unigenes, of which 24,020 are contigs and 18,237 are singletons.
  • Analyses performed on the unigenes:
    • ESTScan and Longest6frame.pl - to predict peptides (39,967 and 43,366 peptides predicted respectively)
    • InterproScan on peptides - to predict protein domains and associate Gene Ontology codes (6,626 and 1,482 different domains associated to the two different peptide datasets from the two different peptide prediction methods)
    • BLAST against Genbank NR, Arabidopsis and Swissprot (30,791, 28,656 and 19,886 unigenes have any match with these protein datasets respectively)
  • The range of unigene ids for this build is: SGN-U562593 through SGN-U604849.
Different ways to access to new tomato species unigene build in SGN:
  • Sequence homology search using SGN Blast.
  • Bulk download for a unigene accession (or list of accessions) using SGN Bulk download tool.
  • Complete download of all the unigene sequences and annotations from the SGN ftp site.