There are 3 different datasets of proteins for an unigene build. + Proteins predicted by longest6frame (a SGN script that translate the sequence in the 6 ORF and get the longest) + Proteins predicted by estscan (http://www.ch.embnet.org/software/ESTScan2.html) + Proteins preferred (for each unigene, compare both methods and get the longest protein) For each dataset, exists two files (cds and protein). Also is provided a version of the preferred protein dataset with annotations compatibles with ProteinPilot program. (NOTE: This is the protein prediction analysis for the Solanum lycopersicum unigene build #2, previous Solanum lycopersicum unigene builds have nit protein prediction analysis) ---------------------------------------- Report: ---------------------------------------- Files: * cds sequences: - cds fasta file: Solanum_lycopersicum_cds_predicted_by_estscan.fasta - number of sequences: 38696 - total bases: 25475376 - average sequences length: 658 - maximum sequence length: 5235 - minimum sequence length: 51 * protein sequences: - protein fasta file: Solanum_lycopersicum_protein_predicted_by_estscan.fasta - number of sequences: 38696 - total aminoacids: 8491792 - average sequences length: 219 - maximum sequence length: 1745 - minimum sequence length: 17 * cds sequences: - cds fasta file: Solanum_lycopersicum_cds_predicted_by_longest6frame.fasta - number of sequences: 41949 - total bases: 23836881 - average sequences length: 568 - maximum sequence length: 4395 - minimum sequence length: 51 * protein sequences: - protein fasta file: Solanum_lycopersicum_protein_predicted_by_longest6frame.fasta - number of sequences: 41949 - total aminoacids: 7934597 - average sequences length: 189 - maximum sequence length: 1465 - minimum sequence length: 17 ----------------------------------------