There are 3 different datasets of proteins for an unigene build. + Proteins predicted by longest6frame (a SGN script that translate the sequence in the 6 ORF and get the longest) + Proteins predicted by estscan (http://www.ch.embnet.org/software/ESTScan2.html) + Proteins preferred (for each unigene, compare both methods and get the longest protein) For each dataset, exists two files (cds and protein). Also is provided a version of the preferred protein dataset with annotations compatibles with ProteinPilot program. (NOTE: This is the protein prediction analysis for the Solanum melongena unigene build #2, previous Solanum melongena unigene builds have not protein prediction analysis) ---------------------------------------- Report: ---------------------------------------- Files: * cds sequences: - cds fasta file: Solanum_melongena_cds_predicted_by_estscan.v1.fasta - number of sequences: 1719 - total bases: 478497 - average sequences length: 278 - maximum sequence length: 1197 - minimum sequence length: 51 * protein sequences: - protein fasta file: Solanum_melongena_protein_predicted_by_estscan.v1.fasta - number of sequences: 1719 - total aminoacids: 157753 - average sequences length: 91 - maximum sequence length: 398 - minimum sequence length: 16 * cds sequences: - cds fasta file: Solanum_melongena_cds_predicted_by_longest6frame.v1.fasta - number of sequences: 1930 - total bases: 519171 - average sequences length: 269 - maximum sequence length: 1227 - minimum sequence length: 87 * protein sequences: - protein fasta file: Solanum_melongena_protein_predicted_by_longest6frame.v1.fasta - number of sequences: 1930 - total aminoacids: 172325 - average sequences length: 89 - maximum sequence length: 409 - minimum sequence length: 29 * cds sequences: - cds fasta file: Solanum_melongena_cds_predicted_by_preferred.v1.fasta - number of sequences: 1841 - total bases: 540966 - average sequences length: 293 - maximum sequence length: 1227 - minimum sequence length: 87 * protein sequences: - protein fasta file: Solanum_melongena_protein_predicted_by_preferred.v1.fasta - number of sequences: 1841 - total aminoacids: 179304 - average sequences length: 97 - maximum sequence length: 409 - minimum sequence length: 29 ----------------------------------------