This is the canonical repository for BAC sequences produced as part of the International Tomato Genome Sequencing Project. For more information about the project, see http://www.sgn.cornell.edu/about/tomato_sequencing.pl . For the latest assembled tomato genome contigs, see ftp://ftp.sgn.cornell.edu/tomato_genome/contigs/ . CONTENTS OF THIS DIRECTORY * bacs.v*.seq FASTA-formatted sequences of all submitted BACs * bacs_repeatmasked.v*.seq same as above, but with repetitive sequences masked with RepeatMasker, see http://sgn.cornell.edu/gbrowse/gbrowse/tomato_bacs/?help=citations#RepeatMasker * bacs_accessions.v*.txt list of GenBank accessions for BAC sequences * finished_bacs.v*.seq sequences of only finished BACs * finished_bacs_repeatmasked.v*.seq repeat-masked sequences of only finished BACs * finished_bacs_accessions.v*.txt list of GenBank accessions for finished BAC sequences * validate_submission.pl perl script used by sequencing centers to check the format of a BAC tar.gz file before uploading. * 2764_finished_bacs_htgs3.fas Includes 2764 non-redundant phase 3 BACs from finished_bacs_accessions.v676.txt and NCBI. Used for SL3.0. chr##/ - materials for BACs on chromosome ## nonstandard/ - contains additional data that do not conform to formatting guidelines A NOTE ON SEQUENCE AND FILE VERSIONING There are two systems of version numbering in place in the SGN BAC repository, one for files, and one for BAC sequences. File versions track the history of each file in the BAC repository. When a change is made to a file, its version number is incremented and a copy of the previous version of the file is stored in a subdirectory called old/. For example, if a new version of the file C01HBa0216G16.1.v2.tar.gz is published, the new file will be called C01HBa0216G16.1.v3.tar.gz and the old version of the file will be archived in old/C01HBa0216G16.1.v2.tar.gz.. Using this system, a full version history is kept for all files in the BAC repository. Sequence versions track the history of each BAC sequence. It is incremented only when a change is made to the sequence. For example, in C01HBa0216G16.1.v2.seq, the first line might be '>C01HBa0216G16.1', indicating that this is the first submitted version of the sequence for the BAC C01HBa0216G16. If an update to this BAC is submitted with a different sequence, the seqquence version for the BAC would be incremented, resulting in a file named C01HBa0216G16.2.v1.seq, with a first line of '>C01HBa0216G16.2'. On the other hand, the sequence version will NOT change if an update is made that does not change the sequence, such as changing annotations or adding supporting data. For BACs whose sequence is still in pieces, a given sequence version applies to the set of pieces, not each individual piece. For example, suppose that C01HBa0088L02 is first submitted to SGN in three pieces. Its sequence file, C01HBa0088L02.1.v1.seq, will contain the fragments, '>C01HBa0088L02.1-1', '>C01HBa0088L02.1-2', and '>C01HBa0088L02.1-3'. Then, if an update is made changing the sequence of one fragment, the sequence version is changed for all the fragments, making: '>C01HBa0088L02.2-1', '>C01HBa0088L02.2-2', and '>C01HBa0088L02.2-3'. If a third update then has a single, finished sequence, the identifier for that would then be '>C01HBa0088L02.3', since it's the third version of this BAC's sequence. -- Please note that this is a working document, subject to change at any time without notice.