Structural and functional genome annotation Should we submit annotation to Genbank/EMBL? Old annotation from Arabidopsis which was submitted to EMBL/Genbank is still being used as reference, even though it is completely outdated. -Community wants to have annotation as soon as possible. -User wants a common standard of annotation on all chromosomes -All? sequence generators want a hand in performing annotation So we decide: -annotation will not be submitted to EMBL/Genbank -consistent annotation will always be available in a timely manner at the best quality reasonably achievable at the time through a central portal, e.g. SGN -all partners will be involved in the annotation of their sequences, implementing common pipelines/standards in collaboration and thus producing comparable results Functional annotation: -similarity-based approaches not sufficient, context, comparative and functional genomics information needs to be included and will provide much better coverage as more and more data becomes available. -most significant contribution would come from literature-based curation -annotation through external experts should be considered -functional annotation should be topic-based (e.g. solanaceae-specific genes) or family-based -curator positions are available at Imperial College, London (1) and Cornell (1) -Annotation procedures are to be discussed based on commonly available standard operating procedures and so on. Web presentation -We would like a consistent dataset available to the public, and a one-stop access to all data. However every project will want to have their own website, so "mirrors" or rewrapping of the genome data will happen there. Every website should show the same content, with clear references to sources and credit. Synchronization of the underlying must be guaranteed, to insure no inconsistencies/differences between the different portals. Quality control Most important products for quality control are BAC assembly and whole genome (pseudomolecule/chromosome arm) assembly. Set up a list of standard procedures that need to be applied to all BACs and sequences. ¥ e.g. comparison of virtual restriction digest based on finished sequence with in-gel digest ¥ Compare to AFLP markers once they become available. However, with the currently available dataset (one or two enzyme combinations), BACs can only be covered partially. ¥ Send the reads from two BACs from Wageningen to all BAC assembly groups to check their assembly pipelines, as Wageningen had very bad experiences when sending these out to be assembled elsewhere. Selection of next BACs: mainly through BAC end sequences. Seed BACs are selected based on OverGo probes and FPC contigs, choosing the longest BAC in a FPC contig. About 50 k BAC end sequences from HindIII library will be available till the end of the year. SGN will try to evaluate the performance of different sequence companies. It would be useful to have the chloroplast and mitochondrial genome sequence. Maybe also FISH the chloroplast/mitochondrial genome onto tomato chromosomes. The USA project has been asked to start sequencing on the eu/heterochromatin border.