Robin Buell Potato Genomics Session. Funded for 3 years, $3 million. sequences 50 BACs the first year, 250 BACs the second year and 270 BACs the third year at 8x. fiberfish gaps. Year 1: BAC end sequencing 70,000 BACs and FISH. Denmark not funded. Austria not funded. Peru and South American countries seek funding. Britain seeks funding. Tuesday, Aoki Koh full length cDNA for microtom metabolite annotation EMS mutant line Genomic database Mutant database 66500 EST sequences -> 16,000 unigenes full length sequence for clones that don't have any Arabidopsis homologue 676 single pass sequencing, using primer walking 387 additional ones in pipeline re-blasting 320 full length cDNA against arabidopsis shows 83 have still no similarity to Arabidopsis 49376 ESTs submitted to Genbank. Will be available in November 2006. Mass spec work. 7000 peaks of metabolites were detected. Annotation not done. 58 flavonoids identified. Database of pathways and metabolites. Todd Vision map: 1000 markers around introns, overgos bac libraries: 11x bac library, 10x bac library. Sanghyeob euchromatin acutally 23MB, not 26MB as estimated on SGN FISH to estimate euchromatin. BAC extension FISH of extended BACs confirms location of extension BAC. Big gap (2.1MBases). How to fill it. China (Eileen) First 20 seed BACs genome wide physical map (FPC). PCR screening of the anchor BACs short arm- gap of 20cM long arm - good seed bac distribution T0072 in heterochromatin boundary defined at the telomere http://tomato.genetics.ac.cn/TomatoFPC/ down to 3000 contigs PCR screening of positive BACs 958 markers Christine - Mapping Karen - finishing 193 BACs total (80% by end of the year). HTG - EMBL FISH - confirms essentially in euchromatin, heterochromatin problematic Fingerprinting - 43,000 FP from MboI, incorporated into AGI build of HindIII FPC ftp://ftp.sanger.ac.uk/pub/tomato/map Karen. assess patterns in sequence etc. dotter dot plot used to QC repeats and integrity of BAC assembly. Jiten Khurana. 5. BAC selection guidelines >100kb in size end sequence available purity check of bacterial stock PCR amplification of genetic marker BAC verfication by size estimation Zamir lines sequenced 2 BACs on chromosome 7 (clones were not verified prior to sequencing) 3 BACs submitted to SGN/NCBI 22 BACs clones Roeland. extension BAcs: multiple hits, repeats 15 BACs in SGN, 24 to be released on Aug 1. start sequencing to phase 3 with EUSOL. testing 454 BAC sequencing Cyrill2 annotation pipeline system Erika Kazusa. Finished 21 bacs (2,5MB). Total 42 currently in pipline or finished. Problem: Marker disappeared in the finished BAC. Was lost during sequencing process. Lotus genome project: shotgun sequencing to fill the gaps. Farid. number of markers 237. 277 BACs estimated funding secured. finished by end of 2008. Annotation 2009. Toni. few markers in euchromatic region... ~40cM gap sequenced ~12 seed bacs. Not all could be extended. geneid program trained Silvana. 18 seed BACs BacEnds Extenstion v0.1 http://tomato.cribi.unipd.it/ http://biosrv.cab.unina.it/ Discussion. Question: How to estimate when finished? SGN->stats page how many unigenes are inlcuded in the BAC Discussion at lunch: Karen: minimum sequence overlap 2k TGP (tiled genome path) AGP file (Accessioned Golden Path) sturcture: chromosome, coords, coords, sequential number, N or F (finished or not) acccession, type (contig or clone), direction To do: Sequence finishing workshop by Sanger -- any interest? Rene to contact Syngenta for their additional markers. Remy: 2.7 full length gene per BAC in 60 hand-annotated BACs. tomato has longer introns than Arabidopsis (300bp vs 100bp) 500 gene models needed 160 BACs needed to obtain Web service adapter for Apollo. www.fruitfly.org/annot/apollo/ use has to fill in config file implementation generic Naama: Alignment Heiko: EST alignment rules for training set 5' prime, exon coverages, 3' prime covered by ESTs To be added to Guidelines: o versioning of bac-based gene names, consistent as far as possible o three letter codes for BAC libraries o rules for the training set (guidelines for hand-annotating). - SGN to distribute BACs (2 BACs overlap between two annotators). - done by end of August - everyone does their own alignments - 5' prime, all splice sites, 3' prime covered by ESTs - start and stop sites defined - all internal introns defined - non-standard splice-sites below 5% - no hit to known repeat - if good protein hit, good coverage and alignment - no conflicting EST alignments - not closer than 500 bp to end of reliable sequence (end of bac, etc) - remove redundancy from training set - send email with annotations to SGN - some utr to ensure full length gene - mark start and end of genomic sequence - check for poly-A signal List of annotators: - Naama - Daniel - Remy - Erwin - Cheol-Goo - (Todd -maybe?) - Heiko and/or Anika - Maria Luisa - India? - Spain? - Japan? - China? o make wiki out of it o AGP and TPF format description and how to submit o every chromosome coordinator is responsible for producing these files and submitting them to SGN. o Annotation. October meeting will define best gene predictor. Best gene predictor will be used. Keep evaluating gene predictors. o BAC-based annotation for now. Chromosome based later. o Each step is centralized at a location, but different steps may be at different locations. o recalculation of all annotations as required by updates.