Cross-species genomic sequencing

  1. Selection of BACs for sequence analysis: Multiple probes are used to identify each genomic interval from various species to ensure that the isolated BACs span the entire gene of interest and the flanking regions. Isolated BAC clones are restriction digested to determine the relative position of each clone within a contig. Size of the BAC is also determined for the estimation of sequence coverage.

  2. We have adapted the sequencing protocols and the quality control steps from the Joint Genome Institute. These protocols have been through a vigorous testing and have consistently produced high quality sequences in a large scale sequencing environment. The QC of the three major steps are briefly described below.
  3. Quality control of subclone library construction:
    1. The concentration of large DNA preps is determined by both fluorometer reading and fluorescent intensity of ethidium bromide staining on gels.
    2. BAC DNA is mechanically sheared to the desirable size range (i.e. 2.5-3.5 kb), and sheared DNA fragments are repaired and size selected from an agarose gel. The sizes of purified DNA are checked on a gel before and after ligation to a pUC18 vector.
    3. Colonies are arrayed into 96-well plates using a picking robot. The quality of the libraries is determined by sizing and sequencing a plate of 96 subclones. A library containing less than 5% of non-recombinants and E. coli DNA is considered acceptable. We also ensure that over 90% of the insert sizes do not differ from the expected size (2.5-3.5 kb) by more than 20%.
  4. Quality control of shotgun sequencing:
    1. Subclones picked into 96-well plates are incubated at 37C for 20 hours and visually inspected to ensure no empty wells.
    2. Plasmid DNA is prepared using the rolling circle kit (Amersham) and an aliquot of the first 96 samples from each new log of chemicals are examined on a gel.
    3. Sequencing reactions are performed using the ABI BigDye terminator and run on an ABI PRISM 3700 sequencer. Sequence quality is estimated by Phred, which provides the probability of error for each base. A quality file is generated for each sequence read. A read containing over 70 bases of Q20 (1 possible error in 100 bases) or higher accuracy is considered successful.
    4. For a run of 96 samples, we expect to have a pass rate of greater than 80% and an average read length of over 500 bases. A run below this quality level results in an investigation of each sequencing reagents and tools.
    5. Each BAC is sequenced to at least 6-fold coverage to achieve ordered and oriented contigs.
  5. Quality control of sequence assembly:
    1. Pair-end sequence information is used by Phrap to produce ordered and oriented contigs.
    2. Several approaches are used to check the quality of the contig assembly. These approaches include cDNA sequence matches, available orthologous sequences, and the use of another sequence assembly program called Paracel Assembler CAP4.
    3. Almost all BAC sequences produce complete ordered and oriented contigs.