PGA WorkshopProgramApplication formcourse locationHousingBerkeley PGA web sitecontact us

Exercises of using UCSC Genome Browser

  1. This exercise leads you to use sequence information to search for genomic targets and corresponding large insert clones for functional studies. What are the human chromosome 11 genomic coordinates (in May, 2004 assembly) that are flanked by the following primers. Describe the feature of the target sequence amplified by these primers. Please find all BAC clones that contain this gene. How do you find the contacts of clone distributors?

    Primer 1: 5'-GAGAAGGCTCTATACGGACACACC -3'
    Primer 2: 5'-AGGCATAGAAGCGAGGTCCTTCA-3

    Hint: use the BLAT function in the browser.

    Answer: chr11:116,209,966-116,213,925. These primers should amplify last exon of the APOA1 gene. The BACs include CTD-2640P19, CTD-2530B14, RP11-728G20, RP11-1147E8, CTD-3009A19, CTD-2149O15 an RP11-599J10. A list of clone distributors can be found at http://www.ncbi.nlm.nih.gov/genome/clone/distributors.html

  2. This exercise shows the basic mean of retrieving genomic data using Table Browser. What are the genomic coordinates of the APOE gene exons and coding start/end sites?

    Hint: use the table browser.

    Answer: exon1: chr19:50,100,878-50,100,938
    exon2: chr19:50,101,698-50,101,764
    exon3: chr19:50,102,856-50,103,049
    exon4: chr19:50,103,629-50,104,489
    coding sequence starts at 50,101,721 and ends at 50,104,347 and the gene transcribes on the plus strand

  3. This exercise shows an example of using intersect function in the Table Browser to extract data from two tables. Go to the July 2003 genomic interval that contains the human ABCA1 gene (chr9:102,923,120-103,070,274) and find all the sequence variations (SNPs and indels) in this interval. How many variations did you find? How many are located in the coding region (coding SNP or cSNP)? Can you get 30 bases of sequence flanking the cSNP? The current release of the SNP data in the UCSC Genome Browser is dbSNP Build 120, but the NCBI has recently released dbSNP Build 122. How many more cSNPs were added to this region?

    Hint: use the table browser and the July 2003 assembly. Use the intersect function to locate cSNP. For the dbSNP question, please use http://www.ncbi.nlm.nih.gov/SNP/, search for ABCA1, and then select geneview to view cSNP

    Answer: a total of 630 variations. Of them, 17 are coding polymorphisms. Sequences are shown below. 10 more cSNPs were added.

  4. This is another example of using the intersect function in the Table Browser to extract genomic information. Simple sequence repeats have been found to mark the regions of instability in the genome. Can you find the all genes located within chromosome 5q3 band that contain triple repeats in the exons?

    Hint: use the table browser with intersect function
    Answer: 11 genes

  5. This is an example of using the Gene Sorter interface to access expression data and other genomic information. Please find the genes that share protein sequence homology (at least 35% identity) with the BRCA1 gene in human and compare their expression profiles.

    Hint: use Gene Sorter in the UCSC browser.
    Answer: 17 genes contain GNF expression data on U133A and GNF1H chips and share at least 35% protein sequence homology with the BRCA1 gene

    Exercises of finding Gene Trap resources

  6. Can you find gene-trapped mouse ES cells for the ABCA1 gene? It is recommended to use sequences to search for hits in the gene trap database. If this gene has been mutated by this effort, how do you find the web site to send your request for the cells? Try to find the insertion site of the gene-trapped vector for this cell?

    Hint: use the international gene trap consortium blast server located at http://www.sanger.ac.uk/cgi-bin/blast/submitblast/genetrap.
    Answer: yes, one ES cell line (AC0693) is found. It is constructed by the Sanger group. The web site to request this cell line is linked to http://www.sanger.ac.uk/cgi-bin/PostGenomics/genetrap/browser?id=AC0693.
    The insertion site is exon 37.

  7. Now try to apply these exercises to your favorite genes

Step-by-step keys

  1. S1: Go to http://genome.ucsc.edu, click "Blat" on the blue navigation bar. Select "human May 2004" assembly and "DNA" as your "Query type" and type the primer sequences in the blank area as following:

    >seq1
    GAGAAGGCTCTATACGGACACACC
    >seq2
    AGGCATAGAAGCGAGGTCCTTCA
    and click "Submit".

    S2: These primers are located between genomic coordinates 116,209,966 and 116,213,925 on chromosome 11, and they are facing toward each other on different DNA strands. To see the region flanked by these primers, you can copy down these coordinates, click on "browser", put down "chr11:116,209,966-116,213,925" in the "position" field, and click "jump".

    S3: These primers flank an exon of the APOA1 gene. Zoom out 3x to see the entire gene structure. The primers flank the last exon of the APOA1 gene.

    S4: Go down on this page and select "pack" for the "BAC End Pairs" track. Click the "refresh" button to see a set of BACs containing this gene.

    S5: Click on any BACs to go to the clone information page. Find the "NCBI Clone Registry" button, which will take you to the Clone Registry page. You can find a list of clone distributors if you click "Distributor Information" on this page.

  2. S1: Click "Table Browser" from the UCSC home page. Select "Human" and "May 2004" assembly, then click "Submit". (Warning: since the Table Brower memorizes settings from the previous queries, one should clear the settings before starting another query)

    S2: Select "Genes and Gene Prediction Tracks" for "group", "RefSeq Genes" for "track", and "RefGene" for "table".

    S3: Enter "APOE" in the "Position" field, and click "Look up" to select for the right target (use the RefSeq Gene coordinates).

    S4: Click "Describe table schema" to get the definition of each field in this table, then click "Get Output" to get the tag-delimited text (this text can be copied and pasted into an Excel file directly. This table shows that the human APOE gene contains 4 exons. The start and end coordinates of the trascribed sequence, coding sequence, and each exon are listed. Please note that the gene is transcribed on the plus strand so that the coding sequence starts at 50,101,721 and ends at 50,104,347.

  3. S1: Click "Table Browser" from the UCSC home page. Select "Human, July 2003 assembly", "Variations and Repeats group", "SNPs track", and "snpMap table". Please note that the May 2004 assembly does not have SNPs track at this moment.

    S2: Enter "ABCA1" for position. Look up and select "chr9:102,923,120-103,070,274" as the target interval.

    S3: Select "all fields from primary table" as output format, and click "Get Output" to get all the variations found in this interval. Copy and paste the data into an Excel or Text Editor file to count the total number of variations. The total number of SNPs is 630.

    S4: Click to create filter to limit the "SNPs" table to only the coding region of this gene. Set "chromStart" to ">= 102926433" and "chromEnd" to "<= 103045798" (Please do not use "," in the coordinates. You should follow the exercise #2 to find the start and end coordinates of coding sequence). Click "Submit" to set.

    S5: To gather only the coding SNPs, you also need to use the intersect function by clicking "create". Select "Genes and Gene Prediction Tracks group" and "RefSeq Genes track" as the intersecting table. Choose "All SNPs that have any overlap with RefSeq Genes" and click "Submit" to set the intersect function.

    S6: Select "sequence" as output format, and then click "Get Output".

    S7: Set "Sequence Retrieval Options" to add 30 bases on both upstream and downstream regions. After setting the "Sequence Formatting Options", click "Get sequence". You should see a list of 17 cSNPs with 30 bases of flanking sequences.

    S9: To find out which version of the SNP data is used in the browser, go to the home page and click on "Release Log". The SNP dataset is updated to dbSNP Build 120.

    S10: To find out what version of the SNP data is in the GenBank, go to the entrezSNP home page at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp. The current data release of the dbSNP is Build 122.

    S11: On the entrezSNP page, you can search for "homo sapiens ABCA1". It will give you a list of 689 variations in this region.

    S12: Click on any "GeneView" to see the entire region with SNP location. Select "cSNP" to see only the list of coding SNPs. The cSNP count is 27.

  4. S1: Clear filter and intersect table, and select human May 2004 assembly.

    S2: Select "Variation and Repeats group", "Simple Repeats track", and "simpleRepeat table" as the primary table parameters.

    S3: Set position to "5q3" and click the "Look up" button. The coordinates are provided automatically. The coordinates should be "chr5:132200001-180857866".

    S4: Create "filter" and set "period" "=" "3" to specify triple repeats. Click "Submit".

    S5: Create "intersection" and select "Genes and Gene Prediction Tracks group" and "RefSeq Genes track". Choose "All Simple Repeats that have any overlap with RefSeq Genes" and click "Submit" to set the intersect function.

    S4: Select "custom track" as the output format. Or you can choose "sequence" if you want to have the triple repeats and genomic location in a tabular form.

    S5: Click "Get Output" to go to the "Genome Browser" display. Enter "chr5:132200001-180857866" in the position field and click "jump" to open the display of 5q3 region. The total number of triple repeats located in the exons of 5q3 region is 11. Please note that if you encounter "duplicated custom track, aborting" message, that means you need to reset the table browser. To reset the table browser, first click "Tables" on the top panel, go to the "old Table Browser Page", and reset all user cart settings. Click "Tables" on the top panel again to go back to the table browser page.

  5. S1: Click "Gene Sorter" from the UCSC home page. Click "configure" to select what to display.

    S2: Select "Name", "GNF Atlas 2", "E-Value", "%ID", and "Description", then click the "Submit" button.

    S3: Click on the "filter" button to set the minimum %ID at 35 for Blastp, and click the "submit" button.

    S4: Select "Protein Homology" for sorting the output list.

    S5: Enter "BRCA1" as the query gene. Click "Go!".

  6. S1: Go to the NCBI web site (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide) and search for "Mus musculus Abca1".

    S2: Click on "NM_013454" to get to the mouse Abca1 cDNA sequence. Change "default" to "FASTA" and click "display" to go to the sequence page.

    S3: Copy the entire cDNA sequence and paste it to the query data field of the gene trap blast server (http://www.sanger.ac.uk/cgi-bin/blast/submitblast/genetrap). Click "Start Blast".

    S4: Click "retrieve" to get the blast result. You should see that only one ES cell (AC0693 ) insertion has 100% sequence match with the sequence submitted.

    S5: Click "Tag Report" to get the cell line information. It is constructed by the Sanger group. The button "Request this cell line" is located on the bottom of this page.

    S6: The sequence generated from the insertion site of the gene trap vector matches to the mouse genomic sequence at "4:52294141-52294309" (see the Map Location). Click on the genome coordinates to go to the "ContigView" of this sequence in Ensembl web page. In the "Detailed view" section, you should see AC0693 sequence extends from an exon into an intron.

    S7: Click on the bar showing "Abca1 Ensembl known trans" to get to the "Ensembl Gene Report" page. Click the button "Exon information" to see all the exon-intron boundaries of this gene. The AC0693 insertion sequence coordinates "52294141-52294309" spans part of exon 37.

For problems with the web site contact the
Berkeley PGA web siteNIH Program in Genomic applications