 |
          |
Exercises of using UCSC Genome Browser
-
This exercise leads you to use sequence information to search for genomic targets and
corresponding large insert clones for functional studies. What are the human chromosome 11
genomic coordinates (in May, 2004 assembly) that are flanked by the following primers.
Describe the feature of the target sequence amplified by these primers. Please find all
BAC clones that contain this gene. How do you find the contacts of clone distributors?
Primer 1: 5'-GAGAAGGCTCTATACGGACACACC -3'
Primer 2: 5'-AGGCATAGAAGCGAGGTCCTTCA-3 |
Hint: use the BLAT function in the browser.
Answer: chr11:116,209,966-116,213,925. These primers should amplify last exon
of the APOA1 gene. The BACs include CTD-2640P19, CTD-2530B14,
RP11-728G20, RP11-1147E8, CTD-3009A19, CTD-2149O15 an
RP11-599J10. A list of clone distributors can be found at
http://www.ncbi.nlm.nih.gov/genome/clone/distributors.html
This exercise shows the basic mean of retrieving genomic data using Table Browser.
What are the genomic coordinates of the APOE gene exons and coding start/end sites?
Hint: use the table browser.
Answer: exon1: chr19:50,100,878-50,100,938
exon2: chr19:50,101,698-50,101,764
exon3: chr19:50,102,856-50,103,049
exon4: chr19:50,103,629-50,104,489
coding sequence starts at 50,101,721 and ends at 50,104,347 and the
gene transcribes on the plus strand
This exercise shows an example of using intersect function in the Table Browser to
extract data from two tables. Go to the July 2003 genomic interval that contains the human
ABCA1 gene (chr9:102,923,120-103,070,274) and find all the sequence variations
(SNPs and indels) in this interval. How many variations did you find? How many are
located in the coding region (coding SNP or cSNP)? Can you get 30 bases of sequence
flanking the cSNP? The current release of the SNP data in the UCSC Genome Browser
is dbSNP Build 120, but the NCBI has recently released dbSNP Build 122. How many
more cSNPs were added to this region?
Hint: use the table browser and the July 2003 assembly. Use the intersect function to
locate cSNP. For the dbSNP question, please use http://www.ncbi.nlm.nih.gov/SNP/,
search for ABCA1, and then select geneview to view cSNP
Answer: a total of 630 variations. Of them, 17 are coding polymorphisms.
Sequences are shown below. 10 more cSNPs were added.
This is another example of using the intersect function in the Table Browser to extract
genomic information. Simple sequence repeats have been found to mark the regions of
instability in the genome. Can you find the all genes located within chromosome
5q3 band that contain triple repeats in the exons?
Hint: use the table browser with intersect function
Answer: 11 genes
This is an example of using the Gene Sorter interface to access expression data and
other genomic information. Please find the genes that share protein sequence homology
(at least 35% identity) with the BRCA1 gene in human and compare their expression
profiles.
Hint: use Gene Sorter in the UCSC browser.
Answer: 17 genes contain GNF expression data on U133A and GNF1H chips and
share at least 35% protein sequence homology with the BRCA1 gene
Exercises of finding Gene Trap resources
- Can you find gene-trapped mouse ES cells for the ABCA1 gene? It is recommended to
use sequences to search for hits in the gene trap database. If this gene has been mutated
by this effort, how do you find the web site to send your request for the cells? Try to find
the insertion site of the gene-trapped vector for this cell?
Hint: use the international gene trap consortium blast server located at
http://www.sanger.ac.uk/cgi-bin/blast/submitblast/genetrap.
Answer: yes, one ES cell line (AC0693) is found. It is constructed by the Sanger
group. The web site to request this cell line is linked to
http://www.sanger.ac.uk/cgi-bin/PostGenomics/genetrap/browser?id=AC0693.
The insertion site is exon 37.
Now try to apply these exercises to your favorite genes
Step-by-step keys
S1: Go to http://genome.ucsc.edu, click "Blat" on the blue navigation bar. Select "human
May 2004" assembly and "DNA" as your "Query type" and type the primer sequences
in the blank area as following:
>seq1
GAGAAGGCTCTATACGGACACACC
>seq2
AGGCATAGAAGCGAGGTCCTTCA
and click "Submit".
S2: These primers are located between genomic coordinates 116,209,966 and 116,213,925
on chromosome 11, and they are facing toward each other on different DNA strands.
To see the region flanked by these primers, you can copy down these coordinates,
click on "browser", put down "chr11:116,209,966-116,213,925" in the "position"
field, and click "jump".
S3: These primers flank an exon of the APOA1 gene. Zoom out 3x to see the entire gene
structure. The primers flank the last exon of the APOA1 gene.
S4: Go down on this page and select "pack" for the "BAC End Pairs" track. Click the
"refresh" button to see a set of BACs containing this gene.
S5: Click on any BACs to go to the clone information page. Find the "NCBI Clone
Registry" button, which will take you to the Clone Registry page. You can find a list
of clone distributors if you click "Distributor Information" on this page.
S1: Click "Table Browser" from the UCSC home page. Select "Human" and "May 2004"
assembly, then click "Submit". (Warning: since the Table Brower memorizes settings
from the previous queries, one should clear the settings before starting another query)
S2: Select "Genes and Gene Prediction Tracks" for "group", "RefSeq Genes" for "track",
and "RefGene" for "table".
S3: Enter "APOE" in the "Position" field, and click "Look up" to select for the right
target (use the RefSeq Gene coordinates).
S4: Click "Describe table schema" to get the definition of each field in this table, then click
"Get Output" to get the tag-delimited text (this text can be copied and pasted into an
Excel file directly. This table shows that the human APOE gene contains 4 exons.
The start and end coordinates of the trascribed sequence, coding sequence, and each
exon are listed. Please note that the gene is transcribed on the plus strand so that
the coding sequence starts at 50,101,721 and ends at 50,104,347.
-
S1: Click "Table Browser" from the UCSC home page. Select "Human, July 2003
assembly", "Variations and Repeats group", "SNPs track", and "snpMap table".
Please note that the May 2004 assembly does not have SNPs track at this moment.
S2: Enter "ABCA1" for position. Look up and select "chr9:102,923,120-103,070,274"
as the target interval.
S3: Select "all fields from primary table" as output format, and click "Get Output" to get all
the variations found in this interval. Copy and paste the data into an Excel or Text Editor
file to count the total number of variations. The total number of SNPs is 630.
S4: Click to create filter to limit the "SNPs" table to only the coding region of this gene.
Set "chromStart" to ">= 102926433" and "chromEnd" to "<= 103045798" (Please
do not use "," in the coordinates. You should follow the exercise #2 to find the start
and end coordinates of coding sequence). Click "Submit" to set.
S5: To gather only the coding SNPs, you also need to use the intersect function by clicking
"create". Select "Genes and Gene Prediction Tracks group" and "RefSeq Genes track"
as the intersecting table. Choose "All SNPs that have any overlap with RefSeq Genes"
and click "Submit" to set the intersect function.
S6: Select "sequence" as output format, and then click "Get Output".
S7: Set "Sequence Retrieval Options" to add 30 bases on both upstream and downstream
regions. After setting the "Sequence Formatting Options", click "Get sequence".
You should see a list of 17 cSNPs with 30 bases of flanking sequences.
S9: To find out which version of the SNP data is used in the browser, go to the home
page and click on "Release Log". The SNP dataset is updated to dbSNP Build 120.
S10: To find out what version of the SNP data is in the GenBank, go to the entrezSNP
home page at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp.
The current
data release of the dbSNP is Build 122.
S11: On the entrezSNP page, you can search for "homo sapiens ABCA1". It will give
you a list of 689 variations in this region.
S12: Click on any "GeneView" to see the entire region with SNP location. Select "cSNP"
to see only the list of coding SNPs. The cSNP count is 27.
-
S1: Clear filter and intersect table, and select human May 2004 assembly.
S2: Select "Variation and Repeats group", "Simple Repeats track", and "simpleRepeat table" as the primary table parameters.
S3: Set position to "5q3" and click the "Look up" button. The coordinates are provided automatically. The coordinates should be "chr5:132200001-180857866".
S4: Create "filter" and set "period" "=" "3" to specify triple repeats. Click "Submit".
S5: Create "intersection" and select "Genes and Gene Prediction Tracks group" and "RefSeq Genes track". Choose "All Simple Repeats that have any overlap with RefSeq Genes" and click "Submit" to set the intersect function.
S4: Select "custom track" as the output format. Or you can choose "sequence" if you want to have the triple repeats and genomic location in a tabular form.
S5: Click "Get Output" to go to the "Genome Browser" display. Enter "chr5:132200001-180857866" in the position field and click "jump" to open the display of 5q3 region. The total number of triple repeats located in the exons of 5q3 region is 11. Please note that if you encounter "duplicated custom track, aborting" message, that means you need to reset the table browser. To reset the table browser, first click "Tables" on the top panel, go to the "old Table Browser Page", and reset all user cart settings. Click "Tables" on the top panel again to go back to the table browser page.
-
S1: Click "Gene Sorter" from the UCSC home page. Click "configure" to select what
to display.
S2: Select "Name", "GNF Atlas 2", "E-Value", "%ID", and "Description", then click the
"Submit" button.
S3: Click on the "filter" button to set the minimum %ID at 35 for Blastp, and click the
"submit" button.
S4: Select "Protein Homology" for sorting the output list.
S5: Enter "BRCA1" as the query gene. Click "Go!".
-
S1: Go to the NCBI web site (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide)
and search for "Mus musculus Abca1".
S2: Click on "NM_013454" to get to the mouse Abca1 cDNA sequence. Change "default"
to "FASTA" and click "display" to go to the sequence page.
S3: Copy the entire cDNA sequence and paste it to the query data field of the gene trap
blast server (http://www.sanger.ac.uk/cgi-bin/blast/submitblast/genetrap). Click "Start
Blast".
S4: Click "retrieve" to get the blast result. You should see that only one ES cell (AC0693 )
insertion has 100% sequence match with the sequence submitted.
S5: Click "Tag Report" to get the cell line information. It is constructed by the Sanger group.
The button "Request this cell line" is located on the bottom of this page.
S6: The sequence generated from the insertion site of the gene trap vector matches to the
mouse genomic sequence at "4:52294141-52294309" (see the Map Location). Click
on the genome coordinates to go to the "ContigView" of this sequence in Ensembl web
page. In the "Detailed view" section, you should see AC0693 sequence extends from
an exon into an intron.
S7: Click on the bar showing "Abca1 Ensembl known trans" to get to the "Ensembl Gene
Report" page. Click the button "Exon information" to see all the exon-intron boundaries
of this gene. The AC0693 insertion sequence coordinates "52294141-52294309" spans
part of exon 37.
|
|