PGA WorkshopProgramApplication formcourse locationHousingBerkeley PGA web sitecontact us

Day 3 Exercises

http://gsd.lbl.gov/vista

Vista Browser

Vista Browser allows users to interactively visualize a variety of whole genome alignments and quickly identify highly conserved regions.

1. Find the maximum percent conservation identity for which all of the exons on the LDLR gene are conserved between Human (July 2003 assembly), Mouse, Rat, and Lemur. Retrieve the coordinates of the conserved regions.

Hint: right-click on the curves to change parameters. You can select each curve and use the "I" button to get detailed information regarding the alignment.

Answer: All the exons are conserved at 60% minimum conservation identity.

2.Identify the human coordinates of the non-coding regions in the HOXA3 gene that are conserved in Human (July 2003) and Chicken. Find the coordinates of the chicken genome region that aligns to human HOXA3 (hint: use the change base genome function).

Answer: chr7:26892652-26892841, chr7:26893662-26893821, chr7:26895304-26895530, chr7:26898569-26898713, chr7:26905975-26906095, chr7:26906158-26906299, chr7:26906492-26906674.

The chicken region is chr2:31769340-31785183

Bonus: Try also adding the Fugu alignment and adjusting Human/Mouse parameters so that the conserved regions match those of Human/Fugu.

rVISTA

rVista is a tool that predicts transcription binding sites by combining a Transfac database search with comparative sequence analysis.

We will now perform rVISTA analysis on the HOXA3 alignment to find predicted transcription binding sites. Note that this is not the only way to use rVISTA - in addition to using it through this page, you can submit to rVISTA directly by going to the main Vista site and submitting an existing alignment, or you can align to sequences with the main Vista program (mVista) and automatically submit to rVISTA from there. Please remember that rVISTA has a 20K limit on the length of aligned sequences.

3. Submit the human-chicken HOXA3 alignment to rVista, and look for HOXA3 transcription binding sites. Find the coordinates of the first three predicted conserved transcription binding sites that correspond to the first CNS on the vista graph.

Hint: click on "summary of data" to get detailed information about the rVISTA predictions

Answer: chr7:26892697-26892705 , chr7:26892700-26892708, chr7:26892706- 26892714

GenomeVISTA Exercises

GenomeVista is a tool that allows biomedical researchers to align their own sequences or NCBI contigs to a number of genomes. The results can be browsed just like the whole genome alignments, using the Vista Browser.

4.Locate the human genome coordinates to which the 6 ordered pieces of Didelphis virginiana clone LB3-274H11 align. Get a list of all the conserved regions and their alignments.

Hint: Look up the clone on GenBank to find its GenBank accession number.

Answer: contig1: chr5:88169426-88195449
contig2: chr5:88204198-88215370
contig3: not aligned
contig4: chr5:88259722-88263380
contig5: chr5:88263600-88272851
contig6: chr5:88281117-88285691

mVISTA exercises

mVista is a tool for aligning two or more sequences to each other (as opposed aligning them to a whole genome, as in the previous exercise). This is helpful when the investigator want to analyze his own self-generated sequences.

A help file and instructions for using mVista are available at http://www-gsd.lbl.gov/vista/mvista/instructions.shtml.

5.Get all the conserved cow exon sequences from a three-way human-pig-cow MVista alignment of the sequences found at http://pga.lbl.gov/Workshop/mvista_files

Hint: Use ExtractSeq from Day 1 excercises.

STEP-BY-STEP

1.1 Go to http://gsd.lbl.gov/vista . Click on the browser link located in the light blue line at the top of the page. Make sure "Human July 2003" is selected in the base genome box, and enter "LDLR" in the position box. A new window will open with several matches to this gene name. Click on the first match. Vista Browser will load the LDLR alignment.
Note: downloading the applet may take a while - be patient. If you experience any difficulties, ask one of the lab assistants to help you.

1.2 Identify the strand of the LDLR gene, and the exons and UTRs (they are marked on the annotation track above the curve, and colored according to the color legend in the lower left-hand corner). Are all the exons and UTRs conserved?

1.3 Right now you are only looking at Human-Mouse and Human-Rat alignments. You can use the second drop-down menu on the left to add the Human-Lemur alignment.

1.4 Since this is not a very well-conserved gene, we might want to adjust the parameters to require a lower amount of conservation for a region to be considered conserved. To do this, right-click on the curve which you want to adjust and select "Parameters." An explanation of the parameters is available at http://pipeline.lbl.gov/vgb2help.shtml . In this case, you will probably want to try lowering the "Cons Identity. Experiment with parameter values until you get all the exons to be marked as conserved.


2.1 In the position box of the Vista Browser, enter "HOXA3" and click "Go." Three matches will come up - double-click the last match (the other two matches are alternative splicings of the gene, which cover only a part of the region we want). Identify the strand of the HOXA3 gene, its exons and UTRs.

2.2 When looking at a highly conserved gene, it is sometimes useful to gain some evolutionary distance in order to identify the most persistently conserved regions. Add the chicken alignment to the display (use the second drop-down menu on the left, or the "+" button). Identify regions that are highly conserved in all 4 species (human, mouse, rat, and chicken).

You will notice that some of the highly conserved sequences are non-coding (salmon-colored). Those areas might seem like good candidates for further analysis.

2.3 Click on the third (human-chicken) curve to select it. Now click on the "I" button ("alignment details") in the toolbar at the top of the screen. A new browser window called "Text Browser" will open with detailed information regarding the segment of the human-chicken alignment you were looking at.

2.4 In this window, you can see detailed information about the aligned regions, including their genomic coordinates. The coordinates of the Chicken region that aligned to human can be found in the second column.

2.5 Click on the "Conserved Regions: human-chicken" link. The CNS coordinates are those marked as non-coding. Note that clicking on the links on this page will give you alignments of the conserved regions.

3.1 You should still have the TextBrowser window open for the Human-Chicken HOXA3 alignment (if not, bring up the alignment again in the browser and click the I button).

3.2 Click on the rVista link in the "Alignment" column. Enter your email as prompted. You have now started the rVista submission process. The default values filled in on the next screen are sufficient for our purposes, however, if you wish to learn about these options, a description is available at http://gsd.lbl.gov/vista/rvista/instructions.shtml#options.

3.3 Click on Submit to go to a list of possible Transcription Factor Binding sites for which to check. There is a large number of matrices here; find the box labeled "HOXA3" and check it. Click submit. Within a few moments you should get an email with a link to a webpage that contains your results.

3.4 The various visualization options are described at http://gsd.lbl.gov/vista/rvista/instructions.shtml#out . Check the "conserved," "aligned," and "all" boxes in the "Binding sites to visualize" column. Click Submit to look at the predicted transcription binding sites (shown as tick marks above a regular Vista curve). The conserved predicted sites are shown in green.

Note that the first CNS is somewhere between 4500 and 5000 bp from the start of the region.

3.5 Click on "summary of data" at the bottom of the page.

3.6 Click on "aligned and conserved"

3.7 Write down the H_POS values of the first three entries in the table. These coordinates are not genomic - they show the distance from the start of the submitted sequence.

3.8 To get genomic coordinates, look at the TextBrowser once again to find the beginning of the HOXA3 region on human. The genomic coordinates of the predicted transcription binding sites can be found by adding their sequence coordinates to the start of HOXA3.

4.1 Go to http://www.ncbi.nlm.nih.gov/ . In the drop-down selection box, choose "Nucleotide." In the search box, enter "Didelphis virginiana clone LB3-274H11" (or just LB3-274H11) and click Go.

4.2 Write down the accession number for this clone.

4.3 Go to http://www.ncbi.nlm.nih.gov/ and click on "GenomeVista"

4.4 In the "GenBank" field, enter the accession number.

4.5 Make sure "Human July 2003" is selected as the base genome. In the "advanced options" enter your email, and name your request - "opossum" will do. Click "Submit."

4.6 In a few moments you will receive an e-mail with a link to your processed results. Clicking on the link will take you to a page that summarizes all the matches of the submitted sequence to the base genome (only one in this case). Click on "Vista Browser" to see the alignment (human-mouse and human-rat alignments will be shown under your alignment in order to assist your analysis).

4.7 Select the human-opossum curve and click the "I" button.

4.8 You can figure out which contig is which by looking at their names.

5.1 Go to http://pga.lbl.gov/Workshop/mvista_files and save all the files to your hard drive.

5.2 Go to http://gsd.lbl.gov/vista and click on mVista.

5.3 You will be submitting 3 sequences (human, cow, pig). Enter 3 and click "Submit."

5.4 Fill out the submission form. Enter your email, then input human.fasta as the base sequence, pig.fasta as the second sequence, and cow.fasta as the third. In the "Options" section enter human.anno as the annotation file. You can also enter the names of the organisms (human, pig, and cow, in that order).

5.5 Note that if you wished, you could get an rVista analysis for your sequences here by clicking the rVista checkbox. We will not be doing that in this exercise.

5.6 Click submit. You will shortly receive an email notifying you that the Vista job has completed. Click on the email link to see your results. From this page you can download the pdf Vista graph, the alignment, and the coordinates of the conserved sequences.

5.7 Since we are interested in the human-cow conserved regions, download the "seq1_seq2.regions.txt" file.

5.8 Find all the lines that list conserved regions within a human exon. The numbers listed in the parenthesis are coordinates on the cow sequence that was submitted. Given these coordinates and the sequence you downloaded earlier, use Day 1 instructions to extract the conserved exon DNA.

For problems with the web site contact the
Berkeley PGA web siteNIH Program in Genomic applications