As you can see, there are three ways to find genes in the database. You can find a gene by name, if you know it; you can browse the alphabetical listing of all the genes in the database, or you can look at genes associated with different groups of deseases.
A) Let us first look at a gene a segment of which we have already examined in Part 1. In the search box, enter "APOE" and click on "find". The website will give you a table with one entry -- the Apoliprotein E gene. Click on the link under the heading "Gene Name" to find detailed information about this gene.
Click on the Gene ID link (107741) to find information about this gene in the OMIM database (notice that it will open in a new window).
What is the function of the genes?
What disorders are associated with the gene?
Change to the old window and click on GenBank accession numbers for the human and mouse DNA. These links will also open a new window, and provide you with publications about the genes, a short summary of information concerning the gene, and the DNA sequence as well as the translated protein sequence of the gene itself.
Go back to the old window again. Open on the "annotation" link in a new window to get detailed annotation of the APOE cluster. Here you will see a simplified graphical representation of the locations of the genes in the cluster, as well as a list of these genes and the genes to which they are identical or similar
Click on the "TOM40" link to see the DNA sequence and the protein sequence.
Click on the "Back to Table" link and now, click on the links in the gi|5174723 link (opposite TOM40). This takes you to the NCBI sequence viewer like we saw with human and mouse accession numbers. Similar information can be retrieved.
When you are done perusing this information, click on the "back" button to go back to the annotation page. Scroll up to see the annotation diagram. The blue rectangles representing the genes are clickable. Click on the blue rectangle under "TOM40". This takes you to a page which provides detailed information concerning gene prediction and annotation in AceDB format.
Go back to the old window. By clicking on the human_mouse alignement link can acess VISTA alignment plots. Go ahead and click on any of the linked intervals to get a VISTA plot. You will notice that the graph is clickable. Clicking on a region of the graph brings up the alignment of the corresponding 400 basepair-long region. At the bottom of the alignment, there are links that allow you to move forward and backward in the alignment. Experiment with this feature. Try clicking on peaks in the VISTA graph. Can you see the high corespondence between the two sequences?
Try the smaller peaks, as well as the regions where no homology is shown. Notice the differences.
What do you find is the reason for the low homology in most cases?
Explore this feature further by using the "next/previous 400 basepairs" links at the bottom of the page. Notice also that you can get the human/mouse sequences you are looking at by clicking the appropriate links -- a feature useful if you are designing primers, for example.
Go back to the page with all the links again. Click on the last unvisited one -- the list of conserved regions. This provides you with the exact data that might be not as clear on the VISTA graph. Exact boundaries and percentages of conservation are made available, as well as the alignments (via clicking the links). Experiment with this feature.
B) Go back to the CVCGD main page. Look for a gene called MEF2C. Click on the OMIM link to find out about this gene. View the VISTA graph for the whole sequence.
What kind of protein does the gene code for?
There is are two distinct functions for the MEF2 family -- what are they?
What are the functions of MEF2C?
Go back to the CVCGD window and look at the VISTA plot for the gene.
Where are the areas of high homology?
Low homology?
Do you think the mouse and the human sequences are closely related?
Check the alignment by clicking on interesting regions on the graph. Was your hypothesis correct?
Go back one page and select smaller intervals if you need to see more detail.
Identify conserved non-coding sequences.
Identify UTRs.
Identify the exons.
Which of these areas belong to the gene?
Where are the areas of high homology outside of the gene?
What could be the reason for this high homology?
C) Now, look for the Tumor Necrosis factor genes.
Why are C-genes displayed separately from the alpha and the beta?
Look at the VISTA plots. What is different about this plot when compared to the MEF2C we saw previously?
Identify the regions that belong to genes (which genes?).
Why do some genes appear to overlap?
Identify the CNSs, exons, and introns.
Compare the sections of alignment at the very peaks of the graph to ones where the homology isn't quite as high.
Identify gaps.
D) Go back to the CVCGD main page. Try clicking on the various groups of diseases to get the corresponding genes. For the genes that are available, look at the annotations and alignments. In particular, look at the ATP-binding cassette transporter gene and the unpublished genes like PTK-7 and TOM-40. How could this be useful to a researcher?