First, some background information to make the story "biologically relevant." Picture yourself working with a photosynthetic bacterium. You have isolated deletion mutants missing a periplasmic electron carrier, cytochromec2. These mutants (none to your surprise) cannot grow photosynthetically. They are, however, still wonderfully capable of growth on a variety of substrates under aerobic conditions.
When you plate 108 cells under conditions demanding photosynthetic growth, you find "suppressor" mutations occur that allow the cells to regain photosynthetic function EVEN though the gene encoding cytochrome c2 has been deleted. You do some biochemistry on these "suppressors" and find they appear to contain high levels of a "new" cytochrome that you believe substitutes functionally for cytochrome c2 in photosynthetic electron transfer. You name this new protein "isocytochrome c2" and postulate that the protein has a similar structure to the wild-type cytochrome c2 protein.
You then clone the gene for isocytochrome c2 and set out to do some alignments to confirm your hypothesis.
Log in to the Biology Workbench at http://workbench.sdsc.edu/ and create a session named "cyctochromes."
Using Protein Tools, import the amino acid sequence (from Swiss-Prot) for the Rhodobacter sphaeroides cytochrome c2 protein. Also import the amino acid sequence for the Rhodobacter sphaeroides isocytochrome c2 protein.
Make sure you have only two protein sequences in your session. Delete any others. To be sure that you have the correct sequences, check the length of the proteins. The (precursor) isocytochrome c2 protein is 144 AA's long and the cytochrome c2 protein is 145 AA's long.
Carefully read the header information for each of these two files. To do this, you will need to use the tool "View database records for imported sequences."
Perform an alignment by using the program BL2SEQ. Before you "run" the alignment, examine the default parameters used and answer the following questions.
Question 1. What is the default scoring matrix for this search? What is the penalty for opening a gap? What is the penalty for extending a gap (per skipped AA?)
Submit the alignment. Examine the output (you may wish to print it) and answer the following questions.
Question 2. Does the program align all Amino acids in both proteins? Which specific residues were aligned in each protein? (i.e., 22-104 in cytochrome c2, with 12-94 in isocytochrome c2).
Question 3. How many of the residues were identical in each protein? How many were "positives?" HINT: These are either identical residues or conservative substitutions.
Question 4. Examine the alignment carefully. In how many regions of the alignment were gaps introduced (Count them by hand -- the "Gaps = "output above the alignment counts the total number of residues gapped, not the number of gaps).
When you perform an alignment in the Biology Workbench, the alignment output is automatically stored under "Alignment Tools." Go to Alignment Tools and select the alignment you have just performed.
Select "TEXTSHADE" as an alignment tool and then "Run" and then "Submit" to view your sequence alignment in color.
Question 5. What color are the residues that are "similar" when your alignment is displayed with TEXTSHADE? What color are residues that are identical?
Return to "Protein Tools." Run another alignment but this time use the algorithm "ALIGN." For consistency, change the default scoring matrix to BLOSSUM 62. This is the scoring matrix you used in the first search with the BL2SEQ algorithm. Examine the output (You may wish to print it). Compare this alignment output to the one you generated with BL2SEQ
Question 6. How much of the two sequences were aligned using ALIGN. That is, were all AA's used in alignment? What do you think the term "Global Alignment" means?
Question 7. Are the gap penalties the same in ALIGN as they were in BL2SEQ (when both used BLOSSUM62 as a scoring matrix)? Which scoring matrix has a stiffer penalty?
Question 8. How are identical residues indicated in the output when using ALIGN? Which algorithm (ALIGN or BL2SEQ) gave the highest percentage identity between the two proteins (when both used BLOSSUM62 as a scoring matrix)? What is the percentage identity between these two proteins using each algorithm (ALIGN and BL2SEQ)?
Question 9. In your opinion, which algorithm gave the most relevant alignment? Why?
Now log into the NCBI server below to Blast 2 Sequences: http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html
Retrieve the DNA sequence files corresponding to the two cytochromes listed above and align the DNA sequences using the default settings. Use the accession number M14501 for the cytochrome c2 DNA sequence and L02104 for the isocytochrome c2 DNA sequence. NOTE: Align only the coding sequences from each file.
Question 10. What was the result? Why do you think you got this result given the previous results you got when comparing the protein sequences?
Return to the Biology Workbench. Run a Blast search of the Swiss-Prot data base to find the best matches to the isocytochrome c2 protein sequence.
Question 11. From what organism was the best aligning sequence from?
Import the best matching sequence into your session. into your session. Run a global alignment of the imported sequence against the isocytochrome c2 protein sequence using ALIGN. Carefully examine the output (you may wish to print it.)
Question 12. Why do you believe the first 21 residues of the isocytochrome c2 protein sequence do not align with anything in the CY2_AGRTC sequence? HINT: There is a biologically relevant answer to this question. Carefully examine the DNA sequence file for the isocytochrome c2 protein. What do the first 21 AA's encode?
Return to the ALIGN output. Compare this alignment to the ALIGN output between the R. sphaeroides isocytochrome c2 protein and the cytochrome c2 protein (you previously printed it).
Question 13. The optimal global alignment between the R. sphaeroides isocytochrome c2 and cytochrome c2 protein required two large gaps (7 and 8 residues) in the isocytochrome c2 sequence. Does the alignment between the isocytochrome c2 protein sequence and the CY2_AGRTC sequence require gaps at similar positions?
Repeat steps 10 and 11 above to produce a global alignment between the best matching Swiss-Prot sequence to the R. sphaeroides cytochrome c2 protein (last time you used the isocytochrome c2 protein as a query in the Blast search). Make sure your "best match" is a different sequence than the query!
Question 14. Which sequence from the Swiss-Prot database scored highest in the Blast search using the R. sphaeroides cytochrome c2 protein sequence as a query? From what organism is this sequence derived?
Question 15. Does the alignment between the cytochrome c2 protein sequence and the C551_ERYSP sequence require gaps at similar positions as those introduced in the isocytochrome c2 protein (when aligned with isocytochrome c2)? Speculate on where within the 3-D structure of these small, globular proteins the AA's corresponding to those missing in the gapped regions are. Do you believe these residues would be buried within the protein's globular structure or on the surface near the aqueous environment? Why?
You also have a "homolog" cytochrome in the mitochondria of your cells. The Swiss-Prot protein sequence file is CYC_HUMAN. Import this file and perform the alignments required to answer the next question.
Question 16. Does your cytochrome c have these two loops of AA's (like the R. sphaeroides cytochrome c2 protein), or does it lack these loops (like the R. sphaeroides isocytochrome c2 protein)? Print out any alignments you use to answer this question.
Now align the three sequences (CYC_HUMAN and the R. sphaeroides cytochrome c2 and isocytochrome c2 proteins) using the MSA Multiple Sequence Alignment (Sum-of-Pairs Criterion) program in the Biology Workbench. (Print this alignment).
Question 17. Does the multiple sequence alignment confirm or refute your answer to question 17? Besides the lack of a signal sequence in the human cytochrome c, and the loops/gaps we have discussed, what feature stands out as being unique to the R. sphaeroides isocytochrome c2 protein?
Alignments can also be used to detect mutations.
Go to the NCBI Blast server to BLAST 2 Sequences
Align the wild-type BRCA1 mRNA sequence (Accession number U14680.1) with a mutant version of the mRNA (Accession number U64805.1). NOTE: To make the alignment, simply enter the accession numbers and use the default settings. You do not have to paste in any sequence. Print out the alignment and examine the output.
Question 18. Given the results of your alignment, what type of mutation do you think has occurred to produce the mutant mRNA? (i.e., insertion, deletion, point mutation, etc). Retrieve the mutant version of the BRCA1 mRNA and check your answer.
You have heard recently from a friend that there is a newly discovered homolog for the human BRCA1 protein in the plant Arabidopsis thaliana. Your friend has submitted the sequence to the protein data base using the accession number BAB03174. Use the NCBI Entrez search tool to retrieve the protein file.
.Return to the Biology Workbench and create a session entitled "BRCA1." Go to "Protein Tools" and import the Swiss-Prot sequence for the human BRCA1 protein. The sequence file name is BRC1_HUMAN.
Using the "Add New Protein Sequence" tool in the Biology Workbench, paste the Arabidopsis thaliana unknown protein sequence into the "Sequence" field and label it "Arabidopsis thaliana unknown protein. When you hit "Save" the file will be loaded into your session. Using the ALIGN algorithm, produce a global alignment of these two proteins. Examine the output.
Question 19. What is the percentage identity between these two protein sequences? In your opinion, is this biologically significant or random?
Import the alignment. Using the alignment tool "TEXTSHADE" display the alignment.
Question 20. Do the regions of identity/similarity cluster in certain spots within the alignment or are they scattered throughout the alignment? Do you believe these proteins are "homologs" as your friend suggests?
Your friend has also indicated that the putative erythrocyte binding protein, EBL-1 from Plasmodium falciparum (accession number AAD33018) may also be a homolog. Paste this sequence into the Biology Workbench.
Question 21. Which two of the three putative homologs are most identical?