More Lab Exercises

Using the Biology Workbench

 

              

              First, some background information to make the story "biologically relevant." Picture yourself working with a photosynthetic bacterium. You have isolated deletion mutants missing a periplasmic electron carrier, cytochromec2.   These mutants (none to your surprise) cannot grow photosynthetically. They are, however, still wonderfully capable  of growth on a variety of substrates under aerobic conditions.

 

When you plate 108 cells under conditions demanding photosynthetic growth, you find "suppressor" mutations occur  that allow the cells to regain photosynthetic function EVEN though the gene encoding cytochrome c2  has been  deleted. You do some biochemistry on these "suppressors" and find they appear to contain high levels of a "new" cytochrome that you believe substitutes functionally for cytochrome c2 in photosynthetic electron transfer. You name  this new protein "isocytochrome c2" and postulate that the protein has a similar structure to the wild-type cytochrome   c2 protein.

 

              You then clone the gene for isocytochrome c2 and set out to do some alignments to confirm your hypothesis.

 

Exercise 1:

 

                 Log in to the Biology Workbench at http://workbench.sdsc.edu/ and create a session named "cyctochromes."

 

Using Protein Tools, import the amino acid sequence (from Swiss-Prot) for the Rhodobacter sphaeroides cytochrome c2 protein. Also import the amino acid sequence for the Rhodobacter sphaeroides  isocytochrome c2 protein.

               

Make sure you have only two protein sequences in your session. Delete any others. Important: Be sure that you have the correct sequences: check the length of the proteins. The (precursor) isocytochrome c2 protein is 144 AA's  long and the cytochrome c2 protein is 145 AA's long.

            

                 Carefully read the header information for each of these two files. To do this, you will need to use the tool "View database records for imported sequences."

                

Perform an alignment by using the program BL2SEQ. Before you "run" the alignment, examine the default  parameters used and answer the following questions.

 

                 Question 1. What is the default scoring matrix for this search? What is the penalty for opening a gap? What is  the penalty for extending a gap (per skipped AA?)

 

                Submit the alignment. Examine the output (you may wish to save it in another window) and answer the following questions.

 

              Question 2. Does the program align all Amino acids in both proteins? Which specific residues were aligned in each  protein? (i.e., 22-104 in cytochrome c2, with 12-94 in isocytochrome c2).

 

    

              Question 3. How many of the residues were identical in each protein? How many were "positives?" HINT: These are  either identical residues or conservative substitutions.

 

Question 4. Examine the alignment carefully. In how many regions of the alignment were gaps introduced (Count them   by hand -- the "Gaps = "output above the alignment counts the total number of residues gapped, not the number of gaps).

 

              Return to "Protein Tools." Run another alignment but this time use the algorithm "ALIGN," with the   the default scoring matrix, BLOSUM 50. Submit the alignment, then examine the output (You may wish to save it). Compare this alignment output to the one you generated   with BL2SEQ. Select "Import Alignment". This "imports" the resulting alignment to the "Alignment Tools" section of the Workbench, and automatically takes you there.

 Select "TEXSHADE" as an alignment tool and then "Run" and then "Submit" to view your sequence alignment in color.

 

              Question 5. What color are the residues that are "similar" when your alignment is displayed with TEXSHADE?  What color are residues that are identical?

 

  Question 6. Returning to the comparison of the two alignments, one from BL2SEQ, the other from ALIGN. How much of the two sequences were aligned using ALIGN. That is, were all AA's used in   alignment? What do you think the term "Global Alignment" means?

              Question 7. Are the gap penalties the same in ALIGN as they were in BL2SEQ (when both used a BLOSUM as a   scoring matrix)? Which alignment program has a stiffer penalty?

 

              Question 8. How are identical residues indicated in the output when using ALIGN? Which algorithm (ALIGN or  BL2SEQ) gave the highest percentage identity between the two proteins (when both used a BLOSUM   scoring matrix)? What is the percentage identity between these two proteins using each algorithm (ALIGN and   BL2SEQ)?

 

              Question 9. In your opinion, which algorithm gave the most relevant  alignment? Why?

 

              Now log into the NCBI server below to Blast 2 Sequences: http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html

 

              Retrieve the DNA sequence files corresponding to the two cytochromes listed above and align the DNA sequences using the default settings. Use the accession number 117780 for the cytochrome c2 DNA sequence and   986950 for the isocytochrome c2 DNA sequence. (NOTE: If you wish to be meticulous, align only the coding sequences from each file.)

 

              Question 10. What was the result? Why do you think you got this result given the previous results you got when comparing the protein sequences?

 

Return to the Biology Workbench. Run a Blast search of the Swiss-Prot data base to find the best matches to the  isocytochrome c2 protein sequence.

 

              Question 11. From what organism was the best aligning sequence from?

 

  Import the best matching sequence into your session. into your session. Run a global alignment of the imported sequence against the isocytochrome c2 protein sequence using ALIGN. Carefully examine the output (you may wish to "import" it.)

 

               Question 12. Why do you believe the first 21 residues of the isocytochrome c2 protein sequence do not align with  anything in the CY2_AGRTC sequence? HINT: There is a biologically relevant answer to this question. Carefully examine the DNA sequence file for the isocytochrome c2 protein. What do the first 21 AA's encode?

 

               Return to the ALIGN output. Compare this alignment to the ALIGN output between the R. sphaeroides   isocytochrome c2 protein and the cytochrome c2 protein (you previously saved it).

 

               Question 13. The optimal global alignment between the R. sphaeroides isocytochrome c2 and cytochrome c2 protein  required two large gaps (7 and 8 residues) in the isocytochrome c2 sequence. Does the alignment between the   isocytochrome c2 protein sequence and the CY2_AGRTC sequence require gaps at similar positions?

 

                 Repeat steps 10 and 11 above to produce a global alignment between the best matching Swiss-Prot sequence   to the R. sphaeroides cytochrome c2 protein (last time you used the isocytochrome c2 protein as a query in   the Blast search). Make sure your "best match" is a different sequence than the query!

               

 Question 14. Which sequence from the Swiss-Prot database scored highest in the Blast search using the R. sphaeroides cytochrome c2 protein sequence as a query? From what organism is this sequence derived?

 

                Question 15. Does the alignment between the cytochrome c2 protein sequence and the C551_ERYSP sequence require gaps at similar positions as those introduced in the isocytochrome c2 protein (when aligned  with isocytochrome c2)? Speculate on where within the 3-D structure of these small, globular proteins the AA's corresponding to those missing in the gapped regions are. Do you believe these residues would be buried within the protein's globular structure or on the surface near the aqueous environment? Why?

 

You also have a "homolog" cytochrome in the mitochondria of your cells. The Swiss-Prot protein sequence file is  CYC_HUMAN. Import this file and perform the alignments required to answer the next question.

                 

Question 16. Does your cytochrome c have these two loops of AA's (like the R. sphaeroides cytochrome c2  protein), or does it lack these loops (like the R. sphaeroides isocytochrome c2 protein)? Save (import) any alignments you use to answer this question.

 

Now align the three sequences (CYC_HUMAN and the R. sphaeroides cytochrome c2 and isocytochrome   c2 proteins) using the MSA Multiple Sequence Alignment (Sum-of-Pairs Criterion) program in the Biology Workbench. (Import/save this alignment).

                   

Question 17. Does the multiple sequence alignment confirm or refute your answer to question 17? Besides the   lack of a signal sequence in the human cytochrome c, and the loops/gaps we have discussed, what feature stands out as being unique to the R. sphaeroides isocytochrome c2 protein?

               

 

Exercise 2:

 

Alignments can also be used to detect mutations.

               Go to the NCBI Blast server to BLAST 2 Sequences

                http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html

 

                Align the wild-type BRCA1 mRNA sequence (Accession number U14680.1) with a mutant version of the  mRNA (Accession number U64805.1). NOTE: To make the alignment, simply enter the accession numbers and use the default settings. You do not have to paste in any sequence.  Save the alignment and examine the output.

 

Question 18. Given the results of your alignment, what type of mutation do you think has occurred to produce  the mutant mRNA? (i.e., insertion, deletion, point mutation, etc).   Retrieve the mutant version of the BRCA1 mRNA and check your answer.

 

You have heard recently from a friend that there is a newly discovered homolog for the human BRCA1 protein in the plant Arabidopsis thaliana. Your friend has submitted the sequence to the protein data base using the accession number BAB03174. Use the NCBI Entrez search tool to retrieve the protein file, and select FASTA under "Display" to bring up its sequence in Fasta format, ready for pasting.

 

.Return to the Biology Workbench and create a session entitled "BRCA1." Go to "Protein Tools" and import  the Swiss-Prot sequence for the human BRCA1 protein. The sequence file name is BRC1_HUMAN.

 

 Using the "Add New Protein Sequence" tool in the Biology Workbench, paste the Arabidopsis thaliana unknown protein sequence into the "Sequence" field and label it "Arabidopsis thaliana unknown protein.    When you hit "Save" the file will be loaded into your session. Using the ALIGN algorithm, produce a global alignment of these two proteins. Examine the output.

 

Question 19. What is the percentage identity between these two protein sequences? In your opinion, is this    biologically significant or random?

 

                Import the alignment. Using the alignment tool "TEXSHADE" display the alignment.

 

                Question 20. Do the regions of identity/similarity cluster in certain spots within the alignment or are they scattered throughout the alignment? Do you believe these  proteins are "homologs" as your friend suggests?

 

            Your friend has also indicated that the putative erythrocyte binding protein, EBL-1 from Plasmodium falciparum  (accession number AAD33018) may also be a homolog.  Paste this sequence into the Biology Workbench.

 

                Question 21. Which two of the three putative homologs are most identical?