1. The BLAST programs
employ the SEG algorithm to filer low complexity regions from proteins before
executing a database search.
a. How many low complexity
regions can you find in the PAX-6 protein of humans? Use Protein BLAST at NCBI
5
low complexity regions. You get an image that shows motifs along with the Query
ID. Motifs and shows low
complexity. In the comparison of
sequence, low complexity is shown by XX.
1 HSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG 60 HSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG5 HSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG 64Query: 61 SIRPRAIGGSKPRVATPEVVSKIAQYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSIN 120
SIRPRAIGGSKPRVATPEVVSKIAQYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSINSbjct: 65 SIRPRAIGGSKPRVATPEVVSKIAQYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSIN 124Query: 121 RVLRNLASEKQQMGADGMYDKLRMLNGQTGSWGTRPGWYPGTSVPGQPTXXXXXXXXXXX 180
RVLRNLASEKQQMGADGMYDKLRMLNGQTGSWGTRPGWYPGTSVPGQPT
Sbjct: 125 RVLRNLASEKQQMGADGMYDKLRMLNGQTGSWGTRPGWYPGTSVPGQPTQDGCQQQEGGG 184
Query: 181 XNTNSISSNGEDSDEAQMXXXXXXXXXXNRTSFTQEQIEALEKEFERTHYPDVFARERLA 240
NTNSISSNGEDSDEAQM NRTSFTQEQIEALEKEFERTHYPDVFARERLA
Sbjct: 185 ENTNSISSNGEDSDEAQMRLQLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLA 244
Query: 241 AKIDLPEARIQVWFSNRRAKWRREEKLRNQRRQASNXXXXXXXXXXXXXXVYQPIPQPTT 300
AKIDLPEARIQVWFSNRRAKWRREEKLRNQRRQASN VYQPIPQPTT
Sbjct: 245 AKIDLPEARIQVWFSNRRAKWRREEKLRNQRRQASNTPSHIPISSSFSTSVYQPIPQPTT 304Query: 301 PVSSFTSGSMLGRTDTALTNTYSALPPMPSFTMANNLPMQPPVPSQTSSYSCMLPTSPSV 360
PVSSFTSGSMLGRTDTALTNTYSALPPMPSFTMANNLPMQPPVPSQTSSYSCMLPTSPSVSbjct: 305 PVSSFTSGSMLGRTDTALTNTYSALPPMPSFTMANNLPMQPPVPSQTSSYSCMLPTSPSV 364Query: 361 NGRSYDTYTPPHMQTHMNSQPMXXXXXXXXXLIXXXXXXXXXXXXXXXDMSQYWPRLQ 418
NGRSYDTYTPPHMQTHMNSQPM LI DMSQYWPRLQ
Sbjct: 365 NGRSYDTYTPPHMQTHMNSQPMGTSGTTSTGLISPGVSVPVQVPGSEPDMSQYWPRLQ 422
b. Does this sequence
contain any sequence motifs?
Answer:
Yes, homeoboxes and pairbox domains.
2.
Pattern searching. Use this sequence:
MMTAKAVDKIPVTLSGFVHQLSDNIYPVEDLAATSVTIFPNAELGGPFDQ
MNGVAGDGMINIDMTGEKRSLDLPYPSSFAPVSAPRNQTFTYMGKFSIDP
QYPGASCYPEGIINIVSAGILQGVTSPASTTASSSVTSASPNPLATGPLG
VCTMSQTQPDLDHLYSPPPPPPPYSGCAGDLYQDPSAFLSAATTSTSSSL
AYPPPPSYPSPKPATDPGLFPMIPDYPGFFPSQCQRDLHGTAGPDRKPFP
CPLDTLRVPPPLTPLSTIRNFTLGGPSAGMTGPGASGGSEGPRLPGSSSAA
AAAAAAAAYNPHHLPLRPILRPRKYPNRPSKTPVHERPYPCPAEGCDRRFS
RSDELTRHIRIHTGHKPFQCRICMRNFSRSDHLTTHIRTHTGEKPFACDYCGR
KFARSDERKRHTKIHLRQKERKSSAPSASVPAPSTASCSGGVQPGGTLCSS
NSSSLGGGPLAPCSSRTRTP.
The Prosite database and the Pfam contain a lot
of information on protein families and functional domains. Often these
databases can be used to get a good hint of the function of particular protein.
You can e.g. search for known functional motifs and domains in your protein (or
DNA).
a) Use the "http://www.expasy.ch/tools/scnpsite.html"
to find out if the protein sequence contains any motifs from the Prosite
database. Follow the different links to e.g. find out what function the motif
could have and if the motif is present in other proteins.
b) Use the Pfam database
at http://www.sanger.ac.uk/Software/Pfam/to
find out if the protein sequence contains any motifs from this database. Follow
the different links to e.g. find out what function the motif could have and if
the motif is present in other proteins.
c) Based on the results
from these different analyses, which functional motifs/domains do you think the
protein contains?
3.
Consider the following partial amino acid sequence of a protein from
Saccharomyces cerevisiae:
Answer: MSSVAENIIQHATHNSTLHQ
a) What is the likely
function of this protein?
Cytochrome
P450 involved in the c-22 denaturation of the ergosterol side-chain.
b) What is the molecular
weight (in kilodalton Kda) and predicted isolectric point (pI) for the protein?
Answer:
Molecular weight 61334.39 Daltons;
Theoretical pI, 8.1. Use the
services at EXPASY.
c) Which chromosome is the
gene located on?
Answer:
One way to answer this is to go to the SGD database at Stanford, devoted to
Saccharomyces. Many organisms have
community databases that are hand-curated.
Another way is to go to the Genomes division of Entrez at NCFBI. Chr XIII; coordinates 302484-300868.
d) Which genes are located
upstream and downstream of the gene?
Answer:
BUD22 and SOK2
e) Does this protein have
a sequence motif that belongs to a certain protein group (family)? Give the
name of the protein group and the sequence of this motif.
NiceSite
View of PROSITE: PDOC00081
(documentation)
|
PROSITE cross-reference(s) |
|||
|
Documentation |
|||
Cytochrome P450's [1,2,3,E1] are a group of enzymes involved in the oxidative metabolism of a high number of natural compounds (such as steroids, fatty acids, prostaglandins, leukotrienes, etc) as well as drugs, carcinogens and mutagens. Based on sequence similarities, P450's have been classified into about forty different families [4,5]. P450's are proteins of 400 to 530 aminoacids; the only exception is Bacillus BM-3 (CYP102) which is a protein of 1048 residues that contains a N-terminal P450 domain followed by a reductase domain. P450's are heme proteins. A conserved cysteine residue in the C- terminal part of P450's is involved in binding the heme iron in the fifth coordination site. From a region around this residue, we developed a ten residue signature specific to P450's. |
|||
|
Description |
|||
|
|
|||
|
|
|||
|
Consensus pattern |
[FW]-[SGNH]-x-[GD]-x-[RKHPT]-x-C-[LIVMFAP]-[GAD] [C is the
heme iron ligand] |
||
|
Sequences known to belong to this class detected by the
pattern |
ALL, except for P450 IIB10 from mouse, which has Lys in
the first position of the pattern. |
||
|
Other sequence(s) detected in SWISS-PROT |
9. |
||
|
|
|||
|
Note |
the term 'cytochrome' P450, while commonly used, is
incorrect as P450 are not electron-transfer proteins; the appropriate name is
P450 'heme- thiolate proteins'. |
||
4. Run
the E. coli RecA protein against the yeast genome on the BLAST server. Choose
basic BLAST and carefully review the various option windows on the page that
comes up. Choose BLASTP as the choice of program and yeast as the sequence
database (all of the yeast proteins). Enter the sequence in Fasta format or
enter the PIR identifier of the query sequence, RECA into the input data window
and indicate the choice in the small option window just above the input data
window. Otherwise, use the default parameters provided by the program.
Answer the following questions:
a)In the diagram that comes up, click the mouse on the yeast
sequence which best matches the RecA query sequence. Identify the name and gi
(Genbank index) of the highest scoring sequence and the score in bits.
Answer:
Rad51, gi/6320942. 52.3;
b) What scoring matrix and
gap penalties were used?
BLOSUM61;
-11 gap opening, -1 gap extension
c)What value of K and l
were used for calculating the Expect scores for the gapped alignment (please
note that there are two sets of these paramaters - one for ungapped and one for
gapped alignments)? Where do these values come from?
Lambda K H
0.314 0.134 0.367
Gapped
Lambda K H
0.267 0.0410 0.140
d) How many database sequences were searched?
Answer:
6304 sequences from yeast.
e) Is the alignment of the highest scoring sequence with RecA
protein significant and why? What biological information (protein structure and
function) does this match suggest about the bacterial RecA protein and the
yeast protein?
>pir||A44348 RAD51 protein - yeast (Saccharomyces cerevisiae)
emb|CAA45563.1| similarities to procaryotic RecA [Saccharomyces cerevisiae]dbj|BAA00913.1| Rad51 protein [Saccharomyces cerevisiae]
ref|NP_011021.1| Involved in processing ds breaks, synaptonemal complex formation,
meiotic gene conversion and reciprocal recombination.; Rad51p [Saccharomyces cerevisiae]sp|P25454|RA51_YEAST DNA repair protein RAD51
gb|AAB64650.1| Rad51p: RecA-like protein [Saccharomyces cerevisiae]
gb|AAA34948.1| RAD51 protein
Length = 400Score = 62.8 bits (151), Expect = 8e-11
Identities = 62/217 (28%), Positives = 105/217 (48%), Gaps = 29/217 (13%)
f) What was the lowest
reported score in this search, and is this score
significant?
>ref|NP_010287.1| Required for X-ray damage repair, meiotic recombination, wild-type
levels of sporulation and viable spores; Rad57p [Saccharomyces cerevisiae]sp|P25301|RA57_YEAST DNA repair protein RAD57
emb|CAA88064.1| Rad57p [Saccharomyces cerevisiae]
gb|AAA34950.1| DNA repair protein
pir||JQ1275 RAD57 protein - yeast (Saccharomyces cerevisiae)
Length = 460
Score = 37.7 bits (86), Expect = 0.003
Identities = 36/133 (27%), Positives = 61/133 (45%), Gaps = 25/133 (18%)
Query: 38 ETISTGSLSLDIALGAGGLPMGRIVEIYGPESSGKTTLTLQVIAAAQRE------GKTCA 91E +T +++D LG G G I EI+G S+GK+ L +Q+ + Q G C
Sbjct: 98 ECFTTADVAMDELLGGGIFTHG-ITEIFGESSTGKSQLLMQLALSVQLSEPAGGLGGKCV 156Query: 92 FIDAEHALD-----------PIYARKLGVDIDNLL---CSQPDTGEQAL--EICDALARS 135+I E L P Y KLG+ N+ C+ E + ++ L RS
Sbjct: 157 YITTEGDLPTQRLESMLSSRPAY-EKLGITQSNIFTVSCNDLINQEHIINVQLPILLERS 215
Query: 136 -GAVDVIVVDSVA 147
G++ ++++DS++Sbjct: 216 KGSIKLVIIDSIS 228
The
score is relatively high, but the function of the protein is similar.
5. In
many cases sequence databases include experimental artifacts. Databases are
known to include vector sequences and other sequencing errors including
contaminants, chimeric sequences, and shifts in reading frame due to insertions
or deletions.
From a colleague you have obtained a stretch of
DNA (see sequence below) that is supposed to be from the bacterium Bacillus
subtilis
accgcacctgtggcgccggtgatgccggccacgatgcgtccggcgtagaggatcgagatctcgatcccgcgaaattaatacgactcactataggggaattgtgagcggataacaattcccctctagaaataattttgtttaactttaagaaggagatataccatgggacaatcgtttaacgcaccttatgaagcgattggagaggaacttctatcgcaacttgttgatactttttatgagcgtgtcgcgtctcatcctttgctgaagccgatttttccaagcgatttgacagaaaccgccaggaaacagaagcaattcttaactcagtatttaggcgggcctcctctttatactgaggaacacggccatcctatgctcagagcaaggcatcttccctttccaattacaaacgagagagctgatgcgtggctcagctgtatgaaggacgcaatggaccatgtagggctggagggcgaaattcgtgagtttttgtttggccggctggagttgacagcaaggcatatggtgaatcaaacggaagcggaggatcgatcatcttgacaagcttggatccggctgctaacaaagcccgaaaggaagctgagttggctgctgccaccgctgagcaataactagcataaccccttggggcctctaaacgggtcttgaggggttttttgctgaaaggaggaactatatccggatatcccgcaagaggcccggcagtaccggcataaccaagcctatgcctacagcatccagggtgacggtgccg
a ) Is the information
correct?
Answer:
The sequence has B. subtilis DNA, but also contains vector sequence
b) What gene is encoded on
the fragment?
Answer:
Yjb1
c) Which protein family is
the gene product likely to be in?
Answer:
GLOBINS
6. Using the GO Browser at the EBI, QuickGO, http://www.ebi.ac.uk/ego/ research a
biological protein or topic of interest to you. Browse up and down the GO
trees. Now use the AmiGO browser http://www.godatabase.org/cgi-bin/go.cgi
to view some of the same info. Compare and contrast the capabilities of each browser.
For the protein you chose, is there a GO process, function, and location?
Using AmiGO Advanced
Query, find the GO's associated with specific gene products. Human genes can be
found using LocusLink. Can you find a human ortholog of the protein you chose
above?