newsgroups-index (beta)

Current group: bionet.microbiology

Comparing protein sequences.

Comparing protein sequences.  
Stefek Borkowski
 Re: Comparing protein sequences.  
Scott Coutts
 Re: Comparing protein sequences.  
Stefek Borkowski
 Re: Comparing protein sequences.  
Scott Coutts
 Re: Comparing protein sequences.  
Stefek Borkowski
From:Stefek Borkowski
Subject:Comparing protein sequences.
Date:Wed, 1 Dec 2004 20:43:31 +0100
Hi,
I have a problem. I would compare 2 known sequences of ca. 130 amino acid
residues, namely human epidermal fatty acid-binding protein vs. human ileal
fatty acid-binding protein. I have tried WWW BLAST at
http://www.ncbi.nlm.nih.gov/blast/ - chosing "Protein-protein BLAST
(blastp)", but unfortunately cannot find my match in the report of the
program. I suspect that the homology may be too little so BLAST skips this
pair of proteins in the report. How do I do it then? Maybe I should change
something in the settings of BLAST interface? Is there a kind of software
(maybe working offline) which accepts two sequences as input and simply
compares the two of them?
Thank you for any help you can offer.
Kind regards from Poland,
Stefek
From:Scott Coutts
Subject:Re: Comparing protein sequences.
Date:Thu, 02 Dec 2004 09:48:03 +1100
Stefek Borkowski wrote:

> Hi,
> I have a problem. I would compare 2 known sequences of ca. 130 amino
> acid residues, namely human epidermal fatty acid-binding protein vs.
> human ileal fatty acid-binding protein. I have tried WWW BLAST at
> http://www.ncbi.nlm.nih.gov/blast/ - chosing "Protein-protein BLAST
> (blastp)", but unfortunately cannot find my match in the report of the
> program. I suspect that the homology may be too little so BLAST skips
> this pair of proteins in the report. How do I do it then? Maybe I should
> change something in the settings of BLAST interface? Is there a kind of
> software (maybe working offline) which accepts two sequences as input
> and simply compares the two of them?
> Thank you for any help you can offer.
> Kind regards from Poland,
> Stefek

You'd be better off finding both sequences (you can do this by simply
using a keyword search) and then doing an alignment of the two
sequences. You can do this on the web, using one of the 'clustal'
programs, or you can download a stand-alone version of clustal and view
your alignment using another downloadable program called 'genedoc'. I
dont have the web addresses on hand at the moment, but you can easily
find them with a google search.

Good luck!

Scott.
From:Stefek Borkowski
Subject:Re: Comparing protein sequences.
Date:Wed, 1 Dec 2004 23:56:19 +0100
Scott Coutts wrote:
>
> You'd be better off finding both sequences (you can do this by simply
> using a keyword search) and then doing an alignment of the two
> sequences. You can do this on the web, using one of the 'clustal'
> programs, or you can download a stand-alone version of clustal and
> view your alignment using another downloadable program called
> 'genedoc'. I dont have the web addresses on hand at the moment, but
> you can easily find them with a google search.
>
Thanks Scott for your quick answer. I just figured it out that I can use the
WWW module of BLAST called "BLAST 2 Sequences". Although I still have an
interpretation problem. Would you care to comment on the below, please.

I would like to know what the BLAST interpretation really is in the case of
comparing two sequences by the BLAST 2 Sequences online modul. The report
goes as follows:
Identities = 32/114 (28%), Positives = 53/114 (46%), Gaps = 1/114 (0%)
I would say that the homology of the two proteins is equal to the value of
"Identities", so it would be 28%. What about the "Positives" then? I happend
somewhere in the literature on estimation of the homology between the 2
proteins, stating that it is equal to 36%. This seems to be more or less the
average arithmetic mean of "Identities" and "Positives", namely (28 + 46)/2
is 37% which seems close to the literature value. Is my way of thinking
correct or not necessarily. In other words, whot is the recommended
algorithm of estimationg the homology of two sequences, on the basis of
BLAST report.
Thanks for all your help. Kind regards,
Stefek
From:Scott Coutts
Subject:Re: Comparing protein sequences.
Date:Thu, 02 Dec 2004 10:46:28 +1100
Stefek Borkowski wrote:

> Scott Coutts wrote:
>
>>
>> You'd be better off finding both sequences (you can do this by simply
>> using a keyword search) and then doing an alignment of the two
>> sequences. You can do this on the web, using one of the 'clustal'
>> programs, or you can download a stand-alone version of clustal and
>> view your alignment using another downloadable program called
>> 'genedoc'. I dont have the web addresses on hand at the moment, but
>> you can easily find them with a google search.
>>
> Thanks Scott for your quick answer. I just figured it out that I can use
> the WWW module of BLAST called "BLAST 2 Sequences". Although I still
> have an interpretation problem. Would you care to comment on the below,
> please.
>
> I would like to know what the BLAST interpretation really is in the case
> of comparing two sequences by the BLAST 2 Sequences online modul. The
> report goes as follows:
> Identities = 32/114 (28%), Positives = 53/114 (46%), Gaps = 1/114 (0%)
> I would say that the homology of the two proteins is equal to the value

Firstly, a technical point here... when your talking about genes, you
should say 'similarity' rather than 'homology'. Either a gene is a
homolog of another, or it's not.

http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
http://www.biomedcentral.com/news/20040309/01

But anyway...

>
> of "Identities", so it would be 28%. What about the "Positives" then? I
> happend somewhere in the literature on estimation of the homology
> between the 2 proteins, stating that it is equal to 36%. This seems to
> be more or less the average arithmetic mean of "Identities" and
> "Positives", namely (28 + 46)/2 is 37% which seems close to the
> literature value. Is my way of thinking correct or not necessarily. In
> other words, whot is the recommended algorithm of estimationg the
> homology of two sequences, on the basis of BLAST report.

I'm not sure what figure they were quoting, but if it is properly quoted
in the literature as a percentage, then it should include a statement of
whether it is identities or similarities (positives), the region over
which the count was obtained (if it's not mentioned the usually it's the
whole protein).

You should read the documentation that comes with BLAST to understand
how it works. The 'identities' is indicating the number of amino acids
that are exactly the same, and the 'positives' is indicating the number
that are similar (i.e. maintain similar properties, for example, both
hydrophobic etc). You should also consider the E value that you're given.


Scott.
From:Stefek Borkowski
Subject:Re: Comparing protein sequences.
Date:Thu, 2 Dec 2004 12:14:08 +0100
Scott Coutts wrote:
>> ...
> Firstly, a technical point here... when your talking about genes, you
> should say 'similarity' rather than 'homology'. Either a gene is a
> homolog of another, or it's not.
>
> http://homepage.usask.ca/~ctl271/857/def_homolog.shtml
> http://www.biomedcentral.com/news/20040309/01
>
> But anyway...
>
>>
>> of "Identities", so it would be 28%. What about the "Positives"
>> then? I happend somewhere in the literature on estimation of the
>> homology between the 2 proteins, stating that it is equal to 36%.
>> This seems to be more or less the average arithmetic mean of
>> "Identities" and "Positives", namely (28 + 46)/2 is 37% which seems
>> close to the literature value. Is my way of thinking correct or not
>> necessarily. In other words, whot is the recommended algorithm of
>> estimationg the homology of two sequences, on the basis of BLAST
>> report.
>
> I'm not sure what figure they were quoting, but if it is properly
> quoted in the literature as a percentage, then it should include a
> statement of whether it is identities or similarities (positives),
> the region over which the count was obtained (if it's not mentioned
> the usually it's the whole protein).
>
> You should read the documentation that comes with BLAST to understand
> how it works. The 'identities' is indicating the number of amino acids
> that are exactly the same, and the 'positives' is indicating the
> number that are similar (i.e. maintain similar properties, for
> example, both hydrophobic etc). You should also consider the E value
> that you're given.

Thank you so much Scott. Your explanation helped me a lot! I visited the
links you'd given me and already benefited from understanding this "E" value
calculated by BLAST. Though not everything is clear for me still, I have
made a big step forward. Thanks again. May you have a nice day :)
Special regards from Stefek
   

Copyright © 2006 newsgroups-index   -   All rights reserved   -   Impressum