BLAST

From The School of Biomedical Sciences Wiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
BLAST (Basic Local Alignment Search Tool) &nbsp;is a [[Database|primary sequence database]]&nbsp;searching algorithm&nbsp;that enables you to compare an unknown&nbsp;[[DNA|DNA]] or [[Protein|protein]] sequence of interest to other known sequences, allowing you to find regions of similarity between them&nbsp;<ref name="BLAST webpage">http://blast.ncbi.nlm.nih.gov/Blast.cgi</ref>. &nbsp;The known proteins are found in protein databases, this allows fo one of the most important applications of BLAST, being able to use known proteins to quickly identify important homologies which can allow the function of the protein to be assessed. <ref>Frederic Dardel and Francois Kepes - Translation by Noah Hardly (2006). Bioinformatics: Genomic and post-genomics. West Sussex: John Wiley &amp;amp;amp;amp;amp;amp;amp; Sons, Ltd. 43-44.</ref>&nbsp;  
+
BLAST (Basic Local Alignment Search Tool) &nbsp;is a [[Database|primary sequence database]]&nbsp;searching algorithm&nbsp;that enables you to compare an unknown&nbsp;[[DNA|DNA]] or [[Protein|protein]] sequence of interest to other known sequences, allowing you to find regions of similarity between them.&nbsp;<ref name="BLAST webpage">http://blast.ncbi.nlm.nih.gov/Blast.cgi</ref>&nbsp;The known proteins are found in protein databases, this allows fo one of the most important applications of BLAST, being able to use known proteins to quickly identify important homologies which can allow the function of the protein to be assessed. <ref>Frederic Dardel and Francois Kepes - Translation by Noah Hardly (2006). Bioinformatics: Genomic and post-genomics. West Sussex: John Wiley &amp;amp;amp;amp;amp;amp;amp;amp; Sons, Ltd. 43-44.</ref>&nbsp;  
  
 
The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an [[Accession number|accession number]]; clicking on this accession number allows you to see the souce organism of the protien, as well as giving links to papers on the protein. The E value ([[Expect value|Expect value]]) is the staistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less.  
 
The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an [[Accession number|accession number]]; clicking on this accession number allows you to see the souce organism of the protien, as well as giving links to papers on the protein. The E value ([[Expect value|Expect value]]) is the staistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less.  
  
When doing a protein blast a (+) in the middle of the sequence indicates a common substitution, a letter in the middle indicates a match and a gap in the middle indicates a mismatch or an insert. The same is almost true when completing a DNA blast; a match is shown as a (I), a gap in between the two sequences shows no match. Note&nbsp;it is not possible to have conserved substitutions in DNA alignments.<br>  
+
When doing a protein blast a (+) in the middle of the sequence indicates a common substitution, a letter in the middle indicates a match and a gap in the middle indicates a mismatch or an insert. The same is almost true when completing a DNA blast; a match is shown as a (I), a gap in between the two sequences shows no match. Note&nbsp;it is not possible to have conserved substitutions in DNA alignments.<br>
  
 
The Blast web page can be found at: [http://blast.ncbi.nlm.nih.gov/Blast.cgi http://blast.ncbi.nlm.nih.gov/Blast.cgi]  
 
The Blast web page can be found at: [http://blast.ncbi.nlm.nih.gov/Blast.cgi http://blast.ncbi.nlm.nih.gov/Blast.cgi]  
Line 9: Line 9:
 
== Basic Blast Programs  ==
 
== Basic Blast Programs  ==
  
There are 5 different BLAST programs depending on the type of sequence you are studying:&nbsp;&nbsp;<br>  
+
There are 5 different BLAST programs depending on the type of sequence you are studying:&nbsp;&nbsp;<br>
  
 
{| cellspacing="1" cellpadding="1" width="468" border="3"
 
{| cellspacing="1" cellpadding="1" width="468" border="3"

Revision as of 15:32, 3 December 2015

BLAST (Basic Local Alignment Search Tool)  is a primary sequence database searching algorithm that enables you to compare an unknown DNA or protein sequence of interest to other known sequences, allowing you to find regions of similarity between them. [1] The known proteins are found in protein databases, this allows fo one of the most important applications of BLAST, being able to use known proteins to quickly identify important homologies which can allow the function of the protein to be assessed. [2] 

The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an accession number; clicking on this accession number allows you to see the souce organism of the protien, as well as giving links to papers on the protein. The E value (Expect value) is the staistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less.

When doing a protein blast a (+) in the middle of the sequence indicates a common substitution, a letter in the middle indicates a match and a gap in the middle indicates a mismatch or an insert. The same is almost true when completing a DNA blast; a match is shown as a (I), a gap in between the two sequences shows no match. Note it is not possible to have conserved substitutions in DNA alignments.

The Blast web page can be found at: http://blast.ncbi.nlm.nih.gov/Blast.cgi

Basic Blast Programs

There are 5 different BLAST programs depending on the type of sequence you are studying:  

A Table Showing a Brief Description of the Five Basic Blast Programs
Nucleotide blast (blastn) Search a nucleotide database using a nucleotide query
Protein blast (blastp) Search protein database using a protein query
blastx Search protein database using a translated nucleotide query
tblastn Search translated nucleotide database using a protein query
tblastx Search translated nucleotide database using a translated nucleotide query

  [3]

BlastP, allows you to compare an amino acid query sequence against a protein database. BlastN enables you to look at a nucleotide query sequence against a nucleotide sequence database. Using BlastX you can compare a nucleotide query sequence translated in all reading frames against a protein sequence database. With tBlastN you can see the similarites between a protein query sequence and nucleotide sequences from the database, dynamically translated in all reading frames, and finally, tBlastX allows you to compare the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database [1].  Links to these programs can be found in the table above.

The low complexity sequence means a region with an abnormal structure that can cause problems when searching for a sequence similarity. They can often be inspected visually as their sequences are usually repetitive e.g. CCCCCCCGGCCCCCCGGGG. In a BLAST search, these sequences will often be shown with a sequence of lower case grey letters. It is necessary to remove these from the search because it can give results which are not entirely true, as in it may not be due to shared homology. The SEG program within BLAST however can filter out the low complexity regions, such as homopolymeric runs and short period repeats, found in the amino acid sequence.[4]

Specialised Blast Programs

There is also the option to perform more specialized BLAST searches such as a primer BLAST search. Links for these searches can be found below to main BLAST search links.

PubMed is a serach engine form the NLM that provides access to over 18 million citations of MEDLINE analysis and BLAST programme.[5]

References

  1. 1.0 1.1 http://blast.ncbi.nlm.nih.gov/Blast.cgi
  2. Frederic Dardel and Francois Kepes - Translation by Noah Hardly (2006). Bioinformatics: Genomic and post-genomics. West Sussex: John Wiley &amp;amp;amp;amp;amp;amp;amp; Sons, Ltd. 43-44.
  3. Information for the table taken from http://blast.ncbi.nlm.nih.gov/Blast.cgi
  4. Fassler J, Cooper P. BLAST Glossary. 2011 Jul 14. In: BLAST Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK62051/
  5. Bioinformatics and functional genomics ISBN: 9780470451489 Publisher: Wiley-Blackwell Published: 22/04/2009 Edition:2nd ed. Pages: 913
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox