BLAST: Difference between revisions
No edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
BLAST (Basic Local Alignment Search Tool) is a [[Database|primary sequence database]] searching algorithm that enables you to compare | BLAST (Basic Local Alignment Search Tool) is a [[Database|primary sequence database]] searching algorithm that enables you to compare an unknown [[DNA|DNA]] or [[Protein|protein]] sequence of interest to other known sequences, allowing you to find regions of similarity between them <ref name="BLAST webpage">http://blast.ncbi.nlm.nih.gov/Blast.cgi</ref>. The known proteins are found in protein databases, this allows fo one of the most important applications of BLAST, being able to use known proteins to quickly identify important homologies which can allow the function of the protein to be assessed. <ref>Frederic Dardel and Francois Kepes - Translation by Noah Hardly (2006). Bioinformatics: Genomic and post-genomics. West Sussex: John Wiley & Sons, Ltd. 43-44.</ref> | ||
The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an [[Accession number|accession number]]; clicking on this accession number allows you to see the souce organism of the protien, as well as giving links to papers on the protein. The E value ([[Expect value|Expect value]]) is the staistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less. | The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an [[Accession number|accession number]]; clicking on this accession number allows you to see the souce organism of the protien, as well as giving links to papers on the protein. The E value ([[Expect value|Expect value]]) is the staistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less. | ||
Line 12: | Line 12: | ||
{| border="3" cellspacing="1" cellpadding="1" width="468" | {| border="3" cellspacing="1" cellpadding="1" width="468" | ||
|+ A Table Showing a Brief Description of the Five Basic Blast Programs | |+ A Table Showing a Brief Description of the Five Basic Blast Programs | ||
|- | |- | ||
| [http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome Nucleotide blast (blastn)] | | [http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome Nucleotide blast (blastn)] | ||
Line 34: | Line 34: | ||
BlastP, allows you to compare an [[Amino acid|amino acid]] query sequence against a protein database. BlastN enables you to look at a [[Nucleotide|nucleotide]] query sequence against a nucleotide sequence database. Using BlastX you can compare a nucleotide query sequence translated in all [[Reading frame|reading frames]] against a [[Protein sequence|protein sequence]] database. With tBlastN you can see the similarites between a [[Protein|protein]] query sequence and nucleotide sequences from the database, dynamically translated in all reading frames, and finally, tBlastX allows you to compare the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database <ref name="BLAST webpage">http://blast.ncbi.nlm.nih.gov/Blast.cgi</ref>. Links to these programs can be found in the table above. | BlastP, allows you to compare an [[Amino acid|amino acid]] query sequence against a protein database. BlastN enables you to look at a [[Nucleotide|nucleotide]] query sequence against a nucleotide sequence database. Using BlastX you can compare a nucleotide query sequence translated in all [[Reading frame|reading frames]] against a [[Protein sequence|protein sequence]] database. With tBlastN you can see the similarites between a [[Protein|protein]] query sequence and nucleotide sequences from the database, dynamically translated in all reading frames, and finally, tBlastX allows you to compare the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database <ref name="BLAST webpage">http://blast.ncbi.nlm.nih.gov/Blast.cgi</ref>. Links to these programs can be found in the table above. | ||
The [[Low complexity|low complexity]] sequence means a region with an abnormal structure that can cause problems when searching for a sequence similarity. They can often be inspected visually as their sequences are usually repetitive e.g. CCCCCCCGGCCCCCCGGGG. In a BLAST search, these sequences will often be shown with a sequence of lower case grey letters. It is necessary to remove these from the search because it can give results which are not entirely true, as in it may not be due to shared homology. The SEG program within BLAST however can filter out the low complexity regions, such as homopolymeric runs and short period repeats, found in the amino acid sequence.<ref>Fassler J, Cooper P. BLAST Glossary. 2011 Jul 14. In: BLAST Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK62051/</ref> | The [[Low complexity|low complexity]] sequence means a region with an abnormal structure that can cause problems when searching for a sequence similarity. They can often be inspected visually as their sequences are usually repetitive e.g. CCCCCCCGGCCCCCCGGGG. In a BLAST search, these sequences will often be shown with a sequence of lower case grey letters. It is necessary to remove these from the search because it can give results which are not entirely true, as in it may not be due to shared homology. The SEG program within BLAST however can filter out the low complexity regions, such as homopolymeric runs and short period repeats, found in the amino acid sequence.<ref>Fassler J, Cooper P. BLAST Glossary. 2011 Jul 14. In: BLAST Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK62051/</ref> | ||
== Specialised Blast Programs == | == Specialised Blast Programs == |
Revision as of 20:12, 23 October 2014
BLAST (Basic Local Alignment Search Tool) is a primary sequence database searching algorithm that enables you to compare an unknown DNA or protein sequence of interest to other known sequences, allowing you to find regions of similarity between them [1]. The known proteins are found in protein databases, this allows fo one of the most important applications of BLAST, being able to use known proteins to quickly identify important homologies which can allow the function of the protein to be assessed. [2]
The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an accession number; clicking on this accession number allows you to see the souce organism of the protien, as well as giving links to papers on the protein. The E value (Expect value) is the staistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less.
Identities are identical matches and can be seen as a letter between the two strands. Positives are conserved substitutions and can be seen as a (+) between the two strands.
The Blast web page can be found at: http://blast.ncbi.nlm.nih.gov/Blast.cgi
Basic Blast Programs
There are 5 different BLAST programs depending on the type of sequence you are studying:
Nucleotide blast (blastn) | Search a nucleotide database using a nucleotide query |
Protein blast (blastp) | Search protein database using a protein query |
blastx | Search protein database using a translated nucleotide query |
tblastn | Search translated nucleotide database using a protein query |
tblastx | Search translated nucleotide database using a translated nucleotide query |
BlastP, allows you to compare an amino acid query sequence against a protein database. BlastN enables you to look at a nucleotide query sequence against a nucleotide sequence database. Using BlastX you can compare a nucleotide query sequence translated in all reading frames against a protein sequence database. With tBlastN you can see the similarites between a protein query sequence and nucleotide sequences from the database, dynamically translated in all reading frames, and finally, tBlastX allows you to compare the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database [1]. Links to these programs can be found in the table above.
The low complexity sequence means a region with an abnormal structure that can cause problems when searching for a sequence similarity. They can often be inspected visually as their sequences are usually repetitive e.g. CCCCCCCGGCCCCCCGGGG. In a BLAST search, these sequences will often be shown with a sequence of lower case grey letters. It is necessary to remove these from the search because it can give results which are not entirely true, as in it may not be due to shared homology. The SEG program within BLAST however can filter out the low complexity regions, such as homopolymeric runs and short period repeats, found in the amino acid sequence.[4]
Specialised Blast Programs
There is also the option to perform more specialized BLAST searches such as a primer BLAST search. Links for these searches can be found below to main BLAST search links.
References
- ↑ 1.0 1.1 http://blast.ncbi.nlm.nih.gov/Blast.cgi
- ↑ Frederic Dardel and Francois Kepes - Translation by Noah Hardly (2006). Bioinformatics: Genomic and post-genomics. West Sussex: John Wiley & Sons, Ltd. 43-44.
- ↑ Information for the table taken from http://blast.ncbi.nlm.nih.gov/Blast.cgi
- ↑ Fassler J, Cooper P. BLAST Glossary. 2011 Jul 14. In: BLAST Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK62051/