BLAST

From The School of Biomedical Sciences Wiki
Jump to: navigation, search

BLAST (Basic Local Alignment Search Tool) is a primary sequence database. It uses a searching algorithm that enables you to compare an unknown DNA or protein sequence of interest to other known sequences, allowing you to find regions of similarity between them[1]. The comparative or 'subject' proteins are found in protein databases, this allows for one of the most important applications of BLAST, being able to use known proteins to quickly identify important homologies which can allow the function of the protein to be assessed[2].

The results of a BLAST search give a graphic summary of the amount of alignment between the query sequence and the sequence hits from the database. The database sequences which produce significant alignments are shown in a table, which tells you the protein that the sequence codes for. Each has an accession number; clicking on this accession number allows you to see the source organism of the protein, as well as giving links to papers on the protein. The E value (Expect value) is the statistical significance threshold; it gives the number of hits that are likely to be found by chance, therefore a lower E value will result in better matches as the probability of a chance result is less. The E value can be changed manually, for example from 10 to 1000 when the sequences you are using to search are very small. This then increases your likelihood of getting a match. However, sometimes even when you change the expect value by increasing it, you still do not get a match, this is simply because some proteins just do not match the sequences which are present in the database.

When running a protein blast, the sequence you input is referred to as the 'QUERY' sequence; and the sequence that is suggested is the 'SUBJECT' sequence. A (+) in the middle of the two sequences indicates a common substitution, a letter in the middle indicates a match, and a gap in the middle indicates a mismatch or an insert. The same is almost true when completing a DNA blast; a match is shown as a (I), a gap in between the two sequences shows no match. Note it is not possible to have conserved substitutions in DNA alignments.

The lower case grey letters in your BLAST query sequence results page automatically filter your query for a low-complexity sequence[3]. This filter is on for the BLASTN pages to prevent matches that are probably artefacts substituting any low-complexity sequence with lowercase grey characters. This enables you to see the sequence that was filtered instead of the "X"s and "N"s of the previous BLAST output[4].

The Blast web page can be found at: http://blast.ncbi.nlm.nih.gov/Blast.cgi

Contents

Basic Blast Programs

There are 5 different BLAST programs depending on the type of sequence you are studying:

This program searches translated nucleotide databases using a translated nucleotide query

A Table Showing a Brief Description of the Five Basic Blast Programs[5].
Nucleotide blast (blastn) This program searches a nucleotide sequence against the nucleotide database
Protein blast (blastp) This program searches the protein database with a protein sequence
blastx This program searches protein database using a translated nucleotide sequence
tblastn This program searches translated nucleotide databases using a protein query
tblastx This program searches translated nucleotide databases using a translated nucleotide query

BlastP, allows you to compare an amino acid query sequence against a protein database. BlastN enables you to look at a nucleotide query sequence against a nucleotide sequence database. Using BlastX you can compare a nucleotide query sequence translated in all reading frames against a protein sequence database. With tBlastN you can see the similarities between a protein query sequence and nucleotide sequences from the database, dynamically translated in all reading frames, and finally, tBlastX allows you to compare the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database[1]. Links to these programs can be found in the table above.

The low complexity sequence means a region with an abnormal structure that can cause problems when searching for a sequence similarity. They can often be inspected visually as their sequences are usually repetitive e.g. CCCCCCCGGCCCCCCGGGG. In a BLAST search, these sequences will often be shown with a sequence of lower case grey letters. It is necessary to remove these from the search because it can give results which are not entirely true, as in it may not be due to shared homology. The SEG program within BLAST, however, can filter out the low complexity regions, such as homopolymeric runs and short period repeats, found in the amino acid sequence[6].

Specialised Blast Programs

There is also the option to perform more specialized BLAST searches such as a primer BLAST search. Links for these searches can be found below to main BLAST search links.

PubMed is a search engine form the NLM (United States National Library of Medicine) that provides access to over 26 million citations ranging from MEDLINE analysis to BLAST programmes and online books[7]. It has offered free access since June 1997 and is updated with new citations daily[8].

Errors in Blast Programs

A common error which can occur in Blast is the message that "no significant similarity found". There are two main reasons that might trigger such an output. Number one is that a short query sequence was used with a low expect (E) value (commonly the default of 10). Increasing the expect value should counteract this issue. The second reason for the error message could be due to the filtering of low complexity alignments. This issue can be resolved by changing the 'filters and masking' algorithm parameters[9].

References

  1. 1.0 1.1 http://blast.ncbi.nlm.nih.gov/Blast.cgi
  2. Frederic Dardel and Francois Kepes - Translation by Noah Hardly (2006). Bioinformatics: Genomic and post-genomics. West Sussex: John Wiley and Sons, Ltd. 43-44.
  3. https://blast.ncbi.nlm.nih.gov/Blast.cgi
  4. https://blast.ncbi.nlm.nih.gov/
  5. Information for the table taken from http://blast.ncbi.nlm.nih.gov/Blast.cgi
  6. Fassler J, Cooper P. BLAST Glossary. 2011 Jul 14. In: BLAST Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK62051/
  7. Bioinformatics and functional genomics ISBN: 9780470451489 Publisher: Wiley-Blackwell Published: 22/04/2009 Edition:2nd ed. Pages: 913
  8. Medicine, U. N. (2016, October ). PubMed Quick Start Guide. Retrieved from PubMed: https://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.PubMed_Quick_Start
  9. https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ#nohits
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox