Table des matières

Installation of BLAST 2.2.22

Additional libraries

None needed

Procedure

For the past couple of years, there has been two versions of NCBI BLAST in the wild: BLAST and BLAST+, a complete rewrite of BLAST in C++. Although it has many advantages (easier to use and faster among other things), BLAST+ has this big problem: it consumes a hell of lot more space than BLAST, which is a problem for a VM-based distro like Impilo which has as a goal to keep its footprint small. Because of that, I chose to use and install the original BLAST. This will probably change in the near future though just because I want to put the best tools in Impilo…

Another choice needed to be done: should I use source or pre-compiled binaries? Because the source code has a lot of stuff pertaining to GUIs and other extra libraries, I decided to use pre-compiled binaries.

I also installed the corresponding NETBLAST, a BLAST network-based client since we might not always have the databases locally for the courses ;-)

Here is my procedure to install BLAST/NETBLAST from an archive that has the pre-compiled binaries:

% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/blast-2.2.22-x64-linux.tar.gz
% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/netblast-2.2.22-x64-linux.tar.gz
% tar -zxvf blast-2.2.22-x64-linux.tar.gz
% tar -zxvf netblast-2.2.22-x64-linux.tar.gz
% sudo chown -R root:root /opt/bio/sources/blast-2.2.22
% sudo chown -R root:root /opt/bio/sources/netblast-2.2.22
% sudo nano /etc/profile
#
# BLAST/NETBLAST specific environment variables 
#
PATH=/opt/bio/sources/blast-2.2.22/bin:/opt/bio/sources/blast-2.2.22/bin:$PATH
[NCBI]
DATA=/opt/bio/sources/blast-2.2.22/data
 
[BLAST]
BLASTDB=/opt/bio/data/blastdb
BLASTMAT=/opt/bio/sources/blast-2.2.22/data

Creating the BLAST databases

BLAST uses specially formatted databases created from text files written in FASTA format. There is no specific places where these databases should be (This is why you have this BLASTDB… line inside the .ncbirc file) but to put some type of order on this Impilo puts all data files used by the various applications under /opt/bio/data, in this case, /opt/bio/data/blastdb. Because space is a premium in this project, I only provide two small databases created from nucleotide and amin acid sequences from the E. coli DH10B bacterium:

Here is the recipe used to create them. Apply it for any FASTA formatted file that you want to turn into a BLAST database.

% sudo su
% cd /opt/bio/data/blastdb
% mv /home/bioubuntu/this_is_example.fa .
% formatdb -i <this_is_example.fa> -p T -n <local_db_name>
% blastall -p blastp -d <local_db_name> -i <your_sequence>