None needed
For the past couple of years, there has been two versions of NCBI BLAST in the wild: BLAST and BLAST+, a complete rewrite of BLAST in C++. Although it has many advantages (easier to use and faster among other things), BLAST+ has this big problem: it consumes a hell of lot more space than BLAST, which is a problem for a VM-based distro like Impilo which has as a goal to keep its footprint small. Because of that, I chose to use and install the original BLAST. This will probably change in the near future though just because I want to put the best tools in Impilo…
Another choice needed to be done: should I use source or pre-compiled binaries? Because the source code has a lot of stuff pertaining to GUIs and other extra libraries, I decided to use pre-compiled binaries.
I also installed the corresponding NETBLAST, a BLAST network-based client since we might not always have the databases locally for the courses
Here is my procedure to install BLAST/NETBLAST from an archive that has the pre-compiled binaries:
/home/bioubuntu
(one would choose either blast-2.2.22-ia32-linux.tar.gz
or blast-2.2.22-x64-linux.tar.gz
based on the fact that we want to build either a 32-bit or 64-bit Impilo), extract its content and move this material in /opt/bio/sources/
.% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/blast-2.2.22-x64-linux.tar.gz % wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/netblast-2.2.22-x64-linux.tar.gz % tar -zxvf blast-2.2.22-x64-linux.tar.gz % tar -zxvf netblast-2.2.22-x64-linux.tar.gz
/opt/bio/sources/blast-2.2.22
should belong to root
and its permissions should be 755
. % sudo chown -R root:root /opt/bio/sources/blast-2.2.22 % sudo chown -R root:root /opt/bio/sources/netblast-2.2.22
/opt/bio/sources/blast-2.2.22/bin
or /opt/bio/sources/netblast-2.2.22/bin
, we need to add these two locations to our PATH
. There is more than one way of doing this but I choose to put this in /etc/profile
.% sudo nano /etc/profile
# # BLAST/NETBLAST specific environment variables # PATH=/opt/bio/sources/blast-2.2.22/bin:/opt/bio/sources/blast-2.2.22/bin:$PATH
.ncbirc
inside the bioubuntu
home folder. This file must contain the following lines:[NCBI] DATA=/opt/bio/sources/blast-2.2.22/data [BLAST] BLASTDB=/opt/bio/data/blastdb BLASTMAT=/opt/bio/sources/blast-2.2.22/data
BLAST uses specially formatted databases created from text files written in FASTA
format. There is no specific places where these databases should be (This is why you have this BLASTDB…
line inside the .ncbirc
file) but to put some type of order on this Impilo puts all data files used by the various applications under /opt/bio/data
, in this case, /opt/bio/data/blastdb
. Because space is a premium in this project, I only provide two small databases created from nucleotide and amin acid sequences from the E. coli DH10B bacterium:
Here is the recipe used to create them. Apply it for any FASTA formatted file that you want to turn into a BLAST database.
root
can write into /opt/bio/data/blastdb
, you need to become root
:% sudo su
/opt/bio/data/blastdb
and move any FASTA
formatted file there. They will be use to create a new database:% cd /opt/bio/data/blastdb % mv /home/bioubuntu/this_is_example.fa .
formatdb
will be in charge of creating the database. The database type (nucleotide or amino acid) is selected using the -p
flag with either the T
(amino acid) or the F
(nucleotide) as parameter:% formatdb -i <this_is_example.fa> -p T -n <local_db_name>
blastall
:% blastall -p blastp -d <local_db_name> -i <your_sequence>