Installation of BLAST 2.2.22
Additional libraries
None needed
Procedure
For the past couple of years, there has been two versions of NCBI BLAST in the wild: BLAST and BLAST+, a complete rewrite of BLAST in C++. Although it has many advantages (easier to use and faster among other things), BLAST+ has this big problem: it consumes a hell of lot more space than BLAST, which is a problem for a VM-based distro like Impilo which has as a goal to keep its footprint small. Because of that, I chose to use and install the original BLAST. This will probably change in the near future though just because I want to put the best tools in Impilo…
Another choice needed to be done: should I use source or pre-compiled binaries? Because the source code has a lot of stuff pertaining to GUIs and other extra libraries, I decided to use pre-compiled binaries.
I also installed the corresponding NETBLAST, a BLAST network-based client since we might not always have the databases locally for the courses
Here is my procedure to install BLAST/NETBLAST from an archive that has the pre-compiled binaries:
- Let's download the appropriate archives in
/home/bioubuntu
(one would choose eitherblast-2.2.22-ia32-linux.tar.gz
orblast-2.2.22-x64-linux.tar.gz
based on the fact that we want to build either a 32-bit or 64-bit Impilo), extract its content and move this material in/opt/bio/sources/
.
% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/blast-2.2.22-x64-linux.tar.gz % wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/netblast-2.2.22-x64-linux.tar.gz % tar -zxvf blast-2.2.22-x64-linux.tar.gz % tar -zxvf netblast-2.2.22-x64-linux.tar.gz
- The folder named
/opt/bio/sources/blast-2.2.22
should belong toroot
and its permissions should be755
.
% sudo chown -R root:root /opt/bio/sources/blast-2.2.22 % sudo chown -R root:root /opt/bio/sources/netblast-2.2.22
- Since the applications that we need are under
/opt/bio/sources/blast-2.2.22/bin
or/opt/bio/sources/netblast-2.2.22/bin
, we need to add these two locations to ourPATH
. There is more than one way of doing this but I choose to put this in/etc/profile
.
% sudo nano /etc/profile
- At the very end of the file, add the following lines:
# # BLAST/NETBLAST specific environment variables # PATH=/opt/bio/sources/blast-2.2.22/bin:/opt/bio/sources/blast-2.2.22/bin:$PATH
- The last thing to do is to create a text file named
.ncbirc
inside thebioubuntu
home folder. This file must contain the following lines:
[NCBI] DATA=/opt/bio/sources/blast-2.2.22/data [BLAST] BLASTDB=/opt/bio/data/blastdb BLASTMAT=/opt/bio/sources/blast-2.2.22/data
Creating the BLAST databases
BLAST uses specially formatted databases created from text files written in FASTA
format. There is no specific places where these databases should be (This is why you have this BLASTDB…
line inside the .ncbirc
file) but to put some type of order on this Impilo puts all data files used by the various applications under /opt/bio/data
, in this case, /opt/bio/data/blastdb
. Because space is a premium in this project, I only provide two small databases created from nucleotide and amin acid sequences from the E. coli DH10B bacterium:
Here is the recipe used to create them. Apply it for any FASTA formatted file that you want to turn into a BLAST database.
- First, since only
root
can write into/opt/bio/data/blastdb
, you need to becomeroot
:
% sudo su
- Navigate toward
/opt/bio/data/blastdb
and move anyFASTA
formatted file there. They will be use to create a new database:
% cd /opt/bio/data/blastdb % mv /home/bioubuntu/this_is_example.fa .
- The application named
formatdb
will be in charge of creating the database. The database type (nucleotide or amino acid) is selected using the-p
flag with either theT
(amino acid) or theF
(nucleotide) as parameter:
% formatdb -i <this_is_example.fa> -p T -n <local_db_name>
- You can test this new database using
blastall
:
% blastall -p blastp -d <local_db_name> -i <your_sequence>