====== Installation of BLAST 2.2.22 ======
===== Additional libraries =====
None needed
===== Procedure =====
For the past couple of years, there has been two versions of NCBI BLAST in the wild: BLAST and BLAST+, a complete rewrite of BLAST in C++. Although it has many advantages (easier to use and faster among other things), BLAST+ has this big problem: it consumes a hell of lot more space than BLAST, which is a problem for a VM-based distro like Impilo which has as a goal to keep its footprint small. Because of that, I chose to use and install the original BLAST. This will probably change in the near future though just because I want to put the best tools in Impilo...
Another choice needed to be done: should I use source or pre-compiled binaries? Because the source code has a lot of stuff pertaining to GUIs and other extra libraries, I decided to use pre-compiled binaries.
I also installed the corresponding NETBLAST, a BLAST network-based client since we might not always have the databases locally for the courses ;-)
Here is my procedure to install BLAST/NETBLAST from an archive that has the pre-compiled binaries:
* Let's download the appropriate archives in ''/home/bioubuntu'' (one would choose either ''blast-2.2.22-ia32-linux.tar.gz'' or ''blast-2.2.22-x64-linux.tar.gz'' based on the fact that we want to build either a 32-bit or 64-bit Impilo), extract its content and move this material in ''/opt/bio/sources/''.
% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/blast-2.2.22-x64-linux.tar.gz
% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.22/netblast-2.2.22-x64-linux.tar.gz
% tar -zxvf blast-2.2.22-x64-linux.tar.gz
% tar -zxvf netblast-2.2.22-x64-linux.tar.gz
* The folder named ''/opt/bio/sources/blast-2.2.22'' should belong to ''root'' and its permissions should be ''755''.
% sudo chown -R root:root /opt/bio/sources/blast-2.2.22
% sudo chown -R root:root /opt/bio/sources/netblast-2.2.22
* Since the applications that we need are under ''/opt/bio/sources/blast-2.2.22/bin'' or ''/opt/bio/sources/netblast-2.2.22/bin'', we need to add these two locations to our ''PATH''. There is more than one way of doing this but I choose to put this in ''/etc/profile''.
% sudo nano /etc/profile
* At **the very end of the file**, add the following lines:
#
# BLAST/NETBLAST specific environment variables
#
PATH=/opt/bio/sources/blast-2.2.22/bin:/opt/bio/sources/blast-2.2.22/bin:$PATH
* The last thing to do is to create a text file named ''.ncbirc'' inside the ''bioubuntu'' home folder. This file must contain the following lines:
[NCBI]
DATA=/opt/bio/sources/blast-2.2.22/data
[BLAST]
BLASTDB=/opt/bio/data/blastdb
BLASTMAT=/opt/bio/sources/blast-2.2.22/data
===== Creating the BLAST databases =====
BLAST uses specially formatted databases created from text files written in ''FASTA'' format. There is no specific places where these databases should be (This is why you have this ''BLASTDB...'' line inside the ''.ncbirc'' file) but to put some type of order on this Impilo puts all data files used by the various applications under ''/opt/bio/data'', in this case, ''/opt/bio/data/blastdb''. Because space is a premium in this project, I only provide two small databases created from nucleotide and amin acid sequences from the //E. coli// DH10B bacterium:
* [[ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B/NC_010473.fna|ecoli_dh10b.faa]]
* [[ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B/NC_010473.fna|ecoli_dh10b.fna]]
Here is the recipe used to create them. Apply it for any FASTA formatted file that you want to turn into a BLAST database.
* First, since only ''root '' can write into ''/opt/bio/data/blastdb'', you need to become ''root'':
% sudo su
* Navigate toward ''/opt/bio/data/blastdb'' and move any ''FASTA'' formatted file there. They will be use to create a new database:
% cd /opt/bio/data/blastdb
% mv /home/bioubuntu/this_is_example.fa .
* The application named ''formatdb'' will be in charge of creating the database. The database type (nucleotide or amino acid) is selected using the ''-p'' flag with either the ''T'' (amino acid) or the ''F''(nucleotide) as parameter:
% formatdb -i -p T -n
* You can test this new database using''blastall'':
% blastall -p blastp -d -i