Table des matières

Installation of BLAST+ 2.2.25

Additional libraries

No other librarie is necessary.

Procedure

For the past few years, two concurrent versions of NCBI BLAST was available: BLAST and BLAST+, a complete C++ refactoring of the BLAST code. At the time and although that it offered significant advantages (easier mnemonics and speed of analysis), BLAST+ had the problem of needing lots of storage space, which for a distro project like Impilo, has profound impacts… However, the C++ recoding effort now seems to have been completed and the legacy BLAST is now declared obsolete… Therefore, long live BLAST+!

Another choice needs to be made: from source or use binaries? Yet again, the decision has been taken by our need to minimize the hard drive footprint: we chose the pre-compiled binaries since they are taking less than a tenth of the space of the source code after compilation.

Here is the procedure to install BLAST+ from the archive that contains the pre-compiled executables:

% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.25/ncbi-blast-2.2.25+-x64-linux.tar.gz
% tar -zxvf ncbi-blast-2.2.25+-x64-linux.tar.gz
% sudo mv ncbi-blast-2.2.25+ /opt/bio/sources
% sudo chown -R root:root /opt/bio/sources/ncbi-blast-2.2.25+
% sudo nano /etc/profile
#
# BLAST/NETBLAST specific environment variables 
#
PATH="/opt/bio/sources/ncbi-blast-2.2.25+/bin:$PATH"
[BLAST]
BLASTDB=/opt/bio/data/blastdb
% sudo cp .ncbirc /etc/skel

Creating databases to use with BLAST+

BLAST+ uses specially formatted databases, created from FASTA formatted multi-sequence files. There is no special location to have these files stores (that is what the BLASTDB… line in the .ncbirc file is for) but to do things in an ordered and clean way, Impilo keeps them in /opt/bio/data, more specifically into /opt/bio/data/blastdb. Because we seek to keep Impilo slim and lean, only a basic set of databases is provided in a default Impilo, one for nucleorides, one for proteins, all taken from the genome of E. coli DH10B strain:

Here is the recipe to create them; use the same recipe for your own databases.

% cd /opt/bio/data/blastdb
% mv /home/bioubuntu/ceci_est_un_exemple.fa .
% sudo makeblastdb -in NC_010473.fna -dbtype nucl -title e_coli_dh10b_nuc -input_type fasta -out e_coli_dh10b_nuc -max_file_sz 2GB
% blastn -query <votre_sequence_e_coli> -db e_coli_dh10b_nuc