Installation of BLAST+ 2.2.25
Additional libraries
No other librarie is necessary.
Procedure
For the past few years, two concurrent versions of NCBI BLAST was available: BLAST and BLAST+, a complete C++ refactoring of the BLAST code. At the time and although that it offered significant advantages (easier mnemonics and speed of analysis), BLAST+ had the problem of needing lots of storage space, which for a distro project like Impilo, has profound impacts… However, the C++ recoding effort now seems to have been completed and the legacy BLAST is now declared obsolete… Therefore, long live BLAST+!
Another choice needs to be made: from source or use binaries? Yet again, the decision has been taken by our need to minimize the hard drive footprint: we chose the pre-compiled binaries since they are taking less than a tenth of the space of the source code after compilation.
Here is the procedure to install BLAST+ from the archive that contains the pre-compiled executables:
- Download the desired archive under
/home/bioubuntu
(choose the appropriate version: ncbi-blast-2.2.25+-ia32-linux.tar.gz or ncbi-blast-2.2.25+-x64-linux.tar.gz if you are building impilo32 or impilo64 respectively), decompress and move under/opt/bio/sources/
.
% wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.25/ncbi-blast-2.2.25+-x64-linux.tar.gz % tar -zxvf ncbi-blast-2.2.25+-x64-linux.tar.gz % sudo mv ncbi-blast-2.2.25+ /opt/bio/sources
- The
/opt/bio/sources/ncbi-blast-2.2.25+
folder should belong toroot
and its permissions should be755
.
% sudo chown -R root:root /opt/bio/sources/ncbi-blast-2.2.25+
- Since the applications are under
/opt/bio/sources/ncbi-blast-2.2.25+/bin
, you need to add this to the$PATH
variable. There are many ways of doing this but I decided to put all this into/etc/profile
.
% sudo nano /etc/profile
- At the very end of the file, add these lines:
# # BLAST/NETBLAST specific environment variables # PATH="/opt/bio/sources/ncbi-blast-2.2.25+/bin:$PATH"
- THe last thing to do is to create a file called
.ncbirc
under/home/bioubuntu
. The file needs to contain the following lines:
[BLAST] BLASTDB=/opt/bio/data/blastdb
- To make sure that all new users have this file created automatically when their account is created, ypu ned to copy this file under
/etc/skel
:
% sudo cp .ncbirc /etc/skel
Creating databases to use with BLAST+
BLAST+ uses specially formatted databases, created from FASTA
formatted multi-sequence files. There is no special location to have these files stores (that is what the BLASTDB…
line in the .ncbirc
file is for) but to do things in an ordered and clean way, Impilo keeps them in /opt/bio/data
, more specifically into /opt/bio/data/blastdb
. Because we seek to keep Impilo slim and lean, only a basic set of databases is provided in a default Impilo, one for nucleorides, one for proteins, all taken from the genome of E. coli DH10B strain:
Here is the recipe to create them; use the same recipe for your own databases.
- Get into
/opt/bio/data/blastdb
and move yourFASTA
-formatted multi-sequence files in there:
% cd /opt/bio/data/blastdb % mv /home/bioubuntu/ceci_est_un_exemple.fa .
- The program called
makeblastdb
will create the database files. Choosing the database type (nucleotide or protein) is done with the-dbtype
option withnucl
(nucleotide) orprot
(protein) as parameter:
% sudo makeblastdb -in NC_010473.fna -dbtype nucl -title e_coli_dh10b_nuc -input_type fasta -out e_coli_dh10b_nuc -max_file_sz 2GB
- You can test your new database with
blastn
:
% blastn -query <votre_sequence_e_coli> -db e_coli_dh10b_nuc