====== BLAST 2.2.26+ Installation instructions ====== ===== Additional Libraries ===== * No additional library is required ===== Procedure ===== For the past few years, there have been two NCBI BLAST versions available: BLAST and BLAST+, a full rewrite of the cousrce code using C++. The regular BLAST was preferred because it was more compact; however, the BLAST development team has stopped all work on the old BLAST. Since the Impilo policy is to use the latest version, BLAST+ is now the version used in Impilo. Another choice had to be made: source or binary? Again, the criteria was occupied space and the binary occupies much less space. Here is the procedure used to install BLAST, from archive to executable: * Download the appropriate archive in ''/home/bioubuntu'', decompress it and move the resulting folder in ''/opt/bio/sources/''. % wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-x64-linux.tar.gz % tar -zxvf ncbi-blast-2.2.26+-linux.tar.gz % sudo mv ncbi-blast-2.2.26+ /opt/bio/sources * The ''/opt/bio/sources/ncbi-blast-2.2.26+'' folder should belong to ''root'' and its permissions set to ''755''. % sudo chown -R root:root /opt/bio/sources/ncbi-blast-2.2.26+ * Because the applications are in ''/opt/bio/sources/ncbi-blast-2.2.26+/bin'', this needs to be added to ''PATH''. There are many ways to do so but I decided to put this information in ''/etc/profile.d/impilo.sh''. % sudo nano /etc/profile.d/impilo.sh * At **the very end of the file**, you need to add the following lines: # # BLAST specific environment variables # export PATH=$PATH:/opt/bio/sources/ncbi-blast-2.2.26+/bin * The last thing to do is to add location information for the sequence databases used by BLAST, in the same file: export BLASTDB=/opt/bio/data/blast_db ===== Creating sequence databases to be used by BLAST ===== BLAST uses specialy formatted database files, usually created from source files written in ''FASTA'' format. There are no strict location for these files (That's what the ''export BLASTDB…'' line in ''/etc/profile.d/impilo.sh'' is for) but to make it cleanly, Impilo puts these files in ''/opt/bio/data'', more precisely, ''/opt/bio/data/blast_db''. Because we want to keep things small, only two small sequence files, derived from E. coli DH10, are provided for teaching: * [[ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B_uid58979/NC_010473.faa|E. coli DH10B strain, all proteins]] * [[ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B_uid58979/NC_010473.fna|E. coli DH10B strain, whole genome]] Here is a general recipe to create these database files. * First, since only ''root '' can write into ''/opt/bio/data/blast_db'', il faut devenir ''root'': % sudo su - * We navigate to locate ourself in ''/opt/bio/data/blast_db'' and we move the ''FASTA'' files into it: % cd /opt/bio/data/blast_db % mv /home/bioubuntu/this_is_an_example.fa . * The ''makeblastdb'' program will create the database file. The selection of the type of datanase is done via the ''-dbtype'' parameter with ''prot'' (amino acid) or ''nucl'' (nucleotide) as values: % makeblastdb -in -dbtype -out -title * You can test your new DB with ''blastn'' or ''blastp'': % blastp -db -query