| Online BLAST Info | Local BLAST Info | Autoblast.pl Info | FASTA File Manipulation Commands | Masking Repeats Info |
Use these to BLAST one or a few sequences against large sequence databases.
Make Your Own BLAST Database
To create your own FASTA-format database file to use locally with
BLAST programs (see this WU-BLAST 2.0 README file) or follow these simple instructions:
Example format for a FASTA-format database file:
>name-sequence1 AATTGGGCGCTCGCACGCAT TTAGGCACGCAGCCCCCGGG AATTGGGCGCTCGCACGCAT AATTGGGCGCTCGCACGCAT ATG >name-sequence2 gttattaccttgatatggcc aaagggaaaaaaaaaattcg gttt >name-sequence3 gttaccttttttttttgAAG accccacaacccccccTACT gttattaccttgatatggcc tggggggcgacgactcag
This creates three index files needed for BLAST nucleotide comparisons: (*.csq *.nhd *.ntb).
This creates three additional index files needed for amino acid comparisons: (*.ahd *.bsq *.atb).
and redirect the output to a file.
For example, for a 'test-seq' run against a newly created FASTA-format database called 'filename' the command would be:
The output from the BLASTN run will be saved to a file called results.
Autoblast.pl is used to BLAST multiple FASTA-files against any
BLAST database on our system.
The arguements to autoblast.pl are:
For example,
will run a 'list' of FASTA files through the 'BLASTN' program using the
'ecoli.nt' database.
Output consists of one file per FASTA file name in the list.
These files are named for the corresponding FASTA file with a '.blastn'
extension, and are created in the working directory.
For any run, a summary report of the BLAST results (header/hits)
may be created using:
To view the summary file:
To print (hard copy) the summary file:
Note: First check the size of the file using the ls -al command. If the file is very
large (which can happen if the test sequence contains L1 or sine, for example), edit before printing.
Before doing another autoblast run using the same sequences, create a new
directory (using mkdir command), and move (mv) or copy (cp) *.blastn and summary files
to the new directory. (To prevent the results from being overwritten.)
Then autoblast.pl against the new database, etc...
Note: Files will be created in the working directory.
Fasta-split names the files with information derived
from the FASTA name line. All characters between the ">" and the first
space are used for the new file name.
For example, the FASTA file created from the following
FASTA file would be named "alcohol".
The list would then be used as one of the arguements to the
autoblast.pl program.
To replace stretches of sequence having vector and/or human repeat homology
with N's use the RepeatMasker program. The program has many options and
returns details about repeats found. Instructions for our system's version
of RepeatMasker can be found by typing at the prompt:
Your masked sequences may be used to create the BLAST databases or
to run against unmasked databases.
An online version of RepeatMasker exists:
Last updated May 1, 2000
Autoblast.pl, for Multiple Sequence BLASTing
Commands to create/manipulate FASTA files and file lists.
>alcohol dehydrogenase, cytochrome c, complete cds
GAATTCCGAACTATCCGTTTCATTGCTTATGCGACAGCATGTTCACTTTTTAGTGAGGCTGAACACTAAA
ATGTCAGGAGACGAGCGTGCTAGCCTCAGTATGTTGCCATGAAACGGACCACCTGCTTTGTCTTTCCTGC
CTGAAGCCGGTTTCTGTCTGGCCGGAAAAGAAGCGCTAGCGCGTTTTTTTGCCGGATACATTCAGAAAGC
TGCTCCGGGCAGAAAGTTGCAGCGGCGGCATCCTGAATTCGAAACCGTTAGTTTTCTGAGGACATCACAT
Masking Repeats
Online BLAST Info
Local BLAST Info
Autoblast.pl Info
FASTA File Manipulation Commands
Masking Repeats Info
Back to Beginning
Basic Unix Info |
Sequence Analysis TOC |