Humgen Sequence Analysis Programs Information


On this page:
Online BLAST Info Local BLAST Info Autoblast.pl Info FASTA File Manipulation Commands Masking Repeats Info

On-line BLAST Programs

Use these to BLAST one or a few sequences against large sequence databases.

  • BLAST at NCBI. Accepts GenBank accession numbers or sequence in FASTA format. Includes a tutorial and a help manual.

  • BLAST Sanger Sequences (WU-BLAST)
  • BLAST Genome Sequencing Center Sequences (WU BLAST)
  • BLAST Human Contigs at IBC

  • Make Your Own BLAST Database

    To create your own FASTA-format database file to use locally with BLAST programs (see this
    WU-BLAST 2.0 README file) or follow these simple instructions:

    1. Prepare a FASTA format database file comprising sequences you want to compare.

    2. There should be no blank lines between the FASTA sequences.

    3. All sequence lines (except the last line of each sequence) should be of equal length.

      Example format for a FASTA-format database file:

      >name-sequence1
      AATTGGGCGCTCGCACGCAT
      TTAGGCACGCAGCCCCCGGG
      AATTGGGCGCTCGCACGCAT
      AATTGGGCGCTCGCACGCAT
      ATG
      >name-sequence2
      gttattaccttgatatggcc
      aaagggaaaaaaaaaattcg
      gttt
      >name-sequence3
      gttaccttttttttttgAAG
      accccacaacccccccTACT
      gttattaccttgatatggcc
      tggggggcgacgactcag
      
      

    4. Save the file in its own directory.

    5. While in that directory, at your user prompt, type:

      pressdb filename

      This creates three index files needed for BLAST nucleotide comparisons: (*.csq *.nhd *.ntb).

    6. For amino-acid sequence comparisons, also use:

      setdb filename

      This creates three additional index files needed for amino acid comparisons: (*.ahd *.bsq *.atb).

    7. To run BLAST on this newly created database, first have your test-sequence(s) in FASTA format. Type at your prompt:

      blast-program-command path/filename path/test-sequence

      and redirect the output to a file.

      For example, for a 'test-seq' run against a newly created FASTA-format database called 'filename' the command would be:

      blastn filename test-seq > results

      The output from the BLASTN run will be saved to a file called results.


      Autoblast.pl, for Multiple Sequence BLASTing

      Autoblast.pl is used to BLAST multiple FASTA-files against any BLAST database on our system.

      autoblast.pl blast-program database list-file

      The arguements to autoblast.pl are:

      1. the BLAST program you want to use,
      2. the name of the FASTA-format database (include the full path if db is not in your current directory),
      3. and a file listing the test files (which must be in the same directory as test files; also include the full path if not in your current directory).

      For example,

      autoblast.pl blastn /genbank/ecoli/ecoli.nt list

      will run a 'list' of FASTA files through the 'BLASTN' program using the 'ecoli.nt' database.

      Output consists of one file per FASTA file name in the list. These files are named for the corresponding FASTA file with a '.blastn' extension, and are created in the working directory.

      For any run, a summary report of the BLAST results (header/hits) may be created using:

      ls *.blastn > blastn-list
      blastn-reports blastn-list > summary

      To view the summary file:

      less summary

      To print (hard copy) the summary file:

      lpr summary

      Note: First check the size of the file using the ls -al command. If the file is very large (which can happen if the test sequence contains L1 or sine, for example), edit before printing.

      Before doing another autoblast run using the same sequences, create a new directory (using mkdir command), and move (mv) or copy (cp) *.blastn and summary files to the new directory. (To prevent the results from being overwritten.)

      Then autoblast.pl against the new database, etc...


      Commands to create/manipulate FASTA files and file lists.


      Masking Repeats

      To replace stretches of sequence having vector and/or human repeat homology with N's use the RepeatMasker program. The program has many options and returns details about repeats found. Instructions for our system's version of RepeatMasker can be found by typing at the prompt:

      repeatmasker -h

      Your masked sequences may be used to create the BLAST databases or to run against unmasked databases.

      An online version of RepeatMasker exists:

      http://repeatmasker.genome.washington.edu/cgi-bin/RepeatMasker


      Online BLAST Info Local BLAST Info Autoblast.pl Info FASTA File Manipulation Commands Masking Repeats Info Back to Beginning

      Basic Unix Info | Sequence Analysis TOC |

      Last updated May 1, 2000