JDSA is a Java program that will degenerately search DNA sequences. This program searches sequences for multiple nucleotides at a given position (positional degeneracy), limited overall sequence accuracy (group degeneracy) or variable spacing between multiple DNA sequences (spacing degeneracy). This program was designed to search the S. pombe genome or the D. melanogaster genome, but custom searches can be performed provided the input is in the correct format. This program has been written in Java 2, SDK 1.4.2_03, contains a graphical interface (Swing v 1.1) and has been tested on both a Win98/XP PC, Mac OS 9.1. and Mac OS X (10.1.4)
Win 9x/2000/XP:
1. You will
need to download and install the Java Runtime Environment (JRE). It is available
here.
2.
Download the program file (JDSAv01.jar)
and run it (see instructions section below.)
3. You may also need the Java
Foundation Classes (JFC)/Swing Package. That is available here.
Mac OS 9.x:
1. You will need to download and install the Macintosh Runtime
for Java (MRJ) 2.2.5. It is available here.
2. You will need to download a file called swingall.jar.
This file will need to be placed in the folder: System Files: Extensions: MRJ Libraries:
MRJClasses
3. Download the program file and run it (see instructions section
below.)
Mac OS X:
1. Download the file: JDSAv01.jar.
Double-click the icon and go.
Additional Files: If you want to search an entire genome
and the
files are not stored locally, you must create a file containing all of
the GI numbers for every genome
sequence file and point the filechooser to that file. Here are the Drosophila
melanogaster Release v3.2
and S. pombe genome GI lists. These may
not be the most up to date annonations of the sequencing results.
1. Positional degeneracy: If you wanted to search for a DNA element that contained degeneracy at a given position - the initiator (Inr) region of Drosophila, for example. The Initiator of Drosophila is the consensus sequence from which transcription starts (the +1 of an RNA transcript.) This sequence is T-C-A-(G or T)-T-(T or C). Searching each permutation (by BLAST search for example) is very inefficient. Instead, using the standard IUPAC designation for degenerate nucleotides, the JDSA algorithm will return all permutations of the Inr sequence. For example, the above Inr sequence can be written: TCAKTY. The Downstream Promoter Element (Burke & Kadonaga(1996); Kutach & Kadonaga(2000)) would be written as: RGWYG. For more information regarding the DPE and promoters, click here.
The IUPAC degeneracy codes are below:
IUPAC | Nucleotide(s) | Complement Nucleotide(s) |
A | A | T |
C | C | G |
G | G | C |
T | T | A |
M | A OR C | K |
R | A OR G | Y |
W | A OR T | W |
S | C OR G | S |
Y | C OR T | R |
K | G OR T | M |
V | A OR C OR G | B |
H | A OR C OR T | D |
D | A OR G OR T | H |
B | C OR G OR T | V |
N | A OR C OR T OR G | N |
3. Spacing degeneracy:
This type of degeneracy has to do with the spacing between two DNA elements. Returning
to the promoter example, if you wanted to look for a TATA box that was close to
an initiator, you could do so with the JDSA program. A TATA box is the most biologically
pertinent when it appears further away than 30 nucleotides upstream of the initiator,
and no closer than 10. To have the JDSA program search for these results, you would
enter the following:
- New Search, 2 DNA elements.
- For element #1: TATAAA
(the TATA box consensus) with any desired maximum allowable mismatch (see group
degeneracy)
- For element #2: TCAKTY (the Drosophila Initiator sequence) with the given
maximum allowable mismatch.
- For element #2: Maximum distance away from element
1 would be 30
- For element #2: Minimum distance would be 10.
Searching for a sequence: To search for a given sequence or set of sequences, click "New..." from the MenuBar, and then "New JDSA Search..." from the pull-down menu.
A pop-up menu will appear and ask you "How many fragments?" From the pull-down menu, you should select the number of separate DNA elements that you wish to search. In the promoter example, if you just wanted to search for the initiator, you would enter 1. If you wanted to search for a TATA box and an initiator, you would enter 2, and so on. Then click "OK". Clicking "Cancel" will abort the search.
A new screen titled "JDSA input" will appear, and its appearance will depend upon how many fragments you said that you needed to search. In the separate top subpanels, you can enter the needed information: what is the sequence of a given fragment, how many mismatches will you allow, how far is it from the previous fragment, and so on).
Please note, when you start, you should not be able to click the OK button (bottom right). The OK button will only become enabled when you have entered enough VALID information to proceed. If you enter a character that the program does not recognize (a non-IUPAC character or a letter where a number is expected) the OK button will be disabled and remain disabled until the problem is corrected.
Parsing and Filtering: There are several options available to cut down on unwanted reported results, as well as attempting to maximize the information returned so that the results have more meaning. These options are in the lower right-corner, just above the Cancel/Proceed buttons.
Parsing: This is an attempt to place the results of a genomic search in its genomic context. If the Parse the results? checkbox is checked, the results will look like this:
1. ggatggattgatttgcctattgcatttata
[C]SPAC7D4 {5646}:In ORF: SPAC7D4.12c; Start of exon <-- {1373 bp} SEQUENCE FOUND
complement strand {906bp} --> end of exon
6. tataaactgcatatttatactccttttccaatt
SPAC7D4 {13918}:Extragenic: Previous gene:SPAC7D4.15c [C]<-- {649 bp} SEQUENCE FOUND
{355bp} --> SPAC7D4.08 [C]
In both of these results, you can see how the formatting is returned. The result lists the sequence, the strand (complementary strand results are designated with a [C]), the file name (in this case the pombe file name SPAC7D4), the nucleotide number (5646), and where those sequences are in the genome, either In an ORF, Extragenic, Intronic or some basic combinations of these possibilities. It also lists how close the resultant sequence is from those surrounding elements. In the extragenic example, the sequence is 649bp from gene SPAC7D4.15c, and 355bp from gene SPAC7D4.08.
Parse Filter: by default is set to No Parse Filter but can be changed to extragenic only, in ORF only, in intron only, or extragenic/in intron.
Strand Filter: by default is set to No Strand Filter but can be changed to Forward Strand Only (especially useful if you have a custom search to perform, see below) or Complement Strand Only.
Starting the search: After you've entered all of the information, click Proceed.
That window will disappear and another small window will appear saying "Click here to START". Click when you're ready to begin. The program will begin its search of your query and return your results to you when it is done.
Three types of files are allowed as valid inputs.
23095176 "AE003474"There must be only one entry per line. The GI number must precede any description you wish to provide. The descriptions are optional. If you include a description, it must be in quotations. This way, when JDSA returns the results of the search, the results can be listed along with their respective title.
23092840 "AE003475"
2894275 "Pombe cosmid c6B1"
6689257
The speed of this program is dependent on several factors. Since it is a degenerative search, it may have to do the functional equivalent of searching the genome multiple times to find what you're asking for. And this takes time.
Also, this program pulls the genome files from NCBI as it needs to search them. Therefore, internet traffic and the load on NCBI will impact performance.
In heavy traffic, a complex search of the Drosophila genome using the Internet as the source can and has taken up to an hour to run. There is a status bar, but just be forewarned.