Online
|
Download
|
Store
|
Contact
|
About

SVM RNAi 3.6 User Manual

Click here to download full-functional SVM RNAi 3.6

I. Introduction

      RNAi has become an important method for down-regulating gene functions. The active intermediate, siRNA, is increasingly used for functional studies in mammalian cells. One of the main obstacles of using siRNA as a tool is to find an "effective" target site on the mRNA. SVM RNAi 3.6 provides researchers with a friendly tool for finding the best siRNA target sequences. It designs siRNA by selecting a core 19-nt target sequences that match the profiles of known good siRNA sequences.

      Two methods are included in SVM RNAi to profile good siRNA target sequences. The first uses a simple set of rules and the second uses the machine learning program SVM. Each method has its advantages and disadvantages as explained below. They may be used in tandem to accurately predict high potency siRNA target sequences.

1a. Functions of SVM RNAi 3.6

       1. Select siRNA target sequences based on simple rules, or SVM learning model or both.
       2. Automatically designs oligos for a variety of technical platforms (e.g., synthetic siRNA, in vitro translated siRNA, shRNA, etc.). New platforms can be readily added by users.
       3. Eliminates unintended targets by "Blast" search transcribed genome sequences or predicted genes.
       4. Avoids targeting to Single Nucleotide Polymorphism (SNP) sites.
       5. Generates scrambled controls.

1b. siRNA design rules

       The following collection of rules is used in SVM RNAi 3.6. These rules are based on a number of recent studies 1-5.

Rules:
       1. GC content 30-52% preferred.
       2. At least 3 "A/U" at sense position 15-19 preferred. SVM RNAi 3.6 will find 19nt target sequences that form double strand RNAs. The 15-19 positions are relative to the core 19nt sequences.
       3. Internal hairpin is penalized.
       4. "A" at position 19 is preferred.
       5. "A" at position 3 is preferred.
       6. "U" at position 10 is preferred.
       7. "G/C" at position 19 is penalized.
       8. "G" at position 13 is penalized.
       9. Consecutive repeats (e.g., AAAA, GGGG) of more than 3 bases are penalized.
       10. The sense strand 5'-end should form more stable duplex than the 3'-end.
       11. No unintended target gene. SVM RNAi 3.6 will search genome sequences to remove targets that are not unique. A siRNA with 15 bp match to any unintended target gene may be removed.
       12. No targeting SNP. SNP database may be searched to eliminate targeting to SNP site.

       Rules 1-10 will be automatically used if user chooses "Rules" as the method for target prediction. Other rules such as choosing targets within ORF and away from 5' and 3' UTRs are not used. The relative positions of the longest ORF and a target are shown in a diagram; it is straight forward for users to select such a target if desired.

      Note rules 1-8 are based on synthetic siRNA transfection experiments 4. The advantage of using these rules is that high success rate has been reported 4. But the high success rate may not be applicable to user's system for the following reasons: (1) The rule 1-8 are derived based on data from synthetic siRNA transfection experiments only 4,6. It is not known whether they will be applicable to other technical platforms. (2) The success is defined as a reduction of mRNA expression measured by real-time RT-PCR. For most studies, a significant reduction at the protein level is expected. (3) The concentration of siRNA used in the experiments was 100 nM 4,6. At this high concentration, antisense RNA may also reduce gene expression 7-9. For most studies, users may need to find siRNA that works at significantly lower concentration (~5 nM).

1c. SVM machine learning

       Users may also use SVM (Support Vector Machine) for rational siRNA design. Briefly SVM is one of the best machine learning methods 10. It is trained to recognized good siRNA target sequences based on published data. Performance of the current version of SVM models is summarized below (Table 1).

Model

Train

Test

Overall

Accuracy

Details(Predicted/Experiment)

Test

Overall

Accuracy

Details(Predicted/Experiment)

+/+

-/-

(-/-)/(-/+)

 

+/+

-/-

(-/-)/(-/+)

I

A1

A2

66%

69/103

13/22

1.8

C

74%

24/31

4/7

2.5

II

A2

A1

69%

77/109

10/19

1.8

C

74%

24/31

4/7

2.5

III

A1+A2

 

 

 

 

 

C

74%

24/31

4/7

2.5

IV

A1

A2

62%

61/103

16/22

1.8

C

55%

17/31

4/7

1.3

V

A2

A1

57%

63/109

11/19

1.4

C

60%

16/31

7/7

2.0

VI

A1+A2

 

 

 

 

 

C

66%

21/31

4/7

1.8


Table 1. SVM model performance statistics. A total of 253 published siRNA sequences (212 positives and 41 negatives) are randomly divided into two sets (A1 and A2). One set is used as training data while the other served as testing data. A third data set of 41 published siRNA sequences (C, 31 positives and 7 negatives) is used to evaluate the performance independently. +/+: number of predicted positives / number of positives in the testing set. -/-: number of predicted negatives / number of negatives in the testing set. The same SVM parameters are used in all models.

       The advantages of SVM models are the followings: (1) It is trained on large number of public data using different technical platforms, thus is more likely to find conserved features of good siRNA targets. (2) A training target is considered positive only if a significant reduction in protein expression has been observed. This sets a much higher standard for "positive" and may be beneficial for finding the most potent siRNAs that can both degrade mRNA and suppress translation. The disadvantage is that the success rate (50-60% or less) is not as high as have been reported 4,6. The success rate is an estimate and is not determined experimentally.

       Users are suggested to use both Rules and SVM models for rational siRNA design. By combining the two methods, we may avoid some of the disadvantages in each method.

1d. Technical platforms

       The software can be setup for a variety of RNAi technologies. There may be certain platform specific requirement on the siRNA target. SVM RNAi 3.6 will first find 19nt targets. For tech platforms that require longer targets, the 19nt core is extended in both directions. Unique sequences 5' or 3' to the target may also be added.

       Certain platform specific rules such as the requirement of "G/C" at the position 1 and 19 for efficient translation are not implemented. Rules like this may be in conflict with other rules (e.g., sense strand 5'-end should form more stable duplex than the 3'-end).

2. Usage

2a. Setup technical platforms

       Step 1. For editing an existing platform, select the name of the platform, and then click on "Edit." A new window will appear.
       Step 2. For editing a new platform, click on "Edit", and click on "Clear" in the new window.
       Step 3. The default and minimum duplex length is 19 nts. If you choose a duplex length longer than 19, the program will first find targets that are 19nt long and then extend in both directions.
       Step 4. The program can design two oligos (Oligo A and Oligo B). Select "Yes" for the oligo you want the program to design.
       Step 5. Fill in the 5' sequences you would like to add. Use "x" if you want the program to find the corresponding sequence on the cDNA. Leaving blank if no sequences are added.
       Step 6. Select the type and orientation of the 19 nt target sequence. There are five options: "DNA", "DNA reverse complement", "RNA", "RNA reverse complement", and "None". Select "None" if you don't need to add any target sequences.
       Step 7. Type in loop or spacer sequences. Leave blank if no loop/spacer.
       Step 8. Select the type and orientation of the 19 nt target sequence. This is required for shRNA designs. Select "None" if not needed.
       Step 9. Fill in the 3' sequences you would like to add. Use "x" if you want the program to find the corresponding sequence on the cDNA. Leaving blank if no sequences are added.
       Step 10. For certain applications, it may be desirable to introduce a few G-U pairing in the siRNA duplex (e.g., for increasing clone stability in bacteria). This is achieved by mutating C in one strand of siRNA to U. Be extremely careful to choose the corresponding mutations on your DNA molecules --- sense strand in your oligo may correspond to sense or antisense of your RNA!
       Step 11. Check "Check SNP" if you want to avoid targeting to SNP sites.
       Step 12. A directory that contains the SNP files for the species must be provided. Use "Browse" to find the directory, and click on any file in the directory, and then open. The directory name should appear in the text field on the left. Note the directory must contain only SNP files. For details of how to download SNP files, please read Section 2c.
       Step 13. Save all your changes. For existing platform, you may proceed to siRNA design. For new platforms, please exit and then restart the program. After restart, your newly added platform will appear in the platform pull-down menu.

2b. Download genome sequences

      SVM RNAi 3.6 will search genome sequences to remove targets that are not unique. A siRNA with 15 bp match to any unintended target gene will be removed. Genome sequences can be downloaded from NCBI RefSeq project. The Reference Sequence (RefSeq) is comprehensive and non-redundant set of sequences for each species.
       Step 1. Go to the NCBI RefSeq web site: http://www.ncbi.nlm.nih.gov/RefSeq/.
       Step 2. Find the species you interested in: ftp://ftp.ncbi.nih.gov/genomes/.
       Step 3. Go to the RNA directory: e.g., ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA/.
       Step 4. Download RNA sequences in FASTA format. Gnomon_mRNA.fsa.gz (ab initio transcript predictions) is suggested. Annotated RNA in FASTA format (e.g., rna.fa.gz) may also be used.
       Step 5. Uncompress the downloaded file and save the extracted file into the following folder:
       $home|genomes
       Here $home is where you have installed your program.
       Step 6. Change the name of the extracted RNA sequence file to an easy-to-remember name such as species name (e.g. Human_pred). The file name will appear in the program's genome pull-down menu.
       Step 7. Periodically check NCBI RefSeq for updates.

2c. Download SNP sequences

      SNP data can be downloaded from NCBI dbSNP site (http://www.ncbi.nlm.nih.gov/SNP/).
       Step 1. Go to NCBI dbSNP web site: http://www.ncbi.nlm.nih.gov/SNP/.
       Step 2. Go to ftp download: ftp://ftp.ncbi.nih.gov/snp/.
       Step 3. Find your species of interest, e.g., ftp://ftp.ncbi.nih.gov/snp/human/.
       Step 4. Go to the rs_fasta directory, e.g., ftp://ftp.ncbi.nih.gov/snp/human/rs_fasta/.
       Step 5. Download SNP data for each chromosome, save in a temporary folder.
       Step 6. Uncompress all the downloaded files and save in a new directory. This directory can be anywhere on your computer. The suggested site is within $home|snp folder, here $home is where you have installed your program. For example, you may save all extracted SNP files in the $home|snp|human folder.
       Step 7. Setup SNP search directory. See Section 2b, step 12.
       Step 8. Periodically check NCBI dbSNP for updates.

2d. Design siRNA

       Step 1. Start SVM RNAi. Copy and paste sense strand cDNA sequences into the sequence input window. Blanks and numbers will be ignored. For copy and paste, use Ctrl-c and Ctrl-v on Windows and Apple-c and Apple-v on Macs.
       Step 2. Select your technical platform.
       Step 3. If changes are needed for your tech platform, click on Edit. A new window will appear. Make changes, save and close the window. If you have entered a new name for your tech platform, you must exit the SVM RNAi program and restart for the changes to take effect. If there are no changes in the name field, it is not necessary to restart the program. See Section 2a for detailed instructions.
       Step 4. Select a genome. If a genome is selected, the program will scan for unwanted targets. Any siRNA sequence that may target more than one gene will be eliminated. See Section 2b for instructions of adding a new genome.
       Step 5. Select the target. The default is "Sense" and the program assumes your input sequence is the sense strand. Select "Scramble" if you want a scrambled control.
       Step 6. Most researchers have selected siRNA targets starting with AA (TT 3'-overhang for the antisense strand). Although not necessary, it is strongly suggested, especially if you also choose SVM model for predictions. SVM performance may be better for targets starting with AA, because most training data starts with AA.
       Step 7. Select a model for prediction. Rules plus SVM model is suggested. Two SVM models are included. Model A rejects more candidates than Model B. It may be necessary to use Model B when your gene is a member of a family of closely related genes.
       Step 8. Click on "Predict" to run program. For a typical sequence of a few kb, the run time will be a few minutes on most systems.
       Step 9. After the program finished running, the results will be shown in the lower window. In addition, the longest ORF and the position of each target will be shown in a diagram (top-right). User may examine each predicted target and copy the predicted sequences to a word processor for recording or ordering.

References

(1) Elbashir SM, Lendeckel W, Tuschl T. RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev. 2001 Jan 15;15(2):188-200.
(2) Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. Asymmetry in the assembly of the RNAi enzyme complex. Cell. 2003 Oct 17;115(2):199-208.
(3) Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand bias. Cell. 2003 Oct 17;115(2):209-16. Erratum in: Cell. 2003 Nov 14;115(4):505.
(4) Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova A. Rational siRNA design for RNA interference. Nat Biotechnol. 2004 Feb 1.
(5) Holen T, Amarzguioui M, Wiiger MT, Babaie E, Prydz H. Positional effects of short interfering RNAs targeting the human coagulation trigger Tissue Factor. Nucleic Acids Res. 2002 Apr 15;30(8):1757-66.
(6) Ambion TechNotes 10(4), September 2003.
(7) Holen T, Amarzguioui M, Babaie E, Prydz H. Similar behaviour of single-strand and double-strand siRNAs suggests they act through a common RNAi pathway. Nucleic Acids Res. 2003 May 1;31(9):2401-7.
(8) Grunweller A, Wyszko E, Bieber B, Jahnel R, Erdmann VA, Kurreck J. Comparison of different antisense strategies in mammalian cells using locked nucleic acids, 2'-O-methyl RNA, phosphorothioates and small interfering RNA. Nucleic Acids Res. 2003 Jun 15;31(12):3185-93.
(9) Vickers TA, Koo S, Bennett CF, Crooke ST, Dean NM, Baker BF. Efficient reduction of target RNAs by small interfering RNA and RNase H-dependent antisense agents. A comparative analysis. J Biol Chem. 2003 Feb 28;278(9):7108-18.
(10) Cortes,C. and Vapnik,V. (1995) Support vector networks. Mach. Learn., 20, 273-293.


SVM RNAi 3.6 Copyright (c) 2003-2004 Chang Bioscience, Inc. All rights reserved.


SVM RNAi 3.6 License Agreement
This is a legal agreement between you, the "END USER", and Chang Bioscience. Use of this software (the "SOFTWARE") written by Chang Bioscience indicates your acceptance of these terms.
1. GRANT OF LICENSE. Chang Bioscience hereby grants you the right to use the SOFTWARE on a single computer. The SOFTWARE is considered in use on a computer when it is loaded into temporary memory or installed into permanent memory. This License Agreement grants you a non-exclusive License to use the product for an evaluation period of sixty (60) days from the date you install the product. On the sixty one (61) day after you installed the product you must either register the product by means of purchasing a commercial License Agreement from Chang Bioscience or destroy all copies of the product in your possession, and any related documentation. You may make a single copy of the software only and no other part of the product in a machine readable form for backup use only.
2. PROPRIETARY RIGHTS. The SOFTWARE is owned exclusively by Chang Bioscience, and this license does not transfer any ownership of the SOFTWARE to you. You may not lend, loan, lease, rent, sell or distribute the product (or copies) in any form.
3. NON PERMITTED USES. You may not translate, reverse program, disassemble, decompile or otherwise reverse engineer the SOFTWARE.
4. NO WARRANTY. THIS SOFTWARE IS LICENSED TO YOU "AS IS," AND WITHOUT ANY WARRANTY OF ANY KIND, WHETHER ORAL, WRITTEN, EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. CHANG BIOSCIENCE DOES NOT WARRANT THAT THIS SOFTWARE DOES NOT INFRINGE ANY RIGHTS OF THIRD PARTIES.
5. LIMITATION OF LIABILITY. Chang Bioscience grants the license to the SOFTWARE hereunder and the END USER accepts the use hereof on an "AS IS" and "WITH ALL FAULTS" basis. Furthermore, the END USER understands and agrees that IN NO EVENT WILL CHANG BIOSCIENCE OR ANY OF ITS SUPPLIERS BE LIABLE TO LICENSEE FOR SPECIAL OR CONSEQUENTIAL DAMAGES which might arise out of or in connection with the performance or non-performance of this Agreement or of the SOFTWARE hereunder, even if CHANG BIOSCIENCE has been advised of the possibility of such damages, INCLUDING, BUT NOT LIMITED TO, LOST PROFITS due to errors, inaccuracies, omissions, incompleteness or insufficiency of the SOFTWARE or materials, nor for the usefulness of the SOFTWARE to the END USER. In no event shall CHANG BIOSCIENCE'S liability related to any of the SOFTWARE exceed the license fees, if any, actually paid by you for the SOFTWARE. CHANG BIOSCIENCE shall not be liable for any damages whatsoever arising out of or related to the use of or inability to use the SOFTWARE, including but not limited to direct, indirect, special, incidental, or consequential damages


LIBSVM Copyright (c) 2000-2002 Chih-Chung Chang and Chih-Jen Lin All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither name of copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE (LIBSVM) IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Copyright 2002-2004 Chang Bioscience, Inc. All rights reserved.