Support Vector Machine (SVM Classifier 1.2): Learn to Classify
| Support Vector Machine (SVM) is one of the
best statistical learning methods. In many cases, its performance (i.e., error rate on test data)
either matches or is significantly better than that of competing methods.
The SVM Classifier 1.2 and SVM SeqClassifier 1.2 are based on JAVA Implementation of LIBSVM,
a simplified form of SVM.
The goal is to provide biologists
a friendly tool to test their hypothesis without the need of programming. Users may find the light version
satisfactory for many applications. See below for a few examples.
Order SVM Classifier 1.2
SVM Classifier is now included in BioToolKit 300. Click here to download.New
SVM Online runs on Java-enabled browsers|
PC: Internet Explorer 5.5 or higher and Netscape 4.08 or higher
Mac: Internet Explorer 5.0, IE5.1.4 or higher (IE5.1 will not work because of a bug.)
Need to raise a better antibody? |
Please use online Abie Pro 3.0 for
peptide antibody design.
The Electronic Protocol Book Table of contents
BioToolKit 300 Download Trials
An electronic protocol book with 500 protocols and
100 recipes. A great quick and practical reference for bench scientists
as well as for new students.
Get A Copy
A collection of tools frequently used by bench biomedical scientists, ranging from centrifugation
force conversion, molecular weight, OD, recipe calculators, to clinical calculators. Include all Primo 3.4, Abie 3.0, Heatmap Viewer, MicroHelper, Godlist Manager, label printing, and grade book.
Example I: SVM Classification of Tumor Types
Test Drive SVM Classifier 1.2|
1. Click here to download the training data.
2. Select from the second line, copy (Ctrl-c or Apple-c) and paste (Ctrl-v or Apple-v)
into the training data input window (top-left) of the SVM Classifier 1.2. Do not include the first line of column names.
The input data file should have the following format:
3. Select model, kernel, and whether to scale the data. If the scale is checked, each
row will be scaled such that Sxi2 = N. Here N is the number of data
points in each row (vector).
4. Click on the Train button to train a model.
5. Click here to download the testing data.
6. Select from the second line, copy (Ctrl-c or Apple-c) and paste (Ctrl-v or Apple-v)
into the testing data input window (top-right) of the SVM Classifier 1.2. Do not include the first line of column names.
7. Click on the Predict button to predict.
8. Click on the Validate button to run cross-validation.
Warning: Due to limit of your clip board memory, it is impossible to copy/paste a large dataset.
You would need the Stand-alone version, which allows data upload.
The microarray data for this demo is published in Ramaswamy et al. PNAS 98(26):15149-54.
Supplement material and data can be found at Whitehead Institute Center for Genome Research.
For the demo purpose, the data was selected/filtered following the steps: (1) No additional
normalization; (2) Select top candidates in OVA; (3) Data values smaller than 10 are raised to 10
and data values greater than 5000 are lowered to 5000; (4) Select features with median expression levels (i.e., the
total expression in the training set is 10000 ~ 30000 for each feature).
Figure 2. Prediction and cross-validation accuracy. Features are selected
from the training data (144 primary tumors) and the test set (46 primary tumors, excluding metastatic)
as described in the above. Training, prediction, and 5-fold cross-validation used
the C-SCV model and linear kernel. The data is scaled such that Sxi2 = N.
The stand-alone version is needed for feature numbers greater than 100 because of the limit on clip-board memory.
A typical training and testing data with ~300 features can be downloaded here:
Training Test. Please remove the first line of feature names before loading the files.
Example II: SVM Classification of Drugs Based On the Mechanism of Actions (MOAs)
Click here for the drug activity data of 115 drugs against a panel
of 60 cell lines (A matrix, excluding 3 drugs with unknown MOA)2. For details, please visit authors web page.
Drugs with different mechanism of actions have different profiles of drug activities against the
60 cell lines. Giving the drug activities, we may also predict a drug's mechanism
of actions: Input the data into SVM, one should find the 5-fold validation accuracy is greater than 70% (C-SVC model, polynomial kernel, scale data). Note SVM will predict only
MOAs present in the data set. For this reason, the prediction accuracy may be low for a multiclass
classifier. The performance can be improved if we are only interested in known whether a drug belongs
to one of the classes. In the latter case, we may divide the training set into two classes --- drugs that belong or do not
belong to the class to be tested.
Example III: SVM Classification of DNA or Protein Sequences
SVM SeqClassifier is an extension of SVM Classifier:
SeqClassifier accepts inputs of DNA or protein sequences. SeqClassifer can
be trained to recognize cis-controling sequences (e.g., splicing site, promoter,
translation initiation sequence, protein binding sequences, etc.) and
protein motifs (e.g., protease sites, phophorylation sites, localization signals, interaction domains, etc.). Click here to test it.
Example V: siRNA Design
Click here to see how SVM can be applied to
select for siRNA.
Example VI: Protein Cellular Localization
Click here to download training/testing data for
protein cellular localization predictions. Chose a small data set, you will not be able to copy/paste
a large data set. The data is in svm-light format that can be accepted by this online tool.
Try to change model/kernel to find the best prediction accuracy.
1. Multiclass cancer diagnosis using tumor gene expression signatures.
Ramaswamy et al., Proc Natl Acad Sci U S A. 2001 Dec 18;98(26):15149-54.
2.A gene expression database for the molecular pharmacology of cancer.
Scherf et al., Nature Genetics 2000; 24: 236-244.
3. For more information about limsvm, please visit
the home page of Dr. Chih-Jen Lin.
What is a USB Flash Disk?
Home Products Order Contact
THIS SOFTWARE IS PROVIDED BY CHANG BIOSCIENCE ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CHANG BIOSCIENCE OR ITS SPONSORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Copyright © 2002-2004 Chang Bioscience, Inc. All rights