GeneStat 2.0 Online*|
Statistical tools for biomedical scientists
Test of independence. Test of goodness of fit.
Z-test and error probability for normal distributions.
|t-Test Tests for differences in means.
Two sample One sample Paired
Pearson (linear) Spearman Rank (non-linear)
Estimates required sample size for achieving desired confidence intervals.
Calculates recombination frequency and corrected map distance.
|BioDataFit 1.02 Linear and
sigmoidal models for standard curves and Km, Vmax, and IC50.
||Probability Distribution Calculator
Calculator for Poisson and binomial probability distributions.
|Log-odds ratio Converts between risk ratio, odds ratio, and log-odds ratio.
*Online versions may have limited functions and may not support certain
web browsers. Internet Explorer is preferred and Mac OSX users may experience problems (If having problems, switch to the classical environment). Stand-alone versions, which are not dependent on web browsers, are always suggested.
*Tell us what you think. Comments and suggestions are always appreciated.
Test of independence
A vaccine trail produced the following results:
placebo 81 1427
vaccine 179 2824
The null hypothesis: there is no relationship between row and column frequencies,
i.e., vaccine/placebo will not have any effect in the infected/uninfected frequencies.
Using test of independence, we find p = 0.42 and conclude that the null hypothesis cannot be rejected,
i.e., the vaccine is not effective.
Goodness of fit
The cross of A a B b X a a b b is used to determine whether the two loci are linked. The results
are shown below:
A a B b 310
a a b b 315
A a b b 287
a a B b 288
The null hypothesis: the two loci are not linked, thus the ratio should be 1:1:1:1.
The expected frequencies are:
A a B b 300
a a b b 300
A a b b 300
a a B b 300
Using the Goodness of fit test, we find p = 0.55. So the null hypothesis cannot be rejected:
the two loci may not be linked.
For sample means, the z-score is defined as
z = ( <X> - m ) /
( s / n1/2 ).
For a one-sample z-test with known population mean and standard deviation, the sample size n = 1.
With the z-test calculator, users can easily calculate z-score from the p-value or the vice versa.
The population standard error s may be replaced with the sample
standard error s if the sample size n is sufficiently large.
The national average annual incidence rate of ALS is 2.8 per 100,000 population
over 15 years of age (95% CI 2.4-3.1). The eastern region has a higher incidence rate
of 4.4 per 100,000 person-years. One-tailed z-test can be used to determine whether
the eastern region has higher ALS incidence rate.
H0 The null hypothesis: the annual incidence difference m between the national (mn)
and the eastern region (me) is 0, i.e., m = mn - me = 0.
Ha The alternative hypothesis: m = mn - me < 0.
We find, z = (y - m) /s =
y/s = (yn - ye)/(sn + se)
~ (yn - ye)/(2 * sn) ~ (2.8-4.4)/0.7 ~ -2.3. Thus
p = 0.01.
The null-hypothesis should be rejected in favor of the alternative hypothesis, i.e.,
high ALS incidence in the eastern region is statistically significant.
t-test is frequently used to test differences in means for two datasets.
Independent data set A clinical trial tests for the effect of a cholesterol lowering drug gives
the following results:
H0 The null hypothesis: the mean cholesterol level for the drug-treated group = mean cholesterol level for
the placebo group.
Ha The alternative hypothesis:
the mean cholesterol level for the drug-treated group < mean cholesterol level for
the placebo group.
Using one-tailed t-test for independent samples,
we find p = 0.12. The null hypothesis cannot be rejected in favor of the alternative hypothesis, i.e., the drug is not effective.
Dependent data set (paired)
A clinical trial tests for the effect of a cholesterol lowering drug gives
the following results:
H0 The null hypothesis: the mean cholesterol level is the same after drug treatment.
Ha The alternative hypothesis: the mean cholesterol level is lower after drug treatment.
Using one-tailed t-test for dependent (paired) samples,
we find p = 0.20. The null hypothesis cannot be rejected in favor of the alternative hypothesis, i.e., the drug is not effective.
Significance and Significance
A small p-value is considered statistically significant. But p-value is not a measure of biological
significance. Suppose we have found ants' weight is < 0.1 gram with absolute certainty (p = 0)
and whales' weight is > 100 ton with a p-value of 0.1. Whales are still heavier than ants no matter what the p-values are!
r = Szxzy / N
is frequently used to test whether two variables have linear relationships. Here
zx = (X-m)/sx,
zy = (Y-m)/sy and N is the sample size.
Sperman rank correlation is an alternative to Pearson correlation when the relationship is
not linear. For example, the data shown below has a perfect Sperman rank correlation (r = -1, p < 0.05)
but an insignificant Pearson correlation (r = -0.85 to 0.13, p > 0.05 ).
Note Pearson correlation will be significant if Y is in the log scale.
Estimate Sample size
Sample size for two samples A pilot study found that the dissociation
constant (Kd) for ligand A and B are 9.5 +/- 1.2 nM and 5.7 +/- 1.5 nM. How many samples do we
need to show KdA - KdB > 3 nM at the 95% confidence level? We may
use the estimator for testing m1 - m1 = D
(one-sided, independent). Reasonable estimates of parameters are a = 0.05 (95% CI),
b = 0.1, D = 3, s = 1.5. Required sample size will be 5, i.e.,
the binding assays should be repeated at least 5 times.
For paired two samples, the differences should be calculated first, then treated as a one sample problem, i.e.,
testing for m (difference) = D.
Sample size for one sample In a microarray experiment, we would like to find out
how many duplicate arrays needed for reliably claiming a gene's transcription is at least
2-fold up/down compared to the control. Suppose we are using cDNA arrays and the data is in the log-scale (log2R/G).
Using the estimator for testing m = D (two-sided) and
the following parameters: a = 0.01 (99% CI),
b = 0.1, D = log22 = 1, s = log21.5 = 0.4 (1.5 fold up/down),
we find the required sample size is 3.
Sample size for confidence interval
A research would like to determine a mutant fruit fly's lifetime to the accuracy of +/- 5 days.
On average flies live for 70 +/- 10 days. Using the sample size estimator for confidence intervals,
he finds that he needs to measure the lifetime for at least 16 individual fruit files (a = 0.05, E = 5, s = 10).
Sample size for Pearson correlation
A researcher has found that gene A's activity is proportional to gene B's activity (ya ~ k yb ). To quantitate
the effect she would like to determine the ratio to the relative accuracy of +/- 20%. Note k is proportional to the Pearson correlation r
and the relative error in k is equivalent to the absolute error E in r. She used sample size estimator for Pearson correlation
and found at 95% CI (a = 0.05), the required sample size is 68.
Recombination can be used to determine map distance between two loci. For small distance,
the recombination frequency (RF) is proportional to the map distance. For large distance,
multiple cross-overs must be taken into account.
RF = (1 - e-2 * map distance) / 2
For example, the map distance is 40 map units (m.u.) for RF = 27.5%.
BioDataFit can be used to model dose-response, ligand-binding, enzyme kinetics, and growth inhibition. Emphasis is given to the four-parameter model or sigmoidal model,
which is frequently used to calculate EC50 (IC50, DC50, or GI50) values in dose-response experiments such as drug screening and inhibition assays.
Can also be used to calculate Michaelis-Menten Constant Km, maximum reaction rate Vm and model a standard curve.
Probability Distribution Calculator
Calculates the probability for Binomial distribution N! / n! / (N - n!) pn (1 - p)N - n and Poisson distribution
mn / n! e-m.
Risk ratio is the ratio of number of subjects with the event in a group to
the total in the group. Odds ratio is the ratio of number of subjects with the event in a group to the number of subjects without the event.
Log-odds ratio is the natural log of the odds ratio. A risk ratio of 0.2 is equivalent to an odds ratio of 0.25 and a log-odds ratio of -1.4.
Population --- any entire collection of subjects we are interested in studying. Population mean m and standard deviation s.
Sample --- a subset of a population. Sample mean <y> and standard deviation s.
Null hypothesis --- the hypothesis being tested.
Alternative hypothesis --- the alternative hypothesis
relates to the null hypothesis to be accepted if the null is rejected.
p-value --- the probability of falsely rejecting the null hypothesis if it is in fact true.
Type I error or false-positive rate a --- the risk of rejecting a true hypothesis.
Type II error or false-negative rate b --- the risk of failing to reject a false null hypothesis.
One-sided --- the rejection region is located only in one-tail of the distribution. Example: H0 m = 20 and Ha m < 20.
Two-sided --- the rejection region is located in both tails of the distribution. Example: H0 m = 20 and Ha m ≠ 20.
100 (1 - a)% confidence interval --- an interval estimate of m.
95% confidence interval <y> +/- 1.96 sy.
99% confidence interval <y> +/- 2.58 sy.
The Electronic Protocol Book Table of contents
BioToolKit 300 Download Trials
An electronic protocol book with 500 protocols and
100 recipes. A great quick and practical reference for bench scientists
as well as for new students.
Get A Copy
A collection of tools frequently used by bench biomedical scientists, ranging from centrifugation
force conversion, molecular weight, OD, recipe calculators, to clinical calculators. Include all Primo 3.4, Abie 3.0, Heatmap Viewer, MicroHelper, Godlist Manager, label printing, and grade book.
Home Products Order Contact