GeneStat 2.0 Online*
Statistical tools for biomedical scientists

ChiSquare Test
• Test of independence. • Test of goodness of fit. 
ZTest
Ztest and error probability for normal distributions. 
tTest Tests for differences in means.
• Two sample • One sample • Paired 
Correlation Coefficient
• Pearson (linear) • Spearman Rank (nonlinear) 
Sample Size
Estimates required sample size for achieving desired confidence intervals. 
Map Distance
Calculates recombination frequency and corrected map distance.

BioDataFit 1.02 Linear and
sigmoidal models for standard curves and Km, Vmax, and IC50. 
Probability Distribution Calculator
Calculator for Poisson and binomial probability distributions.

Logodds ratio Converts between risk ratio, odds ratio, and logodds ratio. 

*Online versions may have limited functions and may not support certain
web browsers. Internet Explorer is preferred and Mac OSX users may experience problems (If having problems, switch to the classical environment). Standalone versions, which are not dependent on web browsers, are always suggested.

*Tell us what you think. Comments and suggestions are always appreciated.

Examples

Chisquare Test
Test of independence
A vaccine trail produced the following results:
infected uninfected
placebo 81 1427
vaccine 179 2824
The null hypothesis: there is no relationship between row and column frequencies,
i.e., vaccine/placebo will not have any effect in the infected/uninfected frequencies.
Using test of independence, we find p = 0.42 and conclude that the null hypothesis cannot be rejected,
i.e., the vaccine is not effective.
Goodness of fit
The cross of A a B b X a a b b is used to determine whether the two loci are linked. The results
are shown below:
A a B b 310
a a b b 315
A a b b 287
a a B b 288
The null hypothesis: the two loci are not linked, thus the ratio should be 1:1:1:1.
The expected frequencies are:
A a B b 300
a a b b 300
A a b b 300
a a B b 300
Using the Goodness of fit test, we find p = 0.55. So the null hypothesis cannot be rejected:
the two loci may not be linked.
Ztest
For sample means, the zscore is defined as
z = ( <X>  m ) /
( s / n^{1/2} ).
For a onesample ztest with known population mean and standard deviation, the sample size n = 1.
With the ztest calculator, users can easily calculate zscore from the pvalue or the vice versa.
The population standard error s may be replaced with the sample
standard error s if the sample size n is sufficiently large.
The national average annual incidence rate of ALS is 2.8 per 100,000 population
over 15 years of age (95% CI 2.43.1). The eastern region has a higher incidence rate
of 4.4 per 100,000 personyears. Onetailed ztest can be used to determine whether
the eastern region has higher ALS incidence rate.
H_{0} The null hypothesis: the annual incidence difference m between the national (m_{n})
and the eastern region (m_{e}) is 0, i.e., m = m_{n}  m_{e} = 0.
H_{a} The alternative hypothesis: m = m_{n}  m_{e} < 0.
We find, z = (y  m) /s =
y/s = (y_{n}  y_{e})/(s_{n} + s_{e})
~ (y_{n}  y_{e})/(2 * s_{n}) ~ (2.84.4)/0.7 ~ 2.3. Thus
p = 0.01.
The nullhypothesis should be rejected in favor of the alternative hypothesis, i.e.,
high ALS incidence in the eastern region is statistically significant.
ttest
ttest is frequently used to test differences in means for two datasets.
Independent data set A clinical trial tests for the effect of a cholesterol lowering drug gives
the following results:
placebo drug
200 205
215 220
225 220
230 225
210 220
210 205
200 205
220 195
200
195
190
H_{0} The null hypothesis: the mean cholesterol level for the drugtreated group = mean cholesterol level for
the placebo group.
H_{a} The alternative hypothesis:
the mean cholesterol level for the drugtreated group < mean cholesterol level for
the placebo group.
Using onetailed ttest for independent samples,
we find p = 0.12. The null hypothesis cannot be rejected in favor of the alternative hypothesis, i.e., the drug is not effective.
Dependent data set (paired)
A clinical trial tests for the effect of a cholesterol lowering drug gives
the following results:
Before After
treatment treatment
200 205
215 220
225 220
230 225
210 220
210 205
200 205
220 195
195 200
205 195
200 190
H_{0} The null hypothesis: the mean cholesterol level is the same after drug treatment.
H_{a} The alternative hypothesis: the mean cholesterol level is lower after drug treatment.
Using onetailed ttest for dependent (paired) samples,
we find p = 0.20. The null hypothesis cannot be rejected in favor of the alternative hypothesis, i.e., the drug is not effective.
Significance and Significance
A small pvalue is considered statistically significant. But pvalue is not a measure of biological
significance. Suppose we have found ants' weight is < 0.1 gram with absolute certainty (p = 0)
and whales' weight is > 100 ton with a pvalue of 0.1. Whales are still heavier than ants no matter what the pvalues are!


Correlation coefficient
Pearson correlation
r = Sz_{x}z_{y} / N
is frequently used to test whether two variables have linear relationships. Here
z_{x} = (Xm)/s_{x},
z_{y} = (Ym)/s_{y} and N is the sample size.
Sperman rank correlation is an alternative to Pearson correlation when the relationship is
not linear. For example, the data shown below has a perfect Sperman rank correlation (r = 1, p < 0.05)
but an insignificant Pearson correlation (r = 0.85 to 0.13, p > 0.05 ).
X Y
0 0.415616974
5 0.006357108
10 5.12865E06
15 1.99272E07
20 1.72342E09
25 1.37486E11
30 6.19116E14
35 5.9692E16
40 4.14682E18
45 1.4239E20
50 9.20835E23
Note Pearson correlation will be significant if Y is in the log scale.
Estimate Sample size
Sample size for two samples A pilot study found that the dissociation
constant (Kd) for ligand A and B are 9.5 +/ 1.2 nM and 5.7 +/ 1.5 nM. How many samples do we
need to show Kd_{A}  Kd_{B} > 3 nM at the 95% confidence level? We may
use the estimator for testing m_{1}  m_{1} = D
(onesided, independent). Reasonable estimates of parameters are a = 0.05 (95% CI),
b = 0.1, D = 3, s = 1.5. Required sample size will be 5, i.e.,
the binding assays should be repeated at least 5 times.
For paired two samples, the differences should be calculated first, then treated as a one sample problem, i.e.,
testing for m (difference) = D.
Sample size for one sample In a microarray experiment, we would like to find out
how many duplicate arrays needed for reliably claiming a gene's transcription is at least
2fold up/down compared to the control. Suppose we are using cDNA arrays and the data is in the logscale (log_{2}R/G).
Using the estimator for testing m = D (twosided) and
the following parameters: a = 0.01 (99% CI),
b = 0.1, D = log_{2}2 = 1, s = log_{2}1.5 = 0.4 (1.5 fold up/down),
we find the required sample size is 3.
Sample size for confidence interval
A research would like to determine a mutant fruit fly's lifetime to the accuracy of +/ 5 days.
On average flies live for 70 +/ 10 days. Using the sample size estimator for confidence intervals,
he finds that he needs to measure the lifetime for at least 16 individual fruit files (a = 0.05, E = 5, s = 10).
Sample size for Pearson correlation
A researcher has found that gene A's activity is proportional to gene B's activity (y_{a} ~ k y_{b} ). To quantitate
the effect she would like to determine the ratio to the relative accuracy of +/ 20%. Note k is proportional to the Pearson correlation r
and the relative error in k is equivalent to the absolute error E in r. She used sample size estimator for Pearson correlation
and found at 95% CI (a = 0.05), the required sample size is 68.
Map distance
Recombination can be used to determine map distance between two loci. For small distance,
the recombination frequency (RF) is proportional to the map distance. For large distance,
multiple crossovers must be taken into account.
RF = (1  e^{2 * map distance}) / 2
For example, the map distance is 40 map units (m.u.) for RF = 27.5%.
BioDataFit
BioDataFit can be used to model doseresponse, ligandbinding, enzyme kinetics, and growth inhibition. Emphasis is given to the fourparameter model or sigmoidal model,
which is frequently used to calculate EC50 (IC50, DC50, or GI50) values in doseresponse experiments such as drug screening and inhibition assays.
Can also be used to calculate MichaelisMenten Constant Km, maximum reaction rate Vm and model a standard curve.
Probability Distribution Calculator
Calculates the probability for Binomial distribution N! / n! / (N  n!) p^{n} (1  p)^{N  n} and Poisson distribution
m^{n} / n! e^{m}.
Logodds ratio Risk ratio is the ratio of number of subjects with the event in a group to
the total in the group. Odds ratio is the ratio of number of subjects with the event in a group to the number of subjects without the event.
Logodds ratio is the natural log of the odds ratio. A risk ratio of 0.2 is equivalent to an odds ratio of 0.25 and a logodds ratio of 1.4.
Glossary
Population  any entire collection of subjects we are interested in studying. Population mean m and standard deviation s.
Sample  a subset of a population. Sample mean <y> and standard deviation s.
Null hypothesis  the hypothesis being tested.
Alternative hypothesis  the alternative hypothesis
relates to the null hypothesis to be accepted if the null is rejected.
pvalue  the probability of falsely rejecting the null hypothesis if it is in fact true.
Type I error or falsepositive rate a  the risk of rejecting a true hypothesis.
Type II error or falsenegative rate b  the risk of failing to reject a false null hypothesis.
Onesided  the rejection region is located only in onetail of the distribution. Example: H_{0} m = 20 and H_{a} m < 20.
Twosided  the rejection region is located in both tails of the distribution. Example: H_{0} m = 20 and H_{a} m ≠ 20.
100 (1  a)% confidence interval  an interval estimate of m.
95% confidence interval <y> +/ 1.96 s_{y}.
99% confidence interval <y> +/ 2.58 s_{y}.


The Electronic Protocol Book Table of contents

BioToolKit 300 Download Trials

An electronic protocol book with 500 protocols and
100 recipes. A great quick and practical reference for bench scientists
as well as for new students.
Get A Copy

A collection of tools frequently used by bench biomedical scientists, ranging from centrifugation
force conversion, molecular weight, OD, recipe calculators, to clinical calculators. Include all Primo 3.4, Abie 3.0, Heatmap Viewer, MicroHelper, Godlist Manager, label printing, and grade book.
More info


Home Products Order Contact

