GeneStat

Online

Download

Store

Contact

About

GeneStat 2.0 Online*
Statistical tools for biomedical scientists

Chi-Square Test • Test of independence. • Test of goodness of fit.

Z-Test Z-test and error probability for normal distributions.

t-Test Tests for differences in means. • Two sample • One sample • Paired

Correlation Coefficient • Pearson (linear) • Spearman Rank (non-linear)

Sample Size Estimates required sample size for achieving desired confidence intervals.

Map Distance Calculates recombination frequency and corrected map distance.

BioDataFit 1.02 Linear and sigmoidal models for standard curves and Km, Vmax, and IC50.

Probability Distribution Calculator Calculator for Poisson and binomial probability distributions.

Log-odds ratio Converts between risk ratio, odds ratio, and log-odds ratio.

*Online versions may have limited functions and may not support certain web browsers. Internet Explorer is preferred and Mac OSX users may experience problems (If having problems, switch to the classical environment). Stand-alone versions, which are not dependent on web browsers, are always suggested.

*Tell us what you think. Comments and suggestions are always appreciated.

Examples

Chi-square Test

Test of independence A vaccine trail produced the following results:

        infected   uninfected
placebo   81          1427
vaccine  179          2824

The null hypothesis: there is no relationship between row and column frequencies, i.e., vaccine/placebo will not have any effect in the infected/uninfected frequencies. Using test of independence, we find p = 0.42 and conclude that the null hypothesis cannot be rejected, i.e., the vaccine is not effective.
Goodness of fit The cross of A a B b X a a b b is used to determine whether the two loci are linked. The results are shown below:

A a B b     310
a a b b     315
A a b b     287
a a B b     288

The null hypothesis: the two loci are not linked, thus the ratio should be 1:1:1:1. The expected frequencies are:

A a B b     300
a a b b     300
A a b b     300
a a B b     300

Using the Goodness of fit test, we find p = 0.55. So the null hypothesis cannot be rejected: the two loci may not be linked.

Z-test

For sample means, the z-score is defined as z = ( <X> - m ) / ( s / n^1/2 ). For a one-sample z-test with known population mean and standard deviation, the sample size n = 1. With the z-test calculator, users can easily calculate z-score from the p-value or the vice versa. The population standard error s may be replaced with the sample standard error s if the sample size n is sufficiently large.
The national average annual incidence rate of ALS is 2.8 per 100,000 population over 15 years of age (95% CI 2.4-3.1). The eastern region has a higher incidence rate of 4.4 per 100,000 person-years. One-tailed z-test can be used to determine whether the eastern region has higher ALS incidence rate.
H₀ The null hypothesis: the annual incidence difference m between the national (m_n) and the eastern region (m_e) is 0, i.e., m = m_n - m_e = 0.
H_a The alternative hypothesis: m = m_n - m_e < 0.
We find, z = (y - m) /s = y/s = (y_n - y_e)/(s_n + s_e) ~ (y_n - y_e)/(2 * s_n) ~ (2.8-4.4)/0.7 ~ -2.3. Thus p = 0.01. The null-hypothesis should be rejected in favor of the alternative hypothesis, i.e., high ALS incidence in the eastern region is statistically significant.

t-test

t-test is frequently used to test differences in means for two datasets.
Independent data set A clinical trial tests for the effect of a cholesterol lowering drug gives the following results:

placebo   drug
200        205
215        220
225        220
230        225
210        220
210        205
200        205
220        195
           200
           195
           190

H₀ The null hypothesis: the mean cholesterol level for the drug-treated group = mean cholesterol level for the placebo group.
H_a The alternative hypothesis: the mean cholesterol level for the drug-treated group < mean cholesterol level for the placebo group.
Using one-tailed t-test for independent samples, we find p = 0.12. The null hypothesis cannot be rejected in favor of the alternative hypothesis, i.e., the drug is not effective.
Dependent data set (paired) A clinical trial tests for the effect of a cholesterol lowering drug gives the following results:

Before     After
treatment  treatment
200        205
215        220
225        220
230        225
210        220
210        205
200        205
220        195
195        200
205        195
200        190

H₀ The null hypothesis: the mean cholesterol level is the same after drug treatment.
H_a The alternative hypothesis: the mean cholesterol level is lower after drug treatment.
Using one-tailed t-test for dependent (paired) samples, we find p = 0.20. The null hypothesis cannot be rejected in favor of the alternative hypothesis, i.e., the drug is not effective.

Significance and Significance

A small p-value is considered statistically significant. But p-value is not a measure of biological significance. Suppose we have found ants' weight is < 0.1 gram with absolute certainty (p = 0) and whales' weight is > 100 ton with a p-value of 0.1. Whales are still heavier than ants no matter what the p-values are!

Correlation coefficient

Pearson correlation r = Sz_xz_y / N is frequently used to test whether two variables have linear relationships. Here z_x = (X-m)/s_x, z_y = (Y-m)/s_y and N is the sample size.
Sperman rank correlation is an alternative to Pearson correlation when the relationship is not linear. For example, the data shown below has a perfect Sperman rank correlation (r = -1, p < 0.05) but an insignificant Pearson correlation (r = -0.85 to 0.13, p > 0.05 ).

X	     Y
0	0.415616974
5	0.006357108
10	5.12865E-06
15	1.99272E-07
20	1.72342E-09
25	1.37486E-11
30	6.19116E-14
35	5.9692E-16
40	4.14682E-18
45	1.4239E-20
50	9.20835E-23

Note Pearson correlation will be significant if Y is in the log scale.

Estimate Sample size

Sample size for two samples A pilot study found that the dissociation constant (Kd) for ligand A and B are 9.5 +/- 1.2 nM and 5.7 +/- 1.5 nM. How many samples do we need to show Kd_A - Kd_B > 3 nM at the 95% confidence level? We may use the estimator for testing m₁ - m₁ = D (one-sided, independent). Reasonable estimates of parameters are a = 0.05 (95% CI), b = 0.1, D = 3, s = 1.5. Required sample size will be 5, i.e., the binding assays should be repeated at least 5 times.
For paired two samples, the differences should be calculated first, then treated as a one sample problem, i.e., testing for m (difference) = D.
Sample size for one sample In a microarray experiment, we would like to find out how many duplicate arrays needed for reliably claiming a gene's transcription is at least 2-fold up/down compared to the control. Suppose we are using cDNA arrays and the data is in the log-scale (log₂R/G). Using the estimator for testing m = D (two-sided) and the following parameters: a = 0.01 (99% CI), b = 0.1, D = log₂2 = 1, s = log₂1.5 = 0.4 (1.5 fold up/down), we find the required sample size is 3.
Sample size for confidence interval A research would like to determine a mutant fruit fly's lifetime to the accuracy of +/- 5 days. On average flies live for 70 +/- 10 days. Using the sample size estimator for confidence intervals, he finds that he needs to measure the lifetime for at least 16 individual fruit files (a = 0.05, E = 5, s = 10).
Sample size for Pearson correlation A researcher has found that gene A's activity is proportional to gene B's activity (y_a ~ k y_b ). To quantitate the effect she would like to determine the ratio to the relative accuracy of +/- 20%. Note k is proportional to the Pearson correlation r and the relative error in k is equivalent to the absolute error E in r. She used sample size estimator for Pearson correlation and found at 95% CI (a = 0.05), the required sample size is 68.

Map distance

Recombination can be used to determine map distance between two loci. For small distance, the recombination frequency (RF) is proportional to the map distance. For large distance, multiple cross-overs must be taken into account.
RF = (1 - e^{-2 * map distance}) / 2 For example, the map distance is 40 map units (m.u.) for RF = 27.5%.

BioDataFit

BioDataFit can be used to model dose-response, ligand-binding, enzyme kinetics, and growth inhibition. Emphasis is given to the four-parameter model or sigmoidal model, which is frequently used to calculate EC50 (IC50, DC50, or GI50) values in dose-response experiments such as drug screening and inhibition assays. Can also be used to calculate Michaelis-Menten Constant Km, maximum reaction rate Vm and model a standard curve.

Probability Distribution Calculator

Calculates the probability for Binomial distribution N! / n! / (N - n!) pⁿ (1 - p)^{N - n} and Poisson distribution mⁿ / n! e^-m.

Log-odds ratio

Risk ratio is the ratio of number of subjects with the event in a group to the total in the group. Odds ratio is the ratio of number of subjects with the event in a group to the number of subjects without the event. Log-odds ratio is the natural log of the odds ratio. A risk ratio of 0.2 is equivalent to an odds ratio of 0.25 and a log-odds ratio of -1.4.

Glossary

Population --- any entire collection of subjects we are interested in studying. Population mean m and standard deviation s.
Sample --- a subset of a population. Sample mean <y> and standard deviation s.
Null hypothesis --- the hypothesis being tested.
Alternative hypothesis --- the alternative hypothesis relates to the null hypothesis to be accepted if the null is rejected.
p-value --- the probability of falsely rejecting the null hypothesis if it is in fact true.
Type I error or false-positive rate a --- the risk of rejecting a true hypothesis.
Type II error or false-negative rate b --- the risk of failing to reject a false null hypothesis.
One-sided --- the rejection region is located only in one-tail of the distribution. Example: H₀ m = 20 and H_a m < 20.
Two-sided --- the rejection region is located in both tails of the distribution. Example: H₀ m = 20 and H_a m ≠ 20.
100 (1 - a)% confidence interval --- an interval estimate of m. 95% confidence interval <y> +/- 1.96 s_y. 99% confidence interval <y> +/- 2.58 s_y.

The Electronic Protocol Book Table of contents	BioToolKit 300 Download Trials
An electronic protocol book with 500 protocols and 100 recipes. A great quick and practical reference for bench scientists as well as for new students. Get A Copy	A collection of tools frequently used by bench biomedical scientists, ranging from centrifugation force conversion, molecular weight, OD, recipe calculators, to clinical calculators. Include all Primo 3.4, Abie 3.0, Heatmap Viewer, MicroHelper, Godlist Manager, label printing, and grade book. More info

Home Products Order Contact