I. Comparing Means
Often we are interested in the question, "How different are the means of these two groups of data?". There are several ways we can compare two samples of experimental units. One common approach compares the means of two samples by performing a t-test. If the samples are naturally paired in some way, such as an assessment being performed on the same subjects twice, a paired t-test is appropriate. If instead you want to compare average measurements for two separate groups, an unpaired t-test (independent samples) is appropriate. These are further explained below. In either case, the t-test compares two samples and determines the likelihood of the observed difference occurring by chance. The chance is reported as the p-value. A small p-value (for example, 0.01) means it is unlikely (only a 1 in 100 chance) that such a mean difference would occur by chance. In such a case we would say that there is a statistically significant difference between the two groups. Often, a p-value of 0.05 (5%) or smaller is necessary to show statistical significance. The main factors which allow us to deduce that there is a significant difference between two groups (or to infer that no difference exists) are a large sample size and a low variability in our measurements within a sample.
A. Comparing differences between two means (paired data): paired t-test
The most common use of a paired t-test is the comparison of two measurements from the same individual or experimental unit. The two measurements can be made at different times or under different conditions, such as blood cholesterol levels before and after a diet low in fats, as depicted in the table to the right. In a paired t-test the measures can be highly variable initially, but if they all change in the same manner you may still see a significant difference, despite the fact that pre- and post-diet cholesterol levels vary tremendously between the subjects. Because nearly every subject's cholesterol drops, it is likely that diet has a significant effect on cholesterol level, despite the initial value. The test is run to determine if the change is statistically significant.

B. Comparing differences between two means (unpaired data / independent samples): unpaired t-test
Unpaired t-tests are used to compare separate (independent) groups, which have experienced some different condition. For example, trained and untrained subjects may be compared for their resting heart rate. There is no pairing between the two samples since they represent two separate groups of subjects; that is, the first untrained subject is not associated with any particular trained subject, nor is the second, and so on. Thus, the variability (within a treatment or sample) in the data is much more important for determining if a difference actually exists. Alternatively, these data would be considered paired if we compared the same subject before and after training.

Type I Error: Saying there is a difference when there isn't. What would happen if we were measuring the risk of heart attacks when taking a certain antihistamine? If we said there were significantly more heart attacks in people using the antihistamine than in a control group taking a placebo, and this conclusion was based on a small sample size with a resultant p = .045, our drug company would have to pull the drug off of the market. There is a 4.5% chance that this difference in the observed number of heart attacks occurred randomly and the antihistamine had nothing to do with it. This conclusion could potentially waste millions of dollars and years of research and development.
By choosing a low level of significance we can avoid Type I Errors, and often a p < .01 is required for pharmaceutical "proof". So, why not choose a significance level of 0.00000001? Then we could be even more sure that we would not see a false difference!? Choosing a low p value increases the risk of a type II error.
Type II Error: Saying there isn't a difference when there is. There is a great amount of variation in nature. This includes natural variation in any measure, including an individual's specific reaction to a drug. Some react strongly, others not as much. There is also an inherent error based in every measurement made...no condition is perfectly measured. By requiring too much difference to infer a difference (a low p-value), we lower the "power" of our statistical test to detect a difference. If the drug company above required a p < .01 to determine that there actually was a higher rate of heart attacks, this study would allow them to say that "there was not a significantly increased risk of heart attacks when using this new antihistamine," because p > .01.
What is the correct level of significance? Knowing which type of risk to avoid is very important. For example, one might think that the drug company should be more concerned about a type II error in this example, saying there isn't an increased risk of side effects (heart attacks) when there actually is an increased risk of side effects.
Alternatively, if one were evaluating the effectiveness of a heart attack medicine (measuring the number of heart attacks experienced by one group on a placebo and another on the medicine), one should be concerned that the number of heart attacks was significantly different in the medicated group (presumably fewer heart attacks) and worry about a Type I Error occurring saying there is a decreased risk of heart attacks when there isn't really a decreased risk. They should be especially concerned with this conclusion if there are other maladaptive side effects of the drug as there often are.
The best way of increasing the power of your test is to increase your sample size, make an accurate measurement, and use a paired comparison if at all possible!
In nature, variation is often great and difficult to control. Sample size is often limited as well, not just due to a lack of animals or ecosystems, but even such factors as funding come into play. Pharmaceutical companies have vast resources whereas the World Wildlife Fund may be more limited. Because of this, a significance level of 0.05 is normally accepted in most of biology.
II. Analysis of Variance (ANOVA) is simply a way of comparing multiple groups instead of only two groups (using a t-test). Like a t-test you derive a p value, which tells you if there is a difference between all of the groups. If differences are found, post-hoc tests can be used to determine between which groups the differences lie.
III. Bivariate Relationships: R-squared, Regressions, and Correlations
A scatterplot shows one variable plotted against a second variable. Below is a plot showing the relationship between grip strength and arm strength for 147 people. The plot shows a very strong but certainly not a perfect relationship between these two variables. Scatterplots should almost always be constructed when the relationship between two variables is of interest. Statistical summaries are no substitute for a full plot and a good look at the data.
The correlation between two variables reflects the degree to which the variables are related. The most common measure of correlation is the Pearson Product Moment Correlation, and is designated by the letter "r" and is sometimes called "Pearson's r." This reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The scatterplot shown depicts such a positive relationship because high scores on the X-axis are associated with high scores on the Y-axis. A correlation of -1 means that there is a perfect negative linear relationship between variables. A correlation of 0 means there is no relationship between the two variables. Correlations are rarely if ever 0, 1, or -1.
By squaring this r-value we get an "r-squared" value, which is always positive and is said to reflect the amount of variation in Y which is accounted for by variation in X. In the figure above, if the r2 value were 0.75 it would be said that 75% of the variation in arm strength (Y) can be predicted by the variation in grip strength (X). 25% of the variation would be due to other factors not accounted for by grip strength. These could be factors such as height, weight, genetics, or measurement error.

A Pearson's correlation does not determine significance of the relationship, a statistical test is done and again a p-value is obtained. A low p-value (< .05) indicates that there is a significant relationship between the two variables, while a high p-value indicates that the two measures are unrelated. It is apparent that in the graph above that arm strength is related to grip strength, but a statistical test is required to confirm this observation. Often an apparently low r-value can still be significant, especially when sample size is high.
The two specific tests used to test significance in a relationship are regressions and correlations. Deciding which is the appropriate test is sometimes difficult. A regression indicates that one variable is dependent on another variable (metabolic rate is a function of weight) and each variable is designated as either the independent variable (weight) or the dependent variable (metabolic rate). Metabolic rate might vary with the weight of an individual, but weight is not dependent on metabolic rate.
A correlation simply says two measures relate to each other but no causation is implied. Finger length may correlate to nose length, but one does not cause the other to vary. Statistically speaking, the tests are identical and result in the same significance level being assigned to the relationship. The assumptions, which go into a regression, are much more exacting, however, and one might say that the conclusions are more powerful as well. A classic example is that the number of murders in a town correlates to the number of liquor stores. Of course there are more murders and more liquor stores in larger towns, and fewer of each in smaller towns. The two correlate, but it would be more difficult to prove that more liquor stores cause more violent crime.
IV. Non-Linear Relationships
Sometimes a bivariate relationship shows a very strong relationship but is non-linear. Statistical tests normally assume you are testing for a linear relationship, and may not prove to be significant in the graph shown to the right or may yield a low r-squared. This would indicate that the variables are unrelated, but this is actually a very predictable and significant relationship. Other tests exist to test for these type of relationships (power, exponential, logarithmic, etc.). Alternatively, the data may be transformed such that the data becomes linear, allowing us to use more common and comfortable linear statistics to examine the transformed data set. For example, logarithmic transformations might be used to "linearize" the above data. Tests exist to determine what type of "fit" is best suited to the data, but sometimes just plotting the data and looking at it is your first and best method for determining the basic relationship.

There are many different and useful statistical packages available for evaluating collected data. While Excel is certainly not the best program for analyzing data, it is a useful tool that we will use extensively in this class for performing simple statistical comparisons. Following the steps below to run the statistical tests that we will use throughout this semester in the laboratory:

A. Performing a t-test:
1. The data that you are analyzing should first be carefully put into a proper spreadsheet. These will likely be provided by your instructor for you to input data. Once your data are all entered and you are ready to perform a student's t-test, put your computer cursor the cell where you want to return your p-value.
2. When the cursor is at the chosen location, move to the tool bar and click on the function button (fx).
3. Once in the function menu, choose "statistical" from the options on the left and "TTEST" from the options on the right. Once you have made your selections, click "OK".
4.This will bring up the t-test menu. In the box marked "Array 1" you will highlight the first column of data involved in the comparison. In the box marked "Array 2" you will highlight the second column of data in the comparison. You will then enter the number of "tails" that you want for your comparison. Without going into detail, I will ask you to choose one-tailed distribution (2) for the purposes of this class. You will then enter the type of t-test (paired or unpaired). This will be according to the data that you have collected. If you choose an unpaired test, for the purposes of this class we will assume equal variance. We have no reason to suspect otherwise.
5. After you have chosen the parameters for your t-test, press "enter" and you will find your pvalue in the cell where you originally placed your cursor. This is the p value for your ttest.
6. Depending on your chosen significance level and your actual pvalue, you will then make inferences about your data. From the inference(s), you can then arrive at your conclusions.
B. Performing a Regression Analysis
Remember first that the statistical output for a regression and a correlation are the same. If it is appropriate to run either a regression or a correlation, you will be required to get two different values. The first will be the r value (Pearson's r), the second will be the r squared value. These are reflections of the strength of the relationship between the two variables.
1. Again, your data should be in the format provided by your instructor. Place your cursor where you want your output to be. Choose function. Choose statistical on the left and correlation on the right.
2. In the correlation menu you will highlight both array one and array two, as you did for the ttest. When you press "enter", you will have an R value for the relationship between these variables.
3. You should then place the cursor where you want to have the rsquared value. Click on the function button. Choose statistical on the left and rsquared on the right. When your rsquared menu pops up, highlight your array 1 and array 2 values as before. When you press enter, you will now have a r squared value for this relationship.