Conducting Educational Research
Calculating Inferential Statistics

In educational research, it is never possible to sample all of the people that we want to draw a conclusion about. For example, to the purpose of a research study may be to determine if a new way of teaching mathematics improves mathematical achievement for all children in Primary 1. However, it would be impossible to test all children in Primary 1 because of time, resources, and other logistical factors. Instead, the researcher chooses a sample of the population to conduct a study. However, the researcher does not want to limit her conclusions to only the few children who participated in the study. Instead, she wants to say that because her new way of teaching mathematics improved mathematical achievement for the children in her sample, it will also improve mathematical achievement for all children in Primary 1. Thus, the researcher wants to draw conclusions - or inferences - about the entire population based on the results from the sample.

The purpose of inferential statistics is to determine whether the findings from the sample can generalize - or be applied - to the entire population. There will always be differences in scores between groups in a research study. For example, imagine the following scores on WAEC from students who attended either a government or a private school. (Note that the data was created by a random number generator and thus does not reflect actual scores.)

Course Government Private
English
24.13
28.18
Mathematics
58.90
27.87
Integrated Science
61.19
47.89
Chemistry
34.38
34.25
Social Studies
99.10
86.97
CRK/IRK
70.90
70.53

In all courses listed in Table 5, there is a difference in the average scores between the government and private schools. However, which differences are large enough to be significant? In mathematics, there is a clear difference in the mean scores. On the other hand, the difference between government and private schools on CRK/IRK is less than 1 point. This clearly does not represent a meaningful difference between the two types of schools. How big must the difference be in order to conclude that there is a meaningful difference in the academic performance between government and private schools?

Inferential statistics must be used to determine whether the difference between the two groups in the sample is large enough to be able to say that the findings are significant. If the findings are indeed significant, then the conclusions can be applied, or generalized, to the entire population. On the other hand, if the difference between the groups is very small, then the findings are not significant and therefore were simply the result of chance.

Returning to the WAEC illustration, this research found that there is a difference in CRK/IRK performance in our sample: government schools had a slightly higher mean score than private schools. However, after conducting inferential statistics, we will find that the difference is not significant. This means that the difference between government and private schools in our sample was too small to conclude that an actual difference exists in the entire population of government and private schools. This tiny difference in CRK/IRK performance was simply due to chance. If we collected data from another sample of government and private schools, we will likely find the opposite of this small difference.

On the other hand, after conducting inferential statistics, imagine that we find that the difference in mathematics performance is significant. Therefore, the results from our sample - government schools have higher performance in mathematics than private schools - can also be applied to the entire population. Thus, we conclude that in general, government schools outperform private schools in mathematics. This means that if we collect data from another sample, we are almost guaranteed to find the exact same result.

When calculating inferential statistics, the key statistic is the p statistic. This p-value is the probability that the result is due to chance. The p-value can range from 0.000 to 1.000. The larger p is, the more likely the results are due to chance. If p is 0.500, the probability that the result is due to chance is 5 out of 10. This means that there is an equal probability that the result is significant in the population or that the result is due to chance. A p of 0.850 means that the probability that the result is due to chance is 85 out of 100, meaning that the finding is most likely due to chance. A p of 0.050 means the probability that the result is due to chance is 5 out of 100.

Researchers want to be fairly certain of their conclusions, so they have decided that the probability that the result is due to chance should be very small - less than 5 out of 100. This is why the standard in research is that the p must be is less than .05 for the results to be significant. (This cut-off point for p is oftentimes called the alpha, sometimes stated that alpha is less than .05.)

Recall from Writing Research Hypotheses that research hypotheses are generally stated as null hypotheses: There is no significant difference between government and private schools on mathematics performance. If the calculated p-value is less than 0.050, then the null hypothesis is rejected. If the statement "There is no significant difference" is rejected, this means that there is a significant difference. It can therefore be concluded that a difference exists between the two groups in the population, not just the sample. On the other hand, if p is greater than 0.050, this means that the null hypothesis is retained. This means the null hypothesis, as stated, is accurate: in the population, there is no significant difference…


Calculating Inferential Statistics

At this point, the data should be coded and total scores calculated for each variable. It is now time to calculate the inferential statistics. I highly recommend using VassarStats, a free website that calculates statistics for all but the most complex studies. The specific statistic to calculate for each research hypothesis should have been already identified in Method of Data Analysis step. If this has been done, the data analysis should take less than an hour to complete. Each type of inferential statistic is slightly different, so read below for how to calculate each statistic.

t-test
The t-test should be used when comparing two groups on a dependent variable. Thus, a t-test will be used to compare a treatment group to a control group or to compare males and females. For sake of illustration, refer back to the comparison of government and private schools on WAEC scores. To calculate the t-test, the data must be sorted according to the independent variable, in this case type of school. In other words, all of the government school scores are grouped together, and all of the private school scores are grouped together. In VassarStats, click on the t-test, then Two-sample t-test for independent or correlated samples. In the new screen, click Independent Samples. (Correlated samples would be for studies where the two groups were matched on another relevant variable, such as intelligence, or when both scores came from the same group of people, such as comparing English and maths scores within the government schools.) Then enter the mathematics scores for each student in the government school group in one sample area (Sample A) and the mathematics scores for each student in the private school group in the other. Click Calculate and the results should pop up.

Scroll down the screen. The first statistic to look at is the p: two-tailed. If this p is greater than 0.050, then the null hypothesis is retained; the result is not significant. If the result is not significant, analysis and interpretation is finished because there is no significant difference between groups.

If this p is less than 0.050, then the null hypothesis is rejected and the result is significant. If this is significant, then the next step is to look at the mean score for each group (scroll back up the page for this). Based on those mean scores, which group had the higher mathematics scores: government or private schools? This last step of examining the mean scores is often overlooked in data analysis, but it is very important to identify which group had the highest mean score. It is important to record the means and standard deviations for the two groups, the t, df, and two-tailed p. Click the Reset button and move to the next research hypothesis.

Analysis of Variance (ANOVA)
When comparing three or more groups on one dependent variable, an Analysis of Variance is the statistics to use. There are two basic types of ANOVAs that can be used.

One-way ANOVA: A one-way ANOVA compares multiple groups on the same variable. For example, perhaps the researcher decides to divide private schools into religious private and secular private. Now, there are three groups to be compared: government schools, religious private, and secular private. A one-way ANOVA is now necessary. To calculate the one-way ANOVA, the data must be sorted according to the independent variable - again, school type. In VassarStats, click on ANOVA, then One-Way ANOVA. Then enter the number of samples (aka the number of groups; in this example, 3). Then click Independent Samples. Enter the mathematics scores for each student in the appropriate column. For example, enter government students' scores in Sample A, religious private students' scores in Sample B, and secular private students' scores in Sample C. Then click Calculate.

Scroll down the screen. Again, the first statistic to look at is the p in the ANOVA summary table. Again, if this p is greater than 0.050, then the null hypothesis is retained; the result is not significant. If the result is not significant, analysis and interpretation is finished because there is no significant difference between groups.

If this p is less than 0.050, then the result is significant. This only means, however, that there is a significant difference between groups somewhere, not that there is a significant difference between all groups. It is possible that government students were significantly higher than religious private and secular private students, but there are no significant differences between religious private and secular private students. Down at the bottom of the screen is the result of Tukey's HSD (Honestly Significant Difference) test. This test identifies which differences are really significant. It is important to record the means and standard deviations for all groups, the ANOVA summary table, and the results of Tukey's HSD. Click the Reset button and move to the next research hypothesis.

Factorial ANOVA: The factorial ANOVA compares the effect of multiple independent variables on one dependent variable. For example, a 2x3 factorial ANOVA could compare the effects of gender and school type on academic performance. The first independent variable, gender, has two levels (male and female) and the second independent variable, school type, has three levels (government, religious private, and secular private), hence 2x3 (read "two by three"). Factorial ANOVAs can also be calculated on VassarStats (click on ANOVA then on Two-way factorial ANOVA for independent samples). However, this interpretation is a bit more complex so please see an expert statistician to help with interpreting the results.

Analysis of Covariance (ANCOVA)
When using a pre-post test research design, the Analysis of Covariance allows a comparison of post-test scores with pre-test scores factored out. For example, if comparing a treatment and control group on achievement motivation with a pre-post test design, the ANCOVA will compare the treatment and control groups' post-test scores by statistically adjusting for the pre-test scores. For an ANCOVA, you must have pre- and post-test scores for every person in the sample, and these scores must be sorted by the group (aka treatment and control group).

To calculate an ANCOVA with VassarStats, click on ANCOVA. Then VassarStats will ask for the k. The k is the number of groups. If there is only one treatment and one control group, then k=2. Click on the correct k for data import. There are two things to bear in mind when doing ANCOVA with VassarStat. It will ask for the concomitant variable and the dependent variable. The concomitant variable (CV) is the variable that should be controlled for. In the case of a pre-post test design, the concomitant variable is the pre-test. The dependent variable (DV) is the variable that you think has been affected by the independent variable. In the case of a pre-post test design, the dependent variable is the post-test. To use VassarStats, it is important that the CV and the DV are side-by-side for each of the two groups. Then enter the CV and DV into the correct columns and click Calculate.

Scroll down the screen. Just as before, the first statistic to look at is the p in the ANCOVA summary table. If this p is less than 0.050, then the null hypothesis is rejected and the result is significant. There are two sets of means that are important to understand in an ANCOVA. First, the Observed Means are the actual means for the dependent variable (post-test). Then the Adjusted Means are the means that have been statistically manipulated based on the pre-test scores. A simple way to imagine this is that the ANCOVA statistically forces the pre-test scores to be equal between the two groups (meaning that the two groups are now equal at the start of the study), and then re-calculates the post-test scores based on the adjusted pre-test scores. It is important to record the observed means, adjusted means, and standard deviations for all groups and the ACNOVA summary table. When creating the tables in the next step, report both the Observed and Adjusted Means. However, make any figures based with the Adjusted Means. Add a note to the figure so that readers are clear that these are Adjusted Means. Click the Reset button and move to the next research hypothesis.

Correlation
Correlations should be calculated to examine the relationship between two variables within the same group of participants. For example, the correlation would quantify the relationship between academic achievement and achievement motivation. To calculate a correlation, you must have scores for two variables for every participant in the sample. To calculate a correlation in VassarStats, click on Correlation & Regression, then Basic Linear Correlation and Regression, Data-Import Version. Enter the total scores for the two variables and click Calculate.

Scroll down the screen. Again, the first statistic to look at is the p: two-tailed. The null hypotheses for correlations state, There is no significant relationship between mathematics and English achievement. If the p is greater than 0.050, then the null hypothesis is retained; there is no significant relationship between variables. If the result is not significant, analysis and interpretation is finished because there is no significant relationship.

If this p is less than 0.050, then the null hypothesis is rejected and the correlation is significant. If the correlation is significant, then the next step is to look at the correlation itself, symbolized by r. For more information on how to interpret the correlation, click on Method of Data Analysis. It is important to record the means and standard deviations for the two variables, the t, df, two-tailed p, and r. Click the "Reset" button and move to the next research hypothesis.


Conclusion

Following the statistics identified from the Method of Data Analysis, calculate the correct inferential statistic for each Research Hypothesis. Once the inferential statistics have been calculated, then the statistics will be organized in tables and figures as described in the next chapter. The final step in data analysis is interpret the findings of the statistics when writing the Results section.


NEXT

Return to Educational Research Steps

Copyright 2013, Katrina A. Korb, All Rights Reserved