Step 13: Write

The *Method of Data Analysis* section outlines exactly which statistic will be used to answer each Research Question and/or Research Hypothesis. To complete this section, refer to the Research Questions and Research Hypotheses. For every research question, describe the descriptive statistic that is appropriate for answering the question. For every research hypothesis, describe the inferential statistic that is appropriate for analyzing the hypothesis. For simple statistics (e.g., percentage, mean, t-test), it is possible to also give the formula for the statistic. However, for more advanced statistics such as ANCOVA, the statistic is much too complex to describe the formula. The following general guidelines should help you determine which statistic is appropriate for each research question and hypothesis.

Note that in many research studies, a range of different statistics will be necessary. This means that researchers should examine each research question and hypothesis separately to consider which statistic is appropriate.

Research Questions

Research questions are always answered with a descriptive statistic: generally either percentage or mean. Percentage is appropriate when it is important to know how many of the participants gave a particular answer. Generally, percentage is reported when the responses have discrete categories. This means that the responses fall in different categories, such as female or male, Christian or Muslim, and smoker or non-smoker. Sometimes frequencies are also reported when the data has discrete categories. However, percentages are easier to understand than frequencies because the percentage can be interpreted as follows. Imagine there were exactly 100 cases in the sample. How many cases out of those 100 would fall in that category?

The mean is reported when it is important to understand the typical response of all the participants. Generally, mean is reported when the responses are continuous. This means that the data has numbers that continue from one point to the last point. For example, age is continuous because it can range from 0 to 100 or so. Scores on an exam are also continuous. In these cases, the mean describes the typical score across all participants.

Research Hypotheses using "Relationship"

Whenever a research hypothesis uses the word "relationship," it generally means that a correlation will be calculated. The correlation statistic examines the relationship between two continuous variables within the same group of participants. For example, the correlation would quantify the relationship between academic achievement and achievement motivation. The null hypothesis of a correlation is stated as "there is no significant relationship between academic achievement and achievement motivation."

When calculating the correlation, it is important to not just calculate the correlation, but also the significance of the correlation. The **p-value** determines whether the relationship is significant. If the p-value is greater than 0.05, then the null hypothesis is retained: there is indeed __no__ relationship between the two variables. Since no significant relationship exists between the variables, then no further interpretation is necessary. If the p-value is less than 0.05, then the null hypothesis is rejected, meaning that there __is__ a significant relationship between the two variables. (Read below for more information about interpreting the significance of a p-value.) The correlation (symbolized as **r**) then can be interpreted.

The correlation has two dimensions. The **direction** of the correlation is indicated by the sign of the correlation. If the correlation is positive, that means that as one variable increases, the other variable also increases. The greater the achievement motivation, the greater the academic achievement. However, a negative correlation means that as one variable increases, the other variable decreases. The more time a person spends watching television, the lower their academic achievement

The second dimension of a correlation is its **strength**. The strength of the correlation is indicated by the absolute value of the number (i.e., the value of the number itself without the positive or negative sign). The closer the absolute value is to 1, the stronger the relationship, while the closer the absolute value is to 0, the weaker the relationship. For example, a correlation of -0.71 and 0.87 are both strong correlations while correlations of -0.18 and 0.09 are both weak correlations

When the term "relationship" is used in a research hypothesis, sometimes a chi-square statistic may be calculated. Chi-square should be used when both of the variables are discrete, meaning that both variables are represented by categories, not numbers. For example, a chi-square would be used to determine if there is a relationship between gender and smoking status. Gender can only be represented as categories (male and female) as well as smoking status (smoker and non-smoker). However, most of the time, chi-square is misused. Some researchers will group participants into categories based on numerical data, such as taking academic achievement and grouping students into "high achievement" and "low achievement" categories based on their numerical scores on an examination. This is not correct. It is much better to keep the original scores on the exam and calculate a correlation, because it keeps the data in its original form. Researchers are more likely to get a significant result when original data is used, instead of grouping participants into artificial categories.

Research Hypotheses using either "Effect" or "Difference"

When a research hypothesis looks at the "effect of a treatment" or "difference between groups," then there are three possible statistics that can be used. The specific statistic depends on the research design. First, consider whether the study will administer the instrument once or twice (e.g., pre-post test experimental or quasi-experimental design). If the study will use a pre-post test design, then an Analysis of Covariance (ANCOVA) should be used. If the instrument will only be administered once, then consider how many groups will be used in the study (either treatment/control group or various groups for the causal-comparative design). If there will be only two groups, then a t-test should be used to compare the two groups. If there will be three or more groups, then the Analysis of Variance (ANOVA) should be used. More details for each of the statistics are given below. Also read more about the theory behind p-values to help you understand what this statistic means.

*t*-test

When comparing two groups on one dependent variable, a *t-test* should be used. For example, use a t-test to compare a treatment group to a control group or to compare males and females.

__ANOVA__

When comparing three or more groups on one dependent variable, an *Analysis of Variance* is the statistics to use. There are two basic types of ANOVAs.

**One-way ANOVA:**A one-way ANOVA compares multiple groups on the same variable. For example, a one-way ANOVA would be used to compare the achievement motivation of students in JS1, JS2, and JS3.**Factorial ANOVA:**The factorial ANOVA compares the effect of multiple independent variables on one dependent variable. For example, a 2x3 factorial ANOVA could compare the effects of gender and grade level on achievement motivation. The first independent variable, gender, has two levels (male and female) and the second independent variable, class, has three levels (JS1, JS2, and JS3). This makes the factorial ANOVA a 2x3. Another study might have three treatment groups and three grade levels. Because the independent variables each have three levels, it would be a 3x3 ANOVA.

__ANCOVA__

When using a pre-post test research design, the *Analysis of Covariance* allows a comparison of post-test scores with pre-test scores factored out. For example, if comparing a treatment and control group on achievement motivation with a pre-post test design, the ANCOVA will compare the treatment and control groups' post-test scores by statistically setting the pre-test scores as being equal.

The

Any of the statistics used to answer research hypotheses are called inferential statistics (correlation, chi-square, t-test, ANOVA, and ANCOVA). Educational researchers can never sample the entire population. Instead, a sample is chosen to represent the population. However, the researcher still wants to draw conclusions about the entire population even though only a sample actually participated in the study. In other words, the researcher wants to make inferences about the population based on the results from the sample. The purpose of inferential statistics is to determine whether the findings from the sample can generalize to the entire population, or whether the findings were simply the result of chance.

Imagine a room full of socks - socks from the floor to the ceiling, from the back of the room clear to the front door. You want to determine whether there are more white socks than green socks in the room. However, there are too many socks to count, so you decide to take a sample of socks. You count the number of white and green socks in the sample. Then, you would like to draw a conclusion about whether there are more white socks in the entire room based on your sample. The purpose of inferential statistics is to determine whether the colors chosen in the sample likely reflect the entire room or if your results from the sample of socks were due to chance.

What factors will determine whether the sample of socks adequately represents the entire room? First, the size of the sample. If only two socks were picked, they would very likely not represent the entire room. The larger the sample is, the more representative the sample will be of the entire room and the more accurate the conclusions will be for the entire room. This is why when conducting experiments, a larger sample is generally better (although not always). With large samples, the results will more likely reflect the entire population

The second factor that determines whether the sample of socks adequately represents the entire room is the actual size of the difference between white and green socks in the entire room. If there are only two more white socks than green socks in the entire room, then it will be very difficult to find a significant difference between white and green socks in the sample. In other words, because there is only a very small difference between green and white socks in reality, it will be practically impossible to find a significant difference in the sample. On the other hand, if there are thousands more white socks than green socks in the entire room, it should be relatively easy to find a significant difference in the sample. This means than when you are conducting a research study, try to ensure that there really might be a large difference between groups in reality. Otherwise, you will not find significant results. If conducting an experimental design, plan very well to make the treatment very effective. Very effective treatments result in a large changes in the dependent variable and increase the chance of finding a significant difference in the study. This is also why large sample sizes are not always best: if the sample size is too large, the treatment might not be very effective, which will decrease the chance of getting a significant result.

Another way of thinking about significance testing is this: imagine you wanted to determine if there was a difference between males and females in science achievement. To do this, you administer a science achievement test to 50 males and 50 females. Then you calculate the mean (average) science achievement score for the males and the mean (average) science achievement score for the females. It is practically impossible for the mean scores to be exactly identical. In other words, there will *always* be at least some small difference between the groups. However, this difference may be very small: perhaps the mean score for the males is 50.21 (out of 100) while the mean score for the females is 50.25. Yes, there is a difference between males and females. However, is this difference large enough to be significant, a meaningful difference? The inferential statistic will determine whether this difference is large enough to conclude that yes, the difference is significant and there is a meaningful difference between males and females in science achievement.

For the t-test, ANOVA, and ANCOVA, four statistics are important to report. First, the p-value determines whether the differences between the groups are significant. If the p-value is less than 0.05, then we say that the differences are significant and the null hypothesis can be rejected. For example, if the null hypothesis was that there is no significant difference between males and females on achievement motivation and the p-value is 0.02, then we reject the null hypothesis and say there __is__ a significant difference between males and females in achievement motivation. However, if the p-value is greater than .05, then the statistic is not significant. This means the null hypothesis is retained: indeed, there is __no__ difference between males and females in achievement motivation.

When reporting the p-value, the value of **t** (for *t*-test) or **F** (for ANOVA and ANCOVA) and the number of **degrees of freedom** must also be included. The **mean** scores and **standard deviation** for each of the groups on the dependent variable must also be reported, which helps the reader to interpret which group has the highest average on the dependent variable.

Any of the previously mentioned statistics can be calculated using the VassarStats website for free.

Return to Educational Research Steps

Copyright 2013, Katrina A. Korb, All Rights Reserved