Chapter 6 Hypothesis Testing
Hypothesis testing is the process of using data to evaluate a claim.
6.1 Stating Hypotheses
In psychology, there are generally two hypotheses of interest: the alternative and the null. The alternative hypothesis is the researchers claim, often that a treatment has an effect on the participants. The null hypothesis states the opposite, that there is no effect. They are written like this:
\(\Large H_0: \mu_1 - \mu_2 = 0\)
\(\Large H_1: \mu_1 - \mu_2 \neq 0\)
In the above, the null hypothesis (\(H_0\)) states that one mean minus a second mean equals zero. In other words, the two means are equal. If one population was treated differently than the other, the null hypothesis suggests that the treatment had no effect.
The alternative hypothese (\(H_1\)) states the opposite: that the two means differ from one another. In other words, the treatment had an effect. This is a vast oversimplification but it gets the point across.
The hypothesis testing process aids us in deciding which hypothesis to believe Based on the evidence provided by the data, we can reject the null hypothesis. That makes sense, right? If our data disagrees that two means are equal, we reject the hypothesis that two means are equal. Pretty straightforward. But here is where it gets complicated (I totally understand if you have to read this over a couple times. Or \(20\)).
Even if we reject the null hypothesis, that does not mean we can accept the alternative hypothesis.
And to complicate it even further, we can never accept that null hypothesis either. We can only fail to reject the null hypothesis. How’s that for a triple negative?
Interested readers can read more elswhere5, but here is the bottomline: traditional hypothesis testing can only help us to reject or fail to reject the null hypothesis. Nothing more, nothing less.
6.2 Rejecting the Null
To reject the null hypothesis, your data must disagree with what the null hypothesis claims to be true. But we know that our methods are prone to error. Think about the following example. Your friend says that the class average on the exam was \(90\)%. You disagree with him and think it’s something else. You state your hypotheses:
\(\Large H_0: \mu = 90%\)
\(\Large H_1: \mu \neq 90%\)
And say we asked \(5\) of the \(20\) students what they got on the test. The average of their scores was \(88\)%. That’s different from \(90\)%, but not by much. How much different does it have to be for us to reject the null hypothesis? Well, that typically has to do with how spread out the data is. If the scores varied by \(20\)% on average, then a \(2\)% difference might seem trivial. Your friend is probably right. But say that all the students in the sample scored \(88\)%. There is little variation there, so the \(2\)% difference might be meaningful.
6.3 Alpha Levels
It’s safe to say that, in general, researchers want to reject the null hypothesis. They like to show that things are related or have influence on other things. But remember how statistics is prone to error? Even decisions about hypothesis testing are prone to error. A research that rejects their null hypothesis can be wrong (Whaaaaaat researchers can be wrong? What a concept.).
Alpha levels are important to hypothesis testing because they tell us how often we are will reject the null hypothesis and be wrong. Psychological researchers often set alpha levels to \(.05\), meaning that they are willing to accept that they are wrong \(5\)% of the time. More accurately, they will reject the null hypothesis when it is actually true \(5\)% of the time.
6.4 Type I & Type II Errors
There are actually two types of errors in hypothesis testing. Type I errors (\(\alpha\)) happen when a researcher rejects the null hypothesis when it is actually true. Our alpha level reflects the rate of this type of error. In addition, Type II errors (\(\beta\)) happen when a researcher fails to reject the null hypothesis when it is actually not true. I remember them by thinking to myself alpha is Type I because 1 comes first and alpha comes first…
Which is worse? Psychology students are usually taught that Type I errors are typically worse. After all, if we incorrectly determine that a treatment is beneficial, when it is actually harmful, that would be a pretty serious error. On the flipside, if a treatment is actually beneficial but but our hypothesis test tells us that it’s not, then people may suffer without that treatment. A more nuanced view of this dilemma asks us to look at these things in context and ask more questions. What are the possible benefits? What are the possible consequences?
6.5 Power and Sample Size
Just as Type I errors correspond to alpha levels, Type II errors correspond to power. Power is the rate at which we will correctly reject the null hypothesis. Power is actually \(1 - Type\ II\ error\ rate\). The following table will help make sense of these concepts and how they are related:
Reject the Null | Fail to Reject the Null | |
---|---|---|
Null is True | Type I Error (\(\alpha\)) |
Correct |
Null is False | Power (\(1 - \beta\)) |
Type II Error (\(\beta\)) |
6.6 Directional and Non-Directional
insert footnote here↩︎