The main purpose of statistics is to test a hypothesis. For example, you might run an experiment and find that a certain drug is effective at treating headaches. But if you can’t repeat that experiment, no one will take your results seriously. A good example of this was the cold fusion discovery, which petered into obscurity because no one was able to duplicate the results.
It can really be anything at all as long as you can put it to the test.
If you are going to propose a hypothesis, it’s customary to write a statement. Your statement will look like this:
“If I…(do this to an independent variable)….then (this will happen to the dependent variable).”
Hypothesis testing in statistics is a way for you to test the results of a survey or experiment to see if you have meaningful results. You’re basically testing whether your results are valid by figuring out the odds that your results have happened by chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.
Hypothesis testing can be one of the most confusing aspects for students, mostly because before you can even perform a test, you have to know what your null hypothesis is. Often, those tricky word problems that you are faced with can be difficult to decipher. But it’s easier than you think; all you need to do is:
If you trace back the history of science, the null hypothesis is always the accepted fact. Simple examples of null hypotheses that are generally accepted as being true are:
You won’t be required to actually perform a real experiment or survey in elementary statistics (or even disprove a fact like “Pluto is a planet”!), so you’ll be given word problems from real-life situations. You’ll need to figure out what your hypothesis is from the problem. This can be a little trickier than just figuring out what the accepted fact is. With word problems, you are looking to find a fact that is nullifiable (i.e. something you can reject).
The hypothesis statement in this question is that the researcher believes the average recovery time is more than 8.2 weeks. It can be written in mathematical terms as:
H1: μ > 8.2
Next, you’ll need to state the null hypothesis (See: How to state the null hypothesis). That’s what will happen if the researcher is wrong. In the above example, if the researcher is wrong then the recovery time is less than or equal to 8.2 weeks. In math, that’s:
H0 μ ≤ 8.2
Ten or so years ago, we believed that there were 9 planets in the solar system. Pluto was demoted as a planet in 2006. The null hypothesis of “Pluto is a planet” was replaced by “Pluto is not a planet.” Of course, rejecting the null hypothesis isn’t always that easy — the hard part is usually figuring out what your null hypothesis is in the first place.
Hypothesis Testing Examples (One Sample Z Test)
The one sample z test isn’t used very often (because we rarely know the actual population standard deviation). However, it’s a good idea to understand how it works as it’s one of the simplest tests you can perform in hypothesis testing. In English class you got to learn the basics (like grammar and spelling) before you could write a story; think of one sample z tests as the foundation for understanding more complex hypothesis testing. This page contains two hypothesis testing examples for one sample z-tests.
One Sample Hypothesis Testing Examples: #2
A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15.
Step 1: State the Null hypothesis. The accepted fact is that the population mean is 100, so: H0: μ=100.
Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ scores, so:
H1: μ > 100.
The fact that we are looking for scores “greater than” a certain point means that this is a one-tailed test.
Step 3: Draw a picture to help you visualize the problem.
Step 4: State the alpha level. If you aren’t given an alpha level, use 5% (0.05).
Step 5: Find the rejection region area (given by your alpha level above) from the z-table. An area of .05 is equal to a z-score of 1.645.
Step 6: Find the test statistic using this formula:
For this set of data: z= (112.5-100) / (15/√30)=4.56.
Step 6: If Step 6 is greater than Step 5, reject the null hypothesis. If it’s less than Step 5, you cannot reject the null hypothesis. In this case, it is greater (4.56 > 1.645), so you can reject the null.
One Sample Hypothesis Testing Examples: #3
Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A researcher thinks that a diet high in raw cornstarch will have a positive or negative effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean glucose level of 140. Test the hypothesis that the raw cornstarch had an effect.
Step 1: State the null hypothesis: H0:μ=100
Step 2: State the alternate hypothesis: H1:≠100
Step 3: State your alpha level. We’ll use 0.05 for this example. As this is a two-tailed test, split the alpha into two.
Step 4: Find the z-score associated with your alpha level. You’re looking for the area in one tail only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed test, you would also be considering the left tail (z=1.96)
Step 5: Find the test statistic using this formula:
Step 6: If Step 5 is less than -1.96 or greater than 1.96 (Step 3), reject the null hypothesis. In this case, it is greater, so you can reject the null.
*This process is made much easier if you use a TI-83 or Excel to calculate the z-score (the “critical value”).
Hypothesis Testing Examples: Mean (Using TI 83)
You can use the TI 83 calculator for hypothesis testing, but the calculator won’t figure out the null and alternate hypotheses; that’s up to you to read the question and input it into the calculator.
Sample problem: A sample of 200 people has a mean age of 21 with a population standard deviation (σ) of 5. Test the hypothesis that the population mean is 18.9 at α = 0.05.
Step 1: State the null hypothesis. In this case, the null hypothesis is that the population mean is 18.9, so we write:
H0: μ = 18.9
Step 2: State the alternative hypothesis. We want to know if our sample, which has a mean of 21 instead of 18.9, really is different from the population, therefore our alternate hypothesis:
H1: μ ≠ 18.9
Step 3: Press Stat then press the right arrow twice to select TESTS.
Step 4: Press 1 to select 1:Z-Test…. Press ENTER.
Step 5: Use the right arrow to select Stats.
Step 6: Enter the data from the problem:
Step 7: Arrow down to Calculate and press ENTER. The calculator shows the p-value:
p = 2.87 × 10-9
This is smaller than our alpha value of .05. That means we should reject the null hypothesis.
Bayesian Hypothesis Testing: What is it?
Image: Los Alamos National Lab.
Bayesian hypothesis testing helps to answer the question: Can the results from a test or survey be repeated?
Why do we care if a test can be repeated? Let’s say twenty people in the same village came down with leukemia. A group of researchers find that cell-phone towers are to blame. However, a second study found that cell-phone towers had nothing to do with the cancer cluster in the village. In fact, they found that the cancers were completely random. If that sounds impossible, it actually can happen! Clusters of cancer can happen simply by chance. There could be many reasons why the first study was faulty. One of the main reasons could be that they just didn’t take into account that sometimes things happen randomly and we just don’t know why.
It’s good science to let people know if your study results are solid, or if they could have happened by chance. The usual way of doing this is to test your results with a p-value. A p value is a number that you get by running a hypothesis test on your data. A P value of 0.05 (5%) or less is usually enough to claim that your results are repeatable. However, there’s another way to test the validity of your results: Bayesian Hypothesis testing. This type of testing gives you another way to test the strength of your results.
Bayesian Hypothesis Testing.
Traditional testing (the type you probably came across in elementary stats or AP stats) is called Non-Bayesian. It is how often an outcome happens over repeated runs of the experiment. It’s an objective view of whether an experiment is repeatable.
Bayesian hypothesis testing is a subjective view of the same thing. It takes into account how much faith you have in your results. In other words, would you wager money on the outcome of your experiment?
Differences Between Traditional and Bayesian Hypothesis Testing.
Traditional testing (Non Bayesian) requires you to repeat sampling over and over, while Bayesian testing does not. The main different between the two is in the first step of testing: stating a probability model. In Bayesian testing you add prior knowledge to this step. It also requires use of a posterior probability, which is the conditional probability given to a random event after all the evidence is considered.
Arguments for Bayesian Testing.
Many researchers think that it is a better alternative to traditional testing, because it:
- Includes prior knowledge about the data.
- Takes into account personal beliefs about the results.
- Including prior data or knowledge isn’t justifiable.
- It is difficult to calculate compared to non-Bayesian testing.
Back to top
Hypothesis Testing Articles
- What is Ad Hoc Testing?
- What is a Rejection Region?
- What is a Two Tailed Test?
- How to Decide if a Hypothesis Test is a One Tailed Test or a Two Tailed Test.
- How to Decide if a Hypothesis is a Left Tailed Test or a Right-Tailed Test.
- How to State the Null Hypothesis in Statistics.
- How to Find a Critical Value.
- How to Support or Reject a Null Hypothesis.
- Chi Square Test for Normality
- Cochran-Mantel-Haenszel Test
- F Test
- Granger Causality Test.
- Hotelling’s T-Squared
- KPSS Test.
- What is a Likelihood-Ratio Test?
- Log rank test.
- Sequential Probability Ratio Test
- How to Run a Sign Test.
- T Test: one sample.
- T-Test: Two sample.
- Welch’s ANOVA.
- Welch’s Test for Unequal Variances.
- Z-Test: one sample.
- Z Test: Two Proportion
- Wald Test.
- What is an Acceptance Region?
- How to Calculate Chebyshev’s Theorem.
- Decision Rule.
- Degrees of Freedom.
- False Discovery Rate
- How to calculate the Least Significant Difference.
- Levels in Statistics.
- How to Calculate Margin of Error.
- Mean Difference (Difference in Means)
- The Multiple Testing Problem.
- What is the Neyman-Pearson Lemma?
- How to Find a Sample Size (General Instructions).
- Sig 2(Tailed) meaning in results
- What is a Standardized Test Statistic?
- How to Find Standard Error
- Standardized values: Example.
- How to Calculate a T-Score.
- T-Score Vs. a Z.Score.
- Testing a Single Mean.
- Unequal Sample Sizes.
- Uniformly Most Powerful Tests.
- How to Calculate a Z-Score.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.Comments are now closed for this post. Need help or want to post a correction? Please post a comment on our Facebook page and I'll do my best to help!
Hypothesis Testing: Upper-, Lower, and Two Tailed Tests
The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.
- Step 1. Set up hypotheses and select the level of significance α.
H0: Null hypothesis (no change, no difference);
H1: Research hypothesis (investigator's belief); α =0.05
Upper-tailed, Lower-tailed, Two-tailed Tests
The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize:
The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.
- Step 2. Select the appropriate test statistic.
The test statistic is a single number that summarizes the sample information. An example of a test statistic is the Z statistic computed as follows:
When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.
- Step 3. Set up decision rule.
The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.
- The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H0 if the test statistic is smaller than the critical value. In a two-tailed test the decision rule has investigators reject H0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
- The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.
- The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.
The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.
Rejection Region for Upper-Tailed Z Test (H1: μ > μ0 ) with α=0.05
The decision rule is: Reject H0 if Z > 1.645.
Rejection Region for Lower-Tailed Z Test (H1: μ < μ0 ) with α =0.05
The decision rule is: Reject H0 if Z < 1.645.
Rejection Region for Two-Tailed Z Test (H1: μ ≠ μ 0 ) with α =0.05
The decision rule is: Reject H0 if Z < -1.960 or if Z > 1.960.
The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."
Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."
- Step 4. Compute the test statistic.
Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.
- Step 5. Conclusion.
The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).
If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H0.
Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H0.
Things to Remember When Interpreting P Values
We now use the five-step procedure to test the research hypothesis that the mean weight in men in 2006 is more than 191 pounds. We will assume the sample data are as follows: n=100, =197.1 and s=25.6.
- Step 1. Set up hypotheses and determine level of significance
H0: μ = 191 H1: μ > 191 α =0.05
The research hypothesis is that weights have increased, and therefore an upper tailed test is used.
- Step 2. Select the appropriate test statistic.
Because the sample size is large (n>30) the appropriate test statistic is
- Step 3. Set up decision rule.
In this example, we are performing an upper tailed test (H1: μ> 191), with a Z test statistic and selected α =0.05. Reject H0 if Z > 1.645.
- Step 4. Compute the test statistic.
We now substitute the sample data into the formula for the test statistic identified in Step 2.
- Step 5. Conclusion.
We reject H0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H0. In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H0. In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.
Type I and Type II Errors
In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).
Table - Conclusions in Test of Hypothesis
Do Not Reject H0
H0 is True
Type I Error
H0 is False
Type II Error
In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H0, then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H0 that the research hypothesis is true (as it is the more likely scenario when we reject H0).
When we run a test of hypothesis and decide not to reject H0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H0 | H0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H0, it may be very likely that we are committing a Type II error (i.e., failing to reject H0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H0, we conclude that we do not have significant evidence to show that H1 is true. We do not conclude that H0 is true.
The most common reason for a Type II error is a small sample size.
return to top | previous page | next page