When you run a test of statistical significance, whether it's correlation, ANOVA, regression, or another type of test, you're assigned a p-value somewhere in the output. If your test statistic is symmetrically distributed, you can choose one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test. However, the p-value shown is (almost always) for a two-tailed test. But how do you choose which test? Is the p-value appropriate for your test? And if not, given the p-value in your output, how can you calculate the correct p-value for your test?
What is a two-tailed test?
First, let's start with the importance of a two-tailed test. If you use a significance level of 0.05, a two-tailed test assigns half its alpha to the test for statistical significance in one direction and half its alpha to the test for statistical significance in the other direction. This means that 0.025 is at each end of the distribution of your test statistic. When using a two-tailed test, regardless of which direction of the relationship you suspect, you are testing the possibility of the relationship going both ways. For example, we might want to compare the mean of a sample to a specific value.xwith a t-test. Our null hypothesis is that the mean is the samex🇧🇷 A two-tailed test tests both if the mean is significantly greater thanxand if the mean is significantly less thanx🇧🇷 The mean is considered to be significantly differentxwhether the test statistic belongs to the top 2.5% or the bottom 2.5% of your probability distribution, resulting in a p-value of less than 0.05.
What is a one-tailed test?
Next, let's discuss the importance of a one-tailed test. If you use a significance level of 0.05, a one-tailed test allocates its entire alpha to testing for statistical significance in the direction of interest. This means that 0.05 is an end of the distribution of your test statistic. By using a one-tailed test, you are testing the possibility of a relationship in one direction and completely excluding the possibility of a relationship in the other direction. Let's go back to our example comparing a sample mean to a specific valuexwith a t-test. Our null hypothesis is that the mean is the samex🇧🇷 A one-tailed test tests whether the mean is significantly greater thanxor if the mean is significantly less thanx, but not both. Thus, the mean value is significantly larger or smaller than depending on the selected tailxwhether the test statistic falls in the top 5% of its probability distribution or in the bottom 5% of its probability distribution, resulting in a p-value of less than 0.05. One-sided testing gives you more opportunities to see an effect in one direction than testing for an effect in the other direction. Below is a discussion of when this is an appropriate option.
When is a one-sided test useful?
Because the one-tailed test provides more power to detect an effect, you might be tempted to use a one-tailed test when you have a hypothesis about the direction of an effect. Before that, consider the consequences of losing an effect in the other direction. Imagine you have developed a new drug that you believe will be an improvement over an existing drug. You want to maximize your ability to spot improvements, so opt for a one-tailed test. In doing so, it does not prove the possibility that the new drug is less effective than the existing one. The consequences in this example are extreme, but they highlight the danger of inappropriately applying a one-tailed test.
So when is a one-sided test appropriate? If you consider the consequences of not having an effect from untested driving and conclude that they are immaterial and in no way irresponsible or unethical, you can proceed with a one-sided test. For example, imagine again that you had developed a new drug. It is cheaper than the existing drug and, in their opinion, no less effective. If you try this drug, you are only interested in testing whether it is less effective than the existing drug. You don't care if it's significantly more effective. You just want to show that you are no less effective. In this scenario, a one-tailed test would be appropriate.
When is a one-tailed test NOT appropriate?
It is inappropriate to choose a one-tailed test only to reach significance. Choosing a one-tailed test after performing a two-tailed test that did not reject the null hypothesis is inappropriate, no matter how "close" to significance the two-tailed test is. Improper use of statistical tests can lead to invalid, non-reproducible and highly questionable results, a high price to pay for a significant star in your results table!
Deriving a one-sided test from a two-sided output
By default, statistical packages that run tests report two-sided p-values. Because the most commonly used distributions of test statistics (standard normal, Student's t) are symmetric about zero, most one-tailed p-values can be derived from two-tailed p-values.
Below is the result of a 2-sample t-test in Stata. The test compares the average male score to the average female score. The null hypothesis states that the mean difference is zero. The two-tailed alternative is that the mean difference is not zero. There are two unilateral alternatives that can be chosen for the exam: the male score is greater than the female score (diff > 0) or the female score is greater than the male score (diff < 0). In this case, Stata presents results for the three alternatives. under the titlesHa: difference < 0jHa: Difference > 0are the results of the one-tailed tests. In the middle, under the headingHa: difference != 0(i.e. the difference is non-zero), are the two-tailed test results.
2-Sample t-Test with Equal Variances ---------------------------------------- - - ---------------------------------- Group | Note Average hours err. Development pattern [95% conf range]---------------------+-------------------- ------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507--------------------+----------------------------------- - ------------------combined | 200 52.775 0.6702372 9.478586 51.45332 54.09668--------+---------------------- ----- ------ --------------------------------------- ---- Difference | -4.869947 1.304191 -7.441835 -2.298059--------------------- --- ----- ------------------------------ Degrees of freedom: 198 Ho: Average(male) - Average( female) ) = difference = 0 Ha: difference < 0 Ha: difference != 0 Ha: difference > 0 t = -3.7341 t = -3.7341 t = -3.7341PAG < t = 0,0001 PAG > |t| = 0,0002 P > t = 0,9999
Note that the test statistic -3.7341 is the same for all of these tests. The two-tailed p-value is P > |t|. This can be rewritten as P(>3.7341) + P(< -3.7341). Since the t-distribution is symmetric about zero, these two probabilities are equal: P > |t| = 2*P(<-3.7341). Thus we can see that the two-sided p-value is twice the one-sided p-value for the alternative hypothesis that (diff < 0). The other one-sided alternative hypothesis has a p-value of P(>-3.7341) = 1-(P<-3.7341) = 1-0.0001 = 0.9999. Therefore, depending on the direction of the one-tailed hypothesis, your p-value is either 0.5* (two-tailed p-value) or 1–0.5* (two-tailed p-value) when the test statistic is symmetrically distributed zero.
In this example, the two-sided p-value suggests rejecting the null hypothesis of no difference. If we had opted for the one-tailed test of (diff > 0), we would not be able to reject the zero due to our choice of tails.
The following result is from a regression analysis in Stata. Unlike the previous example, this result only shows two-sided p-values.
Source | SS df MS Number of Observations = 200 -------------+---------------------------- --- -- F( 2 , 197) = 46.58 model | 7363.62077 2 3681.81039 sample > F = 0.0000 remainder | 15572.5742 197 79.0486001 R squared = 0.3210------------+------------------------------------ ------ ----- ---------- -- Aj R squared = 0.3142 total | 22936.195 199 115.257261 Root MSE = 8.8909 ------------------------------------ ------ ---------------------------------------------------------- ------ Society | coef. Default Err. tP>|t| [95% Conf Range]--------------------------------------+------------------------------ ----- ---------- ------------------------Science | .2191144 .0820323 2.67 0.008 .0573403 .3808885Mathematics | .4778911 .0866945 5.51 0.000 .3069228 .6488594 _cons | 15.88534 3.850786 4.13 0.000 8.291287 23.47939-------------------------------- ------------ ---- - ----- ---------------------------- -
For each regression coefficient, the null hypothesis tested is that the coefficient is equal to zero. Thus the one-sided alternatives are that the coefficient is greater than zero and that the coefficient is less than zero. To get the p-value for the one-tailed test of the variableScienceFor a coefficient greater than zero, you would divide 0.008 by 2, which is 0.004 because the effect is in the predicted direction. This is P(>2.67). If you had made your prediction the other way (the opposite direction of the model effect), the p-value would have been 1 - 0.004 = 0.996. This is P(<2.67). For all three p-values, the test statistic is 2.67.