P-value for Hypothesis Testing: Data Scientist's Guide (2024)

Statistics is an essential part of data science. It allows us to understand our data better, discover its properties and decide on the appropriate analysis methods. Without correct data identification, wrong methods may be used, which will render the whole process of analysis useless.

Hypothesis testing is probably the most critical concept in statistics, as it allows data scientists to conclude the population based on the sample data. Don’t worry if you don’t understand these words yet; we will cover them in a minute.

In this article, we will concentrate on hypothesis testing using the p-value. This is a popular method of hypothesis testing, and it is widely used by both statisticians and data scientists. What is a p-value and how to calculate p-value in hypothesis testing – without any further ado, let’s begin!

Introduction

Hypothesis testing and p-value are two essential concepts in statistics. However, they are widely used in data science as well. In particular, hypothesis testing has proved to be helpful in feature selection. It allows us to choose the best variables from the dataset, leaving out the useless ones. P-value, on the other hand, is just a tool used for hypothesis testing. It allows us to make a specific conclusion about the results we received.

The Intuition Behind The P-value

Now that we know these terms, we can examine the question of hypothesis testing closer. Hypothesis testing is basically all about calculating how certain we are about the inference based on the sample. Let’s take an example of determining the average height of men in Australia. Naturally, it is impossible to measure every single man. So we will choose a sample: a few thousand men from all over the country. Measuring their heights gives us the result of 182cm. Does it mean that our hypothesis is wrong, and the actual average height is not 179cm? Or is this just a coincidence, and this value does not mean anything? To be honest, we can’t say for sure. What we can do is check the probability of getting such a value. Based on this probability, we can then reject or fail to reject our hypothesis.

The p-value allows us to do precisely that. Take a look at the figure below. The red point represents the value we calculated for the sample, 182cm in our case. The p-value is the probability of obtaining an outcome, at least as extreme as the observed sample value. In simple words, the p-value measures the likelihood of all the values to the right of the red point. In our case, the p-value shows the probability of getting the average height with a value higher than 182cm.

A significant p-value means that there is a high chance to get such results, i.e., there is nothing unusual in the value we received. In this case, we have no evidence against our null hypothesis.

A small p-value, on the other hand, is evidence against the null hypothesis. A small p-value implies that there is a small probability of obtaining the results we received. It is very unusual, and we can, therefore, conclude that there is something wrong with our null hypothesis.

The p-value measures the likelihood of the values to the right of the obtained sample value, which equals the area marked by red points, image source

Level Of Significance

But how do we know whether the obtained p-value is large or small? For this, we use a level of significance. The significance level (usually denoted as α) is the probability of rejecting the null hypothesis when it is true. For example, if we choose α=5%, it implies a 5% chance that we will make a mistake and reject the null hypothesis when it is actually true. The significance level is chosen for each task individually, but if you don’t know what to start with, take α=5%. This is the conventional value that is used most often.

How Do We Get The P-value?

Let’s continue with the above example of measuring heights. You calculated the average height in your sample and received the value of 182cm. The next step is calculating a z-score, which is the difference between the measured sample parameter and the population parameter we are testing. The formula for calculating the z-score is:

The formula for calculating the z-score, image source

Here:

μ is the mean of the population (the value we are testing)
x is the mean of the sample (the value we received)
σ is the standard deviation of the population

Note: If the population’s standard deviation is unknown, sample standard deviation can be used as an estimate.

Finally, we can get the p-value. It is usually calculated using special tables, like the one presented in the figure below. This z-table shows the probability of getting the result to the left of the z-score. However, we need the probability of the correct region. Since the p-value is between 0 and 1, we can simply subtract from 1 the value obtained from the table. We now have the p-value.

Note: Human heights have a normal distribution (there are more people with a height close to the average. And there are fewer people who are very tall or very short. The more extreme the value of the height is, the fewer people have it). Because of this, we use a table for a normal distribution. For other kinds of distributions, additional tables are used.

A z-table for normal distribution, image source

Final Steps

The final step in our hypothesis testing would be comparing the p-value with the level of significance. If the p-value is larger than the significance level, we cannot reject the null hypothesis because there is not enough evidence to do so. If, on the other hand, the level of significance exceeds the p-value, we reject the null hypothesis because the probability of getting such a result is doubtful to say that it is a coincidence.

Let us finish the example with the heights of Australian men. Our null hypothesis says that the average height equals 179cm. The average height of the sample is 182cm. We now only need the standard deviation. Again, if the population’s standard deviation is unknown, we can use the sample standard deviation as an estimate. It can be calculated using the following formula:

The formula for calculating the standard deviation (s), image source

Here:

x is the height of one particular man
xis the mean of the sample (average height)
n is the number of men in the sample

Let us suppose that, using the above formula, we got the sample standard deviation to be equal to 9cm. Substituting all the numbers into the formula of the z-score yields 182-1799 = 0.33. Looking at the z-table provided above, we find that the probability of getting 0.33 is 0.6293. That is, the probability of obtaining a score smaller than 0.33 is roughly 62.935%. But we need the probability of getting a value larger than 0.33. So, we subtract 0.6291 from one, which yields 0.3709. This means that we can get a value larger than 0.33 with a probability of 37.09%. It is quite a large number, and it is definitely larger than the 5% significance level. Therefore, we cannot reject the null hypothesis. There is simply not enough evidence to claim that the average height is not 179cm.

Note that we do not say that we accept the null hypothesis; we simply cannot reject it. There is a significant difference between these statements, and it is conventional for statisticians to talk in such a manner. The null hypothesis may be false, but we cannot confirm that with the available data.

If, on the contrary, the p-value appeared to be small, for example, 0.01, we would indeed reject the null hypothesis and say that the average height of Australian men is not 179cm.

If you find it challenging to use the provided z-table, refer to the first section of this article. Simply speaking, the first column specifies the integer and the first decimal number of the z-score, while the first row specifies the second decimal number (the one-hundredth part of the z-score). For example, if we need to find the p-value for the z-score equal to 0.34, we first look at the row that has 0.3 and then at the column with 0.04, because 0.3 + 0.04 = 0.34. Refer to the below picture for a graphical example:

The value at the intersection of the row and column with all parts of the z-score is the p-value

Summary

Let us briefly summarize the concept of the p-value. P-value is the probability of getting the outcome of an experiment at least as extreme as the sample value. If the p-value is larger than the significance level, we cannot reject the hypothesis; if it is smaller than the significance level, we reject the hypothesis. The significance level depends on the task, but 5% is commonly used. The p-value allows us to test various assumptions with a specific level of confidence.

Overall, the concept of the hypothesis testing presented in this article is essential for any data scientist or statistician. The p-value is only one of the ways to test hypotheses, but it is the most popular and the most widely used one. We hope we helped you to build an intuition behind it. After all, the better we understand something, the better we can use it. In other articles, we will cover more statistical concepts. And for now, feel free to start using this one.

FAQs

P-value for Hypothesis Testing: Data Scientist's Guide? ›

To use a p-value in hypothesis testing, compare it to a predetermined significance level, such as 0.05. A p-value less than or equal to this level suggests enough evidence to reject the null hypothesis, indicating a significant effect or relationship.

What to do if p-value is greater than alpha? ›

If the p-value is greater than alpha, you accept the null hypothesis. If it is less than alpha, you reject the null hypothesis.

See Details ›

What is the difference between p-value and T value? ›

Unlike p-value , t-value is not a probability, and therefore its value is unbounded. Most of the time you'll see positive t-values as a lookup table where the value depends on three variables: The type of t-test you're running. Here we're using the table for one-tailed t-test as example.

What does it mean to reject the null hypothesis? ›

We reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect. This often occurs when the p-value (probability of observing the data given the null hypothesis is true) is below a predetermined significance level.

Show Me More ›

Would a p-value of 0.03 mean we know the null hypothesis is false? ›

If we had a p-value of 0.03, for example, that would mean that if our null hypothesis is true, there would be a 3% chance of obtaining our observed test statistic or a more extreme. Since this is a small probability, we reject the null hypothesis and say that the means are significantly different.

Learn More Now ›

What does a p-value of 0.3 mean? ›

E.g. a p-value of 0.3 means "repeating the study many times, given that the null hypothesis + all other assumptions are true, I would see the result I'm seeing (or a more extreme result) 30% of time, so it wouldn't be super unusual.

Know More ›

How do you know if you accept or reject a null hypothesis? ›

If our test statistic is:

positive and greater than the critical value, then we have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.
positive and lower than or equal to the critical value, we must accept the null hypothesis.

More items...

Discover More ›

How do you know if null hypothesis is rejected? ›

Rejecting the Null Hypothesis

Reject the null hypothesis when the p-value is less than or equal to your significance level. Your sample data favor the alternative hypothesis, which suggests that the effect exists in the population. For a mnemonic device, remember—when the p-value is low, the null must go!

Get More Info Here ›

Does a P-value of 0.01 means we should definitely reject the null hypothesis? ›

A P-value of 0.01 means we should definitely reject the null hypothesis? False. Whether a P-value provides enough evidence to reject the null hypothesis depends on the alpha level.

Get More Info Here ›

What is a rejected null hypothesis? ›

The null hypothesis states there is no significant difference in outcomes between the control (untreated) and the treatment condition. To reject the null hypothesis is to say that there is a significant difference. In other words, the treatment obtains better results than doing nothing.

Get More Info Here ›

What is it called when the null hypothesis gets rejected? ›

The critical region, also called the rejection region, is that set of values of the test statistic for which the null hypothesis is rejected.

View Details ›

P-value for Hypothesis Testing: Data Scientist's Guide (2024)

Introduction

The Intuition Behind The P-value

Level Of Significance

How Do We Get The P-value?

Final Steps

Summary

FAQs

P-value for Hypothesis Testing: Data Scientist's Guide? ›

What to do if p-value is greater than alpha? ›

References