In the world of research, we learn about the importance of statistical significance. The p-value (the p stands for “probability”) is crucial. Usually, a p-value of less than 0.05 is considered significant. The smaller the value, the more significant the results.You can also listen to this article as an audio recording.
In general, journals publish articles that report one or more statistically significant effects. It puts tremendous pressure on researchers to produce significant results. At times, researchers may turn to malpractice called p-hacking to get the results they desire.
Here, two researchers, Tom and Lisa, discuss the pressures to produce statistically significant results and how this can harm the integrity of science?
Seeking Statistical Significance
Tom: “I can’t seem to get this method correct! I’m sure I should get a significant result, but my p-value is around 0.8. What am I doing wrong?”
Lisa: “Have you checked your protocol?”
Tom: “Yes, it seems ok. My supervisor wants to publish this in a high impact journal – it could help us to secure funding for our next project. However, the results we’re getting aren’t statistically significant. I’m thinking of making changes to our method.”
Lisa: “What sort of changes?”
Tom: “I’m going to collect more data. I might drop some of the outlying data points – I’m sure they are wrong! Then I can try different ways of analyzing the data.”
The Problem of P-Hacking
Lisa: “But isn’t that, p-hacking?”
Tom: “What’s p-hacking?”
Lisa: “P-hacking is when a researcher, either knowingly or unknowingly, makes choices after seeing their data to help them get a significant result. This includes choices or tweaks like those you mentioned. For example, dropping outlying data points or changing the way you analyze data.”
Tom: “I haven’t heard of p-hacking before.”
Lisa: “As you know, there can be a lot of pressure to produce low p-value. Studies with statistically significant results are much more likely to be published. This can directly affect the funding a researcher receives, as well as their career prospects. It can be very tempting to turn to research malpractices to get a low p-value.”
Types of P-Hacking
Tom: “I think I need to know more about p-hacking so that I can avoid it. Can you give me some tips?”
Lisa: “P-hacking often gives a p-value of 0.05, or less. It is because researchers often stop ‘tweaking’ once they have achieved a significant result. A cluster of values around p=0.05 suggests p-hacking. However, other types of p-hacking can be more difficult to spot.”
Tom: “What are they?”
Lisa: “The first type is known as ‘overhacking.’ This is when a researcher continues to hack their data in an attempt to get a lower p-value. Instead of stopping once they reach a value below 0.05, they continue. They do this because a lower p-value suggests a more compelling result.”
Tom: “What are the other types?”
Lisa: “The next type is selection bias. This is when a researcher has different p-values as a result of carrying out different analyses on the data, or analyzing different variables. However, even if they have several p-values under 0.05, they choose only the lowest for publication. This does not give an accurate picture of the data.
“The third type is selective debugging. ‘Bugs’ can happen if a researcher chooses an unsuitable statistical test, or if there are problems with data coding. Researchers should always try to spot these errors. Selective debugging is when a researcher only corrects bugs when it helps to get a significant result. Once they have a significant result, they stop looking for bugs.”
Tom: “So the researchers are selecting for bugs that give a false positive result?”
Lisa: “Yes, that’s right.
The Consequences of P-Hacking
Tom: “What can happen if a researcher is caught, p-hacking?”
Lisa: “P-hacking can have grave consequences. It undermines the value of the research. It could lead to journals retracting suspect articles, and the loss of future funding. As well as wasting valuable time and money, p-hacking could even reduce public confidence in science.”
Tom: “That does sound serious. How common is p-hacking?”
Lisa: “Some studies have shown that p-hacking is widespread. P-hacking could also be a serious problem for meta-analyses. In these large-scale studies, researchers rely on earlier work for their analysis. If the earlier research has been p-hacked, the results will not be reproducible. A 2015 study asked 100 research groups to replicate 100 published results. Of these, only 40 replicated well. In the other 60, the replication achieved much smaller effects than the initial result.”
Tom: “I understand why p-hacking is a problem. How can it be prevented?”
Lisa: “The best way to avoid p-hacking is to avoid making changes after you have seen the data. Of course, it can be difficult to resist! You could consider pre-registration. This is when you prepare a detailed research plan, including the statistical analysis you plan to use. You then submit the plan to an online register, such as the Open Science Framework. If you end up publishing your results, anyone can check your method against your plan. This makes it much harder to p-hack your data.”
Tom: “That sounds great. Is there anything else I can do?”
Lisa: “Just pre-plan your work, and stick to it. Only make changes if you realize you have made a genuine error. You can also replicate your own work.”
Dealing with Non-Significant Results
Tom: “I know that I can’t change my method to get a significant result. Does this mean I won’t be able to publish my work?”
Lisa: “Not necessarily. As p-hacking becomes more widely recognized, journals may start to reduce their preference for significant results. Journals could also help by providing a platform for pre-registration. Researchers can consider other types of statistical analysis, as well as focusing on the quality of their research plan and data collection.”
Tom: “Thank you for your help!”
You can read more about statistical significance in this article. Have you encountered anyone p-hacking? Share your thoughts and experiences in the comments below.