- Ray G. Butler

# AutoDiscovery and False Discovery Rate

Did you know that even if you use a significance threshold of p=0.05 to suggest that you have made a discovery, **you will be wrong at least 30% of the time**?

Keep reading to know more why and how to improve this...

**How (not) to make a fool of yourself**

In his **great work** "An investigation of the false discovery rate and the misinterpretation of p-values", Dr. David Colquhoun stated that:

"*the function of significance tests is to prevent you from making a fool of yourself, and not to make unpublishable results publishable*"

And he goes deeper with this controversial by saying that you make a fool of yourself if:

a) you declare that you have discovered something, when all you are observing is random chance.

or

b) you fail to detect a real effect, though that is less bad for your reputation.

So at the end, if you are foolish enough to define ‘statistically significant’ as anything less than p=0.05 then you have a 30% chance (at least) of making a fool of yourself.

**What is the False Discovery Rate (FDR)?**

The false discovery rate (FDR) is the expected proportion of **type I errors**, that is, when you get a **false positive**. This type of errors is devastating when you are at the confirmatory stage of your research but it is not so critical when you are doing **exploratory data analysis**.

In that sense, let me show you what political scientist and statistician Andrew Gelman of Columbia University in New York and author of one of the most renowned blogs on the matter says in **this excellent article** published in Nature magazine:

“*… In this approach, exploratory and confirmatory analyses are approached differently and clearly labelled. … Researchers would first do two small exploratory studies and gather potentially interesting findings without worrying too much about false alarms. Then, on the basis of these results, the authors would decide exactly how they planned to confirm the findings.*”

If you repeat a test enough times, you will **always** get a number of false positives. One of the goals of multiple testing is to control the FDR: the proportion of these erroneous results.

However, most of us will agree thus that any effort devoted to minimize the possible false positives would be appreciated so ... let's see how to improve it.

**The Benjamini-Hochborg Procedure**

The Benjamini-Hochberg (BH) Procedure is a powerful statistical tool that decreases the false discovery rate.

The best known method is the Bonferroni correction, but arguably that method sets a criterion that is too harsh, and runs an excessive risk of not detecting true effects (it has low power). By contrast, the method of Benjamini-Hochberg is based on setting a limit on the false discovery rate, and this is generally preferable.

Moreover, the BH procedure is simple to compute.

All these features make BH procedure perfect for biomedical research and the ideal candidate **to implement into AutoDiscovery**.

And that's what we've done!

**AutoDiscovery and False Discovery Rate**

That means that, apart from identifying exploratory statistical associations with high clinical relevance, AutoDiscovery is now also enabled to unveil associations with **high statistical significance**. From a methodological point of view, this group of statistical associations have not to be confirmed in a later stage of the research process which helps you to **accelerate the publication of the results**.

This new feature impacts on different aspects of the software:

**The discovery process**

When the discovery process ends, AutoDiscovery computes the BH p-value threshold for high significance associations and shows it to you. All the potential associations identified during the process are then classified into two groups:

**a) Exploratory results**: associations with a p-value equal or greater than the BH threshold.

**b) High significance results**: associations with a p-value lower than the BH threshold.

**The Discovery Map tool**

The **Discovery Map** tool now highlights the high significance statistical associations so that you could focus on them more easily.

**The Hypo Booster tool**

Finally, the **Hypo Booster** tool also allows you to refine your exploration and filter the high significance associations from the rest: