Thanks to Parc Cientific of Barcelona we had the opportunity to show AutoDiscovery for the first time to the researchers interested in how this software and exploratory data analysis in general may help in unveiling complex relationships and, ultimately, boost the impact of their scientific papers.
Discovery Is The Goal
The original idea of the presentation was to clarify the differences between data mining and knowledge extraction. To do that, I based my explanation in one of the best works ever publshed on the matter: "From Data Mining to Knowledge Discovery in Databases", authored by Fayad, Piatetsky-Saphiro and Smyth nearly 20 years ago!
One of the key messages of that work is ...
"Knowledge is the end product of a data-driven discovery."
But I took the liberty of changing it slightly by ...
I did it because I think that there is a clear difference between knowledge and discovery as the latter refers to the ultimate goal of doing science in general.
We research to discover. To discover things that may be useful for our Society to live better.
A Real Success Case
The second idea I tried to explain was that AutoDiscovery is an specific tools for an efficient exploratory research that has been succesfully applied in a real research work.
Dr. Trejo's Lab focuses on both basic and therapeutic aspects of the formation of new neurons in the adult brain. In their work "Involvement of specific adult hippocampal neurogenic subpopulations on behavior acquisition and persistence abilities" (currently under peer review), the group analyzed the relationships between the task acquisition scores and the persistence in the acquired behavior and whether a correlation existed between the composition and number of the different subpopulations of immature neurons in the adult murine hippocampus and those acquisition/persistence behavioral parameters.
They worked full day during more than 8 weeks analysing every correlation between the variables captured. To do that, they used classical statistical packages but very few relevant relationships were identified mainly due to the extremely high amount of manual work required.
The main challenge for AutoDiscovery was to improve the conclusions they had to enable their work to be published in a magazine with a higher impact factor.
I explained then that AutoDiscovery took less than 2 hours to find out not only all the correlations that the group had identified during their 8-weeks intensive work but also several key correlations that, with a further confirmatory phase, completed the original hypothesis.
A live demo of AutoDiscovery was finally performed showing the features that helped Dr. Trejo to improve their conclusions:
The automatic consolidation of the hippocampal neurogenic results with the learning (Morris Water Maze test) and anxiety (Plus Maze test) tasks information recorded in different Excel files.
The evaluation of relationships between ratios of variables. In fact, many of the key findings were based on them.
The automatic results sorting, which greatly helped to minimze type II error and focus the scarce resources of the group in the most relevant findings.
The exportation of the relationship table, which facilitated the task of building their own plots for the final paper.
Dr. Trejo Discovered Something Great
Indeed. And much faster than they thought. But the best thing of all was that their work was finally presented to a much better magazine which represents the ultimate goal of AutoDiscovery.