What is exploratory data analysis?
In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.
It aims at connecting ideas as to unveil the “why”s of potential cause/effect relationships. This occurs when researchers get started at understanding what they are actually “observing” when in the process of building cause/effect models.
For researchers and data scientists, the use of EDA for initial exploratory studies is crucial in the early stages of an experiment. More detailed analysis would follow from initial discoveries of interesting and significant parameter correlations within complex high-dimensional data.
Read more on the differences between EDA and confirmatory data analysis in our blog.
Requirements and Capabilites
What are the technical requirements of AutoDiscovery?
If you opt for downloading and installing AutoDiscovery in your computer, you must make sure that it fulfills the following minimum requirements:
Operating System: Windows 7, Windows 8, Mac OS X(*) or Linux(*)
CPU: Core i3, 2 GHz.
RAM: 2 Gb.
Monitor Resolution: ideally 1280x1024 pixels.
Internet access (**): required to log into the application.
(*) Mac OS X and Linux users must run AutoDiscovery under a Windows 7/8 virtual machine such as VMWare Fusion, Parallels Desktop or similar.
(**) Please contact your IT manager to make sure that a firewall or other network security device do not prevent your computer to access Internet from the software.
Can I install and use AutoDiscovery in my Mac?
Yes indeed. AutoDiscovery can be installed and used in a Mac computer BUT it must run always under a Windows 7/8 virtual machine previously installed.
There is a myriad of virtual machine tools in the market (some of them are free of charge) but we recommend using one of the following:
Please keep in mind that virtual machine tools may increase the technical requirements of your computer so that AutoDiscovery can run fluently.
To install AutoDiscovery in a Mac computer:
Install a virtual machine tool in your Mac computer.
Install a Windows 7/8 operating system under the virtual machine environment.
Download the AutoDiscovery setup tool.
Execute the setup tool in the Windows 7/8 virtual machine.
Follow the steps to install AutoDiscovery.
To run AutoDiscovery:
Run the Windows 7/8 virtual machine.
Why do I have to download and install a software in my computer?
Despite its simplicity, AutoDiscovery is a powerful software that squeezes the juice out of the computer.
The discovery process is a CPU-intensive task that requires the software to run in a controlled exclusive environment such as your personal computer.
Moreover, this is the best way to keep your data secured.
What kind of relationships are evaluated by AutoDiscovery?
The main goal of AutoDiscovery is to identify the most promising cause-effect relationships between variables and data sources of your experiment.
To do that, AutoDiscovery makes use of basic statistic calculations depending on the type of data of the variables involved in a relationship:
Spearman's Rank Correlations: applied when both are numerical variables. It assesses how well the relationship between two variables can be described using a monotonic function.
One-Way ANOVA: applied when one of the variables is qualitative and the other is numerical. It is used to test for differences among the different groups determined by the qualitative variable.
Both kind of relationships are evaluated in different segments of your data.
How much information can AutoDiscovery handle?
Our limits for the consolidated dataset is, at the moment:
90 columns (variables)
50.000 rows (records)
When you count with over 90 variables, most of the numerical variables are usually a linear transformation of others (e.g. "speed" is a quotient between "distance" and "time") so they might be removed as straightforward correlations between them will be found.
If you have more than 50.000 records, we suggest you to analyze your data in separated files splitting the original one by using one of its qualitative variables (eg, smoker patients and non-smoker patients).
What kind of information can AutoDiscovery handle?
The current version of AutoDiscovery only supports Excel 97-2003 files, that is, files with XLS extension generated with any of these versions of Excel.
Unfortunately, Excel 2007 or higher files (XLSX extension) are not currently supported but these versions of Microsoft Excel will allow you to easily export your data files to the prior format. Clic here to know how to do it.
If you use other formats of information in your research project (such as images, analog signals, etc.) we suggest you to reduce them and extract the key numerical parameters from them (eg. distances between elements in the image, maximum and minimum values in the signal, etc.).
How AutoDiscovery Works
Where is my data stored?
As AutoDiscovery is installed and executed in your computer, it keeps your data secured. The consolidated data is stored in a single project file located in the local "My Documents\AutoDiscovery Projects" folder.
How should the results of AutoDiscovery be interpreted?
When AutoDiscovery registers a relevant relationship a new row in the "Discover" table is added.
Each row represents a relevant relationship between two variables (or more if complex variables -eg. ratios- are used) in a specific segment of your data.
An interactive Discovery Map and the Hypo Booster tools are provided to explore the list of relevant relationships efficiently.
Subscriptions and Billing
How can I take advantage of the advanced exploratory capabilities of AutoDiscovery?
There are three different ways of enjoying the advanced exploratory capabilities of AutoDiscovery:
Do It Yourself: subscribe, download, install and use the software by yourself to enjoy a full features license including free updates and upgrades.
Have Us Handle It: contract our analytic services to get your advanced analysis reports. Our experts deal with your data.
Extra Power: integrate our unique exploratory capabilities and boost your own tools to the next level by means of a programming library (API).
How much does AutoDiscovery cost?
It depends on the plan selected:
Do It Yourself: at first, it's free. You get a 15-day free trial period, and you don't need to give us your credit card information. Once the free trial ends, and only if you decide to continue with AutoDiscovery, you will be billed yearly or quarterly (first year subscriptions not eligible).
Have Us Handle It and Extra Power plans: please contact us to get a personalised quotation.
How do subscriptions work?
Subscriptions are the way you can keep your costs under tight control.
The idea is quite simple: Pay only for what you use.
Thanks to subscriptions, you can limit the use and hence the financial impact of AutoDiscovery by planning ahead your analysis time frames over the year, and benefit from our quarterly subscriptions (first year subscriptions not eligible) in order to optimize your budget.
Take a look to the available options to choose the plan that best fits your needs.