top of page

DATA FILE IMPORT HINTS

Importing and consolidating data files is a very challenging task carried out by AutoDiscovery in a easy and straightforward way.

Understanding data consolidation

 

AutoDiscovery works with the Excel data files in which you store the results of your experiment trials.

 

When we talk about "data consolidation" we refer to the task of joining different related data files into a single table.

 

Data consolidation is a very common process carried out almost every day in the most of labs. The main reason behind this need is that your experiment design usually requires capturing and analysing a variety of information that a single equipment cannot provide by itself. As each equipment generates its own data files, someone (or something) must carefully combine the information so that you can analyse it later.

 

Although AutoDiscovery carries out this task is automatically, it must be properly configured and the imported data files must fulfill with a range of conditions that are explained now.

Data file format

The current version of AutoDiscovery only supports Excel 97-2003 files, that is, files with XLS extension generated with any of these versions of Excel.

 

Unfortunately, Excel 2007 or higher files (XLSX extension) are not currently supported but these versions of Microsoft Excel allows you to export your data files to the older format.

 

To do that: 

 

  1. Open your data file with Microsoft Excel 2007 or higher.

  2. Select the "Save as" option in Excel.

  3. Select the "Excel 97-2003 workbook" option in the "File type" list.

  4. Click the "Save as" button.

 

A new data file with the same name but the XLS extension is generated. Now try to import that new file.

Structure of the information

When a data file is imported, AutoDiscovery analyses the structure and tries to find the data table within the first sheet of the Excel file. That implies that your experiment data must be arranged in the following way:

 

  • It must be stored under the first sheet of the Excel file. The rest of the sheets will not be considered.

  • It must be located in a single table arranged in rows and columns. Special comments, plots or isolated cells will be discarded.

  • The first row of that table must contain the column headers, that is, the names of the variables that will be imported.

  • Each variable you measure should be in one column.

  • Each different observation of such variable should be in a different row.

  • If you have multiple data files, they should include one or more common columns that allow them to be linked.

 

The following image shows typical well-structured data files to be imported and consolidated in AutoDiscovery:

Notice that consolidation variables ("Subject" and "Group") are found in every data file as the experimental subjects in this example are lab's animals.

Number of variables and data samples

A minimum number of variables and data samples are required for the discovery process to be statistically relevant. On the other hand, a maximum quantity of information is supported:

 

  • From 1 to 90 variables (columns).

  • From 5 to 50.000 data samples (rows).

 

If your experiment data files contain more than 90 variables it'd be a good idea to remove some of them. In theses cases, most of the numerical variables are a linear transformation of others (e.g. "speed" is a quotient between "distance" and "time") so they might be removed as evident correlations between them will be found.

 

On the other hand, if your data files do not include more than 5 rows you might consider carrying out additional trials.

Variable names

The names used for the data variables (that is, the titles of the column headers) must fulfill the following conditions:

 

  • They cannot be empty.

  • They cannot be duplicated.

  • They must start with a letter (A..Z, a..z) or an underscore character (_).

 

If a particular variable name cannot be used by AutoDiscovery, it automatically changes it during the import process by adding a starting underscore character (_). Don't worry: your original Excel data file is not modified at all.

bottom of page