Do you have a large list of numbers and you would like to group them into ranges or categories? Do you want to change numerical codes by more descriptive names to facilitate your data analysis?
In this article I will give you some funny superfast tricks to do these kind of pre-processing taks just using Microsoft Excel.
# The Goal
In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Many times, the values of our categorial variables are stored in the data table as numbers as this is the most efficient way from a computational point of view. However, we humans usually feel more comfortable reading and processing text labels ("Large" or "Small" instead of "17" or "5").
# Trick #1. Numerical Codes to Text Labels
Let's imagine that Dr. Discovery performed a study to assess how a variety of known gen mutations impact the clinical evolution of 200 bed patients with mechanical ventilation.
These patients are arranged into 4 different groups depending on the kind of infection they suffered. Groups are coded in a numerical variable called "Group Code" with the following values:
1 = Bronchitis
2 = Pneumonia
3 = Colonization
4 = Carriage
Our goal is to generate a new categorical variable called "Group Name" with the proper group name for each patient.
To do that, Microsoft Excel provides a powerful easy to use function called
Technically, this function returns the value of an element in a list, selected by the position in that list.
Given that our group codes are stored in the B column, the INDEX function should be used in the following way:
1. Insert a new column C called "Group_Name".
2. Copy the following function and paste it within the C2 cell:
=INDEX({"Bronchitis";"Pneumonia";"Colonization";"Carriage"}; B2)
NOTE: Be sure to use the proper character to separate the arguments in the function (semicolon or comma) depending on your setup.
As you see, the group code "1" is associated to the first element in the list, that is, "Bronchitis".
3. Double-clic the bottom right corner of the cell C2 to extend the function to the rest of the column.
Et voilà! You now have a new categorical variable which may be considered by AutoDiscovery in the analysis of variances or contingencies.
# Trick #2. Numerical Values to Ranges
Each patient in our study stayed at ICU for a particular period of time that ranges from 0 to 120 days. That information is stored in the variable "ICU Length Stay":
Our goal now is to classify each patient depending on the length of the ICU stay in the following way:
Short: between 0 and 10 days
Regular: stay between 11 and 30 days
Large: more than 30 days at ICU
To do that, Microsoft Excel provides another powerful function called
This function searches for a specified item in a list, and then returns the relative position of that item in the range. The great thing is that the MATCH function can be combined with the above mentioned function INDEX to return the name of that item (instead of just the relative position)!
So let's start.
1. Insert a new column H called "ICU_Stay".
2. Copy the following function and paste it within the H2 cell:
=INDEX({"Short";"Regular";"Large"}; MATCH(G2;{0;10;30}))
Ok, this expression is a bit more complicated but the idea is that MATCH looks for the range in which the ICU length stay of the patient falls and, then, the INDEX function returns the name of that range.
In the case of the first patient (26 days at ICU), the categorical value is "Regular".
3. Extend the function in this cell to the rest of the column by double-clicking the bottom right corner of the H2 cell.
# The Next Steps
In our first article of this series we learnt how to arrange our data in a simple plain table. Now we have adapted the numerical data to our analytic goals.
The next step before an exhaustive exploratory data analysis will be to check whether the cells contain the proper values. I will talk about this in the next chapter of this series.
Stay tuned!
In the meanwhile, try AutoDiscovery by your own (of course for free!):