Do you have a large list of numbers and you would like to group them into ranges or categories? Do you want to change numerical codes by more descriptive names to facilitate your data analysis?
In this article I will give you some funny superfast tricks to do these kind of pre-processing taks just using Microsoft Excel.
# The Goal
In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Many times, the values of our categorial variables are stored in the data table as numbers as this is the most efficient way from a computational point of view. However, we humans usually feel more comfortable reading and processing text labels ("Large" or "Small" instead of "17" or "5").
#Trick #1. Numerical Codes to Text Labels
Let's imagine that Dr. Discovery performed a study to assess how a variety of known gen mutations impact the clinical evolution of 200 bed patients with mechanical ventilation.
These patients are arranged into 4 different groups depending on the kind of infection they suffered. Groups are coded in a numerical variable called "Group Code" with the following values:
1 = Bronchitis
2 = Pneumonia
3 = Colonization
4 = Carriage
Our goal is to generate a new categorical variable called "Group Name" with the proper group name for each patient.
To do that, Microsoft Excel provides a powerful easy to use function called
This function searches for a specified item in a list, and then returns the relative position of that item in the range. The great thing is that the MATCH function can be combined with the above mentioned function INDEX to return the name of that item (instead of just the relative position)!
So let's start.
1. Insert a new column H called "ICU_Stay".
2. Copy the following function and paste it within the H2 cell:
Ok, this expression is a bit more complicated but the idea is that MATCH looks for the range in which the ICU length stay of the patient falls and, then, the INDEX function returns the name of that range.
In the case of the first patient (26 days at ICU), the categorical value is "Regular".
3. Extend the function in this cell to the rest of the column by double-clicking the bottom right corner of the H2 cell.