I wanted to start by letting everyone know that you can use ifelse statements within other ifelse statements to recode a multi-category variable, much like you can in SAS.
For example, you may have the number_of_glasses variable, which you want to recode into three categories, 0, 1-5, and >5 glasses per day, coded 1,2,and 3, respectively.
You can do this by using multiple ifelse:
data$glasses.categories<-ifelse(data$number_of_glasses==0, 1,
ifelse(data$number_of_glasses>0 & data$number_of_glasses<6, 2, ifelse(data$number_of_glasses>5,3,NA)
The breakdown of this is essentially saying if the number of glasses of water is equal to zero, give the new variable, glasses.categories a 1, otherwise, do the next ifelse statement which says if the number of glasses is greater than zero (e.g. start at 1) AND is also less than 6 (e.g. 5 and below), give the new variable glasses.categories a 2, otherwise do the final ifelse statement which says give the new variable glasses.categories a 3 if the variable number_of_glasses is greater than 5. The final "else" part says if it doesnt meet any of these criteria, give it an NA, which is R for blank or missing. You always need that last "else" part. Since my criteria should be exhaustive (e.g. all of the possible values of number_of_glasses are covered in my if statements or my else statements), the only other possibility would be to be a missing value, so the final NA carries over those missing values.
There is another way to do this as well, without using these ifelse statements (actually multiple ways but Ill show you the one I use most often).
To do the same recode as above, Ill first start by making a new variable that is equal to 1.
data$glasses.categories<-1
So all values in the dataset for this variable will be equal to 1.
Next, recode over these 1's with your first criteria - if the number_of_glasses falls between 1 and 5, give it a 2. You can use a subset of your data by using the straight brackets, like so:
data$glasses.categories[data$number_of_glasses>0 & data$number_of_glasses<6]<-2
This telling R to look at the glasses.categories variable in the dataset "data", but only those rows or cases where the number_of_glasses variable is between (and includes) 1 and 5 and give those a 2. This will recode OVER TOP of the 1's that are already in that variable for those cases... those values that were created with your first step, where you made everything equal to 1.
Finally, recode over the 1's in that variable with your final criteria.
data$glasses.categories[data$number_of_glasses>5]<-3
You will end up with the same values as you did using the ifelse. So the final block of code would be:
data$glasses.categories<-1
data$glasses.categories[data$number_of_glasses>0 & data$number_of_glasses<6]<-2
data$glasses.categories[data$number_of_glasses>5]<-3
******** VERY IMPORTANT NOTE ***********
You need to be careful using this final method in the event you have missing values in your dataset. I would highly recommend one more line of code if your variable has any missing values:
data$glasses.categories[is.na(data$number_of_glasses)]<-NA
This will carry over any missing values from your old variable to your new variable.
Have fun! Next up... I don't know yet. Im getting tired of recoding, so we may move on to basic contingency tables and measures of central tendency (mean, median) as well as variation (Standard Deviation, Interquartile Range).
Peace!
******** VERY IMPORTANT NOTE ***********
You need to be careful using this final method in the event you have missing values in your dataset. I would highly recommend one more line of code if your variable has any missing values:
data$glasses.categories[is.na(data$number_of_glasses)]<-NA
This will carry over any missing values from your old variable to your new variable.
Have fun! Next up... I don't know yet. Im getting tired of recoding, so we may move on to basic contingency tables and measures of central tendency (mean, median) as well as variation (Standard Deviation, Interquartile Range).
Peace!
No comments:
Post a Comment