'twas a long weekend that did not feel so long. Regardless, I'm back to work so lets talk about an important function, subsetting. First, we have to be confident in our abilities to call out specific variables in our data.
Since you can be working with multiple datasets at once in R, you always need to specify the dataset and the variable within that dataset. There are some other tricks to get around this (I wont talk about them because they dont always do what you think they do. If you are interested, look into ??attach and ??detach).
Regardless, if your dataset is called "data" and you want to do something to the "number_of_glasses" variable, you need to specify both of them in your code. The dollar sign ($) is what does this for you, as follows:
data$number_of_glasses
Thats all there is to that. Always specify the dataset AND the variable with the dollar sign in between. If you dont, you will get an error saying that the object is not found. For example, if you just typed:
number_of_days
You would get the following error:
Error: object 'number_of_days' not found
Alternatively, if you type:
data$number_of_days
and run it.
You will see the actual values for the number_of_days variable within the "data" dataset.
______________________________________________________________________
Ok I mentioned that first because we are now going to learn to subset, and without the above explanation, the subset function may not make sense to you.
Often, it is useful to make a subset of a dataset. For example, if you want a dataset where the
heavy_metal_music variable is equal to 1, you can do this... very easily.
data.metal<-subset(data, data$heavy_metal_music==1)
Starting from the left:
1) data.metal - this is the new dataset that will contain the subsetted data from your original dataset
2) <- your friendly neighborhood gets operator! As always, this puts the information from the right side of the symbol into whatever you specify on the left side.
3) subset - this is a function built into the base R package... for... drumroll... subsetting!
4) data - this is your old dataset containing all of the information
5) data$heavy_metal_music - this is the dataset and the variable on which you want to subset. now the earlier comments probably make sense.
6) ==1 . I told you early on that R doesnt use the equals sign as the gets operator (thats what <- is for). R uses the double equals sign to mean equals. So you are saying where heavy_metal_music is equal to 1.
In words:
the new dataset "data.metal" gets a subset of the dataset "data", where heavy_metal_music (in the "data" dataset) is equal to 1.
Hope that makes sense.
Next posting will be on other basic R operators. For example, how do we specify greater than, less than, greater or equal to, not equal to, etc.
Following that, I think we can move on to some basic data recodes, using other subsetting functions and if-else statements.
Have a non crappy Tuesday!
No comments:
Post a Comment