Here is a quick overview of the most common operators. By operators I mean things like "equal", "not equal", "greater than", etc.
This will be a rough one.
equal: ==
not equal: !=
greater than: >
less than: <
greater or equal to: >=
less or equal to: <=
I think that is about it.
So how do you use these? Let us take the subset function from the previous post:
Using equal (double equal sign):
data.metal<-subset(data, data$heavy_metal_music==1)
That would give you all of the cases in "data" where heavy_metal_music is equal to 1 (e.g. "yes")
Using not equal:
data.metal<-subset(data, data$heavy_metal_music!=1)
That would give you all of the cases in "data" where heavy_metal_music is not equal to 1 (e.g. "no" or anything else entered into that cell -- this is where data quality becomes important!)
I won't give an example of the other operators, as by now you can probably tell, you just change the operator in the statement to the operators listed above.
Yes, my font changed. I copied the data.metal subset statement from a prior post and I decided to not change the font back after the paste. Take that! Im such a rebel.
__________________________________________________________
As you may know, R treats empty cells (e.g. missing data) as NA. It will put NA into empty cells that otherwise have numbers in them (we will get back to data types later). If the cells have text in them or a non-numeric character (hyphen, slash, colon, etc), R will actually leave that cell truly blank.
To figure out what variable type you have, you can type the following:
class(data$variable)
of course, you would change the word 'variable' to the variable name for which you are interested in seeing the variable type (e.g. the variable class). R will treat integers, numbers, and factors the same -- NA is missing.
So if you want to keep all of the cases in your subset where heavy_metal_music is missing, you can do the following (again, this only works for variables that are numeric -- because R puts NA in the blank cell):
data.metal<-subset(data, is.na(data$heavy_metal_music))
the is.na(variable) part tells R to subset if the heavy_metal_music is NA (or missing)
To specify NOT MISSING:
data.metal<-subset(data, !is.na(data$heavy_metal_music))
Might look familiar -- just like not equal uses the exclamation mark for NOT, is.na uses it the same.
_____________________________________________________________
So what if the variable is a 'character' variable (e.g. it has text in it).
data.metal<-subset(data, data$heavy_metal_music=="")
The double quotes specifies a textual "blank" in the cell. You could say:
data.metal<-subset(data, data$heavy_metal_music!="")
to mean not equal blank.
You have to be careful here though, as sometimes, a space in that cell (for example, if you had data in a cell in excel, but hit the space bar to clear it out instead of backspace or delete) will not be caught by the "" operator. In this case, you need to say " " double quote with a space in between. If you have two spaces, you need a double quote with two spaces in between, etc.
Its kind of a pain, so make sure your data are clean before you put them into R.
Keep it realz.
No comments:
Post a Comment