课程: R for Data Science: Lunch Break Lessons

cut

- [Instructor] There are times when you need to break a collection of numbers into a set of buckets, and to do this, R has a function called cut. So I've created something called a numeric vector, and we can take a look at it here. I called it creatively numericVector. And in numericVector, I have 100 random values, integers. I'd like to break that into three buckets. So let's use the cut command, and I tell it to cut numericVector into three buckets. And what we're going to return back is a set of factors. This is now no longer actually numeric vectors. It's a factor. And you see some odd notation. The first value is parentheses 171 comma 255 bracket. What this is doing is labeling each value in numericVector, and it's chosen this particular notation just because that's the way the cut's programmed. If you look at the very bottom, you'll see levels, and there are three values there, 1.75 comma 86.3 bracket, parentheses 86.3 comma 171, and so on. These are the labels that cut has decided to produce to identify the low, medium, and high buckets that cut has produced. Now, you can change those labels. So let's use the same command, cut numericVector comma three comma, and you can put in labels of your own. L-A-B-E-L-S equals, and you concatenate. I'm gonna call my buckets low, I'm gonna call 'em medium, and we'll call the third one, we'll call it high. And if I run this now, what I get is instead of the odd notation previously, I get high, low, medium, high, et cetera. And again, what this is doing is labeling each value in numericVector as which bucket it belongs into. If you don't want string values for labels, you can change that. And if I go back to the same command and get rid of this, I can just say simply labels equals false. And now what I'll return is just the number of the bucket that cut has placed each value into. Cut has an alternative way to break things up into buckets, and you can define the break points. So what we'll do is we'll call up the same command that we've been using, and instead of giving it three buckets, we're gonna call it breaks at one comma 100 comma 200 and 256. So now what I've said is break numeric vectors up and break them at these particular numeric values. And what we'll get is again the strange notation, but you'll notice down below at levels, at the very last line, I have three buckets, but the buckets are one to 100, 100 to 200, and 200 to 256. So that's cut. And again, cut is used to break numeric vectors up into separate buckets for later analysis.

内容