课程: Data Visualization in R with ggplot2
Scatterplots
- [Instructor] Scatterplots are one of the most basic visualizations, and they're often very useful when we're exploring data. Scatterplots begin with a simple x-y coordinate grid, and then we use them to plot points on that grid by specifying their x and y coordinates. For example, the red point on the screen has coordinates 2,1, meaning that it has a value of 2 on the x-axis and a value of 1 on the y-axis. Similarly, the blue point has coordinates 1,3, meaning that it's at location 1 on the x-axis and location 3 on the y-axis. In ggplot, we use the point geometry to create scatterplots using the geom_point function. At a minimum, we must tell geom_point where to get its x and y values, but we can also specify other aesthetics such as the shape, color, size, and transparency of the points on the scatterplot. Here's a reference table showing you the names of these aesthetics in ggplot. Most of them are intuitive, with the exception of remembering that the alpha aesthetic is used to set transparency. Let's try working with these in R. You can go ahead and run the starting code for this video that puts us at the point we were at the end of the last video. If we take a look at this code that we ended the last video with, you can see that the first line simply creates the empty plot and prepares us to use the college dataset. When I ran that alone, I got an empty grid. The second line adds a point geometry to the grid, specifying that we should plot a point for each college using tuition as the x-axis value and average SAT score as the y-axis value. When I run this entire statement, I get a simple scatterplot. Now, let's try representing a different dimension here. What if we want to differentiate public versus private schools? We can do that using the shape attribute, for example. Now, there's a variable in this data set called control, you might remember we changed that to a factor earlier, and it has two values, public and private. It's what type of control does this college have? So I'm going to go ahead and add that in. Where I have my x and y aesthetics in my mapping statement, I can just add in a shape aesthetic and say that I'd like to map that to the control variable. When I run that, now I have circles for private schools and triangles for public schools. Now, it's really hard to see the difference between those circles and triangles, so you might think there's probably a better way to do this, and to me, and probably you, color would be a lot easier to see. If I want to switch this from using shape to color, all I have to do is go back to my ggplot command and change the word shape to the word color, run it again, and now it's much easier to see. When I'm working with scatterplots, I can also change the size of the points. Let's do that. Probably the most natural thing here is to change the size of the point based upon the size of the school, and we'll measure that by the number of undergraduates. So I'm just going to add another aesthetic mapping here. I'm going to say use the size the aesthetic and map that to the undergrads variable. Now, there we go. We see different size schools have different shape points, but my scatterplot has gotten a little bit harder to read because now a lot of those points are overlapping. I'm going to experiment here with some transparency so that we can see through these points a little bit. I'm just going to make them a little bit more and less transparent. I'm going to do that using the alpha setting. Now, here's an important point. I'm not going to put the alpha right here next to size=undergrads, because I'm not changing the transparency based on the value of a variable. I'm not mapping transparency to anything. I don't want some schools to be more or less transparent than others. I want to set the transparency for all of these points to a value. So I'm just going to move my cursor over one place here and put this outside of the mapping. I'm going to say alpha=, and then if we start with just the number 1 and run it, it gives me my same plot. Alpha is a fractional value, so it ranges between 0 and 1, and we have our alpha set to 1, that gives us completely opaque points. There's no transparency. On the other hand, if I put a really small number in here, say 1/100, 0.01 and run it, now my points are almost invisible. If you look really carefully, you can see them there, but these points are 1% opaque. They're 99% transparent. Let's try making them a little more opaque, and instead of 99%, let's try 90%. And now they're there. We can see them a lot easier. They've become a little more visible. This still isn't that great, though. What if I make them 50%, or 1/2 transparent? Now, that's a lot nicer and easier to read. So you can see how I can start changing my aesthetic mappings, and then also other parameters, by just setting a value for alpha, for example, to create a nice, easy-to-read visualization. You now have the skills that you need to create some interesting scatterplots from a dataset in R.
随堂练习,边学边练
下载课堂讲义。学练结合,紧跟进度,轻松巩固知识。