课程: Python Essential Training

CSV

- [Lecturer] It's time to look at the CSV module and Python. No need to install anything. It comes with Python. Just import CSV at the top. And there you go. I have included a CSV file that we're going to be working with and this is 10_02_us.csv. And this is derived from a dataset from geonames.org which provides millions of place names in geographical data sets spanning the globe. And this particular data set contains every zip code in the US along with information about the city or town that it represents and then the latitude and longitude of its location. We're going to open this file for reading with open 10_02_us.csv. Open it in read mode, obviously. And then we're going to take that file object and pass it to a new csv.reader. All right, now, this reader object isn't a list. If you look at the type, it's actually a CSV reader class but we can use it as like you would use a list and it's an iterable. So we can do for row in reader print row, and there you go. You get all the rows printed out. Now notice something funny about this, didn't quite parse this correctly, and that's because this is not the traditional comma delimited, comma separated value that you're used to seeing. The CSV file actually contains tab separated values. So all of these are tabs. So we can fix this output by taking the spac slash T and passing it in as a delimiter argument, backslash T right there. And then you see that all those values get split up, and by default this will parse comma separated values correctly. But if you have something other than a comma, you need to put in that delimiter specifically. You can also see that the first row that gets printed out here is the header. And if you want to skip the header, the CSV reader also has a neat function you can use called next. So all we have to do is call next reader, and then that header gets skipped over. So our reader actually has sort of an internal bookmark that keeps track of where you are. So you can call next multiple times and it'll skip over that row for you. Of course, you can also just convert this to a list. So if we say list CSV reader, we don't have to call next, we can just use the list slicing syntax like that and that will also skip over the header. The CSV module definitely has a concept of headers though, so if you want to use that header data you might consider the dict reader. So csv.DictReader, we're going to keep that same delimiter. All right, so notice that this header doesn't get printed out as a row of data, but it's actually used as the keys in each dictionary in this list. And this list of dictionaries can be a really handy data format to work with in Python. So let's convert this from a reader object to a list object. Let's just call it data. Okay, so now we have some data that we can work with. Now I'm in the market for some prime real estate. And so I'm really interested in finding postal codes that are only divisible by one in themselves. You know, prime. So I borrowed some code that we wrote previously and what this does is it just gets all the prime numbers between 2 and 99,999. Remember, postal codes can start with 0. So if a postal code is say 02155, my hometown of Medford, Massachusetts, that would be equivalent to 2,155 which is divisible by five and therefore not prime. So let's filter these to only the prime locations. So data equals row for row in data if int data postal code in primes. I also don't want to buy anything out of state so I'm going to limit my search to Massachusetts and row state code equals MA. And let's print out the length of data. Whoops, row, there we go. 91, so it looks like we found 91 prime postal codes in Massachusetts and I want to write this all back to CSV file to send to my real estate agent. You know, real estate agents love CSV files. So with open 10_02, let's call it ma_prime.csv we're going to open this for writing, this F. Okay. And then we're going to create a new CSV writer, CSV dot writer, and then pass this file name in for row in data. So our data from up here, writer.writerow. Now we have to pass in the row as a list, so we get to decide which values we want here. What does my real estate agent need? Maybe a place name and a county. So we could make the delimiter tabs again by passing in a delimiter keyword argument to the writer here. But by default, a comma will be used, which is what I'd prefer anyway. So let's just use that. Okay, now let's go over here and see what we've got. Hmm, there you go. Look at all of this prime real estate. Barnstable. How many agricultural buildings can you fit into town name?

内容