Does where we workout matter?     Can we use code to live healthier?         An exploration using Foursquare API and folium
Photo by Thabang on Unsplash

Does where we workout matter? Can we use code to live healthier? An exploration using Foursquare API and folium

The Burning Question

Like many of you, I ?? food. Always have, and I’m thankful that it comes in so many wonderful flavors, shapes, and sizes! I’ve had a love-hate relationship with Fast Food over the years. I mean, who can resist a tasty burger?

No alt text provided for this image

Yet, while my mind enjoyed the idea of trying out a new special that was advertised, my body would oblige silently without resistance for a while until it got to a point where it couldn’t remain silent anymore.

The US is the birthplace of Fast Food. Growing up in the Eastern part of the world, I had no idea how in-your-face it was until I landed on US soil. In 2012 alone, the industry spent?$4.6?B-like-boyz-n-the-hood?Billion dollars on advertising their products to you. That was 10 years ago. Given the growth we’re seeing, one can assume that those numbers may be considered conservative today.

No alt text provided for this image

Don’t get me wrong, I still enjoy a good Fast Food meal from time to time but over the years, I’ve learned to be a bit more conscious and caring to my body.

Which got me thinking…?How can anyone who lives anywhere in today’s advertisement filled world, especially in the US, stay healthy??

Sounds impossible — unless, you put yourself through ridiculous amounts of self-determination or go live in a remote island like Tom Hanks.

You walk out the door and there are billboards and temptations everywhere! Just trying to draw you in with well-crafted words by the marketing teams of these mammoth corporations. Sure, you can resist for a while but imagine waking up everyday to this.

No alt text provided for this image

Even worse is when you try to give into the health craze. You decide to get fit and you head to workout at a local gym. You leave the gym after an intense session. You get in your car and start to head home and then you see this..

You decide to give in, because you deserve it after going through all that pain. However, little do you realize, that what you just consumed not only exceeds all the work you just put in, but now, has made you unhealthier as well.

The Thought

All this got me thinking. What if I could navigate better in this temptation filled world?

It’s not realistic to think that I can avoid Fast Food advertising completely, but what if where I choose to live, more specifically — where I choose to workout, has an implication on if I give into cravings or not afterwards.

Enter Foursquare API & Folium.

Developer | Foursquare — Independent Location Data Platform

The Foursquare API is an independent and global, location based platform that collects user-generated information via their app and other sources for public points of interest into a streamlined database. Developers can access this data by creating an account on their platform.

I used the API to return public venue information based on geospatial data (latitudes, longitudes).

Folium-Folium 0.12.1 documentation (python-visualization.github.io)

folium?builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the?leaflet.js?library. Manipulate your data in Python, then visualize it in on a Leaflet map via?folium.1

Case Study

Anna is a 24 year-old from Raleigh, NC who works in healthcare. She enjoys the East Coast life but is considering a shift to the West Coast for the next step in her career and a curiosity to experience life elsewhere. Anna, like many of us, has struggled with the temptations of advertising over the years and wants to live a healthier lifestyle.

She doesn’t know much about where would be a good Neighborhood to move to in L.A. She comes to us for help regarding this matter.

Our goal is to find the ideal neighborhood areas that would foster healthy behaviors.

Data Sources:

In addition to the Public Venue data that we would obtain from the Foursquare API, we need more quantifiable information to build our solutions visually.

  • List of Neighborhoods in L.A.?— The L.A. Times have compiled an accurate list of all the Neighborhoods and Regions of Los Angeles.
  • Department of Public Health Data, L.A. County?— Each Neighborhood has historical information that may be of value to us. L.A. County stores Health datasets available for public use. The two main indicators we will focus on are Obesity Rate and the Percentage of Adults that meet the Recommended Guidelines for Physical Activity.
  • Geospatial Data for Los Angeles — Based on the Neighborhood data we import, we will also collect latitude and longitudinal information using the?geopy?and?opencage?packages.
  • GeoJSON Data for L.A. County — To create choropleth maps in Folium, we need GeoJSON Data for Los Angeles, which we will acquire from the?UCLA GeoPortal.

That was all a mouthful. Ok, let’s load it all in.

No alt text provided for this image

Data Imports

To import the Neighborhood List from L.A. Times, we will be using the BeautifulSoup?package which has the ability to pull data out of HTML and XML files.

No alt text provided for this image
No alt text provided for this image

Much of the time spent working with Data revolves around cleaning messy data and ensuring that it’s ready for manipulation. Skipping this step would only lead to inaccuracies during our analysis later on.

The Public Health Data can be downloaded into?.xlsx?files which we can then import using?pandas.

As previously mentioned, the two key indicators we want to focus on are?Obesity Rate?and?Healthy Adult Percentage?(or the Percentage of Adults who meet Recommended Guidelines for Physical Activity). Luckily, the Department of Public Health has this data split by indicators. All we have to do is bring them in and combine.

No alt text provided for this image

Next up, we need to get the boundaries.

A GeoJSON is a type of JSON format that is stored as a dictionary with co-ordinate data for polygons that will form into boundaries for any geographical region — essentially a mixture of spatial and non-spatial attributes for any location. In our case, this would be for LA County.

No alt text provided for this image

The data that came in from these sources was a bit messy. After ages of tinkering (which I won’t go into detail here) and staring into the screen, we were able to clean it up to represent well.

Getting Geographical Coordinates

To visualize Los Angeles on a Folium map, we need to get it’s co-ordinates first. This is where?geopy?and?geocodercomes in handy. These packages are able to convert geographical location names into their respective lat-long values. Pretty handy.

Our DataFrame’s contain names of Neighborhoods as well. Let’s run them through a loop with the packages to get the necessary spatial values.

No alt text provided for this image

The Magic of Foursquare

The Foursquare API is a giant database of every public point-of-interest you may be aware of, especially for the U.S. Having this information is a valuable tool when it comes to generating insights for any industry.

To get going with the API, you need to create a Dev account with the platform first. Once logged in, the platform will allow you to create a new app. This new app will give you two key pieces of information,?CLIENT_ID&?CLIENT_SECRET, which you need to later copy to your code.

CLIENT_ID?= This will be your Foursquare ID

CLIENT_SECRET = This will be your Foursquare Secret

More info on setup can be found?here.

Now, onto the API Calls.

Once we have the required information declared, we need to make the API calls to get the venue information from the server.

You can do a simple call to get all the venues for a specific set of co-ordinates using the following url

url = https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}        

If you want to filter the data and obtain venues for only specific types, such as in our case, Fast Food Restaurants and Fitness Centers, you can refer to their Venue Category?documentation?which lists all the ID’s of each category.

You can then change the url to include each category type.

url = https://api.foursquare.com/v2/venues/explore?categoryId={}&intent=browse&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}        

where the?categoryid={}?allows you to specify which ID’s you want to request.

Oh and one more thing to keep in mind, The API has a?limit?on the number of calls you can make per day on the Sandbox Tier Account. Bear that in mind for large datasets.

Let’s take a look at the API Data we received.

No alt text provided for this image

Great job! You’ve successfully got all the heavy-lifting out of the way. Now onto the fun part.


Plotting the Maps using Folium

folium?is a great visualization tool when it comes to geospatial data. What I love about it is how easily you can visualize your information in multiple ways.

Let’s look at a simple map generated with?folium. There are several map types as well, which?folium?calls, tiles. They each serve a different purpose aesthetically or functionally.

For example, stamen terrain helps visualize the vegetational levels of each location while stamen toner, may introduce sharp contrasts between land-bodies and water-bodies.

No alt text provided for this image

To start with on our data, let’s plot a simple map with Folium using just the lat-long values of all the Neighborhoods in our DataFrame.

No alt text provided for this image

We can see how each of the Neighborhoods are spread out on the map. Even niftier with Folium is that we can create a custom label for each point as we see with the Neighborhood of Bel-Air here in the Westside Region.

Now, let’s take a deeper look into our Case Study.

Anna wants to maintain a healthy lifestyle when moving to L.A. However, the location of Fast Food restaurants and Fitness Centers is probably not the first thing on her mind when it comes to considering Neighborhoods.

But, given that we’ve already seen how much advertising prevails in the United States with Billboards everywhere and many in front of Fast Food Restaurants, we want to avoid areas with a high density of Fast Food Restaurants. We also aren’t then necessarily looking for an area with high density of Fitness Centers. Because, a Neighborhood that may contain a high density of Fitness Centers may also contain an equally high density of Fast Food restaurants as well. This would be counter-productive to our approach.

We want to isolate areas that would have a low density of Fast Food restaurants and the presence of at least a few Fitness Centers.

There are?two?assumptions that we base our approach on:

  1. Visiting and Leaving Fitness Centers in the vicinity of greater number of fast-food restaurants?reduces the motivation?of the participant to return to the fitness center OR?increases the probability?of visiting a fast-food restaurant after a workout.
  2. Neighborhoods with a larger number of fast-food restaurants have a?higher obesity rate.

Let’s see what the data shows.

Now, to map every little fast food joint and gym on the map would probably blow up my computer. Hence, we’ve acquired a smaller sample size from the API.

The blue points represent Fitness Centers and the red points, Fast Food Restaurants.

From this initial observation alone, we can see some clear differences in the way the establishments are spread out.

No alt text provided for this image

We see five possible Neighborhoods that have a lower density of Fast Food options compared to the others.

Santa Monica | Manhattan Beach | Rancho Palos Verdes | Downtown L.A. | South Pasadena

Now, it’s easy to display a map showing the Obesity Rates or the Healthy Adult Percentages we’ve collected. We can even tell?folium?to control the size of each point each point that appears on the map. We can have higher rates to show up as larger such as this map below.

No alt text provided for this image

This is neat.

But where?folium?really shines is when it comes to Choropleth maps.

What’s a Choropleth map?

A?choropleth map?(from?Greek?χ?ρο? choros?‘area/region’ and?πλ?θο? plethos?‘multitude’) is a type of?thematic map?in which a set of pre-defined areas is colored or patterned in proportion to a statistical variable that represents an aggregate summary of a geographic characteristic within each area, such as?population density?or?per-capita income.2

It’s a great way to visually see how the data is spread across the map.

But, to create a Choropleth map, we need the data of the boundaries of Los Angeles. Luckily, we did that earlier by importing in the GeoJSON file.

folium?seamlessly works with both the geospatial data and our DataFrame so that we can get a visually functional map.

One key element to fill within the folium structure will be the?key_on?parameter. The?key_on?parameter will be a location within our GeoJSON file where the column to represent on the map is selected. Usually, it’s the name of the boundary, and found in a structure location similar to?feature.properties.name

Explore your GeoJSON to get an idea where this may be.

Let’s take a look at how Obesity Rates vary across Los Angeles. You can see that?folium?automatically creates a legend for us as well.

No alt text provided for this image

From the Obesity Rate data, we can see that:

The highest Obesity Rates were found in these Neighborhoods

  • Carson, Harbor | Compton, Southeast | La Mirada, Southeast | Valinda, San Gabriel Valley

The lowest Obesity Rates were found in these Neighborhoods

  • Arcadia, San Gabriel Valley | San Gabriel, San Gabriel Valley | Manhattan Beach, South Bay | Pacific Palisades, Westside

Interesting..?San Gabriel Valley?seems to spike on both spectrums. Additionally, Southern L.A. seems to be an area that we might want to avoid.

On the other end,?Manhattan Beach?and the?Westside?areas that we saw upon first impression were possible ideal locations seem to correlate with our Obesity data.

Now, let’s take a look at how Healthy Adult Percentage Rates vary across Los Angeles.

No alt text provided for this image

From the Healthy Adult data, we can see that:

The highest Healthy Adult Percentage Rates were found in these Neighborhoods

  • Beverly Hills, Westside | West Hollywood, Central LA | Manhattan Beach , South Bay | Santa Monica, Westside

The lowest Healthy Adult Percentage Rates were found in these Neighborhoods

  • Rosemead, San Gabriel Valley | Monterey Park, San Gabriel Valley | Cerritos, Southeast | Rowland Heights, San Gabriel Valley

That’s a third hit for?Manhattan Beach! Could this be an ideal location for Anna? The other Westside areas seem promising too.

Finally, let’s take a look at one more cool feature of?folium.

Heat Maps.

Yes, you heard that right. I don’t know why but I’ve always had a fascination with heat maps growing up. Maybe it’s their glorified use in many movies that we’ve come to watch growing up. They’re also simply, easy to understand and that’s important when it comes to helping our audience connect with the data.

To get a more accurate visual using a Heat Map, we’ve run another API call to get a larger sample size of just the Venue Categories involving Gym/Fitness Center and Fast Food Restaurants. This would be using the method isolating the category ID’s. Bear in mind the larger loading times and save your work accordingly.

No alt text provided for this image

Some of the Neighborhoods that stand out in the Heat Map for Fast Food chains are:

  • West Hollywood | Inglewood | South Pasadena | Downey | Los Alamitos | Manhattan Beach | Rancho Palos Verdes

Note: We are looking for green spots as our ideal since we want a lower density of Fast Food chains for Anna.

Some of the Neighborhoods that stand out in the Heat Map for Fitness spots are:

  • West Hollywood | Inglewood | Torrance| Downey | San Gabriel Valley| Manhattan Beach | Long Beach


Recommendations & Conclusion

From our observations, it’s clear that there are a few Neighborhoods that seem to stand out as ideal for Anna.

Some ideal choices would be Neighborhoods on the West Side of L.A. such as the Santa Monica Area, the Hollywood Hills, Rancho Palos Verdes, Manhattan Beach, or further down in the South such the Long Beach area, or in the North such as the South Pasadena area.

These communities have a low Obesity Rate and a high Physical Activity rate among its inhabitants prompting Anna to engage more in a healthier lifestyle. They also tend to have a lower number of Fast Food options while boasting a decent amount of Fitness options as well. This would make her drives to/from a workout probably less distracted by the large number of establishments which may hinder her health goals.

If Anna can afford a more expensive lifestyle, she can choose to move to areas such as Santa Monica or Hollywood Hills. If she wants to reduce costs, she can consider areas such as Manhattan Beach, South Pasadena, or Ranchos Palos Verdes.

Overall,?Manhattan Beach?seems to be the best location for her first move to L.A.
No alt text provided for this image

The approach we took can definitely be taken further or reconsidered. Some additional things to consider:

  1. The number of Venues that you can collect from Foursquare increases your accuracy for better results. There could be locations outside the U.S. that may not cover venues as extensively as L.A.
  2. We covered Los Angeles on a high-level. One could take it even further by exploring each Region of L.A.
  3. Another approach could be that one could look into the route that Anna may take to/from Fitness Centers. This route can be optimized to minimize Fast Food restaurants. We have lat-long values for each establishment. Those could be brought into use in this scenario.
  4. There may be other factors that correlate better than Obesity rate such as access to healthier food options for each Neighborhood.
  5. The venue data we pulled from Foursquare consisted of one category type for each: Fast Food Restaurants and Gyms. There are other categories that we haven’t considered such as other forms of food establishments that may be unhealthy such as Donut shops or Pizza places, or other Fitness options besides a gym such as outdoor areas, pools, Tennis/Basketball courts and so on. Including these would result in a more accurate outcome.

The Foursquare API allows for great analysis on many geospatial problems. Other situations where it could be useful when answering questions include identifying the ideal location for a business such as a restaurant, identifying the safest areas to reside, and so on.


Thank you

If you read this far, Thank you. It truly means a lot that you took the time to read through this project of mine. ?

I hope you can agree with me how cool it is that we can use code to find interesting solutions to problems we may not even have thought of.

If you'd like to have a discussion on these, feel free to leave a comment or reach out to me!

Oh, and the source code can be found on my github.

Cheers!

References

  1. Folium — Folium 0.12.1 documentation (python-visualization.github.io)
  2. Choropleth map — Wikipedia

要查看或添加评论,请登录

社区洞察

其他会员也浏览了