Technical Guide to Foursquare Places (Part 2): How does Foursquare Get Location Data Right?

Technical Guide to Foursquare Places (Part 2): How does Foursquare Get Location Data Right?

Location data is hard to get right. In Part 1 of this Technical Guide, we outlined the importance of high-quality points of interest data and various factors to consider when investing in a location data provider.?What sets Foursquare Places apart is the rigorous process to guarantee the precision and reliability of our POI dataset.


Where does Foursquare POI data come from?


FSQ Places stands out as the sole provider of POI data with a comprehensive global representation of the world derived directly from firsthand information, drawing upon more than 14 billion distinct check-ins gathered through Foursquare’s consumer apps, City Guide and Swarm. These billions of check-ins power our dataset’s rich attributes, including tips, tastes, and photos. On top of our unique user-generated submissions, we programmatically crawl thousands of authoritative sources, including web resources, third-party partners, and feedback from business owners. To ensure near 100% coverage, we also team up with trusted listing syndicators who update millions of location data daily on behalf of brands, chains, and small business owners.?


How does Foursquare ensure the accuracy of data?

Extract and Clean

Once we’ve curated the initial data, billions of raw data points undergo extraction, structuring, cleaning, and validating procedures. The process kicks off by identifying potential matches with existing POI within our database, using key attributes as indexing criteria. Subsequently, our models generate a similarity score. This model, responsible for resolving similarities between two POIs, is constantly improving through training techniques that use datasets from various countries and languages, all with human-reviewed labels. We use this similarity score to link the input data to the place that shares the most similarity. In instances where a source input doesn’t correspond to any existing POI, this situation prompts the initiation of a new POI entry.


Cluster and Summarize

Our clustering process organizes the cleaned non-unique data points into unique and accurate POI entities. Each attribute value within the same cluster is given a confidence score. Within each cluster, every attribute value is assigned a confidence score. In this stage of the process, we employ a range of methods and strategies, a) weighted mode summarization; and b) model-based summarization. Weighted mode summarization employs consensus voting in a source cluster to pick the most frequently suggested value. Meanwhile, model-based summarization relies on spatial context such as adjacent roads or buildings to determine the geocode from the list of candidate inputs from various sources.?


Calibration and Filtration

Assessing the quality of every point of interest in the dataset is a crucial task when maintaining a dependable location dataset. Foursquare has engineered several models to gauge how accurately each place in our dataset matches its real-world counterpart. We utilize the following scores to consistently detect errors and refine our data.

  1. Venue Reality Score (VRS). The Venue Reality Score signifies how ‘real’ (i.e., accessible to the public at a fixed location) we think a place is. Our models generate a score based on various signals, inputs, and movement data. From there, each POI is scored from Very High to Low, indicating our confidence level. We regularly analyze places in this category to enhance the signal or identify miscalibrations by the model.
  2. Closed Score. Using time-sensitive features like check-ins, review/tip patterns, and feedback received from our API, the Closed Score denotes whether the place is currently open or closed. POIs are categorized under VeryLikelyClosed, LikelyClosed, Unsure, LikelyOpen, and VeryLikelyOpen.
  3. Attribute Accuracy Score. One of the newer measurements that we have recently started to leverage, Attribute Accuracy Score helps determine the accuracy of attributes attached to a place. The algorithm used to power this score rewards or penalizes attributes based on conflicting/agreeing sources and places them through a time decay factor to assign each attribute a confidence score. Attributes scoring in the top 99% help us to a) sort the POIs based on accuracy score of specific attributes and b) identify which sources are trustworthy and which are not.

After assigning scores to all the place records in our dataset, we conduct a series of verifications to ensure that each record is eligible for inclusion in the final dataset we deliver. The factors or conditions we consider are: a) the key attributes of a place record are available; b) the reality score of a place and the accuracy of key attributes meet specific criteria; and c) each attribute has at least one credible source. We also verify that the details in a record logically fit together, such as the zip code matching the city. If a record successfully passes all these checks, we include it in the dataset we provide to our customers.


Quality Assurance and Release

Foursquare ensures that meticulous QA checks are performed at every step of the entire process. All data changes are evaluated using our....

Read the full blog

要查看或添加评论,请登录

Foursquare的更多文章

社区洞察

其他会员也浏览了