Technical Guide to Foursquare Places (Part 2): How does Foursquare Get Location Data Right?
Foursquare
We're a location technology platform, inventing the future with developer tools, enterprise solutions and consumer apps.
Location data is hard to get right. In Part 1 of this Technical Guide, we outlined the importance of high-quality points of interest data and various factors to consider when investing in a location data provider.?What sets Foursquare Places apart is the rigorous process to guarantee the precision and reliability of our POI dataset.
Where does Foursquare POI data come from?
FSQ Places stands out as the sole provider of POI data with a comprehensive global representation of the world derived directly from firsthand information, drawing upon more than 14 billion distinct check-ins gathered through Foursquare’s consumer apps, City Guide and Swarm. These billions of check-ins power our dataset’s rich attributes, including tips, tastes, and photos. On top of our unique user-generated submissions, we programmatically crawl thousands of authoritative sources, including web resources, third-party partners, and feedback from business owners. To ensure near 100% coverage, we also team up with trusted listing syndicators who update millions of location data daily on behalf of brands, chains, and small business owners.?
How does Foursquare ensure the accuracy of data?
Extract and Clean
Once we’ve curated the initial data, billions of raw data points undergo extraction, structuring, cleaning, and validating procedures. The process kicks off by identifying potential matches with existing POI within our database, using key attributes as indexing criteria. Subsequently, our models generate a similarity score. This model, responsible for resolving similarities between two POIs, is constantly improving through training techniques that use datasets from various countries and languages, all with human-reviewed labels. We use this similarity score to link the input data to the place that shares the most similarity. In instances where a source input doesn’t correspond to any existing POI, this situation prompts the initiation of a new POI entry.
Cluster and Summarize
Our clustering process organizes the cleaned non-unique data points into unique and accurate POI entities. Each attribute value within the same cluster is given a confidence score. Within each cluster, every attribute value is assigned a confidence score. In this stage of the process, we employ a range of methods and strategies, a) weighted mode summarization; and b) model-based summarization. Weighted mode summarization employs consensus voting in a source cluster to pick the most frequently suggested value. Meanwhile, model-based summarization relies on spatial context such as adjacent roads or buildings to determine the geocode from the list of candidate inputs from various sources.?
Calibration and Filtration
Assessing the quality of every point of interest in the dataset is a crucial task when maintaining a dependable location dataset. Foursquare has engineered several models to gauge how accurately each place in our dataset matches its real-world counterpart. We utilize the following scores to consistently detect errors and refine our data.
After assigning scores to all the place records in our dataset, we conduct a series of verifications to ensure that each record is eligible for inclusion in the final dataset we deliver. The factors or conditions we consider are: a) the key attributes of a place record are available; b) the reality score of a place and the accuracy of key attributes meet specific criteria; and c) each attribute has at least one credible source. We also verify that the details in a record logically fit together, such as the zip code matching the city. If a record successfully passes all these checks, we include it in the dataset we provide to our customers.
Quality Assurance and Release
Foursquare ensures that meticulous QA checks are performed at every step of the entire process. All data changes are evaluated using our....