Altdata Ideas You Can Leverage Today
Here are some actionable ideas to get you thinking about how to leverage altdata

Altdata Ideas You Can Leverage Today

This is Part 3 of a series of articles on leveraging alternative datasets to provide lift. Part 1 is an overview of altdata and Part 2 is how data sharing is displacing ETL processes and shrinking time-to-value in data projects.

Data-driven companies are leveraging alternative datasets (I call it?altdata) to provide "lift" to their analytics processes. In this article I'll give you some altdata ideas that you may want to integrate into your value-stream. Every day at the Microsoft Technology Center (MTC) we help our customers ingest altdata-sets and determine quickly if they provide value. If you are a Microsoft customer we can help you too, contact me on LinkedIn. These ideas are in no particular order.

altdata Ideas for Any Industry

  • social media sentiment analysis: most social media outlets allow you to programmatically query real-time feeds. We can do simple things like sentiment analysis against a hashtag or we can look at general trends that may be of interest
  • scrapes from websites. Websites contain lots of interesting information in an unstructured format. We can scrape that data and do Natural Language Processing (NLP) against it. For instance, we can look for terms or key phrases. Another common use case is doing a web search for an entity or person and then performing sentiment analysis against the first page of search results. This can tell us a lot of interesting things about the entity/person. Is the entity in the news? If so, for what?
  • satellite data. This data can give you a lot of valuable information about your customer. Let's say you sell pool supplies in your local community. You could do a mass mailing to the region, or you could target specific houses where you could identify pools from the satellite imagery. I actually did this work for a customer once. From a "ten thousand foot level" I could quickly determine what localities to target:

No alt text provided for this image

If I zoom in to the street level you can see that I used our?Azure Cognitive Services?and AI to identify the pools and correlate them with?Bing Maps?to get their addresses. With the addresses I could quickly do a lookup to my sales database to determine if the address was already a customer (the yellow square) or not a customer (the red square). Note: it's not perfect, it clearly missed a pool...but this is something I created in just a few days.

No alt text provided for this image

  • hobbyist drone data (or any other image data). Similar to the satellite data, there are lots of use cases where a birds-eye view of a space can provide you with value

There are lots of stories about hedge funds using drone data to determine?retail store traffic?or?the number of container ships in Chinese ports. This can provide broad strokes on the health of a region or the economy in general, or we can laser focus to a particular address.

  • images: Image data can come from anywhere: web searches, smart phones. Many times the images have embedded metadata in?Exif format?(a 20 year old standard) that will tell you interesting things about the image: the geotag, any comments, the make/model of the phone. All of this information?might?be valuable to your use case.
  • weather data. There are so many ways weather may provide lift. A quick internet search will give you lots of ideas.
  • real estate data. What could you do with real estate data? There are so many use cases. The?MLS?(Multiple Listing System) has datasets available for residential real estate. They hold a monopoly on their data and charge accordingly. But if your company sold backyard pool supplies, wouldn't it be interesting to find all of the local homes for sale with in-ground pools? There is no equivalent MLS for commercial real estate. This would be an excellent business model for a startup to capitalize on. In the interim, you could create proxies for commercial real estate activity by looking for other altdata-sets. Home-sharing firms like airbnb and vrbo have datasets available for purchase. This may help determine regional trends and economic activity.

The App Ecosystem

Think of all the apps you run on your smartphone. On the surface it would seem the business model for many of them is ad-driven. They monetize the ad impressions and are carefully targeting the CTR (clickthrough rate). But that's only part of the story. Many will sell this data to you to semantically enrich your existing data.

Thought experiment: out of the apps you use, which ones might be tracking users in a way that would provide lift to YOUR business?

Here's one example: Uber. With the user's permission, Uber (likely Lyft too, but I'm not sure) sells location data to food and retail industry players. Other companies can leverage this data to provide discounts and promotions personalized to the specific customer.

Companies that have already monetized their data

What is Digital Transformation? My definition is simple. A digitally-transformed company has learned how to monetize its data. That could mean leveraging its data to control costs or increase revenue, but at the extreme it means the company is selling its valuable data assets to others.

Thought experiment: If you had access to any one single company's data assets, which one would it be and why? Now, go research if that company will sell its data to you.

Here are some companies that have monetized their data:

  • SmartTV manufacturers have been capturing the IoT-style data from each TV for years. Every time you change the channel, turn the device ON/OFF, change the volume, etc, the manufacturer knows it.?These manufacturers make more money selling the data YOU generate for them than they make selling the actual hardware.?This is a major reason why prices are falling precipitously. The manufacturers NEED you to upgrade to even smarter, newer units with more built-in apps.?Did you know that apps like Roku, Netflix, and Amazon Prime pay the manufacturers to have their apps installed in the factory??The data is _that_ valuable. How can you leverage this altdata?
  • Payment card processors?make more money monetizing their datasets than they do on the transaction fees. I would think this would bring transaction fees down in the US, which have the highest fees in the world, but it hasn't.
  • American Airlines' data?is now valued higher than the airline itself! This astounds me. With all of their capital assets (which are tangible assets on their balance sheet) the intangible assets (data) are worth more.

There is a general trend in the worldwide economy where intangible assets are a higher percentage of the balance sheet than ever. The biggest factor is likely monetized data, but someone should do some research to confirm this. What is really interesting is that tangible assets depreciate over time. Intangible assets, like data, don't. How can you leverage this asset class?

The Financial Services industry loves seeking "alpha" (the industry's equivalent term for "lift") in altdata sources. Some interesting altadata ideas for the finance industry:

  • financial reports and SEC filings. These documents are available freely from many government websites, for free. The easiest to use is EDGAR. You can find various filings in pdf format. The data can be scraped and added to a data lake where we can do interesting analytics like `Named Entity Recognition` or simple sentiment analysis. We can also look for specific phrases and terms.
  • private company data. Dun & Bradstreet is the de facto standard on private company data and commercial credit.
  • carbon footprint measurements. ESG investing is hot right now and every company is trying to change its perception that it is environmentally-friendly. Even BP changed their logo to appear more "green". How can we measure the carbon footprint of a company given the available altdata in the marketplace??

No alt text provided for this image


  • LinkedIn provides lots of datasets. The investment industry leverages this data for simple use cases like monitoring employee counts and openings. How could you leverage LinkedIn data to semantically-enrich your data?

Risk Analytics

Customers always ask me for interesting ideas for altdata in their industry. Fact is, I don't know your industry as well as you do. Any ideas I may have, you've probably researched. My response is to think of use cases that your competition might not also be researching. A big area to focus on is Risk Management. Finding altdata that can mitigate risk should provide lift. Every industry has different risk management profiles, but let's look at an example to get you thinking creatively.

Cambridge Mobile Telematics?recently acquired TrueMotion. Both companies provide vehicle telematics data to auto insurers to reduce risk. Well, why couldn't you leverage similar data? Traditional auto insurance risk rating factors such as age, gender, credit score, zip code data, moving violations, and type of vehicle are less predictive of accident risk than actually looking at driver behavior...via OBDII (your vehicle's computer) on-board vehicle telematics. Those traditional risk rating factors are just proxies for likely driver behavior. Younger drivers tend to be more risky, as are middle aged men driving red sports cars. Or, that's the theory.

I will NEVER install a telematics device in my vehicle that will send data to my insurer. I can assure you that my risk profile using the traditional rating factors is much, Much, MUCH better than my actual driving behaviors. (I probably shouldn't admit that for fear LinkedIn will sell that data to my insurer).

CMT will likely create additional datasets to monetize for other industries than just auto insurance. You might be able to glean valuable insights about your customer if you knew their driving habits. How can knowing my customers' risky behaviors provide me with competitive advantage? The bulk of CMT's employees are data professionals, I'm sure they are dreaming up new data monetization avenues.

You can acquire telematic driving altdata from lots of vendors.

  • mobile-phone apps ask for your location data (and are likely tracking it)
  • insurance company-provided dongles
  • aftermarket blackboxes and in-car video

Thought experiment: Who better to provide auto insurance that the auto manufacturers that have access to all of your vehicle telematics, service history, credit, etc??General Motors has announced?they are planning to offer their own auto insurance that they will bundle with OnStar. Brill-yunt! They are monetizing their data. That is Digital Transformation!

Banking and insurance are highly-regulated industries and tend to be slow-to-change based on necessity. This has allowed innovators from micro-lenders to payment processors to leverage data and invest heavily in digital services. One of the enablers of this trend is better risk management from altdata.

These companies are leveraging altdata like:

  • prescription-drug histories
  • EHR/EMR records
  • DMV records
  • property records
  • life insurance clearinghouse data from people's previous applications.

Yep, all of this data, in some de-identified fashion, is available for purchase. Does that surprise you?

Consumption Data Analytics

Consumption Data Analytics?is the aggregating of online and offline (brick-and-mortar) consumer purchase activity, merged with consumer behavioral datasets, geolocation data (where was your smart phone when you made that online purchase), and other point-of-sale vendor data (also available for a fee).

Consumption data is its own category of altdata. Right now this is huge in financial services but its potential is enormous. Quite simply,?consumption data?is business transaction-related information that can augment your predictive analytics.

Where can you get offline purchase activity? Well, the credit card companies (among many others) provide various levels of aggregated datasets for sale. This includes offline purchase activity.

Consumption Data Analytics in 2021 focuses on consumer consumption. I expect that to slowly shift to B2B consumption behaviors. An example: right now we have a global computer chip shortage. There are theories as to why that is, but if I am an automobile manufacturer that relies on certain chips for my vehicles, I want to know if my chip supplier is themselves experiencing supply chain issues so I can plan accordingly.

Data Exhaust

No alt text provided for this image

Data exhaust?is the trail of data that remains after a business activity has occurred on a computing system. Data exhaust provides valuable insights. Some examples:

  • web server logs: this can tell you how long a consumer browsed your site before making a purchase, how long an item remained in their shopping cart before it was abandoned, etc.
  • cookie data: both 1st and 3rd party cookie data will provide valuable information about your customers. Did you know that by default your browser throws off so much metadata about you that the average marketer can likely identify you with no additional data? This is called?fingerprinting.

Data exhaust is a great way to understand the behaviors of your customers...and your potential customers.

Treat your software like IoT data. It is throwing off a lot of interesting browsing events for your users. If you can ingest that data and react to it in real-time you should be able to provide a better experience for your users.

Consumer-profile data

If you are a B2C company where your customer is a consumer then you need to know as much about them as possible.

  • credit card transaction data: Who knows more about consumers than credit card companies? Card issuers provide altdata-sets of transaction history that is valuable to determine wants, desires, and trends. The data is always anonymized but you can still gain valuable insights depending on how you slice-and-dice the data.
  • credit reporting agencies: The Big Three credit reporting agencies will sell you data and services to help you target consumer demographics for your marketing campaigns based on interesting metrics like purchase data.?Experian?will actually provide you with software that performs the consumer targeting, but I'd rather have access to the raw data so I can make my own unique matching algorithms.
  • data aggregators:?Acxiom?is an example of a 3rd party data provider that will license data to you about consumers from various other 3rd party data sources. Then they validate the data and help you enrich your existing consumer profile data.

The grocery industry has mastered consumer-profile data and it might be worthwhile to research how they do customer analytics. Grocers and CPG suppliers have been sharing data for years to learn about shopper habits and their shopping journey. Stores are analyzing broad buying trends to prevent shortages like we saw with toilet paper and Lysol during the early days of the pandemic. CPG companies can leverage the POS data from the grocers to generate better consumer engagement and product offers and determine brand loyalty (which also suffered during the early pandemic).

Economy and Economic Data

Economic data that broadly shows the state of the economy and your industry is very valuable. Imagine you are a homebuilder...could you get a competitive advantage by knowing that lumber prices are forecast to rise substantially over the next few years because an invasive bug species is decimating Douglas Fir trees in the Pacific Northwest?

Jobs reports and inflation data are commonly used in many industries. If you are a QSR (Quick Serve Restaurant) it's valuable to understand the?prevailing wage?in your area. How will this affect your margins?

Advertising Data

Nielsen?is a century-old research firm that measures TV viewership, among MANY other things. They are a monopoly for this data and they provide different datasets for lots of different use cases. Recently they created a new dataset that allows them to make comparisons of how many people are streaming entertainment vs watching traditional broadcast channels. This could be beneficial to your next marketing campaign.

Advertisers have been using altdata for years (sometimes called?incidental data), they just struggle to integrate it into their value-stream. Usually the integration is done on a one-time basis, usually in Excel. We can do better.

Unstructured Data

All data has structure, otherwise it's worthless, but?unstructured data?has come to mean data like images, pdfs, and video where you can extract value creatively. I mentioned above that many images have metadata that you can extract.

Every organization has a wealth of data that doesn't sit in a traditional database. This means it's difficult to do analytics on it. I call this?latent data. It has value, but it's difficult to extract. If you can find this latent data in your organization you can leverage it with your structured data. Examples:

  • PDFs, Word docs, Excel spreadsheets, business forms, etc.?Azure's Cognitive Services?can help you extract data from file-based data sources.
  • handwritten notes, operator logs, user journals, etc. Handwriting recognition is a solved problem, you should leverage it.

At the MTC we work with a lot of manufacturing companies. Each one has stressed that they have what I call a?shifting demographics?problem. They have older workers nearing retirement and the younger generations are not interested in doing those dirty, manual labor jobs anymore. Recently, companies have been deploying IoT solutions to understand how they can automate some of these processes. Another approach is to look at all of the handwritten operator logs that these workers have maintained for decades and may not be digitized even today.?Azure's Cognitive Services?can OCR even the worst handwriting, allowing you to use NLP to find the patterns in the notes.

  • web scrapes. Companies scrape webpage data for lots of reasons, but essentially they want to find valuable data that is locked up in the html. Examples:
  • scraping pricing information from your competitors' website. There are companies that scrape industry-specific websites (they probably scrape your website already) in order to resell the data back to you! Why? They provide analytics that compare your data to your competition and provide valuable intel. Sometimes it's as simple as telling you what a given price for a particular product should be during a certain time of day. This is called?competitive analysis. These altdata-sets will show industry trends, growth rates, and demographics.

The Hottest altdata Trend Today

Don't value judge me... I think we are living in the most contentious, politically-charged environment ever. Probably everyone throughout history has said that though.

No alt text provided for this image

Now, imagine you are targeting me as a potential high value customer lead. Your analytics state that my CLV (customer lifetime value) is 2x your average customer lead. You've collected all of the common demographics about me using altdata and existing transactional data. Would you agree that you might want to tailor your advertising to me if you knew what my political views were? Well, you can't know who I voted for in the last Presidential election (supposedly we have a secret ballot), but in most areas you CAN determine my party registration. And voter registration lists are free in most areas (or there's a nominal processing fee). There are aggregator firms that will sell you this data too.

Voter registration data, I believe, will be the hottest altdata-set in the near future.

Become Data-Driven at the MTC

Are you convinced that your company is ready to leverage some of these altdata ideas?

No alt text provided for this image

I am a Microsoft Technology Center (MTC) Architect focused on data solutions. The MTCs are a service Microsoft provides to our customers. We strive to be the Trusted Advisors for our customers. Others have Know-How, we have Know-What. We want to understand your business problems and ideas for altdata analytics. Then, we'll help you ingest and enrich the data using our cloud solutions. Technology alone cannot solve these problems without smart people and processes that work. We offer services ranging from human-centered Design Thinking Workshops -- where we help you determine which use cases are the best for altdata -- to hackathons where we quickly ingest some altdata, do the semantic enrichment with you, and quickly determine if the altdata provides lift.

Listen, we aren't experts in your business, but we are great enablers. Within a few days we can build a rapid prototype and show you the Art of the Possible. We'll show you what it takes to start a data sharing initiative and we'll help you solve data problems in days that would've taken months in the past.

Does that sound compelling? Contact me on LinkedIn and we'll get you started on your journey.


要查看或添加评论,请登录

Dave Wentzel的更多文章

  • MTC Data Science-as-a-Service

    MTC Data Science-as-a-Service

    I get a little bored this time of year. I'm a data scientist (among other things) for the Microsoft Technology Center.

  • Top 10 Data Governance Anti-Patterns for Analytics

    Top 10 Data Governance Anti-Patterns for Analytics

    At the Microsoft Technology Center (MTC) we talk to a lot of data leaders that are struggling to leverage their…

    3 条评论
  • The Dashboard is Dead, Probably?

    The Dashboard is Dead, Probably?

    There's a movement by a few data analytics vendors (here's one) that says, "dashboards are dead." Most of this is slick…

    3 条评论
  • Do This Before You Outsource Your Next Analytics Project

    Do This Before You Outsource Your Next Analytics Project

    Were you satisfied with your last outsourced data and analytics project? Did it provide the value you were hoping? When…

    1 条评论
  • Data-Driven Customer Lifetime Value

    Data-Driven Customer Lifetime Value

    Business is changing and the customer is the focal point now more than ever. Customers understand they have access to…

    1 条评论
  • Data Sharing as a Replacement for ETL

    Data Sharing as a Replacement for ETL

    This is Part 2 of a series of articles on leveraging alternative datasets to provide lift. Part 1 is an overview of…

    1 条评论
  • Gaining Information Edge with AltData

    Gaining Information Edge with AltData

    "Lift" is something every data scientist and business person strives for. Getting better data is one approach to adding…

    2 条评论
  • Design Thinking for Data and AI Projects

    Design Thinking for Data and AI Projects

    Data science and AI projects are risky. We should leverage anything that removes risk, solves our users' problems, and…

    1 条评论
  • Data Kwality Does NOT Matter

    Data Kwality Does NOT Matter

    That's serious clickbait. Let me explain my position.

  • Build vs Die

    Build vs Die

    In 2021 every company needs to be a Digital Company. At the Microsoft Technology Centers we are seeing that the most…

    1 条评论

社区洞察

其他会员也浏览了