How AI and LinkedIn Can Transform Microsoft

How AI and LinkedIn Can Transform Microsoft

To fully understand why LinkedIn's acquisition could be so transformative for Microsoft, we need to take a deep dive into the nerdy (yet quietly sexy!) world of data.

Here's the high-order bit: In a world of machine learning, uniquely valuable data is the new network effect. The right kind of data is now the force multiplier that can catapult organizations past any competitors who lack equivalent data... and if you have the right kind of data, it is VERY HARD to get equivalent data, so data is also the new barrier to entry.

How can we characterize this type of uniquely valuable data? Despite the marketing hypethe best data is not necessarily just "big"! The actual factors of uniquely valuable data that offer a sustainable advantage are:

  • Authoritative
  • Self-described
  • High coverage
  • Centralized
  • Perishable

The first two, authoritative and self-described, are highly correlated in the case of LinkedIn. Reading a businessperson's own description of her or his career gives you more signal than any secondhand account: it's one thing to know that a job candidate or potential customer went to a certain college, but more valuable to know s/he is proud enough of having been on the rowing team or in a particular fraternity to put it on a public resume. But LinkedIn also tries to recruit people who have worked with you to give their authoritative statements about your accomplishments and qualities. Note that self-reported data is not always accurate, but it is likely to be detailed; LinkedIn has done an admirable job improving data accuracy, and integration with Microsoft applications will likely make the data even more accurate and detailed.

High coverage area can vary quite a bit by domain -- for instance OpenTable's data includes only 40,000 restaurants -- but the closer the data is to 100% coverage, the better. I'm always in awe when I read about people who painstakingly construct full-coverage data sets by hand, like this professor who spent 25 years researching all 3000 known American serial killers and their 10,000 victims. LinkedIn's 450 million profiles (and growing!) gives Microsoft the other half of the treasure map: An unmatched coverage area for professionals that is basically impossible to replicate. Connecting the dots, that coverage area also includes a comprehensive collection of organizations, schools, locations, job listings, job titles, and skills.

Data that is centralized, normalized, and deduped is going to be far easier to work with than federated data, even over a smaller number of records. As we will see later on, LinkedIn's centralized data would be more valuable than Salesforce's federated data even if it were not also richer and more authoritative. Data quality and integrity are equally important as quantity -- which is why bigliness is not the be-all and end-all for data.

Finally, I need to shout out to Thomas Layton (of Metaweb, which led to Google's Knowledge Graph) for reminding me that the most valuable data is perishable and not static. New insights come from new information, and the metadata we get from tracking changes in datasets can ultimately provide as much signal as the data itself -- for example, many stock-picking strategies act on stock market movements as much as the actual prices of the stocks. And as far as sustained advantages go, static data can be copied but perishable data is ever-changing and hopefully ever-improving to reflect now.

So why is data with these characteristics so valuable right now? Because software is eating the world, and AI is eating software -- or rather AI is eating data and pooping out software. Despite a mountain of hype about GPUs, tooling, and algorithms, almost all Artificial Intelligence ultimately comes down to having access to as much good, relevant data as possible. If AI is akin to building a rocket ship, then "the fuel is the huge amounts of data we can feed to these algorithms". It's cool that Google was able to train an AI to recognize videos of kittens... but that breakthrough was based on Google knowing where to get lots and lots of kitten videos (hint: YouTube!). It's cool that you can talk to your phone, your car, and a cylinder speaker thingie in your kitchen... but they are all built on top of the same proprietary database of phonemes, collected over decades by researchers making recordings of native speakers of dozens of languages. It's cool that IBM Watson can help oncologists treat cancer patients more efficiently... but a lot of how Watson helps is by keeping on top of the 160,000 cancer research papers published every year. Machine learning is a voracious consumer of structured data, and that might in fact be its most valuable feature in the long run because we humans kinda suck at it.

Okay, let's pop back up a level to the LinkedIn acquisition. Don't buy Satya's spectacularly bland public announcement that this purchase was about improving email or what have you. Satya is clever like a fox: Microsoft clearly sees the network effects in LinkedIn's data, and they are hiding that insight in the examples of that public announcement. Meaning: one of the key real world AI data sets Microsoft did not own until now is Economic Activity, which Google gets from search, Amazon gets from shopping carts (including b2b!), and Facebook gets from likes. No business AI can learn as well without knowing economic effects, and it is no longer sufficient to just house data like SAP and Oracle software do, because any credible machine learning must have insight into the whole economy macro. Net net, such insight can make Microsoft the strongest player in CRM services -- and if LinkedIn can do that, the acquisition will be richly worthwhile.

CRM software is ripe to be disrupted by machine learning for the same reason that Willie Sutton robbed banks: because that's where the money is. What businessperson wouldn't want a program that can predict ON FIRST CONTACT which potential customers will purchase quickly, be easy to upsell, or be worth the most over their lifetimes? I don't think it's a coincidence that the global CRM market in 2015 was worth $26.3bb -- almost exactly the price Microsoft paid for LinkedIn! -- and it is the fastest-growing area of enterprise software. It is also not a coincidence that Marc Benioff of Salesforce, the market leader, wanted to acquire LinkedIn himself and was so disappointed when his part-stock offer was spurned for piles of lovely Microsoft cash that Salesforce pursued regulatory injunctions as well as looking into a potential acquisition of Twitter.

Don't underestimate how important it is for Microsoft to find a new mission and a new market with room to grow. I don't think I'm the only person who saw this great company losing relevance as it almost totally whiffed on the two biggest platforms of the last 25 years, web and mobile. Satya bought Microsoft some time by embracing the cloud via Azure (shout out to Mike Abbott and .NET services team!); but let's face it, the majority of Microsoft's cloud services came courtesy of finally moving Office and Outlook servers online. AI-inflected cloud services would be the first truly new, homegrown, exciting product for Microsoft in a very long time.

Benioff is fighting back with the splashy announcement of Salesforce's own machine learning platform, the modestly named Einstein, but the company has a problem that can't be easily solved: it doesn't actually own most data itself. What is it training on? Can the company use its customers' data for training purposes? Could Salesforce use one customer's data to improve another customer's Einstein results? Generally one would assume that customers with strong security and data integrity policies would prevent all of that. Perhaps Salesforce can use the data it holds, if properly anonymized, but it seems like all CRM vendors will have similar problems when embracing AI: How do we train across customers and not leak anything proprietary?

Furthermore, any one of Salesforce's customers has only a partial, relatively thin dataset of its own: federated data, not centralized; and neither authoritative nor self-described. Any models trained on those datasets will be poorer than those trained on LinkedIn's interconnected dataset, given equally good engineers and tools -- and let's not forget that LinkedIn practically invented "data scientist" as a job title.

Benioff himself, who is neither a fool nor prone to self-delusion, stated recently that companies buy each other for their data... and he said this knowing full well that his company doesn't own much data, so what is it going to be worth in the future? In that sense the LinkedIn acquisition marks an important turning point for the entire software business, which for most of its history measured itself by functionality. The software business saw itself as building trains that ran ever faster, with ever more features and more comfort, appealing to ever more passengers -- but without particularly caring what kind of cargo the train was hauling. Suddenly the world has changed in such a way that the cargo is more valuable than the train -- in fact the cargo might now be able to make the train. It's going to take some time to sink in.

Finally I want to go back to my point about how data is the new barrier to entry -- if only to emphasize to entrepreneurs how the bit has flipped for us too. Think about what Marc Benioff said: mergers and acquisitions are now for data. Even in academia, great research is now largely about getting access to unique high-coverage datasets. Entrepreneurs need to think from before Day One about what their unique high-coverage dataset is going to be. Don't be afraid to start collating by hand, and remember that the most valuable datasets -- LinkedIn, Amazon, Twitter, Facebook, Google -- took many years to assemble. You won't be able to compete with the bigs for breadth, so look for a new area that has not yet been mined where you can go for depth. And if your people don't yet know SQL or how to use Amazon Redshift, everyone in your company needs to learn those skills ASAP, as those tools help people connect the dots in a world of uniquely valuable data.

Ahmad Qardahji

Principal Engineer & Chief Architect

6 年

Spot on!

回复
Arun Satyan

Hyreo recruiter Co-pilot for Candidate Experience & Automation

7 年

Great read. Thnx,

回复
Kai Mildenberger

Passionately bringing science, engineering and art to business through technology.

7 年

Adam, thank you! This is excellent perspective on a very large swath of issues!

Great article too long needs to condensed!

回复
Saket Saurabh

Co-founder & CEO at Nexla. Ask me about production-grade GenAI

7 年

Very cogently put, Adam! I love analysis. The core principles on which we are building Nexla are very aligned with the insights you have shared. Thank you!

要查看或添加评论,请登录

Adam Rifkin的更多文章

  • Panda Notes on Happiness and Meaning

    Panda Notes on Happiness and Meaning

    This post began as a gift to my friend Emily Esfahani Smith, who wrote an inspiring new book about The Power of…

    282 条评论
  • Why LinkedIn is Worth $26.2 Billion to Microsoft

    Why LinkedIn is Worth $26.2 Billion to Microsoft

    LinkedIn's announcement of its blockbuster acquisition by Microsoft this summer was greeted with skepticism in many…

    12 条评论
  • What I learned from Facebook buying WhatsApp

    What I learned from Facebook buying WhatsApp

    I'm sleepless from enjoying the Sochi Olympics the last fortnight, so I wonder if I'm dreaming when I reflect that…

    108 条评论
  • How the Wizard of Oz Can Help You Make Better Decisions

    How the Wizard of Oz Can Help You Make Better Decisions

    WE'RE OFF TO SEE THE WIZARD..

    32 条评论
  • How Goofing Off Can Make You More Successful

    How Goofing Off Can Make You More Successful

    I co-authored these thoughts with my friend Emma Seppala, who serves as Associate Director at Stanford University's…

    104 条评论
  • The Basics of Power Networking

    The Basics of Power Networking

    Two years ago Fortune magazine identified me as the best networker on LinkedIn; this in turn led to some wonderful…

    187 条评论

社区洞察

其他会员也浏览了