The Data and The Story

The Data and The Story

Oscar Wilde wrote a short story called The Nightingale and the Rose. Read it if you have time - my precis below does it little justice. What this has to do with data will become clearer soon.

'She said that she would dance with me if I brought her red roses,' cried the young Student; 'but in all my garden there is no red rose.'
    'No red rose in all my garden!' he cried, and his beautiful eyes filled with tears.

The Nightingale hears him and feels for him. It scours the gardens for a red rose bush but finds only a barren one, old and withered. The bush tells the Nightingale she will have to sing the entire night to create a rose and will have to colour it red with her heart's blood. She sings the entire night of the elements of love, what makes it grow and how it reaches its zenith in sacrifice. Before the night is over, the Nightingale is finished and the Red Rose is complete. The student takes it to the Professor's daughter, who replies that it won't go with her dress and the student's richer rival is gifting her jewellery instead. Frustrated, the student throws the Rose in the street and concludes...

    'What a silly thing Love is,' said the Student as he walked away. 'It is not half as useful as Logic, for it does not prove anything, and it is always telling one of things that are not going to happen, and making one believe things that are not true. In fact, it is quite unpractical, and, as in this age to be practical is everything, I shall go back to Philosophy and study Metaphysics.'
    So he returned to his room and pulled out a great dusty book, and began to read.

If you read the full story, both the Nightingale and the Student have ALL the DATA about what Love is. But only one of them is able to understand the STORY, the why and the how of Love from it.

Economist has called data the new oil.

Words shape thoughts.

Since data is the new oil, the world has rushed off to explore the depths of their data stores, from which shall be produced crude data, which they will refine, by distilling insights, each of which can be pumped as fuel into strategies or even be directly monetized.

This is Student thinking which results in using A/B testing for the minutest of decisions. This kind of thinking results in the straight selling of data because you don't know what to do with it, but people keep insisting it has value.

The Nightingale's sisters reside in the unlikeliest of lands, the land of science.

In science, data takes us, step by step, towards a story, a full explanation of what is, beyond a simple observation of correlations in data. This is not to underestimate the importance of the correlation, because it is the first step and requires painstaking, disciplined work. But, if we miss the story, there is work that remains undone, truth unrevealed, and better courses of action left untraversed.

Source: Wikipedia

In the first stage of development of a theory, we observe macroscopic variables. Data points us to the relationships between them. Boyle's law or the claim that at constant temperature and mass, the pressure and volume of an incompressible gas are inversely related is such an observation that data brings forth.


Over time, we accumulate lots of these theories, purely by observation of data

Boyle's Law PV = const if (m, T constant)

Charle's Law V/T = const if (m, P constant)

Gay Lussac's Law P/T = const if (V, m constant)

which gave what may have been the Unified Theory of Gases, the

Combined Gas Law PV/T = const if (m constant)

The further observation of

Avogadro's Law V/n = const if (P, T constant)

and full combination of these brought us the

Ideal Gas Law PV = nRT

which related everything that could be measured about a gas in widespread conditions.

However, all of this is still data being related to data by observation. The STORY here came from the kinetic theory of molecular motion. This theory makes a very small set of simplifying assumptions and then applies classical physics to explain the mechanism of why and how the Ideal Gas Law would come about.

You can keep observing the variation in click through rates by the variation of shades of blue, and you will certainly find the best shade of blue to use. But you may completely miss the STORY, the mechanism that explains the why and how of variations in user behavior. Once you understand the mechanism, you may have much better results with a different colour or indeed with a completely different scheme.

But, isn't this a very difficult ask? Time and budget is limited. Running experiments on blues is doable. How will thinking about theories help us to transcend the blue neighbourhood and make fundamental deductions about other colours or other schemes without having to run the same A/B tests in combinatorially exploding numbers?

Gradient descent works, after all. Yes, one must make some jumps, but is there such a thing as an informed jump?

That's where human ingenuity and Occam's razor save the day. Make small assumptions, make simplifying assumptions and you will often stumble upon the mechanisms. Here is an example of that.

In 1854, there was a severe outbreak of Cholera in Soho in London. Two competing theories were offered about the causes of Cholera - the miasma theory which posited that some sort of airborne foul substance spread the cholera from person to person. The germ theory posited that some sort of waterborne self-reproducing entity (this is before Louis Pasteur's work) was spreading the disease.

No alt text provided for this image

John Snow (sic!) was able to trace the source of the 1854 cholera outbreak by mapping the frequency of cholera occurrence as a function of location in the map beside. The source turned out to be a handpump at the intersection of Broad Street and Cambridge. The pump was disabled by removing its handle and the disease rapidly declined. Fastidious researcher that Snow was, he admits the possibility the disease may already have been in decline by the flight of people from the area. But the whole sequence showed that the mechanism of the disease spread was the water, not the air. With this, he was able to change the fate of several areas by simply changing the source of water. Without the STORY and the mechanism, this would not have been possible. The mere observation that the outbreak was centered around the handpump would not have led to any actions to be taken in other areas where there were outbreaks.

One of the two theories of disease transmission turned out to be true. Why did researchers limit themselves to these two possible theories? This is Occam's razor in action. The choice between the two was made by whichever had fewer inconsistencies with observed data.

Data is useful to illuminate the path, but keep following the path to find the full story.

Himanshu Nautiyal

Chief Product Officer @ Fractal.ai

5 年

David Duvenaud, an assistant professor in the same department as Hinton at the University of Toronto, says deep learning has been somewhat like engineering before physics. “Someone writes a paper and says, ‘I made this bridge and it stood up!’ Another guy has a paper: ‘I made this bridge and it fell down—but then I added pillars, and then it stayed up.’ Then pillars are a hot new thing. Someone comes up with arches, and it’s like, ‘Arches are great!’” With physics, he says, “you can actually understand what’s going to work and why.” Only recently, he says, have we begun to move into that phase of actual understanding with artificial intelligence.

You have really hit the nail on the head. The real challenge in a large volume of data is to find the story to tell. Does exploratory data analysis provide some pointers in this direction?

回复
Himanshu Nautiyal

Chief Product Officer @ Fractal.ai

5 年

To be clear... I am saying you must FIND the story in the data, not that you must CREATE one that advances your goals. The latter is the purview of marketing and not covered by this article.

回复
Kaustubh Patekar?

Product, Strategy, GTM, Venture Operator | MIT, IIT Bombay - Aerospace Engg | Mentor NASSCOM DeepTechClub

5 年

Nice article Himanshu Nautiyal. Loved the opening. Science and maths are only one half of the solution. The attitudes which encourage - an open mind and learning or discourage - a closed mind and ignoring shape the use of data. Certainly storytelling is very powerful and can be far more effective at affecting, if not changing attitudes.

Pure data is bland, but story telling is spicy, a spin scarcely needs accurate data or in other words, data can be moulded to fit the story of convenience. especially when the real and virtual words are blurring.?

要查看或添加评论,请登录

Himanshu Nautiyal的更多文章

  • Hardworking Ants

    Hardworking Ants

    "Let me give you another instructive instance to graphically illustrate this sham co-operation between ants. Figure 13…

  • Integrate to Differentiate

    Integrate to Differentiate

    Many ecommerce business sell parts of a solution (individual SKUs/services) conveniently, quickly and cheaply. Services…

  • Far from the Madding Crowd

    Far from the Madding Crowd

    I came across an IITian who has run and sold startups in the US and now set up a 10-person AI services firm in a…

    2 条评论
  • How I screwed up my biggest deal

    How I screwed up my biggest deal

    2006. I had a termsheet after ~2 years.

    12 条评论
  • Un-disruptive innovation (HCL, IBM)

    Un-disruptive innovation (HCL, IBM)

    "HCL Technologies (HCLT.NS) will buy some software assets from U.

    2 条评论
  • How I came fourth at the JEE

    How I came fourth at the JEE

    If you had asked my mother, she would say it was because I overlooked one question in the Chemistry paper. If you asked…

    32 条评论
  • Show Up, Stick Around

    Show Up, Stick Around

    The feeling of delivering a drastic improvement in revenue or cost for a client is a thrill. Having a near-perfect…

    1 条评论
  • How I returned my angels' money

    How I returned my angels' money

    I was in Delhi yesterday to observe the first Shraaddha after my mother’s passing. This is a ceremony that commemorates…

    5 条评论
  • AI for retailers: Chatbots aren't all

    AI for retailers: Chatbots aren't all

    Customer engagement, Customer service, Chatbots, Social media, Discovery and Customer Experience are hygiene factors…

  • Use The Bots

    Use The Bots

    McKinsey believes jobs commanding total wages of $15 Trillion have significant automatable components. While low-skill,…