登录查看更多内容

RICH CONTENT POOR CONTENT

Ashutosh Trivedi

Skip MP4 Hassles, Query Videos Directly with VideoDB.

发布日期: 2018年7月5日

I am a data scientist, and as data scientists our job is to find information in data to help businesses take better decisions. This means finding patterns using a special class of algorithms called Machine Learning. We also create intelligent applications with more decision making power by understanding patterns in this data.

To create such intelligent systems, we quickly have to become demanding when it comes to data — it should have the structure suited for mining information. It should be possible to transform(vectorize) it into a form where machine learning algorithms could be applied.

But most importantly, there should be plenty of it. Should be possible. Thankfully, humanity generates ample amount of digital data everyday. We write, talk, create and capture moments in images and videos. We leave our digital footprints everywhere. Data is usually generated in two forms- Digital Footprints and Content. Digital footprints are data points stored by any applications about you as a user. For e.g., Amazon is collecting your click patterns on their website to know your preferences. But, I want to discuss data in from of content here.

If we look at how we collect and store content, it’s governed by two major factors:

1. INFRASTRUCTURE

Operating systems stores data in form of files. Taking the example of an audio — Operating systems (OS) stores them in files with mp3, wav, and many other types of codecs( a technical term for encoder-decoder ). OS needs encoding to store the sound information in files and later needs a way to decode it to generate sound. Encoding is a bridge between sound card and the OS and decoding is the bridge between OS and the human.

2. CONSUMER BEHAVIOR

What people want to do with the data also determines how it is stored. People wanted editable documents so we got .doc and .txt files. There were certain documents that did not need editing — think legal docs — so we got pdfs. Even early stage html, which was primarily created to put text data on web pages. For media, we have countless audio & video formats. Lossless encodings, small sized compressed formats, HD formats etc.

I believe we have missed or ignored another very important factor over the years — Business Intelligence. Shouldn’t the data be stored in a way that can get maximum business intelligence from it?

3. INTELLIGENCE — THE THIRD MISSING FACTOR

We have largely ignored Intelligence as a factor of data storage due to lack of AI technologies. With no AI technologies in sight, storing data became the end goal. But now things are changing — AI technologies are improving exponentially meaning storing data is just the first step. Data is the digital fuel that powers intelligence to benefit both businesses and their end customers.

Text data is comparatively easy to manage and get some intelligence from. It is searchable, indexable and can be understood by machine learning algorithms — but what about media? They are still the big, fat files that can’t be easily searched or indexed and machine learning algorithms find them notoriously hard to understand. These media files are optimized for long form entertainment but that is not really the only purpose of that content?

CONTENT IS NOT JUST ENTERTAINMENT

In media you have voice information (both in audio and video) which is not just a source of entertainment, but a source information which is never mined for business intelligence. This content is just sitting there, dark & inaccessible inside big fat media files.

Voice content living in interviews, podcasts, conference videos, customer calls, meetings are valuable for business. Such content is under-utilized or un-utilized and we are all at a big loss as a result.

To get intelligence from this information, we need to convert this data to a form where it is understood by ML algorithms. Not every data we store has this property. For example audio files — mp3, wav, FLAC and various other formats, are they suited for mining information? No, they are well suited for consumption. A dumb media.

Currently, the only way to get some intelligence from this media to mine information is to either hire bunch of data scientists, use cloud ML services or hire services that tag the data manually.

But what if we flip this? — The infrastructure itself becomes intelligent. As a data scientist, I am exploring alternatives here. Can we have some business first data formats?

A better term would be — “intelligence first” content — a kind of rich media.

If we want mass adoption of AI technologies and our systems to perform intelligently, we should push intelligence one level below to infrastructure.

This will also help in fighting one of the biggest challenges of current AI economy — Centralization.

AI CENTRALIZATION

Content in the form of text, images, videos, audios contains huge amount of information and converting this data to a form which can be analyzed is hard and expensive. We are seeing some success with images by reading their pixel values and using ConvNets.

We are also seeing some success with machine understanding of natural languages, english in particular. Advancement in speech-to-text conversion is making it possible to find & organize information in audio & videos.

But only a few organizations have capabilities to mine information in media content. It is very very expensive. Hence we are seeing chronic centralization of AI power.

Google, Amazon, Microsoft are using their infrastructure, money and access to acquire huge amount of data to build intelligence. Later, they provide it in their cloud offerings for others to build intelligent applications. This is good for them but not for the rest of us.

Bringing intelligence of AI technologies to the infrastructure level can break this centralization of power.

Our entire team at Spext, envision an internet where internet components are themselves smart and businesses can focus on use cases and distributions. The biggest component we can see is, data itself.

NEW APPLICATIONS

Intelligent data formats will not just help businesses, they will also tap into new user behaviors, creating new markets. The amazing adoption rate of Alexa is exciting. Is your website ready for interaction with Alexa? Is your media smart enough to interact with bots?

When I discussed these ideas with one of my friends, he jokingly described intelligent media as socialists data formats. A complete opposites of AI capitalism by big bros. Bringing intelligence to the infrastructure level means equal sharing of the cost as well.

In this coming series I will discuss, how we at Spext are looking at it. We are set to innovate in the space of smart media, and firm believer of this idea. For now, we see it the only way to unlock value from media content at scale.

The time for intelligent content is here and this is Day 0.

Sumit Singh Chauhan

Director (Data Science) @ Entropik

6 年

This is a real good explanation. Rich Media can make the whole world of computing way more interesting. If our videos, pictures and audio files have some amount of intelligence embedded in them, a new world of creative applications will emerge.

4 次回应

Mittal Shah

6 年

Nice read, gave a good understanding

1 次回应

Brendan Usher

Director at Logical Line Marking

6 年

I achieved some real clarity after this reading - thanks for sharing.

2 次回应

Nitesh Gawade

PURPOSE COACH | SPEAKER | COMMUNITY CREATOR

6 年

Superb Ashutosh Trivedi for explaining in a lucid form

1 次回应

查看更多评论

要查看或添加评论，请登录

Ashutosh Trivedi的更多文章

From Language Models to World Models: The Next Frontier in AI

2024年3月20日

From Language Models to World Models: The Next Frontier in AI

Since beginning my journey in Natural Language Processing (NLP) in 2013, I've witnessed its remarkable transformation…
Emergence - An Intelligence of the collective

2018年8月29日

Emergence - An Intelligence of the collective

Societies are very common in nature. I am sure you might have noticed the coordinated flying behavior of a flock of…

1 条评论
Autonomy - Do we have the choice?

2018年8月17日

Autonomy - Do we have the choice?

Why it is hard to take some decisions for humans? Whenever we have to take a complex decisions we have to deal with…
Society of Machines

2018年8月14日

Society of Machines

Society - Group of people living together, collaborating, competing and conflicting. Look around yourself, you might be…

2 条评论
BUILDING INTELLIGENT MACHINES (PART 3)

2018年6月25日

BUILDING INTELLIGENT MACHINES (PART 3)

In part 2, we discussed human decision making and understood the principle of “Observe and respond” – humans observe…

2 条评论
BUILDING INTELLIGENT MACHINES (PART 2)

2018年6月20日

BUILDING INTELLIGENT MACHINES (PART 2)

In Part 1 of Building Intelligent Machines, we discussed that we can define Intelligence as a measure of magnitude of…
BUILDING INTELLIGENT MACHINES (PART 1)

2018年6月19日

BUILDING INTELLIGENT MACHINES (PART 1)

Imagine a future where you are not stuck in traffic because machines are driving the vehicles, not humans. A future…

See all articles

RICH CONTENT POOR CONTENT

Ashutosh Trivedi

Skip MP4 Hassles, Query Videos Directly with VideoDB.

1. INFRASTRUCTURE

Ashutosh Trivedi的更多文章

社区洞察

其他会员也浏览了

A Complete Guide to Creating and Storing Vector Embeddings!

How Enterprise Data Observability will make the most of your Shiny New Vector Databases

The Secret Ingredient to Smarter AI? It’s All in the Data!

Effective Data Chunking Strategies for the RAG

Vector Databases vs. Knowledge Graphs: Choosing the Right Foundation for Retrieval-Augmented Generation

10 (free) AI tools for data science

Master the Future of AI: Exciting Updates and Resources

How Vector Databases and Embeddings Power?AI

1. INFRASTRUCTURE

Ashutosh Trivedi的更多文章

From Language Models to World Models: The Next Frontier in AI

Emergence - An Intelligence of the collective

Autonomy - Do we have the choice?

Society of Machines

BUILDING INTELLIGENT MACHINES (PART 3)

BUILDING INTELLIGENT MACHINES (PART 2)

BUILDING INTELLIGENT MACHINES (PART 1)

社区洞察

其他会员也浏览了

A Complete Guide to Creating and Storing Vector Embeddings!

How Enterprise Data Observability will make the most of your Shiny New Vector Databases

The Secret Ingredient to Smarter AI? It’s All in the Data!

Effective Data Chunking Strategies for the RAG

Vector Databases vs. Knowledge Graphs: Choosing the Right Foundation for Retrieval-Augmented Generation

10 (free) AI tools for data science

Master the Future of AI: Exciting Updates and Resources

How Vector Databases and Embeddings Power?AI