登录查看更多内容

2016 - The Year of Fast Data

Rishi Yadav

Founder & CEO at Roost.ai

发布日期: 2016年1月11日

A lot of technologies change so fast that sometimes the name given to them becomes a misnomer. Big data is one such technology. It's no longer big but fast. Most of the enterprises do not have petabytes of data but they have data which moves very fast. In other words out of volume, velocity, variety of big data, the volume part has become predominant.

A glimpse into the history

Big data started with the advent of Hadoop by Doug Cutting and then it's subsequent growth at Yahoo. It was based on papers Google File System and MapReduce by Google ( a web-scale company). It grew at Yahoo (another web-scale company).

Facebook (yet another web-scale company) created Hive so that Hadoop can be accessed using SQL like interface. LinkedIn and Netflix (both web-scale companies) have also contributed to the eco-system in various ways.

While these web-scale companies provided perfect lab environments, few startups took the charge of both implementing and contributing to the open-source effort. Most notable ones being Cloudera, MapR and Hortonworks.

Transition from web-scale to enterprise play

For big data technologies specifically Hadoop to have wider adoption, it had to meet enterprise needs. Enterprises are nowhere like web-scale companies. This led to a long period in which enterprises had split-brain scenario for big data adoption. One part of the company full of geeks would be excited by the promise of big data, arrange some funding and start playing with it in it's own silo. Rest of the company would not touch it as it did not look enterprise-ready.

Enterprises have some cross-cutting concerns like security and data governance. These cross-cutting concerns though important, were not really a stumbling block. The stumbling block was latency, Hadoop was just too slow. Enterprises got used to low latency experience provided by database technologies. They wanted the same experience.

It all changed with memory becoming first-class storage. SAP could see it coming when they introduced SAP HANA around 5 years back for enterprise data. For the big data eco-system Apache Spark has done the same. Spark uses memory in the commodity slave nodes both for storage as well as compute. Spark works well with Hadoop as underlying storage layer but is not limited to it.

Though 2015 was the year a lot of companies moved their Hadoop workloads to Spark, most of these workloads were still batch ETL/ELT type loads.

2016: I want it all and I want it now

As companies saw power of Spark and in-memory analytics. They got more and more impatient (in a good way). In 2016 they want all workloads to be handled (which Spark does well with a comprehensive suite of libraries) and they want to get answers from all of them now (low latency/near realtime). This has made streaming front and center of big data game.

Another trend which has fueled the need for streaming is IoT. The evolution in IoT technologies (IoT = sensor data) has created unlimited number of use cases of streaming. Now you can do (near) real-time analytics on data coming from programmable logic controllers (PLCs), airplanes, offshore drilling platforms etc.

Regarding streaming for IoT data there are plenty of opportunities to leverage the stream of IIoT (Industrial IoT) data and run monitoring rules against in real time opening new opportunities to bring value to our customers (plant floor, manufacturing, etc.). I see an evolution where features like security, multi-tenancy for the data and a high level UI to configure the rules will move the intelligence to the crowd (crowdsourcing analytics for the 'fast torrential stream")

Juan Asenjo, Principal Engineer and IoT Evangelist, Rockwell Automation

Enterprise systems are no way behind. For two decades for analytics enterprise data (which in reality is streaming e.g. point-of-sale data) was transformed into cubes and processed using data warehousing systems. This batching and slowing of data was needed as the technology was not capable. Now with in-memory analytics, all this data can directly be processed in memory and results produced in real-time.

Summary

Looking at crystal ball for 2016. I see following trends clearly emerging

Memory being first-class storage. All data lending in memory first and then pipelined for various needs.
Streaming data use cases coming at accelerating pace. I want it all and I want it now.
Traditional data warehouses being disrupted and slowly replaced by in-memory analytics layer.
Big data integration (I love to call it plumbing) becoming primary consulting play.

We InfoObjects with bleeding edge in Big Data see a consistent pattern emerging. This consistency is good for both us and the clients and this is creating a new design pattern for streaming analytics. In fact in 2015 a designer pattern called Kappa architecture became very famous which now has taken a backseat to streaming-first and streaming-only approach.

Adel Bekhiet, MBA, PMP

Sr. Director, Infrastructure and Cloud Services at Northwestern Mutual

6 年

Well done Rishi! The fast data war has started already and we all have to be part of it soon!

1 次回应

Pritpal S.

Hands-on Gen AI Open AI Architect/Cloud Leader, Sr. Cloud Program Manager, Cloud Architect, Senior Manager, Customer Success Manager, AI evangelist/consultant, Senior Technical Program Manager

9 年

Great One Rishi!!!

Kranthi Meka

9 年

Rishi, like the clarity and phrases. Keep shedding the light.

John Furrier

Cofounder & CEO of SiliconANGLE Media; Executive Editor SiliconANGLE.com and Host of @theCUBE

9 年

Fast data is the key to the value in the data

Juan Asenjo

VP, IoT and Data Analytics @ Zoetic Global

9 年

Well done Rishi. Regarding streaming for IoT data there are plenty of opportunities to leverage the stream of IIoT (Industrial IoT) data and run monitoring rules against in real time opening new opportunities to bring value to our customers (plant floor, manufacturing, etc.). I see an evolution where features like security, multi-tenancy for the data and a high level UI to configure the rules will move the intelligence to the crowd (crowdsourcing analytics for the 'fast torrential stream")

查看更多评论

要查看或添加评论，请登录

Rishi Yadav的更多文章

#206: GPT-4.5 Gets Emotional—And It’s a Game Changer

2025年2月28日

#206: GPT-4.5 Gets Emotional—And It’s a Game Changer

Summary for the Impatient: Human communication naturally embeds emotional hints, humor, and implied meanings—areas…

3 条评论
#205 When AI Agents Talk Shop, Humans Need Not Intrude!

2025年2月26日

#205 When AI Agents Talk Shop, Humans Need Not Intrude!

Summary for the Impatient: AI agents naturally communicate more efficiently when not restricted to human language…

3 条评论
#204 From Majorana Zero to Majorana 1: Microsoft's Quantum Leap

2025年2月21日

#204 From Majorana Zero to Majorana 1: Microsoft's Quantum Leap

Summary for the Impatient: Physics often blends established facts with theoretical "fictions," highlighting a…
#203: DeepSeek's Disruption: Turning AI into a Commodity

2025年1月27日

#203: DeepSeek's Disruption: Turning AI into a Commodity

After dissecting DeepSeek’s “Sputnik Shock” in Newsletter #202, it’s time to explore how they’re fundamentally…

5 条评论
#202 DeepSeek’s Sputnik Shock: Innovation We Admire, From a Rival We Fear

2025年1月25日

#202 DeepSeek’s Sputnik Shock: Innovation We Admire, From a Rival We Fear

We all celebrate progress and innovation, yet our deeply ingrained tribal instincts inevitably color our perception of…

2 条评论
#201 The Year of Agents

2025年1月10日

#201 The Year of Agents

I often tell people that in a solar year, only four days hold true significance: the two solstices and the two…

5 条评论
#200 Attention Wars – The Digital Gilded Age and Our New Servitude

2024年11月28日

#200 Attention Wars – The Digital Gilded Age and Our New Servitude

previous edition: 3 keys to clarity in gen AI Over the past decade, a striking irony has emerged: as humans become…

2 条评论
#199 Unlocking Generative AI: The 3 Keys to Clarity

2024年11月24日

#199 Unlocking Generative AI: The 3 Keys to Clarity

Generative AI is transforming our world at an exhilarating pace. Every day brings new frameworks, fresh jargon, and…

5 条评论
#198 Beyond the First Killer App: Generative AI and the GPT Legacy

2024年11月22日

#198 Beyond the First Killer App: Generative AI and the GPT Legacy

Generative AI is sometimes criticized as a "solution in search of a problem". There is nothing fundamentally wrong here.

3 条评论
#197 LLMs Are Hitting Scaling Limits—But Who Cares?

2024年11月21日

#197 LLMs Are Hitting Scaling Limits—But Who Cares?

Scaling has always been more than just a buzzword in the tech industry—it's been the driving force behind innovation…

See all articles

2016 - The Year of Fast Data

Rishi Yadav

Founder & CEO at Roost.ai

A glimpse into the history

Transition from web-scale to enterprise play

2016: I want it all and I want it now

Summary

Rishi Yadav的更多文章

社区洞察

其他会员也浏览了

The Evolution of Big Data Technologies

The Data Value Chain: Redefined

BIG DATA IN LITTLE SPACES: HADOOP AND SPARK AT THE EDGE

Era of Big Data has come to an end

Evolution of data tech stack

Top 10 Big Data Trends for 2017

Apache Parquet: The Modern Solution for Efficient Data Storage and Processing

Today we going to explore how to use table formats in Delta Lake

Future of Big Data in 2018!

Future of Big Data in 2018!

A glimpse into the history

Transition from web-scale to enterprise play

2016: I want it all and I want it now

Summary

Rishi Yadav的更多文章

#206: GPT-4.5 Gets Emotional—And It’s a Game Changer

#205 When AI Agents Talk Shop, Humans Need Not Intrude!

#204 From Majorana Zero to Majorana 1: Microsoft's Quantum Leap

#203: DeepSeek's Disruption: Turning AI into a Commodity

#202 DeepSeek’s Sputnik Shock: Innovation We Admire, From a Rival We Fear

#201 The Year of Agents

#200 Attention Wars – The Digital Gilded Age and Our New Servitude

#199 Unlocking Generative AI: The 3 Keys to Clarity

#198 Beyond the First Killer App: Generative AI and the GPT Legacy

#197 LLMs Are Hitting Scaling Limits—But Who Cares?

社区洞察

其他会员也浏览了

The Evolution of Big Data Technologies

The Data Value Chain: Redefined

BIG DATA IN LITTLE SPACES: HADOOP AND SPARK AT THE EDGE

Era of Big Data has come to an end

Evolution of data tech stack

Top 10 Big Data Trends for 2017

Apache Parquet: The Modern Solution for Efficient Data Storage and Processing

Today we going to explore how to use table formats in Delta Lake

Future of Big Data in 2018!

Future of Big Data in 2018!