登录查看更多内容

Data Vectorization [Series#2: I am Data!]

Mustafa Qizilbash

‘Open for New Opportunities (Globally) Author & Podcaster of “Let’s Talk About Data!”, Data & AI Practitioner & CDMP Certified, Innovator of DAC Architecture & PVP Approach

发布日期: 2022年4月9日

Data Vectorization is something very common now a days specially since the inception of Big Data or Hadoop. It’s not like, it was not in-use in past, but I would say it was not famous.

All of us must have heard about MPP (Massive Parallel Processing) right? But do we know how it work at back end? Data Vectorization is about enabling parallel processing to fetch data.

Let’s decode it…..

There are four types of instructions to pull data i.e., SISD, SIMD, MISD and MIMD (all explained in separate topic). In this topic, we will be referring to SISD and SIMD only.

Traditional, a computer or machine or a server works in SISD mode i.e., Single Instruction and Single Data, means each instruction fetch required data one by one.

领英推荐

Musings on Data, Part 1: lakes, houses, clouds, etc.

Renu Tewari 1 个月前

Big Data and Hadoop

Vedant Kakde 4 年前

How big MNC's are Store,manage,manipulate Data by…

Devendra Kanade 4 年前

In Data Vectorization, we change our approach to MPP mode i.e., computer or machine or server starts working in SIMD mode i.e., Single Instructure and Multiple Data, mean, if one query is executed and data is residing in multiple data nodes, data from all nodes will be pulled in parallel making computation must faster as compared to SISD.

Data Vectorization has become key component of any data solution specially since Hadoop, No SQL databases and Cloud has surfaced. Now most of the databases are Data Vectorization or MPP enabled.

Question is, why Data Vectorization is so important? Please note, in Big Data era since social media, CCTV, Audio etc., kinds of datasets are also able to produce valuable insights, organizations has started to store and utilize those. But to utilize one must process those. To process such kind of huge datasets, SISD was not a suitable technical as processing Gigabyte and Terabytes of data in sequential mode would take days to process so SIMD or MPP or Data Vectorization has been a chosen technique which could process data in massively parallel mode making computation 100s of times faster as compared to SISD.

Cheers.

KAMRAN AHMED

I am an Enterprise Data Management, Data Governance, Data Modeling Experienced Professional | As a Team Leader, I ensure the highest data quality, security, and compliance standards.

2 年

Well explained Data Vectorization: I understood this topic in data science where python and R have vector-type variables. The implementation in Databases makes a real difference. I believe in-memory databases incorporate the storage or table space in vector form, making a real difference when we retrieve or summarize a massive amount of data.

1 次回应

要查看或添加评论，请登录

Mustafa Qizilbash的更多文章

A Lifecycle Framework for Evaluating and Decommissioning Data?Products

2025年2月26日

A Lifecycle Framework for Evaluating and Decommissioning Data?Products

A structured lifecycle approach ensures efficiency, accountability, and minimal disruption when evaluating and retiring…

2 条评论
Types of Data Products to Decommission

2025年2月24日

Types of Data Products to Decommission

Not all data products remain valuable indefinitely. As businesses evolve, certain data assets become obsolete…
The Need for Evaluating and Decommissioning Data Products

2025年2月24日

The Need for Evaluating and Decommissioning Data Products

1. The Challenge of Data Product Sprawl Organizations tend to accumulate numerous data products over time for several…

4 条评论
Impact & Governance

2025年2月18日

Impact & Governance

As organizations strive to become data-driven, the ability to measure, govern, and optimize data initiatives is…

2 条评论
Decision-Making Context or Data?Story

2025年2月16日

Decision-Making Context or Data?Story

Data, in its raw form, is just a collection of facts. It’s the story we weave around that data that transforms it into…

1 条评论
Data Product Lifecycle & Problem-Solving Focus

2025年2月13日

Data Product Lifecycle & Problem-Solving Focus

In today’s data-driven landscape, the role of a Chief Data Officer (CDO) extends beyond governance and compliance. A…

3 条评论
The Evolving Landscape of Data Practice [a landscape CDO Owns]

2025年2月6日

The Evolving Landscape of Data Practice [a landscape CDO Owns]

In today’s rapidly evolving digital landscape, organizations face increasing challenges in building scalable…

2 条评论
Foundation of Data Practice (Supporting the Four 4s Framework)

2025年2月5日

Foundation of Data Practice (Supporting the Four 4s Framework)

The Four 4s framework provides a structured approach for data teams to define strategy, execute initiatives, and drive…

6 条评论
Four 4s (4x4x4x4) Formula: Structuring the Data Practice in Organizations

2025年2月3日

Four 4s (4x4x4x4) Formula: Structuring the Data Practice in Organizations

The Four 4s (4x4x4x4) Formula provides a structured approach for organizations to build and manage their data practice…

5 条评论
?? The Dawn of Photonic Quantum Computing??

2025年1月22日

?? The Dawn of Photonic Quantum Computing??

?? Japan has unveiled the world’s first general-purpose photonic quantum computer, a revolutionary leap in the field of…

4 条评论

See all articles

Data Vectorization [Series#2: I am Data!]

Mustafa Qizilbash

‘Open for New Opportunities (Globally) Author & Podcaster of “Let’s Talk About Data!”, Data & AI Practitioner & CDMP Certified, Innovator of DAC Architecture & PVP Approach

领英推荐

Mustafa Qizilbash的更多文章

社区洞察

其他会员也浏览了

Expectations from Data – Part 6

Data Technology Trend #0: Foundational

BIG DATA - HADOOP

"parquet format" Is it really Observability's Holy Grail in Storage ??

!! Big Data Concept with Distributed Storage Cluster and Hadoop !!

How to Properly Handle Updates and Deletes in Your Glue Hudi Spark Jobs When Working with CDC Data: Utilizing the _hoodie_is_deleted Flag

Harnessing the Power of Big Data

Handling The Huge Data

Big Data & Hadoop

Big data Technical whitepaper

领英推荐

Mustafa Qizilbash的更多文章

A Lifecycle Framework for Evaluating and Decommissioning Data?Products

Types of Data Products to Decommission

The Need for Evaluating and Decommissioning Data Products

Impact & Governance

Decision-Making Context or Data?Story

Data Product Lifecycle & Problem-Solving Focus

The Evolving Landscape of Data Practice [a landscape CDO Owns]

Foundation of Data Practice (Supporting the Four 4s Framework)

Four 4s (4x4x4x4) Formula: Structuring the Data Practice in Organizations

?? The Dawn of Photonic Quantum Computing??

社区洞察

其他会员也浏览了

Expectations from Data – Part 6

Data Technology Trend #0: Foundational

BIG DATA - HADOOP

"parquet format" Is it really Observability's Holy Grail in Storage ??

!! Big Data Concept with Distributed Storage Cluster and Hadoop !!

How to Properly Handle Updates and Deletes in Your Glue Hudi Spark Jobs When Working with CDC Data: Utilizing the _hoodie_is_deleted Flag

Harnessing the Power of Big Data

Handling The Huge Data

Big Data & Hadoop

Big data Technical whitepaper