登录查看更多内容

Distributed Analytics: Enabling More Insightful Solutions

Chuck Freedman

Growing Productive & Successful Developer Communities through Advocacy, Partnerships, and Relationships; Director of Developer Advocacy and Enablement @ MongoDB

发布日期: 2016年11月11日

At the recent Intel Analytics Summit, our first panel discussion featured a conversation on distributed analytics. This panel offered a lively look at topics that are close to the heart of anyone focused on extracting value from big data. While the topics were wide-ranging—from foundational technologies to applications like machine learning, the centerpiece of the discussion was a deep dive into the advantages of distributed analytics.

So what is distributed analytics? At the most basic level, distributed analytics spreads data analysis workloads over multiple nodes in a cluster of servers, rather than asking a single node to tackle a big problem. The same algorithms run across each of the nodes, processing a subset of the data. When the processing concludes, the data sets are aggregated, or brought back together, to generate collective insights.

The advantage of the distributed approach boils down to a simple concept: faster time to insight. By putting multiple nodes to work on the same problem, you can gain insights from data much more quickly than would be possible when running an analytics application on a single node with linear processing. This is a huge benefit when you want to get fast answers from massive amounts of data.

Let’s take an example. To improve patient safety and avoid costly patient readmissions, a hospital might want to put distributed analytics to work to compare a patient’s vital signs and symptoms with the records of thousands of other patients who have experienced a similar health issue, including those who were released and later readmitted. It wouldn’t be feasible to do this on a single node, even a very large one, because the results would come back way too slowly to help the clinical staff make timely decisions on releasing the patient. By distributing the workload of processing all relevant data, insights can be obtained fast enough to support this kind of solution, advising the hospital staff whether it is safe to release the patient in a matter of minutes.

At a technology level, the time is ripe for distributed analytics. It is a natural complement to the Apache Hadoop and its distributed file system (HDFS), which many organizations use as a repository for large amounts of data. By design, HDFS spreads data over different nodes, which makes it relatively easy to plug in a distributed analytics application.

Distributed analytics is also a natural fit with the popular Apache Spark processing engine, which is often paired with a Hadoop environment. Spark includes built-in modules for data streaming, SQL, machine learning and graph processing. When you pair Spark with a distributed analytics application and a lot of processing power, you’re positioned to run analytics on data as it streams in, to generate insights and answers in near real time.

Both Hadoop and Spark are spearheaded by the Apache Foundation and enriched by code contributions from Intel and the broader analytics community. The code that Intel contributes to these projects helps application developers and data scientists take full advantage of the capabilities and performance of underlying Intel architecture.

To further boost performance, Intel’s contributions include the Intel? Data Analytics Acceleration Library (Intel? DAAL) and the Intel? Math Kernel Library (Intel? MKL). Intel DAAL provides highly tuned functions for deep learning, classical machine learning, and data analytics performance. Intel MKL provides highly optimized, threaded, and vectorized functions to increase performance on Intel processors. These optimization libraries are baked into related projects like the Trusted Analytics Platform (TAP), an Intel-initiated open-source platform that accelerates advanced analytics and machine learning solutions.

Ultimately, distributed analytics is an enabler of more advanced artificial intelligence (AI) solutions that need lightning fast responses from data processing engines. AI gives us the ability to extend the reach of analytics to encompass not just data but also images, video, facial expressions, human speech and other sources of insight.

Let’s close with a key takeaway from the distributed analytics panel discussion at the Intel Analytics Summit: Emerging solutions today aren’t just based on processing large amounts of data; it’s about leveraging computing performance and spreading across multiple machines or nodes. The practice of distributed analytics helps you capitalize on this data and can set you up to gain faster, more valuable insights.

For a closer look at Intel’s contributions to distributed analytics, including a range of resources for software developers, visit intel.com/machinelearning.

[This article was originally posted at https://itpeernetwork.intel.com/distributed-analytics-enabler-insightful-solutions/ on Nov 8, 2016.]

要查看或添加评论，请登录

Chuck Freedman的更多文章

Building Relationships in DevRel

2023年9月22日

Building Relationships in DevRel

Part 2: Relationships with our communities Continuing my series this year on DevRel relationships, I am proud to…

6 条评论
Building Relationships in DevRel

2023年6月7日

Building Relationships in DevRel

Part 1: Relationships with our colleagues In April, as I was set to post my fourth monthly DevRel article for the year,…

2 条评论
Great DevRel participation at a conference, as a sponsor

2023年4月1日

Great DevRel participation at a conference, as a sponsor

We were at the Game Developers Conference (GDC) last week in San Francisco. It was my first time in nearly four years…
Five Tools of Developer Advocacy

2023年3月1日

Five Tools of Developer Advocacy

Baseball was my first love. While I would eventually get immersed in most aspects of the sport, its history, and the…

7 条评论
Realizing Our Impact With Other People in What We Inspire

2023年2月1日

Realizing Our Impact With Other People in What We Inspire

I made a personal commitment for 2023 to write monthly, at least, on what I love doing. This month’s topic is about…

1 条评论

See all articles

Distributed Analytics: Enabling More Insightful Solutions

Chuck Freedman

Growing Productive & Successful Developer Communities through Advocacy, Partnerships, and Relationships; Director of Developer Advocacy and Enablement @ MongoDB

Chuck Freedman的更多文章

社区洞察

其他会员也浏览了

Data Science in the Real World: Strategies for Handling Large-Scale Data

Introducing GenSQL: Revolutionizing Database Management with Generative AI

What is Big Data? ??

Fueling Generative AI's Potential through Databases

Mastering Machine Learning Model Deployment: A Comprehensive Guide with Azure Services

??DATA Pill #101 - What Is a Streaming Database? Flink SQL: Misconfiguration, Misunderstanding, and Mishap

Pyspark Scenario based Realtime questions

Unleashing the Power of Data with Databricks

Insights for Navigating Generative AI: Megatrend Lessons Learned

Big Data Processing with PySpark in Databricks

Chuck Freedman的更多文章

Building Relationships in DevRel

Building Relationships in DevRel

Great DevRel participation at a conference, as a sponsor

Five Tools of Developer Advocacy

Realizing Our Impact With Other People in What We Inspire

社区洞察

其他会员也浏览了

Data Science in the Real World: Strategies for Handling Large-Scale Data

Introducing GenSQL: Revolutionizing Database Management with Generative AI

What is Big Data? ??

Fueling Generative AI's Potential through Databases

Mastering Machine Learning Model Deployment: A Comprehensive Guide with Azure Services

??DATA Pill #101 - What Is a Streaming Database? Flink SQL: Misconfiguration, Misunderstanding, and Mishap

Pyspark Scenario based Realtime questions

Unleashing the Power of Data with Databricks

Insights for Navigating Generative AI: Megatrend Lessons Learned

Big Data Processing with PySpark in Databricks