登录查看更多内容

Building Big Data Center of Excellence with IBM Cloud and Hadoop

Karan Sachdeva

IBM AWS Global Strategic Partnership Executive for AI @ IBM | NYU Stern MBA ‘27

发布日期: 2016年11月3日

I often ask customers what inhibits big data initiatives in their organization. Frequent answers include: no compelling business need, or difficulty identifying use cases; lack of data science skills; not enough staff to support them; and the complexity of collecting and managing the data. The concept of a center of excellence (CoE) for big data, which I attempt to demystify here, helps ensure these responses are not inhibitors in any organization.

The key to a data-driven business is in bringing data and insight to all workflows in the business and integrating it into the decision making at every step. This approach enables organizations to take advantage of the longitudinal analytics available with new technology advances such as Hadoop and Spark as well as machine learning for past-, present- and future-looking analytics simultaneously.

Defining big data centers of excellence

A big data CoE is a framework that takes an organization from zero knowledge to having a fully functional practice of Hadoop, Spark and emerging open source technologies to deliver robust business results. A CoE is where organizations identify new technologies, learn new skills and develop appropriate processes that are then deployed into the business to accelerate adoption.

A centralized big data CoE can be the bedrock for establishing a data-driven company that treats data as a strategic asset. The big data CoE can partner with the business to identify data that is invaluable, explore use cases that differentiate its products and services in the market and help jump-start the business with insights that can yield real-time client value. Data’s strategic importance is the value it represents for the business, but success with big data is not just about data. The people and the organization also play a vital role in that success:

A) Building big data success stories with Use Cases

In many cases, the business comes up with the use cases, but the CoE has the responsibility of facilitating this work. The CoE needs to assume a leadership role in understanding which applications and use cases can be driven with available sets of data sources. Sometimes businesses can be more proactive by bringing use cases to the CoE because the list of use cases can be overwhelming and put a strain on available resources. A transparent process for prioritizing these use cases is important and should be adopted. The CoE needs to prioritize use cases based on parameters such as ease of data availability, data quality, business revenue–based value and impact, costs and risks.

B) Applying agile methodology—the fail-fast approach

Agility and the ability to fail fast are essential to reaching the potential of big data. A lightweight agile process provides tools to deliver outcomes quickly and transparently, typically within two- to three-week sprints. The ability to fail fast is a key big data opportunity; business and technical roadmaps for delivering value need to change more often than in a traditional waterfall environment.

Data itself is also highly agile when it is collected in native form and transformed potentially many times to meet the needs of different use cases. Using the basic ideas of agile development methodology, a CoE can provide the leadership across the organization to ensure business users can quickly gain value from the data.

C) Developing financial models

At the heart of a big data CoE is creative financial models that support the innovation. The charge-back strategy can be a function of data as a service, insights as a service or analytics as a service.

As is often the case with shared services, a charge-back model is necessary to properly handle the maintenance and growth of the emerging technologies, which in this case can be Hadoop and Spark clusters. An organization needs to develop a charge-back model for the business units that will be engaging with the CoE for project, personnel, infrastructure and application resources. Some important questions need to be considered when determining the charge-back model for business units:

How many users will access the application and cluster?
How much data will be ingested initially?
How much data growth is expected over time?
What is the data retention policy?

Business leaders and decision makers acknowledge that creating a data-driven organization requires a change of culture. Big data CoEs can be the key to this culture change. An important recommendation for building a CoE framework is starting with a small, secure data lake—a Hadoop- or Spark-based service—that can store and process data from various internal groups to support multiple use cases. When building a data lake, organizations learn and employ operational best practices for a number of processes:

Cluster build out
Data exploration
Data ingestion and processing
Disaster recovery
General operations and maintenance
Hadoop and Spark development
Infrastructure integration
Model building and testing
Multitenancy and security
Third-party software evaluation and integration
Use-case evaluation

A leading telecommunications firm, for example, began by developing a CoE that asked each business division to come up with business use cases that would generate powerful insights through analytics. It then established regular training boot camps in which business users learned how to use data with self-service tools, and it created a community of data scientists and data engineers to support line-of-business managers in their analyses and to validate findings. As a result, this CoE enabled big data as a shared service that opened up the conversation for creative financial models that involve charge backs and show backs.

Leveraging big data centers of excellence

I foresee creative CoE adaptations such as the one just described helping businesses move beyond the hope of becoming a data-driven organization enabled by big data to the reality of an organization using a data-ingrained business model.

The article has been adapted from my original post at IBM Big Data Hub. The Big Data Hub is created and curated by IBM. It is the home for current content and conversation regarding big data and analytics for the enterprise from thought-leaders, subject matter experts and big data practitioners.

要查看或添加评论，请登录

Karan Sachdeva的更多文章

Finding the Right Role in the AI Era

2025年2月20日

Finding the Right Role in the AI Era

The rise of AI is transforming industries, reshaping business models, and creating new opportunities at an…

1 条评论
Who Is Responsible When AI Makes the Wrong Decision?

2025年2月17日

Who Is Responsible When AI Makes the Wrong Decision?

I was in London last week, meeting top executives in the AI space—leaders from enterprises, startups, and regulatory…

3 条评论
Agentic AI: Revolutionizing Business Operations

2025年1月26日

Agentic AI: Revolutionizing Business Operations

According to Gartner, by 2028, about 33% of enterprise software applications are expected to incorporate agentic AI, up…
2024: Moments that Matter

2025年1月6日

2024: Moments that Matter

To accomplish great things, we must not only act, but also dream, not only plan, but also believe.-Anatole France(poet,…

2 条评论
5 AI Skills to Master in 2025

2024年12月22日

5 AI Skills to Master in 2025

Artificial intelligence continues to reshape businesses across every industry. As we move toward 2025, the skill sets…

3 条评论
2025: Three Big Bets in Technology

2024年12月17日

2025: Three Big Bets in Technology

Much to my wife’s chagrin, I’ve always enjoyed putting a bet or two in casino.unlike most gamblers, I win a lot more…

2 条评论
Finding Your Voice in 2025

2024年12月16日

Finding Your Voice in 2025

“Speak your mind, even if your voice shakes” - Maggie Kuhn, American social activist. "There is no greater agony than…
5 Mind-Bending Use Cases of Generative AI

2024年11月24日

5 Mind-Bending Use Cases of Generative AI

Generative AI has quickly emerged as a transformative force, unlocking creative and operational possibilities across…
Open Source: The Unsung Hero of the Generative AI Revolution

2024年11月18日

Open Source: The Unsung Hero of the Generative AI Revolution

The generative AI revolution, a phenomenon that has transformed industries and redefined human-computer interaction…

1 条评论
5 Key Bets to Close Q4 Strong

2024年11月12日

5 Key Bets to Close Q4 Strong

As we approached the last quarter of the year, I found myself standing on the edge of both anticipation and reflection.…

1 条评论

See all articles

Building Big Data Center of Excellence with IBM Cloud and Hadoop

Karan Sachdeva

IBM AWS Global Strategic Partnership Executive for AI @ IBM | NYU Stern MBA ‘27

Defining big data centers of excellence

A) Building big data success stories with Use Cases

B) Applying agile methodology—the fail-fast approach

C) Developing financial models

Leveraging big data centers of excellence

Karan Sachdeva的更多文章

社区洞察

其他会员也浏览了

Simplifying Data Work with Amazon EMR and PySpark for Data Processing and Analysis

AWS Data Engineering Essentials Guidebook

Big Data Computation: Revolutionizing the Digital World

COMPONENTS OF AZURE DATA FACTORY

Big Data Architecture: Layers, Process, Benefits, Challenges.

Data Technology Trend #8: Data Next

How to Choose the Right Data Ingestion Service: AWS, Azure, GCP

Musings on Data, Part 1: lakes, houses, clouds, etc.

Building a Data-Driven Future: Part 2 - Six ELT Challenges Nobody Tells You

Real-Time Data Processing: Architectures and Tools

Defining big data centers of excellence

A) Building big data success stories with Use Cases

B) Applying agile methodology—the fail-fast approach

C) Developing financial models

Leveraging big data centers of excellence

Karan Sachdeva的更多文章

Finding the Right Role in the AI Era

Who Is Responsible When AI Makes the Wrong Decision?

Agentic AI: Revolutionizing Business Operations

2024: Moments that Matter

5 AI Skills to Master in 2025

2025: Three Big Bets in Technology

Finding Your Voice in 2025

5 Mind-Bending Use Cases of Generative AI

Open Source: The Unsung Hero of the Generative AI Revolution

5 Key Bets to Close Q4 Strong

社区洞察

其他会员也浏览了

Simplifying Data Work with Amazon EMR and PySpark for Data Processing and Analysis

AWS Data Engineering Essentials Guidebook

Big Data Computation: Revolutionizing the Digital World

COMPONENTS OF AZURE DATA FACTORY

Big Data Architecture: Layers, Process, Benefits, Challenges.

Data Technology Trend #8: Data Next

How to Choose the Right Data Ingestion Service: AWS, Azure, GCP

Musings on Data, Part 1: lakes, houses, clouds, etc.

Building a Data-Driven Future: Part 2 - Six ELT Challenges Nobody Tells You

Real-Time Data Processing: Architectures and Tools