How HP’s Workforce Experience Platform is Mastering the Art of Data Scale

How HP’s Workforce Experience Platform is Mastering the Art of Data Scale

The ability to analyze large amounts of telemetry data is what powers the HP Workforce Experience Platform and makes it one of the largest device-linked data processing cloud operations in the world.

Introduction

Part of the magic of leading teams into the crossroads of devices, data, and the cloud is the thrill you get to experience when your solutions delight and enrich your customers. Having spent two decades launching Security, Mobile, and Cloud products, I can tell you that the joy of building industry-changing solutions only continues to grow over time.



At HP, my latest passion is helping employees become happier and more productive at work. The HP Workforce Experience Platform is our newest offering that learns from 25,000+ businesses to improve employee engagement and productivity with data and AI-assisted decisions.

How we scaled the HP Workforce Experience Platform

Today, our platform is a massive petabyte-scale data operation that ingests data from 38+ million devices, sending huge amounts (660GB) of telemetry data to HP daily. However, scaling the platform required us to map out and then execute several critical strategies.

For example, one of my mantras for developing software is that innovation must always focus on differentiated value creation. Often, that means striking strategic partnerships to reuse leading inventions across the technology world.

That’s why HP collaborated with top cloud provider Amazon Web Services (AWS). Within a short period, we achieved a scalable and repeatable model on AWS with a differentiated approach to architecture.

We also needed to simplify our plans for data scale, follow rigorous design principles, and implement standardization across architectural patterns. To help inform our decisions, we first had to better understand the scaling challenges that modern tech infrastructures face.


The Rise of Data but the Fail to Scale!

As many people know, the key to any analytics product is DATA. It is often the main currency, or input, that drives a virtuous cycle, where:

More data, leads to… Better analytics, which leads to… More customer value, which leads to… More usage, which leads to… More data…


However, according to McKinsey, 92% of data analytics projects fail to scale effectively. This aligns with broader industry trends, where high failure rates are common across various data initiatives.

Why, then, can businesses not leverage the wealth of available data to impact business outcomes positively?

Where scaling goes wrong

When it comes to building any data analytics product, there’s a catch. The aforementioned virtuous cycle only works if the following three conditions are met:

  1. Data collection must be seamless.
  2. Scaling must be autonomous.
  3. Cloud costs must be low.

Further complicating matters is the rise of powerful cloud platforms like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure. With the promise of making big data infrastructure more accessible and scalable, each platform now offers ~200 services that can be mashed up, customized, and re-swizzled into an insane number of customizations.


Source: Milan Cloud Mapping AWS vs Azure vs GCP vs Oracle

Unsurprisingly, complexity has soared but without a ton of additional benefits. Today, the complicated mess of technologies and customizations has made tech infrastructures more expensive and complex and less scalable, seamless, autonomous, performant, and reliable.

As a result, organizations quickly realize that their data—the valuable currency that drives the virtuous cycle—doesn’t provide the value or return on investment they expected.

A Different Approach: Simplify Data Scale to What Matters Most

With our platform, we took a novel approach from the very beginning. We needed a dramatically simpler cloud SaaS architecture to extract the maximum value from our data—without the downsides.

First, we focused our efforts on the data layer because that’s where most scalability challenges are concentrated. Ingesting, storing, accessing, archiving, and processing data all require enormous scalability.

Below is a diagram that shows how data moves through the input, processing, and output stages. As you can see, the scalability demands for the data layer dwarf the source data and serving layers.


Once we nailed down our focus, we strategized on how to architect a repeatable pattern for ultimate simplicity. That meant:

  • Strictly following first-order principles for architectural design.
  • Seeking standardization across architectural patterns.

Rigorously designing by principles and measuring by metrics

When designing the HP Workforce Experience Platform’s software architecture, we followed five primary first-order principles that have always stood my test:

  1. Customer value focus
  2. Measurable by metrics (includes elastic scale, cost, quality, and compliance)
  3. Differentiated value focus
  4. Play to your skills and strengths
  5. Keep it simple, stupid (KISS)

With our platform, these principles allowed us to prioritize what truly mattered while we ruthlessly eliminating everything else. Following the KISS principle, we then used standard architectures to?build and learn from those who walked the path before us and previously scaled with confidence.

Why Standardization Across Architectural Patterns Matters More Than You Think

Several established architectural patterns have emerged in big data storage. Popular architectures include Kappa, Data Lake, and Data Warehouse, but the two most suitable for our platform are:

  1. Medallion Architecture for batched processing
  2. Lambda Architecture for real-time event processing

The choices were based on our domain, metrics, need for standardization, and need for simplicity. This focus reduced HP customizations, allowing us to scale our data lake to millions of devices.


As you can see, Medallion and Lambda architectures combine to deliver the best of both worlds: batched and real-time data processing. This combination has yielded maximum scale at low cost, satisfying customer needs for insights. Let’s take a closer look.

Medallion architecture

Medallion architecture is used when you need a structured, layered approach to manage data quality, consistency, governance, auditability, and compliance. It organizes data into different layers or zones to streamline data processing and ensure data quality.

  • Bronze layer: Raw, unprocessed data
  • Silver layer: Cleaned and transformed data
  • Gold layer: Aggregated, highly curated, and ready-to-use data

HP Workforce Experience Platform uses the medallion architecture wherever batch processing of data is done. ~100% of our 173 data classes, each with ~30 fields, use Medallion architecture.


In this illustration, we reused AWS service components like Kinesis and Kinesis Firehose, allowing for hands-off monitoring and autonomous scale. We also used AWS S3 to store bronze and silver data (raw data formats) and Redshift (for analyzed data) to allow for large data storage. Regarding metrics, compliance and data quality are crucial in a hyper-organized medallion architecture.

What about speed? It’s true that, at times, developer velocity is hampered by this multi-stage formal processing.

However, this is compensated in the long run because data is more organized, and insights are of higher quality. Furthermore, a pragmatic but documented choice of selectively bypassing silver data for high-data-quality classes can also save on development costs.

Lambda architecture

Lambda architecture is designed to handle massive quantities of data and is used when you need to handle both real-time and batch data processing, provide low-latency responses to real-time queries, ensure data consistency with batch processing, and can manage the complexity of maintaining two separate processing paths.

The three layers of Lambda architecture are:

  • Batch layer: Stores all incoming data and processes it in batches (use medallion here)
  • Speed layer: Handles real-time data processing for immediate results
  • Serving layer: Merges outputs from both batch and speed layers for final consumption

For the HP Workforce Experience Platform, 15% of our customer insights and actions must surface in real time. These are mission-critical elements that must meet delivery SLAs of under 15 minutes. The rest of the insights go through machine learning from years of data in our data lake via the batched model (or Medallion architecture).

Diving a layer deeper, this standardization provides much-needed speed and scalability. In the following illustration, our platform leverages AWS Kinesis, Kinesis Firehose, and Elastic MapReduce (EMR) to ingest, store, process, and service real-time data at scale. As you can see, the sporadic nature of these real-time events is better served using NoSQL Databases instead of an ETL-style, low-cost data warehouse like AWS Redshift.


“If you want to walk far, walk together.”

This old African proverb perfectly describes HP’s approach to collaboration. Collaborating with the AWS public cloud ecosystem gave us ample opportunity to share and review our progress.

How can you follow in our footsteps and leverage similar AWS capabilities for your organization?

First, connect with others who have walked the path before. Ask your AWS contact to connect you with internal Amazon leaders or customers who have faced similar challenges. Our real-time/micro-batching Lambda architecture ideation was kickstarted with an AWS Reinvent meeting with the Amazon Advertisements Software Leader. He operated in a parallel universe that required a similar solution.

Second, take advantage of the AWS Well-Architected Tool. AWS assigns solution architects to help periodically review and assess your framework’s security, scalability, and costs.


Source: AWS Well Architected

Finally, for key initiatives, AWS also provides prototyping engineers. We have done many initiatives with the AWS SaaS Factory Team.

The HP Workforce Experience Platform Is Built to Last

Management historian Martin Gutmann points out that people tend to celebrate leaders and teams for their dramatic words and actions in times of crisis. However, they often overlook lessons from truly great case studies that avoid the crisis in the first place.

If scaling is planned, executed, and monitored well, it’s almost boringly efficient. There should be no surprises or fires to put out, and everything should work smoothly and uneventfully.

Few people know it, but our platform is now one of the largest device-linked data processing cloud operations in the world. Three petabytes of data are processed daily through over 300 EMR jobs, exposed by over 200 microservices and Lambda functions, running on over 1,000 server instances on AWS. 18 million corporate devices benefit from the pipeline across our smart warranty, security, and print products.

And we achieved this at record speed with a tiny engineering team.

In a massive petabyte-scale data operation like the HP Workforce Experience Platform, this is the power of simplicity, standardization, and differentiated value creation.

?

HP Workforce Experience Platform is a comprehensive and modular digital employee experience solution that enables organizations to resolve IT issues before they arise, protect against cyber threats, and personalize IT for every employee’s needs.

Subscribe to the HP Workforce Experience Blog get notified on cutting edge technology posts from our HP team or to begin optimizing your IT capabilities today.


Gaurav Kumkar

Founder and CEO, Majestic Investment Group

3 个月

Huge

Harri J Salomaa

Designing tomorrow, advising today

3 个月

Hi Gaurav, Thank you for sharing such a comprehensive and practical account of your journey. It’s always great to see former colleagues pushing the boundaries of innovation and achieving such remarkable results. I look forward to following more of your updates. Best Harri

Nir Gilon

Senior Software Engineering Manager @HP Graphic Solutions

3 个月

Abilash Thulasidharan Eran Sagi Abilash Thulasidharan

Nir Gilon

Senior Software Engineering Manager @HP Graphic Solutions

3 个月
Imran Shah

IT Consulting @Pfizer

3 个月

Well said. Adopting a Lakehouse architecture can further simplify this architecture by supporting both real-time and batch processing. Implementing robust data quality controls ensures trusted data. Additionally, a metadata layer is very instrumental for AI and machine learning, providing context and improving model performance.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了