登录查看更多内容

Transforming Sports Data with Databricks

Jared Chavez

Manager, Data Engineering & Strategy | Pacers Sports & Entertainment

发布日期: 2024年7月25日

+ 关注

How the Indiana Pacers and Fever are revolutionizing Basketball with the Databricks Data Intelligence Platform?

Author Jared Chavez, Senior Data Engineer, Pacers Sports & Entertainment

Pacers Sports & Entertainment (PS&E) is the engine behind the Indiana Pacers, Fever, and Mad Ants, along with a thriving eSports team in Pacers Gaming, an incredible charitable foundation, and a soon-to-be multi-venue entertainment business. Over the past two years, there has been a meteoric rise in development and demand for data within our organization. As we approach what might have previously been considered the peak for our organization, our sights are set on ever-higher goals and innovations in the sports and entertainment data space.

Our journey toward true modernization began in early 2022. Basketball Analytics looked to the cloud for its next evolution, and the organization turned toward centralization to dramatically reduce operational costs and improve synergy across our brands. My Pacers Basketball counterpart, Michaela Roberts, and I were given the rare opportunity to completely redesign the infrastructures of our respective departments and redefine how data operated within our organization.

We ultimately elected to move forward with Databricks. This decision quickly streamlined the workflows of our analysts and end-users (particularly on the business side) to such a high degree that we soon found ourselves unable to imagine returning to the old way of doing things. Soon, the days of supporting fragmented querying tools, notebook environments, and data silos were behind us, and everyone collectively moved to engage this new “Lakehouse” directly.

Reenvisioning our data platform and building a new ecosystem

The decentralized nature of PS&E’s data infrastructure and analytics teams led to disparate silos all over the business with limited awareness of each other's resources or projects. Tooling and skill sets varied wildly, with each silo practicing its own form of analytics and developing its own tribal knowledge base. The organization controlled two completely independent warehouses, each serving different areas of the business, and each becoming increasingly unstable under the weight of clashing ideologies and rapidly changing requirements over the years.

We didn’t need a new warehouse per se; most had never even seen the existing warehouses we had. Instead, we needed an ecosystem in which every department could engage data where they felt the most effective, something where this new data platform felt present but wasn’t the “be-all and end-all” that the previous warehouses had tried to be.

Finding a product that could fill that role—and do it well—was challenging. Our previous warehouses were built on Azure Synapse, which facilitated seamless connections to external platforms but proved prohibitively expensive for an organization of our size. Additionally, the need to integrate external tools like Azure Data Studio and our machine learning platform created a fragmented operation. Each team had its own workflow and preferred method of connecting everything, leading to a disjointed system and a chaotic work environment for me to support, so I went off to find a new home for our data.

What initially drew us to Databricks and has kept us with them over the years as the platform and company have continued to evolve is relatively simple: they are every bit as aggressive in mindset as we are.?

Reflecting on my time with Pacers Sports & Entertainment brings back many humorous memories of working with (and often around) the once fledgling Workflows toolkit, watching the old Hive Metastore decide when it felt like connecting to our business intelligence tools and seeing the SQL Workbench trying its best to meet our organization's expectations with its limited syntax. With that said, the Databricks Platform today is head and shoulders above what it was just a few years ago, and their rapid, relentless innovation has not only mirrored our own ambitions but has also made them an invaluable partner throughout this journey. While there remains room for improvement, I’m very happy we decided to stick with them, and I can’t see us going anywhere else.

Growing alongside Databricks and redefining how business is conducted

A high-level overview of our data infrastructure.

We'll start with the Machine Learning Workbench, which undoubtedly had the most significant impact early on. Candidly, it democratized machine learning within our organization. We have a diverse group of data professionals across our brands—some who thrive in their IDEs, building, training, and deploying models at will, and others who have hardly touched languages like Python or R. Both groups needed predictions, which made the machine learning tool I mentioned earlier essential.

Databricks excelled from the start, with their AutoML suite providing our more technical users with a rapid prototyping tool that could produce a solid baseline model for further development. At the same time, it offered our less technical users a way to create reliable predictions quickly and confidently. The first models we deployed focused on ticket pricing and game demand, and they have remained core components of our ML workflows every season since. Databricks has been responsive to our feedback, continually refining how clusters are configured, improving the AutoML tool, and enhancing connectivity options to our schemas and the integration of model outputs back into our data tables, making the Machine Learning Workbench a breeze to navigate and utilize these days. Workflows now take hours instead of days, and I get to watch the AutoML tool compete against our ticketing team and their models each cycle. This competition leads to significantly better-informed predictions, with the occasional AutoML model making it straight to production without any further tweaking from our team, saving them quite a bit of time getting answers back to our decision-makers during their peak periods.

I’ll group the SQL Workbench and the Notebook environment together. To quote our newly onboarded Director of Business Analytics upon seeing Databricks for the first time: “This is the most advanced data platform I have seen.” The evolution of the notebook environment has been nothing short of fantastic. Features like parameterization, native Git support, and the ability to mix and match programming languages within cells have streamlined development, allowing us to leverage each language for its strengths. The SQL Workbench has also come a long way, enabling our analysts across various brands to interact seamlessly with data inside the platform. With the addition of the newer serverless warehouses, connecting and refreshing dashboards and other reports in external BI tools has also become effortless. Everything lives in the same place, and we no longer worry about difficulties with local environments, dependency clashes, and the endless stream of troubleshooting external development tools.

Before diving into what we’ve been building, let's wrap up with Workflows. Although they were rough in the early days, we stuck with them, and honestly, I don’t see us using anything else now. Nearly every quirk, issue, and nitpick we had has been addressed by the Databricks team, who have graciously taken an enormous amount of feedback from us. Compared to most alternatives, Workflows have made orchestration much easier to deploy, maintain, and navigate. The ability to quickly access the interface and begin work has significantly reduced overhead for my team, and I have no desire to entertain other tools that have come our way.

Continuing to push the envelope with Databricks

Databricks has given me the firepower needed to meet the organization's demands and the increasingly complex vision for what data could enable within our organization.

Alongside the firepower the platform has given us, Delta Lake has given us unrivaled flexibility and peace of mind. We have built our infrastructure in such a manner that I don't really have to worry about it anymore. Delta Lake has made everything so easy to build, maintain, and scale that I could honestly delete our entire metastore, and it will faithfully rebuild itself from our raw data storage overnight as if nothing ever happened.

领英推荐

Sports Predictive Analytics

Solutyics 4 个月前

Sports Analytics

Solutyics 8 个月前

NBA 2021/2022 Player Stats Data Analysis

Stuart Walker 1 年前

Additionally, when I say data lives in the lakehouse, people know what that means and how they can get to it; the platform’s flexibility has been the biggest hit point in the organization. We're no longer sitting here trying to figure out “Where does this data live?” or “How do I best connect to it?” However people want to work, wherever they want to work, they can do it. And that's the first time I've really been able to say that, not just here at Pacers Sports & Entertainment, but in my career.

Their recent partnership with Salesforce, one of my closest partners, is a prime example of this, and it has opened up a whole new world of possibilities for us. Earlier this year, we began piloting the newly developed connectors to and from Salesforce Data Cloud alongside the "Bring Your Own Model" feature within the platform. This has enabled me to finally unify our various brands into a cohesive Customer 360 model that integrates directly with the tools and platforms our Sales & Marketing division loves. At the same time, I can leverage Databricks as a powerful computing and machine learning engine, continuously enriching, refining, merging, and predicting on our customer records across not just our brands but every system and revenue stream. Visibility on customer information will soon be at an all-time high, and the expected impact is unprecedented.

In tandem with the Salesforce project, we're developing our new text messaging workflow, leveraging Databricks to deliver data directly to our vendor. This process fuels engagement and attribution, allowing us to incorporate feedback from our vendor into our broader ecosystem promptly. This integration significantly enhances turnaround times and performance reporting for our advertising and marketing teams.

Delta Share has enabled me to significantly enhance support for the various collegiate programs we are collaborating with. We're organizing hackathons that leverage Delta Share to provide real-time access to participating organizations, avoiding the need to directly expose our own workspaces or create unnecessary overhead for them while preparing data for their competitions. Additionally, we are exploring Delta Share's potential for our consultants, eliminating their need to work directly within our Azure environment and significantly reducing project ramp-up time.

Finally, I want to touch on what was originally a pipe dream idea that has now taken off. We embarked on a project to model our entire arena and surrounding campus in a 3D space for use in virtual reality. Leveraging the locational data from our network infrastructure, we are poised to create traffic maps that will aid in security efforts, concessions staffing, signage placements, and more.

Once this project is complete, our next step involves integrating Ticketmaster into this 3D space via Databricks. In conjunction with the ML Workbench, we aim to develop a vision model that calculates the value of physical signage viewed by fans during games. This will revolutionize how we report physical assets to our sponsors.

Setting the Stage for Data in Women’s Basketball

The Indiana Fever are currently undergoing one of the most exciting transformations in our organization's history, both on and off the court.

Earlier this year, we embarked on a project to create the largest and most advanced data center in women's basketball, with Databricks at the core of this initiative. The dynamic nature of women's basketball globally is fascinating, as players frequently move between leagues and competitions, often traveling across the world throughout the year to do so.

Accurate and timely coverage of these competitions is crucial for our scouting efforts and performance evaluation of players abroad. However, two significant challenges quickly arise. First, box scores and play-by-play information, if at all available, are often incomplete or unreliable, making it challenging to gather consistent and accurate statistics. Second, the level and quality of play vary widely across these competitions, complicating the assessment of how a player's performance in a particular competition translates to the WNBA.

The project's first stage involves integrating video tracking systems like Hawk-Eye and Second Spectrum. These systems provide us with terabytes of highly detailed biomechanical and statistical play-by-play data for all WNBA games, allowing us to analyze basketball under a microscopic lens. This influx of data has fundamentally transformed our ability to analyze and break down the game. Although it may take another season or two for our staff to adapt to the potential this information offers fully, it has already significantly enhanced the detail of our pre-game and post-game reports, answering many questions that were previously nearly impossible to address with our typical play-by-play feeds.

In the next stage of the project, our coverage of women’s basketball will expand significantly as we begin capturing data from every major competition worldwide, from high school games to the Olympics. The full impact of this expansion on our organization is hard to quantify, but it will undoubtedly revolutionize our reporting and scouting efforts. We intend to start by developing a new scouting platform for our front office powered by this comprehensive global database. This platform will leverage a new in-house weighting system for each competition and league that will be designed and developed following the completion of global data capture, enabling us to accurately assess players’ performances abroad and predict how they will fare in the WNBA. This phase will also include an effort to categorize players by playstyle, allowing us to evaluate matchups better, predict on-court chemistry, and optimize roster composition. This modernized approach goes beyond traditional positional analysis, providing our team with a more flexible and dynamic system for evaluation, roster building, and strategic planning.

The "final" stage of this endeavor involves integrating another Databricks partner, Posit, into our ecosystem. Posit Workbench, Package Manager, and Connect will enable us to support Python and R as first-class languages in various IDEs. This will allow our users to publish using their favorite visualization libraries while eliminating dependency issues through centralized package version enforcement. There's a great deal of excitement around this initiative, as it will empower all our users to harness the power of the Databricks Platform while working with tools they are comfortable with. Just as we use Databricks as an engine for Salesforce, enabling our Sales & Marketing teams to handle data in their preferred environment, our goal here is to provide a similar experience for our basketball teams and most technical users.

The Financial Impact of the Databricks Data Intelligence Platform and the Lakehouse Paradigm

The transition to Databricks has resulted in some of the most remarkable year-over-year cost savings I've encountered in my career.

Starting with predictive workloads in our ticketing operations, Databricks enabled us to move away from a siloed machine learning platform that would have cost us $100,000 annually. The Machine Learning Workspace within Databricks provides seamless access to our data in an environment that is significantly easier to configure, train, and deploy models. We are now entering our second year of using Databricks for ticketing predictions, and the financial difference has been astounding. The same range of predictive projects, with careful compute configurations, were produced this past season for an eye-watering $10. These projects were also deployed to production in a single day, which was much faster and with fewer issues than in previous years, allowing our team to focus substantially more time on driving sales.

Moving on to the impact Databricks has had on our sponsorships, we’ve achieved a level of automation that eliminates months of tedious manual work across multiple departments. This time savings significantly enhance our ability to report quickly to our sponsors and keep pace with the relentless sales cycles our brands face annually. Reports that once took days to weeks now take only seconds to minutes. Looking ahead to the next cycle, it’s hard to imagine how we ever operated without Databricks.

Finishing with the infrastructure as a whole and the metrics we are most proud of: we now have over 440 times more data in storage and eight times more data sources in production compared to our old infrastructure. Despite these significant increases, we are operating at just under 2% of our previous annual costs, which has led to numerous humorous conversations internally and externally as we try to grasp the level of efficiency we have achieved.

The savings from our migration to Databricks have allowed us to reinvest in better tooling across the business and explore new ventures on and off the court, including many projects I could not have envisioned being able to experiment with while working for a sports team. Hopefully, we’ll be able to talk about them soon. In short, it’s sometimes hard to believe this is the same company I joined two years ago, and Databricks has been a major factor in this transformation.

Divesh C.

| Senior Data Engineer | Python,Scala, SQL | AWS, Azure, PySpark, Snowflake | Power BI, Tableau | Hadoop, Kafka, NoSQL, Git | Certified AWS Associate Data Engineer ||

2 个月

"Absolutely transformative! PS&E’s adoption of the Databricks Lakehouse architecture is a masterclass in leveraging data for competitive advantage. The seamless integration across brands, coupled with advanced analytics, is not just a win for operational efficiency—it's a blueprint for the future of sports and entertainment. This is how you turn data into a true game-changer!"

1 次回应

James Skinner

Helping teams succeed with data

3 个月

Jared Chavez your an inspiring leader and innovator. Congrats on the success and thanks for sharing!

1 次回应

Steve Kruchkevich

Strategic Account Executive at Alteryx

3 个月

Great example of how unlocking data gives you the power to achieve what is imaginable as well as unimaginable and drive tremendous value.? Looking forward to hearing what you accomplish next!

1 次回应

查看更多评论

要查看或添加评论，请登录

Transforming Sports Data with Databricks

Jared Chavez

Manager, Data Engineering & Strategy | Pacers Sports & Entertainment

How the Indiana Pacers and Fever are revolutionizing Basketball with the Databricks Data Intelligence Platform?

Reenvisioning our data platform and building a new ecosystem

Growing alongside Databricks and redefining how business is conducted

Continuing to push the envelope with Databricks

领英推荐

Setting the Stage for Data in Women’s Basketball

The Financial Impact of the Databricks Data Intelligence Platform and the Lakehouse Paradigm

社区洞察

其他会员也浏览了

Top 15 NIL Valuation in Focus: Leveraging Data Analysis for Strategic Market Valuations

Sports Analytics Market to See Major Growth by 2029 | Oracle, DataArt, Orreco

What can Healthcare Learn from Oakland's Golden State Warriors?

Three-Dimensional Decision Making: How Ternary Charts Reveal Hidden Patterns in Sports and Business

All There Is To Know About Sports Analytics (And Why Your Team Needs It)

Mastering Sports Science and Data Science: The Power of Decision-Making

Data, Insights, Actions, Outcomes: Snippets from Movie - MoneyBall

How Data Science Changed How I See Basketball (And What I Still Need to Learn)

Making the invisible visible through data storytelling with UFC’s Alon Cohen

A Lesson from the Golden State Warriors for the Evolving Healthcare