AWS re:Invent Recap 2023
LED Wall at re:Invent 2023

AWS re:Invent Recap 2023

Overview

Unified BI emerged as the central theme at Amazon Web Services (AWS) re:Invent this year showcasing Amazon's dedicated effort to consolidate various cloud products seamlessly. As an example of this shift, GoDaddy has gone through this transformation which involves transitioning from a big data platform utilizing Hadoop and Spark to a comprehensive migration onto the AWS cloud infrastructure.

Key Takeaways

亚马逊 is making significant strides in establishing a unified BI platform, showcasing its commitment to eliminating data silos, the need for specialized tool training, diverse UIs, and the challenges associated with managing various contracts and maintenance commitments. This direction toward a unified platform represents a substantial leap forward. There are a few key takeaways that we should consider:

  1. Staying in the Amazon ecosystem - Is it a good idea to stay exclusively within the Amazon ecosystem, weighing the advantages of a unified approach against potential limitations on future technology adoption in the ever-evolving BI landscape?
  2. Data Ownership - Is it more suitable to have a decentralized model, where different teams manage their data architecture? Tools like DataZone can assist in governance, but the responsibility lies with data owners to maintain up-to-date data glossaries.
  3. User Dashboard Creation - Allowing user-created dashboards with QuickSight. Recognizing the need for training to prevent issues similar to the past, where an excess of dashboards were created within Tableau.
  4. The role of Data Science - Despite the convenience of relying on a single tool without code or input, we still need to recognize the importance of the role of data science. Only with their help can we emphasize the significance of experimentation with different models and inputs before finalizing decisions.
  5. Managing Many Tools - Having all these tools at your fingertips is nice but can be overwhelming, underscoring the importance of a training framework to maximize tool capability and ensure a user-friendly experience.

The success of DataZone and QuickSight is still unknown, but I look forward to the next developments from Amazon in this evolving space of Generative BI and Insights. Below are some items that were announced during the Analytics track of AWS re:Invent.

Data Governance

Let’s dive into the significance of Data governance to see how these systems interact. Notably, BMW presented a case where they seamlessly integrated Amazon's DataZone into their Cloud Data Hub.

However, what exactly is DataZone, and how does it facilitate the integration of these various products? DataZone is a data management service that provides the capability to catalog, discover, govern, and analyze data across your entire organization. The underlying concept of DataZone is to establish a data portal that enables universal access, allowing individuals to efficiently gain insights into their data.

Slide from Amazon's DataZone presentation.

Teams responsible for data generation are considered data producers, which can be a decentralized data solution where instead of a singular team owning the data, multiple teams own different aspects of the data. On the other side, data consumers have the ability to request access to specific data segments. For instance, if someone wishes to access financial data, they need to submit a request through the portal, prompting the finance team—the data owners—to review and approve it. However, this decentralized approach poses challenges, as the finance team must take on responsibilities such as tracking, cataloging, maintaining data definitions, and comprehending the intricacies of the data.

After gaining access, a simple button click will allow you to see the subscribed data.

Screenshot of the subscribed data set after being granted access to it.

You can easily navigate through the business catalog to access pertinent information. This catalog is equipped with features such as asset search, providing technical and business metadata for data objects like tables, dashboards, or views. It also includes a business glossary featuring standard business and data-related terms with easily comprehensible definitions. Additionally, metadata forms are available to capture both technical and business metadata of assets in a standardized manner.

Should you want to write raw SQL queries, a user, with a click of a button, can launch a query editor within Amazon Athena.

Amazon Athena SQL Editor.

Generative AI within DataZone

With GenAI now enabled within DataZone, the platform will automatically generate descriptions from your dataset, simplifying the process for users to locate relevant information. Leveraging Athena, users can create tables and execute automated metadata jobs within DataZone. This job imports the columns into the catalog, allowing users to create the business metadata in a user friendly format.

Business Metadata tab summary.
Data metadata that is generated automatically for users to review.

Upcoming enhancements that were announced related to searching within the data catalog using Large Language Models (LLMs) to suggest relevant data for addressing specific inquiries.?

Unified BI through Amazon QuickSight

The case for Amazon QuickSight

Unified BI is the future for Amazon’s product lineup, with QuickSight serving as the visualization layer. In line with this release, numerous updates have been introduced to enhance their visualization capabilities.

Updates to QuickSight.

However, the most significant update revolved around an AI-driven dashboarding experience fueled by Amazon Bedrock.

Leveraging NLP, users have the capability to instruct QuickSight to construct visualization that align to your business needs. Additionally, you can modify visualizations by specifying what you want to observe in the visualization

Visualizations generated using NLP.

With a simple question, you can prompt your visualizations to provide executive summaries and it will suggest questions/alternatives for them.

Executive summary of the data you are looking at.

My favorite feature is the ability to help users drive insights in concise one pager to present to executives.

One pager executive view that is generated out of QuickSight.

Data Science and QuickSights

An integration with Amazon SageMaker Canvas was introduced that allows users to build and train ML models without writing any code, enabling the incorporation of predictive data models into QuickSight. In your dashboard, you can configure a dataset to send to SageMaker and initiate runs across different models.

SageMaker analysis results.

The model executes and provides the model accuracy which can be fed into QuickSight to build a visualization.

Quickly using data out of SageMaker to build visualizations within QuickSight.

Limitations

While all these tools are sound, I feel however there are notable limitations in the current state of Amazon’s data solutions:

  1. Multidimensional Cubes or OLAP - Amazon doesn't currently support this. While they recommended using SPICE (Super-fast, Parallel, In-memory Calculation Engine) for their report cache, it seemed to be constrained.
  2. Databricks Delta Lake Architecture - I found that many conference attendees are also transitioning to Databricks delta lake Architecture. Amazon doesn’t currently support this Databricks integration into the unified BI stack, but they have it on their roadmap to collaborate with Databricks on a solution.
  3. Data Complexity in Demonstrations - Demonstrations primarily showcased simple datasets such as team sales or real estate data. Wish I could have seen more complex data models and sets, leaving me to wonder if a system like this can handle more real-life use cases such as our massive and complete data stack over at Slickdeals.
  4. Data Ownership Challenges - Amazon in their demos of DataZone assume that different teams within companies own the datasets. While this is true, they may not own the cataloging and documentation of the data. Based on my multiplied years of experience in the data space more often than not, companies do not follow a model where different teams have ownership over specific datasets.

Miscellaneous Pictures

LED wall was pretty cool where you create your own drawing. Someone made Mario!

Greg Mabrito doing some PR for Slickdeals .

No we didn't get tattoos or piercings but the lines were insane to get one.

Expo Center was massive with so many vendors Greg Mabrito and I spent time speaking with.

AWS re:Play closing party was something to see.

They had the pickleball and ping pong finals during the AWS re:Play closing party.


要查看或添加评论,请登录

Cyrus S.的更多文章

社区洞察

其他会员也浏览了