Bridging the (Analytics) Culinary Gap - Part 3

Let's break open the data pantry!

Happy Monday! Another week of tasty treats! Today’s Data Bytes calories: 890 words … 5 minutes.

??Join Google Developers Group Atlanta NOW to keep up to date on our 2025 event calendar! Here’s some of what’s in store (details TBD):

  • Zero to Hero Hands-On Flutter series (starting in February TBA)
  • Black History Month on Campus event (TBA February)
  • International Women’s Day event in March or April
  • Road to Google Developers Certification study program - Cloud Architect
  • DevFest! Save the Date: October, 31, 2025
  • Plus so much more!

??What I’m reading

??LLM Engineer’s Handbook

??Umami (Mapa de las lenguas)

??What I’m working on - currently working on my edge angles in Zermatt, Switzerland (I use Carv - check it out!.


One Big Thing: Bridging the (Analytics) Culinary Gap (Part 3)


Dataplex: Your Pantry Organizer

With our recipes crafted in Dataform and our sous chef Composer/Airflow orchestrating the cooking, we need a way to keep our data kitchen organized and efficient. That's where Dataplex comes in – our pantry organizer extraordinaire!

Why Dataplex is Essential for a Tidy Kitchen

Imagine a pantry where ingredients are scattered haphazardly, labels are missing, and expiration dates are a mystery. Chaos reigns! Dataplex brings order to our data pantry, ensuring our ingredients are fresh, easily accessible, and well-documented.

Here's how Dataplex keeps our data kitchen running smoothly:

  • Ingredient Inventory: Dataplex provides a clear inventory of all our data assets, much like a well-organized pantry. We can easily discover and access the data we need, whether it's raw ingredients or prepped dishes. No more rummaging through cluttered shelves!
  • Freshness Guarantee: Dataplex helps us track data lineage and freshness, ensuring we're always using the most up-to-date ingredients. It's like having a built-in expiration date checker for our data, preventing us from serving stale insights.
  • Recipe Documentation: Dataplex allows us to document our data assets with rich metadata, like nutritional labels for our ingredients. We can record details such as data origin, quality checks, and transformation processes. This ensures transparency and traceability in our data cooking.

Dataplex in Action: A Well-Organized Pantry

Let's see how Dataplex keeps our data kitchen tidy. Imagine we have a variety of data sources – customer orders, sales transactions, and marketing campaigns. Dataplex helps us organize these ingredients into logical shelves, making it easy to find what we need.

Setting up your pantry with Dataplex

  • Create a Lake: Start by creating a Dataplex "lake." This is the main container for your data, like the pantry itself. Give it a descriptive name, like "Sales Data Lake."

  • Organize with Zones: Next, create "zones" within your lake. These are like shelves in your pantry, separating different types of data. For our sales data, we might have zones like "Raw Sales Data," "Cleansed Sales Data," and "Sales Reports."


  • Add your Assets: Now, it's time to stock your shelves! Add your data sources as "assets" to the appropriate zones. This could be your BigQuery tables with raw sales data, your Dataform project containing the transformation recipes, and even the final dashboards where your insights will be served.


Navigating your pantry

Once you have your assets organized, Dataplex provides a clear view of your data landscape.

In this screenshot, we see how Dataplex automatically discovers the number of tables and total data size of our the assets in our lake. In Dataplex Discover (search) we can easily navigate through our data pantry, finding the specific ingredients (tables) we need for our ETL recipes.

Dataplex also provides detailed metadata for each asset, like a comprehensive nutritional label.

This metadata helps us understand the origin, quality, and transformation history of our data, ensuring we're using the freshest and most reliable ingredients in our recipes.

With Dataplex as our pantry organizer, our data kitchen is a model of efficiency and order. We can easily find, understand, and manage our data assets, ensuring our ETL pipeline delivers delicious insights to our stakeholders.

Helpful Resources


?? Sweet & Sour Candy (this week’s good, bad, or weird of the tech world)

?? Mark Zuckerberg gave Meta's Llama team the OK to train on copyrighted works, filing claims | TechCrunch - Plaintiffs in a copyright lawsuit against Meta allege that CEO Mark Zuckerberg approved the use of a known pirated dataset, LibGen, to train the company's Llama AI models, despite internal concerns about copyright infringement. Meta is accused of stripping copyright information from the data and using torrenting to access and distribute the pirated works, further concealing their alleged infringement.

?? A foundation model of transcription across human cell types - Nature - Researchers have developed a new model called GET (general expression transformer) that accurately predicts gene expression across 213 human cell types using only chromatin accessibility data and sequence information. GET's adaptability across different sequencing platforms and assays enables the discovery of universal and cell-type-specific transcription factor interaction networks and facilitates regulatory inference across a broad range of cell types and conditions.


?? One last bite

"You will do foolish things, but do them with enthusiasm." ~Sidonie-Gabrielle Colette

Thank you for reading Data Bytes. This post is public, so feel free to share it.

Share

Thanks for reading Data Bytes! Subscribe for free to receive new posts and support my work.

要查看或添加评论,请登录

Jessica Rudd, PhD, MPH, PStat?的更多文章

  • Bridging the (Analytics) Culinary Gap - Part 2

    Bridging the (Analytics) Culinary Gap - Part 2

    It's time for the Sous Chef! Happy Tuesday! Another week of tasty treats! Today’s Data Bytes calories: 2,674 words … 13…

  • Bridging the Culinary Gap: A Delicious Data Journey with ETL!

    Bridging the Culinary Gap: A Delicious Data Journey with ETL!

    Happy Monday and Happy New Year! After a year hiatus, DataBytes is back with more tasty data dishes! Today’s Data Bytes…

    2 条评论
  • Why Heuristics Still Matter

    Why Heuristics Still Matter

    While AI and ML are all the rage, it's crucial to remember that they aren't always the silver bullet for every business…

    1 条评论
  • Take the Reigns of the Cloud

    Take the Reigns of the Cloud

    Mastering gcloud Multi-Account Manuevers! Happy Monday! Time to make your week awesome! Today’s Data Bytes calories:…

    1 条评论
  • Hook, Line, and Cleaner

    Hook, Line, and Cleaner

    A primer for pristine python and precise pre-commit prowess Happy Monday! Time to make your week awesome! Today’s Data…

  • One Mac, Many Gits to Rule Them All

    One Mac, Many Gits to Rule Them All

    Mastering the Art of Multitasking GitHub Accounts on Your Apple Silicon Mac Happy Tuesday (because I forgot to post…

  • Feeling the heat (of my first app development)??

    Feeling the heat (of my first app development)??

    Do or do not..

    2 条评论
  • The Art of the Chill Week

    The Art of the Chill Week

    Happy Monday! Time to make your week awesome! Today’s Data Bytes calories: 369 words … 2 minutes. ??Join Google…

  • Going Mad for March Madness ??

    Going Mad for March Madness ??

    Happy Monday! Time to make your week awesome! Today’s Data Bytes calories: 781 words … 4 minutes. ??Join Google…

  • JavaScript vs. Python: The Data Engineering Duel ??

    JavaScript vs. Python: The Data Engineering Duel ??

    Happy Monday! Time to make your week awesome! Today’s Data Bytes calories: 646 words … 3 minutes. ??Join Google…

    2 条评论

社区洞察

其他会员也浏览了