Data + LLM News - September 2024

Data + LLM News - September 2024

Hi there,

?

I’m writing this newsletter under the Paris rain - well aware that summer is behind us. New York seems to be on the same page, with Halloween decorations already popping up around the city. The CastorDoc team is expecting a busy month of October between Big Data paris, dbt Coalesce, and new client onboardings. But don't worry, we're still committed to bringing you the latest data news. Here's what you can expect:

  • One?tool?that we believe is worth digging into as a?data?person.
  • A curated list of the best articles we've read this month, along with brief teasers.
  • Show and Tell: Some exciting updates from the CastorDoc team.
  • A data meme, to brighten your day.

Let’s dive in.

Data Tool

A lot of people have been writing about whether or not AI would replace data teams. Well, we are already seeing some of this happen with the first “AI Data Engineer”. Revefi is a data observability tool that automatically establishes baselines, detects anomalies, and alerts users to unexpected behaviors in their data ecosystem. The tool's approach addresses common pain points like poor adoption of manual quality checks, alert fatigue, and escalating cloud costs. Investors seem to be betting on Revefi's AI-driven approach, as the startup recently secured $20 million in funding.

Below, we have presented our understanding of where Revefi fits in the Data Observability ecosystem. You will find a more detailed analysis of the Data Observability landscape here, and a benchmark of different tools here.?



Data News

  • The Analytics Development Lifecycle (ADLC). Tristan Handy has been teasing the data ecosystem with this whitepaper for a few months now. Well, it is finally out! The 25-page whitepaper outlines a framework for mature analytics workflows, drawing parallels with software engineering practices. Handy argues that while data transformation practices have improved, other areas like notebooks and dashboards still lack proper testing and SLAs. The ADLC model covers planning, development, testing, deployment, operations, and analysis. It is a must-read if you are looking to perfect the analytics workflow in your organization.
  • Data Teams Survey 2024: Jesse Anderson released his annual survey on the data industry. It covers LLM adoption in data engineering, the relationship between data teams and business needs, and differences between high and low-value creating teams. The results are interesting, and it’s also a useful benchmark for data professionals to compare their practices against the rest of the industry.
  • Just do it: For a long time, Benn Stancil was a strong advocate of the “metric layer”. In his last piece, he recognizes the concept might miss the mark. In fact - companies that put all their focus on measurable metrics sometimes neglect less quantifiable but just as important aspects of their business. This is exactly what happened to Nike. Stancil suggests we might be putting data on too high a pedestal in business. Instead of spending too much time looking at the data, he argues for a simpler approach: just try things out. This is a great read for anyone feeling stuck in analysis paralysis.


Show and Tell

  • Feature of the Month: SQL Copilot Table Auto-Select in CastorDoc App. Our SQL Copilot just got smarter! Now, when you write a query, you don't need to know exactly which tables to use. The Copilot will suggest the most relevant tables based on your question, using insights from past queries. This means you can focus on what you want to know, not on remembering database structures. While you might sometimes need to tweak the suggestions, this update makes writing SQL much easier, especially if you're not a database expert. If you're interested in trying it out - get in touch with the team.

  • Article of the Month: Dataset Lineage and Field Lineage: How to Compute Them??In September, I published the second piece in our series dedicated to data lineage. In this article, I look at the two main types of data lineage: dataset lineage and field lineage. I'm particularly focusing on the computation methods for these lineage types, including SQL parsing, API integration, and pattern recognition. If you're interested in understanding the technical aspects of how lineage is calculated and the distinct benefits of each type, you'll like this read.
  • Event of the Month: Big Data Paris 2024. The CastorDoc team will be attending Big Data Paris on October 15-16, 2024. You can find us at Booth F14. We're excited to announce that Olivier Détriché, Lead Analytics Engineer at Payfit, will present a session on Payfit's journey around building robust data documentation. This is a good opportunity to learn from a real-world implementation of data documentation strategies. If you're planning to attend the event and would like to connect with our team or learn more about Olivier's presentation, please get in touch here. We look forward to seeing you in Paris!


Data Meme

Until next time,

Louise from CastorDoc

要查看或添加评论,请登录

CastorDoc (Acquired by Coalesce)的更多文章

社区洞察

其他会员也浏览了