I hope you've had a great month of August and are enjoying the back to school feeling. I’m personally very excited about September; the whole team will attend Big Data London - the largest European data gathering of the year. Now, let’s get to the data news!
As always, you can expect to find:
- One?tool?that we believe is worth digging into as a?data?person.
- A curated list of the best articles we've read this month, along with brief teasers.
- Show and Tell: Some exciting updates from the CastorDoc team.
- A data meme, to brighten your day.
Things are always moving in the data observability space. Definity just raised $4.5M to transform data pipeline management. Their approach consists of monitoring data in motion, allowing real-time observation and optimization of data transformations. This is important as businesses increasingly rely on complex data applications for decision-making. Definity's solution targets Apache Spark-based systems and works both on-premises and with popular cloud services. Founded by ex-PayPal and FIS experts, the company is well-positioned to address data engineering challenges. As AI drives demand for efficient, high-quality data pipelines, Definity's timing could be spot-on. You will find a more detailed analysis of the data observability landscape here, and a benchmark of different tools here.
- The Five Laws of Data Enablement: How the father of library science would make his data team indispensable. Can a 1930s librarian teach us about modern data management? Apparently, yes. Amalia Child draws parallels between library science and data work. She adapts S. R. Ranganathan's "Five Laws of Library Science" for today's data professionals, offering a perspective on how to make data teams indispensable. From prioritizing data use over governance to treating your data function as a growing organism, Child's insights are both practical and actionable. Who would have thought that the solutions to our modern data problems could come from a book written nearly a century ago.
- Personal Data Classification. Airbnb's blog post offers a peek behind the curtain of how they manage personal data. Sam Kim and team break down their in-house data classification system, a foundation for security, privacy, and compliance. They dive into the three pillars of their approach: cataloging, detection, and reconciliation, each with its own set of challenges and solutions. What I find interesting is their "shift left" strategy, pushing data classification earlier in the development process. It's a delicate balance between automation and human oversight, aiming to protect user data without restricting access. This piece illustrates the complexities of managing personal data at scale, and sheds on how tech giants balance user trust with data utilization - an issue most data teams grapple with.
- Analytics Personas. Tristan Handy's piece looks at the human side of data work. Drawing from his experience with dbt, Handy challenges the conventional wisdom about data team structures. He argues that the best data professionals aren't confined to rigid roles but can flexibly wear different "hats" - engineer, analyst, and decision-maker - as needed. It's a case against treating analytics like an assembly line and for empowering individuals to tackle problems end-to-end. This piece is an interesting perspective on how to unleash the full potential of your analytics talent. It's a must-read for anyone looking to foster agility in their data organization.
- Feature of the Month: Unified Conversation History for Chrome Extension and App. This update ensures you never lose track of your interactions with the AI assistant, regardless of platform. Seamlessly access your conversation history across Dashboard Q&A, SQL CoPilot, and AI Search, even after navigating away or refreshing. Pick up right where you left off, eliminating frustrating context loss. This feature provides a smooth, uninterrupted experience with the AI assistant. If you’re interested in trying it out - get in touch with the team.
- Article of the Month: Tracing Value: 6 Use Cases for Data Lineage. In August I published the first piece of a series dedicated to data lineage. The article covers six key applications: data governance, migration, metadata propagation, debugging, impact analysis, and reducing technical debt. I'm looking especially at how lineage can improve data team efficiency and deliver business value. If you’re interested in the practical use cases of the lineage technology, you will like this piece.
- Event of the month: Big Data London 2024. The whole team will be attending Big Data London on September 18-19, 2024. You can find us at Booth Y558. We will also be hosting a session with Thibault Gadiolet, Chief Data Officer at HomeServe, on September 18 at 2:40 PM in the Analytics and Decision Intelligence Theatre. Click here to bookmark the session in your calendar. Thibaut will share Homeserve’s journey towards Self-Service Analytics. If you're attending the event and would like to connect with our team, please get in touch here.
Thank you for the shoutout and the addition to your market map as the leading in-motion data pipeline observability platform! ?? CastorDoc