DATA Pill #032 - Kubernetes, Data Analytics & 3 Ways to Extract Data Lineage with Airflow
Hi,
How fascinating is it that so much great content was created past week despite the upcoming Christmas holidays and probably many vacations!
Here are the best of them:
ARTICLES
3 Ways to Extract Data Lineage with Airflow | 7 min | Airflow | Howard Yoo | Astronomer Blog
Howard explains how to use Airflow’s operators, custom extractors and inlet/outlet arguments to send lineage to your data observability tool. You will explore three ways that Airflow can emit data lineage information to a data observability backend:?
1) using the already supported operators,
2) developing your own custom extractor
3) using inlet and outlet arguments in an operator.
Drug Discovery with Deep Learning | 7 min | Deep Learning | Kian Kenyon-Dean, Jake Schmidt, John Urbanik, Ayla Khan, Jess Leung, Berton Earnshaw | MLOps Community Blog?
Instead of building specific models for each disease we want to find treatments for, the experts have built models that generalize across diseases. Read more about the challenges, using techniques that include deep learning, transfer learning and domain adaptation to design target-agnostic models.
Databricks vs Snowflake – December 2022 Take | 7 min | Blueprint Blog?
Snowflake and Databricks are both good data platforms for BI and analysis purposes. Selecting the best platform for your business depends on your data strategy, usage patterns, data needs and volumes and workloads. This article shows the differences and pros and cons between the two of them.
Introducing the dbt_project_evaluator: Automatically evaluate your dbt project for alignment with best practices | Data Analytics | 7 min | Grace Goheen | dbt Blog
Read how the dbt team compressed all of their ideas about best practices into a single, actionable tool to automate the process of discovering these misalignments. From now on, analytics engineers can immediately understand exactly where their projects deviated from their best practices and are empowered to improve their projects on their own.
Are Data Silos Distorting Your Product Analytics? | Data Analytics & BI | 5 min |? Abhishek Rai | NetSpring Blog
This text explains to you why modern cloud data warehouses are better than data silos and discusses four areas: inconsistent data, missing data, the data model and governance that are problematic when you are using data silos.
领英推荐
TUTORIALS
?
Migrate Google BigQuery to Amazon Redshift using AWS Schema Conversion tool (SCT) | 15 min | BigQuery | Jagadish Kumar, Anusha Challa, Amit Arora & Cedrick Hoodye | AWS Blog
Migrating a data warehouse can be a challenging, complex and yet rewarding project. AWS SCT reduces the complexity of data warehouse migrations. Following the walkthrough shown in this blog post, you can understand how a data migration task extracts, downloads and then migrates data from BigQuery to Amazon Redshift. This solution performs a one-time migration of database objects and data. Data changes made in BigQuery when the migration is in progress won’t be reflected in Amazon Redshift. When data migration is in progress, put your ETL jobs to BigQuery on hold, or replay the ETLs by pointing to Amazon Redshift after the migration.
PODCAST
What’s Next for Machine Learning in Time Series | 38 min | ML | host: Ben Lorica; guest: Ira Cohen | The Data Exchange
Ira Cohen is the co-founder and Chief Data Scientist at Anodot1, a startup that uses time series tools to monitor business data in real time. In this episode, he discusses the many existing challenges faced by teams who deal with time series.
?
CONFS, EVENTS AND MEETUPS
Big Data Technology Warsaw Summit | 29-30 March | Early Birds Price Registration | Online & Onsite Conference?
3 keynote speakers at Big Data Technology Warsaw Summit 2023!
Machine Learning by Doing | HacDC | 16 December | Online?
During this meeting, the host and participants will choose a small ML problem or exercise and write code to solve it. This meetup is targeted towards people who have some coding and maths experience but doesn't require much ML expertise.
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
regards,
Adam from GetInData | Part of Xebia