DATA Pill #032 - Kubernetes, Data Analytics & 3 Ways to Extract Data Lineage with Airflow

DATA Pill #032 - Kubernetes, Data Analytics & 3 Ways to Extract Data Lineage with Airflow

Hi,

How fascinating is it that so much great content was created past week despite the upcoming Christmas holidays and probably many vacations!

Here are the best of them:


ARTICLES

3 Ways to Extract Data Lineage with Airflow | 7 min | Airflow | Howard Yoo | Astronomer Blog

Howard explains how to use Airflow’s operators, custom extractors and inlet/outlet arguments to send lineage to your data observability tool. You will explore three ways that Airflow can emit data lineage information to a data observability backend:?

1) using the already supported operators,

2) developing your own custom extractor

3) using inlet and outlet arguments in an operator.


Drug Discovery with Deep Learning | 7 min | Deep Learning | Kian Kenyon-Dean, Jake Schmidt, John Urbanik, Ayla Khan, Jess Leung, Berton Earnshaw | MLOps Community Blog?

Instead of building specific models for each disease we want to find treatments for, the experts have built models that generalize across diseases. Read more about the challenges, using techniques that include deep learning, transfer learning and domain adaptation to design target-agnostic models.


Databricks vs Snowflake – December 2022 Take | 7 min | Blueprint Blog?

Snowflake and Databricks are both good data platforms for BI and analysis purposes. Selecting the best platform for your business depends on your data strategy, usage patterns, data needs and volumes and workloads. This article shows the differences and pros and cons between the two of them.


Introducing the dbt_project_evaluator: Automatically evaluate your dbt project for alignment with best practices | Data Analytics | 7 min | Grace Goheen | dbt Blog

Read how the dbt team compressed all of their ideas about best practices into a single, actionable tool to automate the process of discovering these misalignments. From now on, analytics engineers can immediately understand exactly where their projects deviated from their best practices and are empowered to improve their projects on their own.


Are Data Silos Distorting Your Product Analytics? | Data Analytics & BI | 5 min |? Abhishek Rai | NetSpring Blog

This text explains to you why modern cloud data warehouses are better than data silos and discusses four areas: inconsistent data, missing data, the data model and governance that are problematic when you are using data silos.

{ MORE LINKS }



TUTORIALS

?

Migrate Google BigQuery to Amazon Redshift using AWS Schema Conversion tool (SCT) | 15 min | BigQuery | Jagadish Kumar, Anusha Challa, Amit Arora & Cedrick Hoodye | AWS Blog

Migrating a data warehouse can be a challenging, complex and yet rewarding project. AWS SCT reduces the complexity of data warehouse migrations. Following the walkthrough shown in this blog post, you can understand how a data migration task extracts, downloads and then migrates data from BigQuery to Amazon Redshift. This solution performs a one-time migration of database objects and data. Data changes made in BigQuery when the migration is in progress won’t be reflected in Amazon Redshift. When data migration is in progress, put your ETL jobs to BigQuery on hold, or replay the ETLs by pointing to Amazon Redshift after the migration.

No alt text provided for this image

{ MORE LINKS }



PODCAST

What’s Next for Machine Learning in Time Series | 38 min | ML | host: Ben Lorica; guest: Ira Cohen | The Data Exchange

Ira Cohen is the co-founder and Chief Data Scientist at Anodot1, a startup that uses time series tools to monitor business data in real time. In this episode, he discusses the many existing challenges faced by teams who deal with time series.

?

CONFS, EVENTS AND MEETUPS

Big Data Technology Warsaw Summit | 29-30 March | Early Birds Price Registration | Online & Onsite Conference?

3 keynote speakers at Big Data Technology Warsaw Summit 2023!

  • Maciej Dzi?cielak / Veolia - The chief IT Architect who will bridge the gap between business and technology. Good at active listening and experienced with gathering business requirements which translates into IT solutions, enjoys and thrives in challenging and demanding environments.
  • Lukasz Hunka / PKO Bank Polski - Director of SME Risk & Analytics PKO Bank Polski. Currently navigating the migration of the risk analytical environment to Google Cloud and setting up MLOps culture among data scientists.?
  • Nikhil Simha / Airbnb - Nikhil leads the Machine Learning infrastructure team at Airbnb. He is currently working on Chronon, an end-to-end feature engineering platform.?


Machine Learning by Doing | HacDC | 16 December | Online?

During this meeting, the host and participants will choose a small ML problem or exercise and write code to solve it. This meetup is targeted towards people who have some coding and maths experience but doesn't require much ML expertise.



________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub


regards,

Adam from GetInData | Part of Xebia

要查看或添加评论,请登录

社区洞察

其他会员也浏览了