DATA Pill #068 - Amazon S3, Athena & AWS Glue ??Iceberg, ClickHouse ?? DuckDB = OLAP2

DATA Pill #068 - Amazon S3, Athena & AWS Glue ??Iceberg, ClickHouse ?? DuckDB = OLAP2

Hi,

This week, the internet again flooded us with a lot of tutorials and a ton of hot news on top.

Llama, AWS, announcements made by Google and much more in this week’s DATA Pill.

Are you ready?


ARTICLES

Zero Configuration Service Mesh with On-Demand Cluster Discovery | 9 min | Cloud | David Vroom, James Mulcahy, Ling Yuan, Rob Gulewich | Netflix TechBlog

How Netflix worked with Kinvolk and the Envoy community on on-demand cluster discovery - a feature that streamlines service mesh adoption in complex microservice environments.


Less data, less problems: Airbyte’s column selection is finally here | 14 min | dbt | Jakub Szafran | GetInData | Part of Xebia Blog

Airbyte 0.50 introduces platform changes, including checkpointing, automatic schema propagation and highly anticipated column selection. To address community demand, the GetInData team conducted tests on this feature, exploring issues such as? column extraction and CDC incremental ingestion handling. Find detailed insights in this blog post.



TUTORIALS

ClickHouse ?? DuckDB = OLAP2 | 4 min | BigData | Lorenzo Mangani | qryn dev

Explore the seamless integration of ClickHouse and DuckDB in the OLAP ecosystem through the innovative tool Quackpipe. This tutorial demonstrates how Quackpipe enables effortless data exchange between these two platforms, offering both installation guidance and exciting use cases, highlighting the collaborative power of ClickHouse and DuckDB for data analytics and manipulation.


AWS users: Amazon S3, Athena & AWS Glue ?? Iceberg | 15 min | Data Engineering | Anna Geller | AWS in Plain English?

This tutorial will walk you through the process of initiating Apache Iceberg on AWS. After reading, you will have the proficiency to generate Iceberg tables, manipulate data stored in S3 in Parquet format, execute SQL queries on data and table details, and efficiently oversee data ingestion.


In MORE LINKS you will find using MLflow AI Gateway and Llama 2 to Build Generative AI Apps and high-performance computing on AWS

{ MORE LINKS }



NEWS

OpenTF Announces Fork of Terraform | 5 min | Cloud | OpenTF Blog

HashiCorp changed the license for their core products, including Terraform, to BSL. In response, the community crafted the OpenTF manifesto, garnering support from 100+ companies, 10 projects and 400 individuals to create OpenTF.


Introducing Code Llama, a state-of-the-art large language model for coding | 6 min | LLM | Meta AI Engineering

Let’s explore the capabilities and implications of Code Llama, a Large Language Model designed to revolutionize coding practices. Code Llama is an LLM capable of generating code, and natural language about code, from both code and natural language prompts. In benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks. Let's find out more.



In MORE LINKS you will find supercharging Vertex AI with Colab Enterprise and MLOps for generative AI

{ MORE LINKS }



DATA TUBE

Achieving success with automation in enterprise architecture in big size digital transformation | 58 min | Data Architecture | Garima Singh | Iasa Official

The talk will focus on Garima’s experience and journey in executing company-wide digital transformation, in decentralized and globally distributed big size enterprises, with the help of automated versions of enterprise architecture.



PODCAST

How Azure Embraces Terraform For Infrastructure As Code | 46 min | Cloud | Hosts: Ned Bellavance, Ethan Banks; Guests: Mark Gray, Steven Ma | Day Two Cloud Podcast

Delve into the world of Infrastructure as Code (IaC) with Microsoft's Mark Gray and Steven Ma. Discover how Microsoft is embracing Terraform to enhance its Azure offerings, including the Terraform Export Tool, the AzAPI Provider and the thriving Terraform in the Azure community. Explore the collaboration between Microsoft and HashiCorp, learn about the tool's capabilities and gain insights into the future of Terraform on Azure.

In MORE LINKS you will listen to episode about navigating event streaming

{ MORE LINKS }



CONFS EVENTS AND MEETUPS

Build a Modern Data Stack with dbt and Databricks | Online | 26th September 2023

In this live hands-on workshop, you’ll follow a step-by-step guide to achieving production-grade data transformation using dbt Cloud with Databricks. You’ll build a scalable transformation pipeline for analytics, BI and ML – entirely from scratch.

?

You’ll learn how to:

  • Quickly connect dbt Cloud and Databricks in Databricks Partner Connect
  • Model data with dbt Cloud using data in Delta Lake, following software engineering best practices like version control, testing and documentation
  • Build highly scalable and reliable data transformation pipelines for analytics, BI and ML

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?


Adam from the GetInData | Part of Xebia Xebia

Lorenzo Mangani

Entrepreneur | CEO @qxip @gigapipe | Telecom Observability

1 年

Thanks for the mention Adam Kawa! I'm glad you found our research on ClickHouse ?? DuckDB OLAP2 interesting!

Garima Singh

Global VP & Chief Architect IKEA INGKA GROUP | ISO standards co-author | "Visionary of the year" Sweden national awardee | Nordic data professional of year 2023 awardee |10+ years in automotive |International Speaker

1 年

Thank you Adam Kawa for featuring my article on enterprise architecture in your blogs :) i am truly honoured ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了