DATA Pill #002 - 5 Machine Learning problems and 5 trends in data/AI

DATA Pill #002 - 5 Machine Learning problems and 5 trends in data/AI

Hi everyone ??

?

time for the second dose of DATA Pill.

Be prepared for a large portion of valuable content ;)

Without further ado, let's begin:

?

ARTICLES?

1. MLOps: 5 Machine Learning problems resulting in ineffective use of data | 10 min read | ML & MLOps | ?? Jakub Jurczak | GetInData?

5 Machine Learning areas are at risk of inefficiency that could be siled by MLOps

  • Data silos trap - mismatching IDs between warehouses can make joining data between different sources difficult and sometimes even impossible.
  • Time goes by. So does data. - You can never know if data that is being processed is new or stale data, so there is a need for some TTL (time to live) information that says how long old data is good.
  • Skewing data - If the value of a feature changes significantly over time, then the model performance could suffer.

?

2. Scaling data access by moving an exabyte of data to Google Cloud | 7 min | GPC, BigQuery | Wini Tran, Di Zhao | Twitter Blog

Technical dive into how Twitter approached migration to BigQuery, conclusions and results:

  • Decrease the development time required for new dataset ingestion down from two weeks to one hour.
  • Reduce the maintenance required for data engineers by leveraging managed services, including Airflow.

?

3. Orchestrate big data jobs on on-premises clusters | 5 min | AWS |AWS Blog

Step Functions enables thousands of workflows to run parallel. Additionally, Lambda provides flexibility implementing arbitrary interfaces to the on-premises infrastructure and its compute resources. With additional steps in the orchestration, the solution also allows operations to monitor thousands of parallel jobs in a visual interface for better debugging.

?

4. My Journey to Analytics Engineering: How I Got Started and You Can, Too | 10 min | dbt | Emily Hawkins - Data Engineering Manager

Drizly data stock. And the prejudice that Analytics Engineering is empowering, fun and lucrative career.

No alt text provided for this image

MORE LINKS


____________________

NEWS?

1. Announcing General Availability of Databricks Feature Store | 6 min | ML & MLOps | Databricks Blog

The first feature store co-designed with data and MLOps platform is generally available (GA).

?

2. Google Cloud launches AlloyDB, a new fully managed PostgreSQL database service | 5 min | Techcrunch Blog

Google announced the launch of AlloyDB, a new fully managed PostgreSQL-compatible database service that the company claims to be twice as fast for transactional workloads as AWS’s comparable Aurora PostgreSQL (and four times faster than standard PostgreSQL for the same workloads and up to 100 times faster for analytical queries).

MORE LINKS


____________________

PODCAST

1. 5 current trends in the data and AI landscape (H12022) | 22 min | Radio DaTa

  • Retail becomes a very hot sector for AI/ML (plus new data sources, Metaverse, MLOps, Responsible AI)
  • Modern Data Platforms (plus SQL, hiring, open-source, data engineering pipelines)
  • Public Cloud (plus cloud-native, platform unification, data residency)
  • Data quality and data auditing
  • Data access (data cataloging, data discovery, and data mesh).

All explained and with ideas on how to follow such trends.

?

2. Machine Learning for Optimization | 26 min | The Data Exchange

How machine learning can be used to learn constraints in optimization problems. Use cases and trends in the use of machine learning for optimization problems.

MORE LINKS


____________________

DATAtube??

The future of Cloud databases ?| 28 min | Google Cloud Tech?

75 % of all databases are expected to be in the cloud this year. How AlloyDB is going to meet this trend?

MORE LINKS


__________

Hope You find it valuable. If so, I encourage You to subscribe to this newsletter.

See You tomorrow!


Adam Kawa from GetInData

要查看或添加评论,请登录

社区洞察

其他会员也浏览了