The Elmer Project, New Shiny Release for Python, Mastering NLP from Foundations to LLMs

The Elmer Project, New Shiny Release for Python, Mastering NLP from Foundations to LLMs

This week's agenda:

  • Open Source of the Week - The Elmer project, Shiny new release for Python, and the Narwhals library
  • Learning resources - The GitHub Universe and PyData Amsterdam conferences
  • Book of the week - Mastering NLP from Foundations to LLMs by Lior Gazit and Meysam Ghaffari, Ph.D.

I am also on ???? Blue Sky ??


Open Source of the Week

Here are a couple of interesting projects I came across this week.

The Elmer Project

The Elmer is a new R library that provides a user friendly wrapper over core LLM frameworks. It supports LLM frameworks such as OpenAI ChatGPT, Anthropic Claude, Snowflake Cortex, Google Gemini, etc. This project is part of the Tidyverse framework and it currently at an early stage. It supports core LLM's functionality such as streaming and async APIs and text summariztion.

This project has one of the cutest hexagons! Image credit: The Elmer project


This project is a collaboration between Hadley Wickham and Joe Cheng . You can find more details about the project on Hadley's post and in the project documentation:

Shiny 1.2 for Python

Another announcement this week from the Posit PBC was the release of Shiny version 1.2 for Python. The main feature of this release is the integration of the Python narwhals library, which provides a unifying layer for working with different data frame objects (more details below). The following short video provides a more detailed explanation.


More details are available in the library release notes.

The Narwhals Library

Thanks to the above Shiny release, I learned about the narwhals Python library. This library provides a lightweight supporting layer between different Python dataframes libraries such as Pandas, Polars, Arrow, Dast, etc.

The Narwhals Library; Image credit: the project repository

You can find more details in the project documentation:


New Learning Resources

Here are some new learning resources that I came across this week.

GitHub Universe 2024

For those who missed GitHub Universe 2024, the annual GitHub developer conference, all the talks are now available online. This year, the conference was heavily focused on LLM and AI applications. Thanks to the GitHub team for making the talks available online.

PyData Amsterdam 2024

All the talks from the recent PyData Amsterdam 2024 conference are now available to watch online. This includes great talks about machine learning, data engineering, data visualization, AI, etc. Thanks to the PyData Amsterdam team for making the talks available online.

One talks I watched so far and I highly recommend if you are in the domain of time series forecasting and Bayesian stats is Dr. Juan Camilo Orduz talk - Time Series forecasting with NumPyro:


Time Series Forecasting with NumPyro; Image credit: PyData Amsterdam 2024

You can learn more about the NumPyro Python library in?edition 8?of this newsletter.


Book of the Week

This week's spotlight is the Mastering NLP from Foundations to LLMs by Lior Gazit and Meysam Ghaffari, Ph.D. . The book focuses, as the name implies, on the foundation of NLP and LLM modeling, and it covers the following topics:

  • Mathematical foundations of machine learning and NLP
  • Data preprocessing techniques for text data
  • Machine learning applications for NLP and text classification
  • Deep learning methods for NLP and text applications
  • Theory and design of Large Language Models
  • Applications of LLM models
  • LLM applications with Langchain


The book is for people who are interested in starting with NLP and those who wish to explore LLM applications. The book is available to purchase on the publisher's website and Amazon:


Have any questions? Please comment below!

See you next Tuesday!

Thanks,

Rami

??Join my Data Science Channel for daily updates??







Lior Gazit

Machine Learning Group Manager

4 个月

Great overview Rami!

回复
Muhammad Ishtiaq Khan

Driving Advanced Analytics & Digital Transformation in Audit & Assurance | Expertise in Continuous Auditing, Fraud Analytics & Automation | xPTCL & Ufone (e& UAE) | Data Science - Agentic AI - Machine Learning - GenAI

4 个月

Exciting updates in this edition! I'm particularly interested in the Elmer project open source tools are vital for community collaboration.

Patrick Georges

Associate Prof, University of Ottawa

4 个月

Rami Krispin - your newsletter is a true public good. Thanks for the time you put to assemble all this information. And this week’s recommended book on NLP and LLM foundations is indeed a great read that I bought when it was first released.

要查看或添加评论,请登录

Rami Krispin的更多文章

社区洞察

其他会员也浏览了