An Introduction to Statistical Learning, New Python Resources, Fine Tuning LLMs for Rag

An Introduction to Statistical Learning, New Python Resources, Fine Tuning LLMs for Rag

I hope everyone had a great holiday break!

This week's agenda:

  • Open Source of the Week - the ebook2audiobook and Latexify projects
  • New learning resources - a guide for setting up a new machine, 10 Python concepts, introduction to tables in R with reactable, Airflow for Beginners, Ollama + Postgres, Fine-Tuning LLMs for RAG, Introduction to Latexify
  • Book of the week - An Introduction to Statistical Learning

Please share this if you find it useful!


In case you missed it, the New Year edition with a curated list of data science free courses:

For daily updates, please subscribe to my Data Science Channel on Telegram or WhatsApp.


Open Source of the Week

This week's highlight is on the ebook2audiobook and Latexify projects

ebook2audiobook

The ebook2audiobook is a new open-source project that enables the conversion of ebooks into audiobooks.

According to the project documentation, it supports 1124 languages!

Project key features:

  • Web GUI interface
  • Docker image
  • Works with CPUs and GPUs

License: Apache 2.0 ??

Latexify

The Latexify is a Python library from Google that transfers mathematical functions in Python into a LaTex expression.

This is super practical if you create academic papers, tutorials, books, etc.

License: Apache 2.0 ??

Examples of converting Python functions into a LaTex output; Image credit: Project documentation

The library has a notebook with a variety of examples. Also, see a video tutorial in the new learning resources section below.


New Learning Resources

Here are some new learning resources that I came across this week.

Awesome-DS-Settings

A few years ago, I documented the process of setting up a new machine with core data science tools. Over time, it became a tutorial for setting up a new machine with core data science tools. During the holiday break, I found the time to refresh and update this tutorial, and it covers the following topics:

  • Setting up git and ssh to GitHub
  • Installing and setting up CLI tools
  • Installing Docker
  • Setting up Postgres
  • Setting up VScode
  • Installing tools for Python
  • Installing R and Positron
  • General tools

10 Python Concepts

This short tutorial from Tech by Tim about core Python ?? concepts such as f-string, args, kwargs, etc.

Interactive Tables With R & Reactable

The reactable is one of my favorite R libraries for making interactive tables. The following tutorial by Dr. Albert Rapp provides a deep dive into the library functionality.

Airflow for Beginners

A short tutorial by Sunjana Ramana for building an ETL data pipeline using Apache Airflow:

Ollama + Postgres

A short tutorial for using Ollama embeddings with PostgreSQL

Fine-Tuning LLMs for RAG

The following short tutorial by Louis-Fran?ois Bouchard provides an introduction to fine-tuning LLMs for RAG applications:

Introduction to Latexify

The following short tutorial by NeuralNine provides an introduction to the latexify Python library. This library by Google enables converting Python functions into math LaTeX math formulas.

Book of the Week

This week's focus is on one of my favorite books - An Introduction to Statistical Learning (ISLR), by Profs James, Witten, Hastie, and Tibshirani. The ISLR, in my opinion, is one of the best introductory books for data science and machine learning. The book focuses on the foundations of data science, and it covers the following topics:

  • Regression and classification
  • Linear model selection and regularization
  • Non-linear regression
  • Tree-based methods
  • Support vector machines
  • Deep learning
  • Unsupervised learning

The ISLR R and Python versions; Image credit: Book website

The book has two versions, one with R examples and a second with Python with the additional author - Prof. Taylor.

Both the R and Python versions are open and available for free online:

Printed versions are available for purchase on Amazon (R and Python).


Have any questions? Please comment below!

See you next Tuesday!

Thanks,

Rami



Patrick Georges

Associate Prof, University of Ottawa

2 个月

Rami Krispin awesome choice for the book of the week and awesome newsletter-a true public good- thank you so much for your efforts in assembling together all this info!

Dr. Albert Rapp

Building data solutions at work, sharing R knowledge after hours.

2 个月

Thank you for sharing my reactable tutorial, Rami Krispin ??

Muhammad Ahmad

AI | Machine Learning | Computer Vision Research | NLP | Language Modelling | GANs

2 个月

Rami Krispin ?? I'm particularly excited about the ebook2audiobook project - its support for 1000+ languages is impressive The inclusion of "Introduction to Statistical Learning" is crucial - it's been my go-to reference for teaching ML fundamentals to my teams. Quick suggestion: For those exploring Ollama + Postgres integration, I'd recommend adding vector indexing for better RAG performance - it's made a significant difference in our production deployments.

要查看或添加评论,请登录

Rami Krispin的更多文章

社区洞察

其他会员也浏览了