An Introduction to Statistical Learning, New Python Resources, Fine Tuning LLMs for Rag
Rami Krispin
Senior Manager - Data Science and Engineering at Apple | Docker Captain | LinkedIn Learning Instructor
I hope everyone had a great holiday break!
This week's agenda:
Please share this if you find it useful!
In case you missed it, the New Year edition with a curated list of data science free courses:
Open Source of the Week
This week's highlight is on the ebook2audiobook and Latexify projects
ebook2audiobook
The ebook2audiobook is a new open-source project that enables the conversion of ebooks into audiobooks.
According to the project documentation, it supports 1124 languages!
Project key features:
License: Apache 2.0 ??
Latexify
The Latexify is a Python library from Google that transfers mathematical functions in Python into a LaTex expression.
This is super practical if you create academic papers, tutorials, books, etc.
License: Apache 2.0 ??
The library has a notebook with a variety of examples. Also, see a video tutorial in the new learning resources section below.
New Learning Resources
Here are some new learning resources that I came across this week.
Awesome-DS-Settings
A few years ago, I documented the process of setting up a new machine with core data science tools. Over time, it became a tutorial for setting up a new machine with core data science tools. During the holiday break, I found the time to refresh and update this tutorial, and it covers the following topics:
10 Python Concepts
This short tutorial from Tech by Tim about core Python ?? concepts such as f-string, args, kwargs, etc.
Interactive Tables With R & Reactable
The reactable is one of my favorite R libraries for making interactive tables. The following tutorial by Dr. Albert Rapp provides a deep dive into the library functionality.
Airflow for Beginners
A short tutorial by Sunjana Ramana for building an ETL data pipeline using Apache Airflow:
Ollama + Postgres
A short tutorial for using Ollama embeddings with PostgreSQL
Fine-Tuning LLMs for RAG
The following short tutorial by Louis-Fran?ois Bouchard provides an introduction to fine-tuning LLMs for RAG applications:
Introduction to Latexify
The following short tutorial by NeuralNine provides an introduction to the latexify Python library. This library by Google enables converting Python functions into math LaTeX math formulas.
Book of the Week
This week's focus is on one of my favorite books - An Introduction to Statistical Learning (ISLR), by Profs James, Witten, Hastie, and Tibshirani. The ISLR, in my opinion, is one of the best introductory books for data science and machine learning. The book focuses on the foundations of data science, and it covers the following topics:
The book has two versions, one with R examples and a second with Python with the additional author - Prof. Taylor.
Both the R and Python versions are open and available for free online:
Have any questions? Please comment below!
See you next Tuesday!
Thanks,
Rami
Associate Prof, University of Ottawa
2 个月Rami Krispin awesome choice for the book of the week and awesome newsletter-a true public good- thank you so much for your efforts in assembling together all this info!
Building data solutions at work, sharing R knowledge after hours.
2 个月Thank you for sharing my reactable tutorial, Rami Krispin ??
AI | Machine Learning | Computer Vision Research | NLP | Language Modelling | GANs
2 个月Rami Krispin ?? I'm particularly excited about the ebook2audiobook project - its support for 1000+ languages is impressive The inclusion of "Introduction to Statistical Learning" is crucial - it's been my go-to reference for teaching ML fundamentals to my teams. Quick suggestion: For those exploring Ollama + Postgres integration, I'd recommend adding vector indexing for better RAG performance - it's made a significant difference in our production deployments.