登录查看更多内容

GenAI n00b, Part 1

Dmitry Grinberg

President & COO at YMM

发布日期: 2023年11月20日

This article is Part 1 of multi-part series “GenAI n00b”.

Generative AI is a ‘07 smartphone revolution combined with late ’90 dot com boom. It is a real page turner for AI and technologies at large, with everybody rushing in to do something with it, around and about it. What happens later is unclear, but it is truly an exciting new chapter.

Few weeks ago, I’ve decided to roll up my sleeves and really learn in a more practical ways what we are (industry-wide) going to be working with for the next few years. I’ve spent combined about 20-30 hours, between some nights and weekends – my recommendation – if you are technologist, take your time and learn it too. You might need more time (I know linux, python, and have decent ds/ml knowledge), but it’ll be well worth it.

领英推荐

?? Happy Thanksgiving From OpenCV: Free Trial of…

OpenCV 2 年前

Web ML Monthly #12: Google IO, client side LLMs /…

Jason Mayes 1 年前

Ghost in the Machine

Gail Weiner 2 个月前

This Part 1 is a combination of brief summary and my takeaways of what I’ve done in last few weeks. Later parts will expand on actual details. ?

In summary, everything I did in the last few weeks was in Jupiter Notebooks. Both locally and when I moved to hosted Notebooks. I started on VirtualBox (CPU only) which was a waste of time, then used my son’s old gaming laptop that I rebuilt with Ubuntu 22 LTS. When this setup outlived itself, I moved to hosted setup. I had few use cases in mind, which I implemented. I wanted to make sure to learn and use available open source LLMs and available frameworks; which I did (a lot more to explore, obviously). What I figured out is:
Hugging Face is your friend! Basically, it’s a github of all things “ds/ml models” (and more). Where they truly shine though, imho, is their sdk – takes the usability to entirely new level. I’ve used it for everything I’ve done in the last few weeks, including non-LLM model use. If it didn’t exist (or something similar), I’d probably spend weeks just in getting models from all over. My recommendation – spend some time on their site, read about things there – google what you don’t understand, learn more.
Don’t plan on doing anything local (on your pc), without decent GPU. Just CPU will work, if you have enough memory, but you’ll regret it very quickly – everything is just going to be unusably slow.
You’re going to figure out how things work with your GTX 1050 Ti’s 4GB memory, but doing something more real will require something beefier – I found that Google’s Colab is perfect, with multiple GPU options (Nvidia A100 is a beast!), CPU, Memory, and Disk options – and a very reasonably priced for this kind tinkering. My recommendation - use local setup to learn basics and how certain things work and what things really mean and do, before moving to hosted environment - you don't need to spend a lot of $ to learn.
Be careful unleashing LangChain on OpeanAI - it can silently make hundreds of calls to OpenAI in one shot, if you're not careful (depending on what your setup does) - one run cost me $3 (at $0.01 per 1k token, that's a lot of tokens). Not a problem with your local/hosted setup, but with Colab hosted setup my recommendation is not to use A100 much - it's too expensive. V100 is ideal, even T4 is OK for some things.
Most of the models are huge! Think – 4GB+ for anything usable. This matters because you both need to download them, and they need to fit into GPU’s memory (sort-of). My suggestion, if there is GGUF-type model, use it – I’ll go into it in later parts, but in short, its greatly reduced size, which in-tern, requires less memory and works just as well (I’ve used Q4KM type GGUF). I've used Hugging Face GGUF models created by TheBloke (says funded by Andreessen Horowitz) - models on his Hugging Face pages have a lot of explanations.
I had separate terminals monitoring GPU, CPU, Memory, etc, which ended up being a very good idea, because things crash – mostly when GPU runs out of memory – so knowing what’s going on at all times helps not to waste time debugging all over the place. Colab shows this too, which helps just as well (you run out of 48GB GBU memory too).
LangChain is OK for OpenAI, but somewhat frustrating for everything else (be careful with it making a ton of calls, though). It also has minimum amount of instrumentation and short-sighted overall design for complex chains. My current thinking – don’t use it, unless you are just learning – you will likely not want to make this part of you real project. But, and it’s a BIG but – LangChain is where I found most of the information on LLM and technology that I need to understand around it - I red their documentation, which gave me information that I researched farther, and their code has more specific details on “how” – I would have likely needed 4x more time to get through everything, if they didn’t exist – I highly recommend at least reading through their documentation – they’re basically the current universe of related technology.
Tried LangFlow – it’s very interesting, might be useful if you stick with LangChain. Has a small issue – it lags behind LangChain’s functionality, some of it is very important new functionality – but I suppose it’ll keep catching up. I ended up not using it even while playing around with LangChain, as some of the things I was doing with Llama, I had issues with via LangFlow. LangFlow also gave me the idea of the product that needs to exist, which I might eventually build somewhere too.
ReAct (Reasoning and Acting) paradigm opened up my eyes on how this technology can be made a lot more useful. Google it, read up on it. I think this is going to be evolution of LLM’s use – imho the path to GAI.
I ended up using Llama 2 13B GGUF 4QKM most of the time, after I compared many (time suck!) other types of LLM and sizes. Anything 13B seems to yield good results. 7B is always so-so - I am sure it can be used in specific cases, but I wouldn’t waste time doing it, honestly. I didn’t try Llama 2 70B, I’m sure it’s phenomenal, but it’s way too big, would be expensive and/or slow – I’ll try later, I am pretty sure I will need it to do ReAct properly – my few primitive ReAct tries did not work very well (compared to ReAct with OpenAI 4, which worked well). ?
One of the use cases that I implemented was in interrogating custom structured data set with llm. The company where I work has implemented this for customer’s hackathon – but I wasn’t convinced in the solution – it felt too limiting. I tried another approach, and I can say with full confidence going forward this is how one should interrogate structured content with llm instead (for structured data specifically; for unstructured, vector database is the way to go) – two words "Pandas DataFrame”.
Another use case that I implemented ended up pleasantly surprising me with quality of models that are not LLMs. This use case was in listening to audio prompts, converting to text, having llm respond, then convert back to audio.

Part 2+ will go into few of the topics above, with code and more explanation.

Dmitry Grinberg的更多文章

GenAI n00b, Part 2

2024年1月3日

GenAI n00b, Part 2

This article is Part 2 of multi-part series “GenAI n00b”. Part 1 is here - https://www.

GenAI n00b, Part 1

Dmitry Grinberg

President & COO at YMM

领英推荐

Dmitry Grinberg的更多文章

社区洞察

其他会员也浏览了

The Copilot Era:My Speech at Semantic Kernel DevDay in Microsoft Reactor

? High-Performance Computing Strategies Enhanced by Data Annotation ???

Torching Through API Dependence: How TorchChat Optimizes LLMs for Local Use

How to Set Up and Run DeepSeek-R1 Locally Using Docker and Docker Compose

Accelerating Trading Analytics with GPU and RAPIDS: A Guide for Financial Data Science

Google Colab: A Powerful Testing Platform for Machine Learning and Time Series Analysis

No One Will Read This Series - Flipping the Switch: The Rise of Binary Code

Creating and Building an AI Dataset for Accelerating GPU Design

Boosting Logistic Regression Performance: Migrating from SciKit-Learn (CPU) to CuML (GPU)

Building a Faster, Leaner Vector Search in Go

领英推荐

Dmitry Grinberg的更多文章

GenAI n00b, Part 2

社区洞察

其他会员也浏览了

The Copilot Era:My Speech at Semantic Kernel DevDay in Microsoft Reactor

? High-Performance Computing Strategies Enhanced by Data Annotation ???

Torching Through API Dependence: How TorchChat Optimizes LLMs for Local Use

How to Set Up and Run DeepSeek-R1 Locally Using Docker and Docker Compose

Accelerating Trading Analytics with GPU and RAPIDS: A Guide for Financial Data Science

Google Colab: A Powerful Testing Platform for Machine Learning and Time Series Analysis

No One Will Read This Series - Flipping the Switch: The Rise of Binary Code

Creating and Building an AI Dataset for Accelerating GPU Design

Boosting Logistic Regression Performance: Migrating from SciKit-Learn (CPU) to CuML (GPU)

Building a Faster, Leaner Vector Search in Go