HockeyStick #3 - Generative AI with Mark Liu
Miko Pawlikowski ???
I help technical leaders achieve HockeyStick growth | Head SRE | Co-founder SREday.com, Conf42.com & 5 more
In episode 3, the host Mikolaj Pawlikowski interviews Mark Liu , a finance professor at University of Kentucky and seasoned coder, and the author of "Learn Generative AI with PyTorch" (Manning Publications Co. ).
We talk about his book, about learning to build Generative AI technology from scratch, about Mark Liu 's own journey into AI from the world of finance, and where the technology might lead us.
Follow HockeyStick Show to never miss an episode!
Podcast
Follow HockeyStick Show and find the episodes below:
Video
Audio
Summary
Here's what Generative AI has to say about the episode:
The HockeyStick podcast explores the journey of Mark Liu, delving into his transition from finance to AI and the significance of Python in this shift. Liu discusses the advancements in generative AI, focusing on GANs, transformers, and the societal impacts of AI deployment. Additionally, the podcast addresses the evolution of OpenAI from nonprofit to for-profit, analyzing Elon Musk's lawsuit and ethical concerns surrounding AI commercialization. It also highlights PyTorch's advantages in training speed and flexibility, particularly in generating anime faces, and emphasizes the importance of appropriate loss functions in machine learning models.
Transcript
Miko Pawlikowski: [00:00:00] I'm Miko Pawlikowski and this is HockeyStick. Generative AI is on everyone's mind. From essays to photorealistic pictures to high quality videos it has changed the way we think about creativity and intelligence forever. If the AI won't steal your job, but somebody using AI will, then the best defense is to learn how this technology works ASAP.
Miko Pawlikowski: Today, I'm bringing you Mark Liu, the author of Learn Generative AI with PyTorch, a tenured finance professor and the founding director of the Master of Science in Finance program at the University of Kentucky and a veteran coder with over 20 years of experience. In this conversation, we'll talk about learning through doing, how everybody can build generative AI models, the various breakthroughs that allowed for the current AI explosion to take place, and make some wild predictions about the future.
Miko Pawlikowski: Welcome to this episode and please enjoy.
Miko Pawlikowski: How are you doing today?
Mark Liu: Pretty good. Thank you [00:01:00] Miko. glad to be here.
Miko Pawlikowski: Yeah, I'm very excited. not only because I'm hoping to learn so many interesting things from your book, but also because I'm very curious, how does somebody who's a founding director of a master of science in finance and a tenured professor in finance, decide to go into AI. Tell us a little bit about your story.
Mark Liu: it goes back to, like five years ago,in 2017, our department wanted to launch a Master of Science in Finance program. And it is that point, I've been tenured for about five years.I was always, very adventurous, trying to do new things.
Mark Liu: I was appointed the founding directorto start an academic graduate program from scratch. And, I was very much into it. it was a lot of work. But I thoroughly enjoyed it. So our program launched in fall of 2017. [00:02:00] And it's a one year program.at the end of 2017, We started to, place our students.
Mark Liu: the very first year we had 30 students in the program, which is a great number. And, I talked to many employers, many companies, trying to place our MS Finance students. I heard the same thing again and again. they told me that they want somebody who not only knows finance, but also knows coding programming analytics and the number one programming language in finance is Python and I've been doing programming for many years?
Mark Liu: So those are mainly, statistical, software to run regression for the finance research. And then I had to learn [00:03:00] Python from scratch in order to teach my students. And it turns out that Python is a very user-friendly programming language, so even if you never programmed before, you can guess what a block of code is trying to accomplish.
Mark Liu: I started to run Python workshops to MS finance students and gradually I accumulated a lot of teaching notes and I also had to convince my students to use Python, because some of the students said that,"I can do everything in Excel, why should I learn Python", right? And then I told them that, Excel is not exactly a programming language, and you do need a programming language in order to [00:04:00] automate things to make sense,more convenient, the bigger programs, that kind of stuff.
Mark Liu: So what I did was I started to create fun projects in finance, like speech recognition and text to speech. So one example would be I add those features to a finance calculator. what you can do is that you can actually speak to a computer, and ask the computer to do a finance calculation.
Mark Liu: you can tell the program in a human voice "what is the present value of $1000 in five years", And then the program will do the calculation and tell you the answer in a human voice. and then that caught a student's attention. So I started to do those kind of applications. And then after a year or so, I had plenty of projects. And then [00:05:00] some students told me "you should write a book about it". So I started to, send the manuscript to no starch, press to publish the book.?
Mark Liu: The moment my colleagues, or my students, or a lot of my friends, even my family members, heard that I'm writing a programming book,In Python about the speech recognition and the text to speech, their first reaction was, "I thought you were a finance professor"?
Mark Liu: that question came up again and again. And then I gave them a famous quote by a chief risk officer from Deutsche Bank. "banks are essentially technology firms now". So there is a lot of truth in that because in order to be in the field of finance, you need to know a lot of technology, know programming, know,analytics and so forth.[00:06:00]?
Mark Liu: So that was my first book. in 2020, it's finally published in 2021. So I think I, signed a contract with them in 2019. And then after that, I. Started to, teach a course in the MS finance program. So it's called, Python, predictive analytics. so use Python to do machinery models. for business analytics, and, I started to, teach students a lot of machine learning models, including, deep neural networks. And then, again, I, accumulated a lot of,notes. And then,
Mark Liu: I came across a video from DeepMind, showing how you can actually play Atari games like,Breakout, by training a computer program to play the [00:07:00] game, at a superhuman level.
Mark Liu: So what happened was, not only the computer program learned,, To play the game, it actually figured out a way to score very efficiently, a way human beings didn't know before. So you, dig a tunnel at the side of the wall, and then you send the ball to the back of the wall to score it very efficiently.
Mark Liu: When I saw that video, I was completely amazed. I told myself, "I gotta figure out how this works". I spent several monthsexperimented with different kind of programs, trying to figure out how it works. And eventually I figured it out. And that became my second book. it's machine learning animated.
Mark Liu: So it's published with CRC Press, last year. [00:08:00] And then, recently, once, ChatGPT was out, generative AI was very popular. I was very curious. I was trying to figure out how exactly a large language model works, and how a computer program can understand the human language.
Mark Liu: I spend a lot of time trying to figure it out. Before I was actually using TensorFlow. It worked pretty well for me with
Mark Liu: Atari games and so on and so forth.
Mark Liu: apparently it's not great in terms of GPU training. You can do GPU training, but there is an overhead. So you have to program everything in CPU and then send it to the GPU. Do the calculation and then send it back. the overhead is just too much. So it ended up, not very [00:09:00] fast.?
Mark Liu: then I learned another AI framework called PyTorch. you can explicitly send a tensor to GPU to do the calculation and so on and so forth. It's a little more complicated than TensorFlow because you do have to send something to GPU and then, get it back. So in terms of coding, you have to do a slightly more work, but in terms of performance, it's amazing. So I get to, train models. 7 to 10 times faster, compared to CPU training. as all those large language models, they have billions or hundreds of billions of parameters, right? So the speed is crucial. RIght now, I'm like, training a model with [00:10:00] millions of parameters. which is fine. So for, even larger kind of language models, in my third book, which is with,manning publications.
Mark Liu: So in this book, I'm doing generative AI with PyTorch. the reason I switched to PyTorch is because of dynamic, computing, graph, and then, the GPU training. I can train most models in a matter of minutes. sometimes I get a larger ones, maybe a couple of hours. That's it.
Mark Liu: I can see the model in action and then I can tune the model so that's the third book. So let me,conclude by quickly summarizing what I'm doing in the third book. the name, I think you just mentioned at the beginning. Learn Generative AI with PyTorch.
Mark Liu: Readers learn to create generative AI models from scratch, to [00:11:00] create the different contents like, images, shapes, numbers, text, music, sound, so forth, all with PyTorch and deep learning models. And in particular,
Mark Liu: readers learn how to create. a ChatGPT-style transformer from scratch, and then in particular,I teach readers how to create a GPT-2 XL with 1.5B parameters Of course, with 1. 5 billion parameters, it's very hard to train, right? It's very slow, number one. Number two, GPT-2 was trained with huge amounts of data, and regular readers don't have access to this training data, right? but, I also teach readers how to extract the pre trained [00:12:00] weights from OpenAI and then you load those weights into the GPT-2 model you created from scratch, and start to generate the text. So the text you generate is very coherent without grammar errors, it's amazing, of course it's not as Powerful as ChatGPT GPT-4, but?
Mark Liu: a normal person without access to super computing facilities, without access to larger amounts of training data cancreate a ChatGPT-style deep neural network from scratch, and use it to generate a text and generate a lifelike music.
Mark Liu: It's amazing. And that's the text part. on the image part, you can create like a color image. [00:13:00] You can also convert a horse to a zebra. You can convert blonde hair to black hair in images. You can add or remove glasses in images and so forth. So the whole experience is amazing. it worked better than anticipated.
Mark Liu: And that's a whole experience. Reminded me of famous quote, "technology advanced enough is indistinguishable from magic". The whole thing is really magic. That's my long answer to your question.
Miko Pawlikowski: Thank you for that. just for anybody who's not familiar with Manning, the book, is currently available in what's called MEEP. That's for Manning Early Access Program, you can read the chapters as, they are produced, by Mark.
Miko Pawlikowski: So at the moment there is five chapters that are available, but I'm being told that 11, will be coming, very soon. And the estimated [00:14:00] time for the whole book to be available is May 2024, so for anybody who's eager and who might be thinking that the book is not finished yet, you can actually start reading it right now. speaking of the magic and the building from scratch, I think what I liked the most about your book, and what initially attracted me to actually go and read it, It's that 'build from scratch' thing. And I love that you used Richard Feynman's philosophy, the quote, "What I cannot create, I do not understand".
Miko Pawlikowski: I think that's a very good motto to live by. it's absolutely great that, you take us on this journey to build things up, even though I've only read the five chapters so far.all of a sudden with ChatGPT, everybody started talking about this and this explosion.
Miko Pawlikowski: what were some other moments, other than chat GPT, where you realized, Oh man, this is going to blow up. This is going to be massive with generative AI. I believe you [00:15:00] mentioned, the writer's guild of America versus AI, story. Can we talk about that for a minute??
Mark Liu: before I answer that question, I encourage you to read my chapter one for free, even if you don't have to buy my book. manning has a great feature. If you go to manning.com and if you look for my book, Learn Generative AI with PyTorch, you can find it. I have a fairly long chapter one summarizing the state of the art in generative AI and also what I've been doing in the book. what Miko talked about, the Writer's Guild of America. So a few months ago, they,negotiated with, big firms. About,The threat of, AI.
Mark Liu: And as a result, it's a,contract to limit, how much AI you can use in writing, in production, in order to protect the jobs of the [00:16:00] writers. And, this is just one example of the,Disruptive power of AI in many different industries.
Mark Liu: writers is just one example, and it threatens many other industries. Another example is checkmate, which is online educational platform. So college students go there to get tutoring service and so forth, and with the ChatGPT actually their business model is threatened, right?
Mark Liu: I think,
Mark Liu: in the month after the release of ChatGPT, their, stock price plunged by almost 40%. So that's how serious the, competition is. Those are just, a couple of examples. the potential of generative AI is huge, but at the same time, if you don't,catch up with the trend, there [00:17:00] is,
Mark Liu: a risk that, your job might be, replaced by ai.
Mark Liu: there is a, an interesting quote. I think there is a lot of truth. It says that,"AI will not take your job.somebody using AI will". So I think there is a lot of truth in that. So in order to avoid being replaced by AI, I think the best strategy is to get in the game. to learn about the general AI, toprotect yourself in terms of, future careers. so that's,
Mark Liu: the big motivation, behind my books. the main motivation, of course, is intellectual curiosity. I'm by nature a very curious person. So when I saw like ChatGPT works like magic, I really want to get it to the bottom of it.
Mark Liu: And they're trying to figure out how it works. So that's the main reason. But at the same [00:18:00] time, I'm trying to teach my students.programming skills, machine learning skills, AI skills, generative AI skills in order to prepare them for the job market. so that, in the future, their skill sets will not be outdated.
Mark Liu: that's my second motivation forwriting the books.
Miko Pawlikowski: Do you buy in this comparison that AI is like personal compuers? And that, a lot of people were worried about how personal computers were going to just remove jobs. But what ended up happening was, some, small portion of jobs was eliminated, but most of the jobs were modified, and became, operating computers.
Miko Pawlikowski: Do you think that's the most apt comparison of what we're likely to experience with AI in the coming years??
Mark Liu: the future, is hard to predict, [00:19:00] but personally, I think, most likely, that's what's going to happen in the near future. if generative AI,you can actually use it to increase your productivity, to have more job opportunities. On the other hand, if you,basically, completely stay away from it,your skill sets might be outdated but at the same time, I think technology will make all this AI stuff more accessible to most people, right? You don't necessarily have to be a programmer, so one example is Midjourney right? you can actually just go to a browser and then you can use Midjourney or DALL-E 2, DALL-E 3, or whatever to create a very fancy images.
Miko Pawlikowski: You can use a text prompt to create a. an image of what you meant, you don't have to draw yourself, in that [00:20:00] sense, I'm optimistic. I think for most people, generative AI will be a very valuable tool to increase their productivity. as long as, you keep up with the technology, I'm glad you mentioned Midjourney because I think for me personally, that waswhere I realized: 'okay, this is the hockeystick moment' because I remember the little tiny pictures, blurry from the GAN paper.and then all of a sudden I saw some pictures that were generated by Midjourney and I went and I, I tried it myself and, it was more or less able to produce almost everything I threw at it, other than some particular types of dinosaurs that just didn't recognize.
Miko Pawlikowski: That was like the one thing I knew, 'okay, they didn't train it on that kind of dinosaur'. But, that was definitely one of those moments where I realized, wow. And the other is, I think,I live in London,one way or another, you end up using the tube a lot, and, usually you're annoyed at people who, play some [00:21:00] music on like public transport.
Miko Pawlikowski: And then, at some point I realized that I was getting annoyed at people talking about generative AI, on the public transport and making noise. And that's when you realize that, 'okay, so this has now gone, mainstream and, and everybody's talking about that'. But let's talk a little bit about, The actual underlying breakthroughs, that brought us to where we are.
Miko Pawlikowski: And, in particular, I'm thinking about GAN, the generative adversarial networks and transformers and diffusion.where should we start? what's the first important breakthrough that everybody should know about?
Mark Liu: I think, all the generative AI models, in my book are deep neural networks. machine learning is a very wide field. there are many traditional machine learning models,random forest,linear regressions, this and that, but about,[00:22:00] 20 years ago, deep neural networks became very powerful.
Mark Liu: one great thing about the neural networks is that you can scale it and,deep neural network can approximate any relationship, even if we human beings don't know what's the exact relationship, as long as you create a large enough model to capture it. so that's the foundation 20 years ago.
Mark Liu: And then over the past, 20 years or so, many people. Breakthroughs in, deep learning field, and then, let's talk about it like a ChatGPT. Okay, so ChatGPT is a huge deep neural network trained on huge amounts of data.
Mark Liu: And before that, state of the art, natural language processing models are recurrent [00:23:00] neural networks.
Mark Liu: So how it works was eitherprogresses on the timeline. Let's say you have a sentence like, this is a sentence, right? So you have like four words in the sentence, right? the model uses the first, word, "this" to predict the second word "is" and then it uses the first two words to predict the third word, and so on and so forth.
Mark Liu: it worked to some degree, but it's very slow because, you have to,predict one word at a time. And then in 2017, there is a huge breakthrough. There's a paper. called "attention is all you need" by a group of, Google scholars, and they used a different mechanism to capture the, relationship of different words in a sentence.
Mark Liu: So it's called the attention mechanism [00:24:00] and It's much more effective on top of that. it's not sequential. So which means one word can't pay attention to all other words at the same time. And this allows for, parallel training. And this has huge implications. number one, it works better in terms of capturing long-term relationships. between different words in a sentence so that you can understand the meaning of a long sentence, long text, number one. Number two, because of the non sequential nature of, Attention mechanism. You can use parallel training. you can train the same model on many different devices. this makes training much faster.
Mark Liu: And this also allows you to train the model on more data. that's why ChatGPT became so powerful, because, you can train them [00:25:00] much faster, and then you can train them on more data. On top of that, the mechanism works much better than recurrent neural networks, because it can capture
Mark Liu: really long term relationships in a sequence, like as a text is a sequence, right?
Mark Liu: that propelled, uh, OpenAI to have all these models,including ChatGPT. now let's go to, the recent development, the text to image transformers. this is a new innovation in transformer models called, multimodal models. The original transformer model,
Mark Liu: "attention is all you need", which powers the chatGPT, they only use text, right?
Mark Liu: So the input is a sequence of text, the output is also a sequence of text, but the multimodal models, the input and output can be, different formats, right??
Mark Liu: 32, 33, the input is a text [00:26:00] and the output is an image, right? you can have a different,inputs, outputs.
Mark Liu: You can have audio, you can have video, Sora has videos, that kind of stuff. but let's talk about what is the underlying mechanism behind multi modal models. DALL-E 2, DALL-E 3,sit has something to do with different models. So I think you mentioned that,at first the generated image is very grainy, right?
Mark Liu: the different models add noise to an image gradually. let's say there are like 1000 time steps. And then at each time step, you can actually add a little bit of noise to the image and gradually you have a 1000 different images and each one becomes progressively noisier and at the end, it becomes completely noisy. And then what you can do is that you can give those images [00:27:00] to a machine learning model and you can train the model to remove those noises, progressively, step by step. that's how,DALL-E and all those text to image models work. first step is that you use a text prompt to generate a very grainy image, and then after that you usea model which is very much like a different models. You will progressively refine those models so that, you turn a very grainy image into a high resolution image. that's why, when you enter a like a shorter prompt and then, DALL-E 2 can give you a higher resolution image.
Mark Liu: capturing, what are you trying to produce in the text prompt. So that's actually chapter 14 of my book. I'm going to talk about how you can add a little bit of noise to the image, one step at a [00:28:00] time. And then you can use those, images to train the model to remove the noise step by step progressively, and very much like, DALL-E 2 trying to,make the image clearer and clearer step by step progressively.?
Miko Pawlikowski: Generative adversarial networks, which was an interesting development, from Ian Goodfellow. How does that fit into the rest of what you just described?
Mark Liu: Generative Adversarial, Networks, so it's great at generating different forms, of content.a lot of times when readers learn something, if you give them the end product, it's too complicated, right? So they may get frustrated and they just give up. as an author, my job is how to make sure [00:29:00] thatreaders stay engaged throughout the book and never get tired, never get frustrated, and gradually learn and finally learn to do the state of the art machine learning models generally by models like ChatGPT-style transformer to generate the text and the audio, right?
Mark Liu: So what is the idea behind the GANs? You have two networks. One is a generator network. The other one is a discriminator network, so the job of the generator is trying to generate a piece of work similar to that from the training data set. let's use a grayscale image as an example, right?
Mark Liu: you have a training dataset of grayscale images of, handwritten digits, like 0 to 9. And then, those are the real images. And then you will ask the generator to generate something similar to [00:30:00] that, so that it can pass as real in front of the discriminator.
Mark Liu: before you train the model, the generator is terrible. So whatever the generator generated, completely like gibberish. it's like a snowflake on a screen, that kind of stuff. But, this is where training, comes in. you will have a training loop, and then, in each iteration, you will ask the generator to generate a bunch of fake images. At the same time, you also have a bunch of real images from the training set and you give all those to the discriminator and ask the discriminator to determine whether each image is real or fake?
Mark Liu: And then the generator's job is trying to create an image so that the discriminator [00:31:00] would think it's real. that's the generator's objective. So therefore you have a loss function, and then you train the model. You gradually fine tune the model parameters so that in the next iteration, whatever image generated by the generator will have a higher probability of passing as real. And then you do this again and again, you can do the thousands of iterations.
Mark Liu: And, if you do that,long enough, then eventually the generator will be able to create an image identical to the image from the training set. So that's how GAN works you have a zero sum game, you have a competitive kind of two networks competing with each other, trying to outsmart each other and eventually, the generator gets better and better.
Mark Liu: So that's the [00:32:00] idea behind GANs, it's a revolutionary idea. in 2014, 2015, Ian Goodfellow and his co authors proposed the model. a great thing about the model is it can generate different content: numbers. Images, shapes,even music, so on and so forth.
Miko Pawlikowski: I love this idea because on top of that, you've got this built-in target point, right? When your discriminator can no longer discriminate between what you're generating. when you're finished, it's not arbitrary. You've got that. And the other reason why I love that is that it's got this anecdote attached to it that, legend has it, it was written one evening, when Ian was celebrating in a pub I think someone was graduating, some fellow students. And, they were discussing a problem when they wanted to generate some pictures. And he came up with this idea that, 'oh, what you're suggesting is too [00:33:00] complicated and you should, put two networks against each other'. And they laughed. he went home and, still slightly drunk. he wrote a proof of concept of that. And then turned out, that it actually worked out. I think in one of the interviews later, he said that if he wasn't drunk, he probably wouldn't have done it because it sounded like a silly idea.
Mark Liu: Okay. Yeah, that's right. Yeah.?
Miko Pawlikowski: how random some of those things are. How, weird and unpredicted. And I think one of the things I wanted to ask you about is also what made all of those kind of recent breakthroughs possible? what was missing?
Miko Pawlikowski: Because we've had the neural network since what the 80s or something like that. all of a sudden, it looks like in the last few years, or maybe last decade or so, it was just like one breakthrough after another breakthrough just dropping. And if you try to keep up with currently written papers on AI, there's just so many of them.
Miko Pawlikowski: And [00:34:00] it looks like every other day, there's something super interesting that's been developed and it's literally hard to keep up just with other people's ideas. What do you think enabled this kind of explosion in the recent years??
Mark Liu: actually, like a neural networks was proposed even earlier than 1980s. I think in 1960s, researchers proposed artificial neural networks, basically modeled after human brain, The idea was a great one, but at that point, we didn't have the,hardware to support it, And then started in 1990s, early 2000s. The hardware becomes much more powerful, number one. Number two: there was more research, more breakthroughs in the research field of, artificial neural networks. so one example is,LeCun's, [00:35:00] uh, Convolutional Neural Networks. most neural networks are fully connected, dense neural networks, which means, a neuron in the previous layer is connected to all the neurons in the next layer, and it works great.
Mark Liu: Except that once your model becomes larger, the number of parameters, grow exponentially, and then it's very hard to train it, right? So that's a problem. convolutional neural networks is, you localize the weights, okay??
Mark Liu: you have a filter, and then the weights in the filter is a fixed When you move the filter on an image, and then this greatly reduced the number of parameters.
Mark Liu: it makes,computer vision much more efficient. because of that in,Early 2000s, there were a lot of breakthroughs in computer vision, in [00:36:00] convolutional neural networks, and I think that's a huge breakthrough. And then
Mark Liu: after that,you also have, GPU training. GPU training became very popular in the past maybe 10 years or So.
Mark Liu: And there is, Huge game changer because as deep neural networks became larger and larger, It's very hard to train them, without,extra help, right? When you train on CPU. CPU is a general purpose kind of processor. you have to do many things on it. But, GPU is specialized.
Mark Liu: So you can do machine learning jobs much faster.
领英推荐
Miko Pawlikowski: and of course, we also have more and more. training data available, and that also is necessary for large language models to work. it takes time, but I think, the past 20 years or so, [00:37:00] we suddenly have, everything come together to make it work,basically, we've got gamers to thank for their breakthroughs in AI because of the graphic cards, the GPUs that they requested, right?
Miko Pawlikowski: you have a very good point, I think GPU was originally designed for gaming purpose, right? And then suddenly right now, it has a completely different purpose, And I have several GPUs at home, not very powerful I think it's powerful enough for me to experiment on different models. It costs maybe several hundred dollars, thousand dollars. I have three of them. Two of them are from my son. My son was playing video games. And then now he doesn't use those computers anymore. And then he just gave it to me. And then I just simply take them out and use it for my own,?
Mark Liu: But the cost is not that much.?
Miko Pawlikowski: the cost is not that much unless you go for like the top of the line 80 [00:38:00] gig ones, which are very hard to come by and also quite expensive. Yeah, so thank you gamers. Thank you for enabling the AI revolution in many ways. it goes back to what I was saying about how random some of these things seem to be.
Miko Pawlikowski: so where do you think, we're heading? Like you said, the future is notoriously difficult to predict, obviously. But, if you were still going to venture and make a guess, that will probably prove completely wrong a few years down the line, where do you think we're heading with all of this?
Mark Liu: if I had to venture to guess The large language models will become even more powerful in the near future, not only in terms of generating,cohesive text, but also generating images, generating videos and also Multimodal models will become very popular.
Mark Liu: Okay,you [00:39:00] can generate not only images, text, you can also generate audio, video, sound, and so forth.?
Mark Liu: other than that, I think,it really depends on, which breakthrough will come through in the near future. And you never know if there's just one day suddenly is huge breakthrough, and then they'll completely,change the landscape of ai, just like what the ChatGPT did a couple years ago, right?
Miko Pawlikowski: the future is very exciting, but at the same time, like you said, it's very hard to predict. But, I think right now is a very fortunate time, a very exciting time for, tech enthusiasts. for anybody who is passionate about ai, about technology, is very exciting. So two follow up questions then. one it's,like anything else, there are these fashion waves that kind of, come and go. and AI is now the latest [00:40:00] hottest thing. So all the VCs, everybody's throwing money at it. But at some point people will probably move on to the next thing, just like they did with crypto and smartphones and internet and whatever else before, right?
Miko Pawlikowski: So I'm wondering, where do you think we are in that, hype cycle, and what's going to happen when all of a sudden slapping AI-first on your startup, no longer make sure that you get funding. So that's question number one, follow up.and then the second question is, if you were to plot, a graph of how you expect, the large language models to continue developing, I think we can all agree that there are some kind of like very exponential growth where somebody figured out, ChatGPT or one of those massive models. If you throw enough data at it, and you massage it for long enough,you can create this impression of, 'oh, this is magic, how on earth is that even happening?' But then, at some point it has to [00:41:00] plateau, right? it's not possible for it to go, at that kind of speed, into the sky.
Mark Liu: Feeling. Again, it's hard to predict the sense.
Miko Pawlikowski: course, all the usual disclaimers about predictions, but what's your take on what it means about us as humans?
Miko Pawlikowski: Does it mean that what we, cherish as one of the unique capabilities of humans, the human intelligence? it's not actually all that unique, because it's hard to not have this feeling when you talk to one of those big large language models and, during the time it doesn't go haywire and start behaving weird, but on the times where it works well.
Miko Pawlikowski: It's really hard to not have this impression that you're talking to somebody with, some amount of intelligence to it. So does it mean that we're all some kind of statistical models and the intelligence that we demonstrate [00:42:00] is also an emerging property? What's your take on that?
Mark Liu: I don't think,many people in the world right now have a good answer to that question. that said, I do want to point out that there are many people right now have concerns about,AI. Because of the potential damage it can do,so it's all about the objective function, So if you give a task to the model and, in terms of the last function, and then you can just try it again and again, and eventually it will become very good at,whatever objective you want the model to do so that is good, but at the same time, it can be bad
Mark Liu: the AI may not even know it, right? It's just trying to accomplish a certain goal. It just happens that a human being is standing in the way of that goal. so in that sense, I do think that, Human beings need to be [00:43:00] careful. I think like AI needs to be, regulated in to some degree.
Miko Pawlikowski: we cannot let it to, do whatever it wants. It may have serious. negative consequences to human beings,I think that a lot of what you just described has been the main kind of concern for everybody making sci-fi movies from the Terminator and Skynet And,I certainly get that, but I think I'm probably more worried about.going back to what we said about, you won't be losing your job to AI, you'll be losing your job to someone using, an AI, I think this probably applies here too, that you can just do, as an enabler, it scales up the amount of damage that,nefarious party can actually, produce, because using that to bad ends.
Miko Pawlikowski: a lot of the security that we [00:44:00] rely on is practical, right? Like for example, all the encryption keys that we use for everything are, only because it would be computationally too expensive to actually figure that out. But then when you've got tools like this, it's easy to be scared about the possibility of that figuring out, and making things possible, that previously weren't, so I think I'm more worried about that scenario, where someone uses the AI to bad ends and it enables them to do more damage that they would be able to do with traditional methods,?
Miko Pawlikowski: even in the current stage,if AI falls into the wrong hands, it can do a lot of damage. not that catastrophic, but it can do a lot of damage to a lot of families, right? I think, There were like stories about, people use the generative AI to createa fake phone call to their parents and, demand a ransom [00:45:00] money so I think it causes, financial damage and also a lot of emotional distress, like fake news. Fake video,a lot of deep fake stuff, so even at this stage I think you can do a lot of harm if you fall into the wrong hands Yeah, that's a very good example of the call. Like you can technically go and call people and scam and,people do that, but there is a limit to how many people you can physically call in a day. If on the other hand, you have a powerful enough AI, you can scale it up and you can probably call everybody in the United States,a certain amount of times.
Mark Liu: That's?
Miko Pawlikowski: you concerned about the AI , involvement in the upcoming election.?
Mark Liu: so we have to be careful, but I think so far the impact that it's limited. but at the same time, I think all the parties,politicians need to pay attention to generative AI. [00:46:00] Because of what it can do, fake news and so forth.
Mark Liu: imagine you are running a political campaign, right? You must,get to know, analytics, how AI can influence your campaign either positively or negatively, if your team can utilize AI, uh,to, Strengthen your position legally, you're in a very good,position, it can help you, but on the other hand, if you're not careful, your opponents or somebody can use deepfake to disrupt your campaign for your cause?
Mark Liu: that's why I think AI is so powerful and also so widespread. It affects every single industry in the economy, not just a few isolated sectors. that's very unique. About AI.
Miko Pawlikowski: Did you hear about the [00:47:00] Elon Musk lawsuit against OpenAI from a few days ago??
Mark Liu: obviously OpenAI initially started as an alternative to the big companies, and the massive labs like Google, Facebook and so on. And their pitch and the initial mission statement was to release everything open source. Now, hence the name OpenAI.
Miko Pawlikowski: And then somewhere along the way, that turned and it's currently a for profit, closed source company, worth, what, under a hundred billion at the moment. we're recording this on March the 4th, a few days ago. Elon Musk, opened this lawsuit, where he alleges that, he was basically scammed because they turned the company around and they went against the initial mission.
Miko Pawlikowski: And, I think the opinions on the internet, vary from, 'okay, this is jealousy', because he's jealous of, of the success that open AI has seen. [00:48:00] To, 'okay, this is a nice publicity stand. he probably has a point, but this is probably not going to start standing court'. and I'm trying to make sense of, how much of that is actually valid and how much I should be worried about OpenAI being, at the forefront of this, a big closed source company.
Miko Pawlikowski: I also heard that,many years ago when Elon Musk and the Sam Altman co-founded the OpenAI,their objective, was,a nonprofit organization,
Mark Liu: Given the competition from other big players in the industry, I think OpenAI was under pressure to commercialize ChatGPT and this may go against the original objective so I can see the argument from both sides.
Mark Liu: on the one hand, we have to be careful like we just discussed about the use [00:49:00] of, AI that may lead to,the end of humanity as we know it, if we're not careful. But at the same time, if we use that properly, I. It can be a great tool, that's why there is such a great market for, generative AI,?
Mark Liu: so I think there is some tension, within the company, so you have different views. that's why, I think, a few months ago, within several days, Altman was fired and then get hired back and so on and so forth. in the background, I think it's really just those two forces at play, so the force wants to make sure that, AI does not go out of control, harm human beings and at the same time, there is huge pressure from, industry peers to Commercialize those applications to make profits,
Mark Liu: Actually I'm glad that,Elon Musk actually made the lawsuit in the sense that it [00:50:00] may,swing the pendulum to the other side so eventually what I think, uh, the view that we should commercialize and make money out of it, I think that kind of view prevailed, right?
Mark Liu: that's why Sam Altman got hired back, but that can go too far, because, in the process of competition, making profits, you may sacrifice security, so I think,the lawsuit by Elon Musk can potentially put the original mission in check. So to speak, and maybe,force OpenAI and other tech companies to think more about,guardrails around,AI to make sure It doesn't go out of control and harm human beings,?
Mark Liu: time will tell if anything comes out of it other than, one billionaire being upset at [00:51:00] another, but we'll see. So I'm going to ask you for one more prediction, and this time a little bit more down-to-earth. Pytorch. It appears to be still on the rise and, it appears to be the kind of go-to option for any new papers.
Miko Pawlikowski: TensorFlow seems to be, stagnating a little bit.you talked a little bit about the advantages of PyTorch and why you chose it for your book. and, I'm wondering, do you see this being like the prevailing platform? because now I think that the main kind of breakthroughs for Pytorch was, you mentioned the GPU support, obviously, and also the built in, backpropagation, right, the autograd now, the other frameworks also provide the autograd. so I guess they're closing up the gap a little bit in that respect, if you were to venture one more crazy prediction, would you see Pytorch leading the way going forward? Are you going to update your book in a couple of years to port it to [00:52:00] some other framework?
?
Mark Liu: I think PyTorch is going to prevail in the near future. So I mentioned this in my book. So what PyTorch does is, using a dynamic computational graph, which means it creates, Computational graph on the fly so that, it's faster, it's more flexible. TensorFlow is using static computational graph. so it's slower. so that's the main difference. And, it affects the training speed greatly. so in TensorFlow, you don't really have to worry about which device you can use.
Mark Liu: it's all done at the backend automatically by TensorFlow. but at a cost,?
Mark Liu: If you have,an industry scale Models, and then you have a lot of GPU and you do a huge [00:53:00] calculation Maybe the overhead is neglectable. doesn't affect things much but for a lot of researchers it makes a huge difference because we already working with a lot of toy models not huge, therefore If you use the PyTorch, there is a little bit of inconvenience in the sense that you have to, specify whether to move this tensor to GPU, and then once you are done with it, you have to, get it back. But the benefit is huge because it,greatly. Increases the training speed. I think like at least for,small players, regular readers, and also for researchers around the world. I think a PyTorch is much more convenient. It's much faster. And certain [00:54:00] large corporations, they may not care that much.
Mark Liu: for regular people PyTorch is much more convenient, it's much faster and in the near term it may,win out.?
Miko Pawlikowski: for anybody listening to this, I know that if I haven't, read your book before, I would probably be on manning. com, looking at it. And then at some point I would reach chapter 4, where you're walking us through building a network that does, generation of anime faces. Which I thought was a pretty cool example.
Miko Pawlikowski: Can you give us a taste, for, anybody who's going to be doing that? what's the training gonna look like? what data we're going to use, how we're going to implement a network. And then in terms of training, what kind of hardware you need for the training to be, quick, how much time you need to, see for that.
Miko Pawlikowski: give us an [00:55:00] idea whether this is something that, someone who is comfortable with Python can just pick up on a Sunday,on a random weekend and go through, or whether there's any extra prep that's needed.
Mark Liu: in order to train a GAN model to produce the color images or for anime faces obviously you need the training data, right? the research community has a lot of human-created data for us to experiment on. So you can actually go to a website, download the anime faces. I think tens of thousands of them, and then you need to create two neural networks.
Mark Liu: One is the generator, one is the discriminator, and the generator is trying to create an image that can pass as real in [00:56:00] front of the discriminator. you just train the model,many rounds and then eventually you will see that the generator is able to generate a anime face, which is very much the one from the training set.
Mark Liu: I want to mention that in order to, generate,color images of human faces, you don't need to use,convolutional neural networks because,we mentioned this earlier. So if you use a fully connected,dense neural networks. There are just too many, parameters and then the training will be too slow.
Mark Liu: So on the other hand, if you use the convolutional neural networks, you localize the weights. So the weights will stay the same in a filter and then you move the filter around the image. So there's a way of
Mark Liu: greatly reduce the number of parameters in the model and make the model training much faster.
Mark Liu: this is on the software side, on [00:57:00] the training side. In terms of hardware, so I trained it using,GeForce RTX 2060, like a GPU. I think right now the cost is three or four hundred bucks. It's not that expensive You can easily buy it or if you have a older gaming computer, you can just grab it and then put on your computer.
Mark Liu: It's very easy to do, you don't really need a lot of knowledge about computer hardware to do it. Nowadays, computers are very user friendly you can Just pop it open and, change ports,very fast, that kind of stuff. So it took me like, 30 minutes to an hour to train the model. So it's very fast.?
Mark Liu: However, if you don't really want to bother with the GPU, you can train the same model with the CPU and, what you can do is, you can simply leave your computer on all night it may take, five, six or seven [00:58:00] hours, but, it can be easily done overnight. You just leave the program on, go to sleep, next morning, you see the result.
Mark Liu: so in that sense, computationally, it's not that costly.
Mark Liu: I think the most complicated model, would it be,chapter six, you have to convert, like a horse image into a zebra image. It's called, cycleGAN and then you have to convert like a blonde hair to black hair in images or black hair to blonde hair,
Mark Liu: Those kind of models are a little bit more. Time consuming, because you are using higher resolution, number one. Number two, you are actually training two generators and two discriminators. Okay, so what, how CycleGAN works is that, you have two generators, let's use a horse and a zebra as the example, how to convert a horse image to a zebra image, [00:59:00] right?
Mark Liu: So you have two generators. One generator is called a horse generator, the other one is called a zebra generator. So what horse generator does is that it takes in a zebra image and convert it into a horse image. And then what is a zebra generator does is that it will Take a horse image and convert it into a zebra.
Mark Liu: And then you also have two discriminators. the horse discriminator will tell whether an image is a horse image or not, and then the zebra discriminator will tell if an image, if is a zebra image or not. and then, cycleGAN has another element
Mark Liu: a loss function has a component called a cycle loss.
Mark Liu: So what do you do? So I think the idea is really Ingenious. that's why I mentioned that with [01:00:00] the right loss function you can't show anything. originally you have a horse image, right? And then you give that image to a zebra generator to create a fake
Miko Pawlikowski: Zebra image.
Mark Liu: Okay. Now, you will use that fake zebra image as input to the horse generator, and ask the horse generator to convert the fake zebra image into a fake horse image. now here is the key if both generators do their job right, then
Mark Liu: the fake horse image you got will be Identical to the original horse image You so that's called a cycleGAN.
Mark Liu: cycle loss is trying to minimize the loss between
Mark Liu: the original horse [01:01:00] image and the fake horse image after a round trip. That's a very powerful tool because that forces the model, both models, both the zebra generator and the horse generator to generate realistic Images.?
Mark Liu: so since your show is called HockeyStick I think that's like when I was like trying to experiment the different models I think that is pretty much like a hockeystick moment.
Mark Liu: When I saw that, I was like, this is like a psycho loss is really ingenious because that component in the loss function is crucial for you to successfully convert a horse image into a zebra and a zebra image into a horse. When I saw that I was completely amazed not just by how well the model works, but also by,the, ingenious [01:02:00] mechanism, devised by the researchers. again, there are tons of smart people in the profession. So sometimes I see what they are doing, and once I understand what they are doing, I was completely amazed.
Miko Pawlikowski: I said, this method is amazing, the author must be a genius, I think there are tons of geniuses in our profession.Love that story. And also FYI, I'm totally stealing the quote from you with the right loss function. You can achieve anything. I think this should go on a t shirt.
Mark Liu: That's right, yeah. with the right loss function, you can achieve anything. That's my belief, the concept of the loss function is very powerful. so loss function is another way of saying the objective function, right? you are telling the model what to achieve, what to do, it's very powerful.
Miko Pawlikowski: Yeah, I think what keeps striking me is that once you go and look into this ideas. They're [01:03:00] not actually that complicated, there's not too much magic in it, but to come up with that idea initially, be the first one to propose that it does require certain a level of genius.
Miko Pawlikowski: So I think, probably decades from now, kids will be learning a lot of that stuff in primary school or early in their education. And it just feels like we're really experiencing some kind of breakthrough in this profession, a hockeystick moment.
Mark Liu: Absolutely. it's good that a lot of smart researchers are working in the field. And sometimes when you get stuck on a question, you may work on it for years, right? Without any breakthrough, and then suddenly, last year, like a strong line, year after year, suddenly, there is a aha moment, and then you figure out the way to tackle the problem and it worked.
Mark Liu: And then it's a methodmay become revolutionary, it may [01:04:00] completely change the field?
Miko Pawlikowski: You're about to finish, your book. Is there anything that you would do differently if you were starting to write it today? Would you make any different choices??
Mark Liu: Good question. I don't think there are many things I would change. The reason is because even though it's a new book, actually I have been working on it for a couple years now, so I have, a GitHub, repository, before I, submit a proposal to manning so it's my way of working things out. couple years ago I started to, use PyTorch for machine learning models and I started to get into. generative AI, and then I started to, use PyTorch to generate shapes, images, and then [01:05:00] eventually I get into natural language processing, large language models, and then I had a lot of projects.on my computer?
Mark Liu: writing book, it's my way of,organize things to, think things through to make sure everything works out. but I know that, in order to write a compelling proposal.I need to,first prepare well, right?
Mark Liu: especially there are not too many good publishers out there, so you only have one shot with a good publisher. like manning is one of the great publishers. over the years I've read many books from manning and, I really enjoyed their books and, I knew that I needed to write a good proposal in order to work it out. I don't want to lose a chance. So [01:06:00] what I did was, in the summer, I spent several months to create a huge github repository. So I lay out all the chapters initially, like the first draft, and it had 17 chapters and,each chapter I use a Jupyter notebook to explain everything to the best of my ability. All the codes are there. So it's, pretty much like a book.
Mark Liu: Once I have that, then I spend another month to convert it into an actual book, a PDF file. a lot of tech people use latex. Latex is A word processing software, right? especially if you have a lot of math, you can actually generate like a beautiful equation, my book has some like a equation, some math, but not a whole lot. [01:07:00] But, it forces me to go through everything one more time, in the process of converting, the GitHub repository into a PDF file. I spent a lot of months converting everything. And also it looks beautiful because, uh, it exactly like a book.
Mark Liu: you have a template, you have a cover, you have,table of content, you have each chapter, what is section number, what is section title, what is subsection so forth, you have images,?
Mark Liu: in short, it's pretty much like a book to be published. and then I sent that, to manning,in the summer, along with the PDF file, along with the,proposal file, and then I have a link to the GitHub page. And then what manning did was send the book proposal to more than 10 reviewers in the profession. The [01:08:00] reviewers are all data scientists,people who know, AI in the profession, and they give comments on whether, this book should be published And then they give a lot of, very valuable feedback.
Mark Liu: the feedback was very positive, partly because it's a hot topic, partly because I spent a lot of time preparing it, right?
Mark Liu: but I did receive a lot of good feedback.
Mark Liu: to answer your question is because I have been through the several rounds. now, there's not much I would change, because I have already incorporated, some feedbacks, great feedbacks from about the 12, reviewers on the proposal.?
Mark Liu: Fair enough.?
Miko Pawlikowski: How many copies have you sold so far?
Mark Liu: it's already sold more than a thousand copies. I think like it's a daily high was 58. So it says a lot about the [01:09:00] demand for,generative AI and if you look at the, the top 10, from manning website every week, you will see generative AI is hot. a lot of demand. And another trend is, Python PyTorch. I think that's, a lot of people are switching to PyTorch and, I think there is a book from Manning called, "Deep Learning with PyTorch". It's selling very well. And then there's another book called,"Large Language Models from Scratch". actually the book is also using PyTorch just as I do. But it's that just that focuses on large language models, but in my book focus on many different contents like large language models. music, images, shapes, numbers And then another thing I want to mention is that, I did spend a lot of time thinking about, how [01:10:00] to help
Mark Liu: readers learn progressively,step by step.
Mark Liu: chapter one, of course, is an overview of the book of the, generative AI landscape and, what is the book is trying to accomplish. Chapter two, it's a deep learning with PyTorch. So even if readers. Have no background using PyTorch. after reading chapter two, they will be able to use, pyTorch to create, deep learning models. from A to Z you have you can do the whole thing. Okay? So that's very important. And then chapter three, we get into GANs. So you will use, GANs to generate, numbers and the shapes. So the models are very simple. you only have a two or three layers, of neurons in those models. So therefore, it's very easy to understand. It's easy to create, and the training takes a matter of minutes. [01:11:00] readers will not get,frustrated because everything is so simple. And then in chapter four,
Mark Liu: I kick things up a notch
Mark Liu: so instead of using fully connected dense layers, I use convolutional layers that's needed for image processing. If you want to create a high resolution color images, fully connected dense layers won't work It may work, but it's very slow. On the other hand, if you use convolutional layers, it's much faster because you use filters,to move around the image, and then you just train the weights in the filter itself.
Mark Liu: So that's much more efficient and that kind of stuff and then so people learn to use the convolutional layers in chapter four to generate the color image and then in chapter five I kick things up another level. [01:12:00] So readers learn to select characteristics in images, you can choose to generate An image with eyeglasses or without eyeglasses You can transition from an image with glass to an image without glasses.
Mark Liu: So all those arithmetic kind of stuff and then chapter six is not out yet, but I will do the cycleGAN is computationally costly, because the reason I just mentioned it because they have two generators, two discriminators. and then chapter seven is about,variational auto encoders. that's a different model from GAN. that is important, because it has a encoder-decoder architecture. We see it's very common. In machine learning models, for example, ChatGPT is like a decoder-only model, the original transformer paper attention is all you need has like an [01:13:00] encoder part, and a decoder part that kind of stuff.
Mark Liu: And then after that, I get into transformers, natural language processing, how to do tokenization, how to create a transformer from scratch, including like a ChatGPT-style,you can create a GPT from scratch, you can train it. I saw that you have,several posts on LinkedIn about how to create a GPT from scratch, right?
Mark Liu: my book does exactly that in, chapter 10, how to create a GPT from scratch. And then chapter 11 is how to create a small GPT from scratch and then train it. To generate text. its focus is not mainly on creating, but on training a GPT from scratch.
Mark Liu: Of course, it's much smaller. It only has 5 million parameters. But you learn how to train a model from scratch. and after that it's music generation and then different models and then how [01:14:00] you can use the langChain to chain together different, large language models.
Miko Pawlikowski: So that's the whole book, it's been a real pleasure to talk to you. I'm personally super excited, can't wait until the rest of the chapters become available. So, you know, hurry up before I let you go. I'm curious whether you have your next idea for your next book already in mind or whether you're going to take a small break before book number four.
Mark Liu: So far I'm very busy with,writing the current book. I do get ideas from time to time. One example is, I think this is a text to Image, like a multimodal model thing, is amazing. I think,there could be another book, there, just focused purely on diffusion models and, multimodal [01:15:00] transformers, how to convert a text to image, or convert, text to video,?
Mark Liu: There could be a book there. I thought about it, but, I didn't spend a lot of time on it because I'm busy writing the current book and the other, idea I thought about is,so this is also related to multi modal models. my first book is called a make a Python talk, right?
Mark Liu: But it's actually using Google API to do the actual speech recognition, text to speech. I don't do any machine learning part. So I just use the Google API to do all the heavy lifting But, there are like open source models out there. You can actually train a model. To, do speech recognition, so that's actually a multi modal model, right?
Mark Liu: Because,speech recognition, basically the input is, audio, output is text, right? And then you can also do text to [01:16:00] speech.that can be another interesting project.?
Mark Liu: I have some ideas on how they work, but I do have to spend a lot of time to experiment. so I would say in another two or three years, I may venture into one of those ideas and maybe write another book about it.
Mark Liu: Awesome. you're going to have one reader already interested in that. So definitely go for it. Okay, let me ask you then, which idea do you like better, the speech recognition model or, just a book about, text to image, multimodal, transformer, which idea do you like better?
Miko Pawlikowski: I've been meaning to properly read the whisper, paper. So I think the speech, recognition is actually a pretty good use case, and I would definitely be interested in reading that.
Miko Pawlikowski: Good to know. I may put more emphasis on that project. Awesome.
Mark Liu: the feedback.
Miko Pawlikowski: [01:17:00] All right. thank you so much. It's been a pleasure, and hopefully I'll get you next time with your next book. Thanks a lot.
Mark Liu: Thank you.
It sounds like you had an insightful conversation with Mark Liu! As an expert in Generative AI and a seasoned coder, his perspectives must have been invaluable. Looking forward to diving into his book and learning more about building Gen AI tech from scratch.