Demystifying GenAI: A Beginner’s Guide to Building Your Own ChatGPT-like System

Demystifying GenAI: A Beginner’s Guide to Building Your Own ChatGPT-like System

ChatGPT has given wings to our imagination, at least to those of us who are in the tech industry. Now that most of us have a fair understanding of GenAI, its about time we also sought answers to the next obvious question. How can I use GenAI to build something similar to ChatGPT but more explicit to my own business need? Though I had a fair understanding of GenAI owing to my experience; I still wanted to double check to ensure I am on the right track. My basic instinct directed me to ‘Google Search’, which certainly gave me great insights. But then my creativity pushed me to ask ‘ChatGPT’ itself! I was not surprised, as it did give high-level points which are relevant. I was rather encouraged, that there is still a need for us to write articles as humans. For all those people who have been wanting to use #genai but not sure where to start or what is the best use case, here is an attempt to demystify the same using a theoretical discussion on what it may take to build a product like ChatGPT.

The key Building Blocks

Large Language Model

One of the key building blocks of ChatGPT, is LLM (Large Language Model). An LLM is a deep learning algorithm, trained on huge sets of data (and hence the name large), designed to perform NLP (Natural Language Processing) tasks like classifying, generating, and transforming text. ChatGPT used GPT 3.5 model which stands for Generative Pre- Trained Transformers. There are multiple LLMs currently available in the market like GPT 4 used by ChatGPT, Llama 2, Inflection AI, MosaicML, BLOOM, et cetera. But just an off the shelf LLM is not going to meet your needs. Most of these models are well trained to understand and generate text efficiently using publicly available data, but they would not be able to meet the specific requirements of your industry or domain or meet your unique business demands.

You are now left with two choices, either to create your own LLM from scratch or to use an existing LLM and customize it meet your business needs.

Building your own LLM

Building your own LLM is going to be a humongous task and would require a large investment as well. If your goal is long term to build the AI from scratch, wherein you want to train your AI bottoms up with your own data for reasons like compliance, novelty, research, privacy, et cetera, then go ahead and build your own LLM. Following is a high-level summary of what would it take for such endeavor:

1.????? Approx. Training Cost – The cost of training an LLM depends on multiple factors. It is primarily impacted by the no. of parameters used for training, the size of the training data, the choice of GPU used, the format of training and the CPU utilization. At an approximate level, it takes about 10,000 GPU hours to train a 1 billion parameter model, and the math is linear here. If I were to be conservative and use an A100 chip to train a 70B parameter model having 2000B tokens using TF32 training format at approximate 30% GPU utilization, it would take 7066 GPUs to complete the task in 1 week. The same can be reduced to 2756 GPUs/ week with a H100 chip. Now if you did this exercise by renting GPU on cloud at the rate of say $1 GPU/hour it would cost you $1.18 million. And if you managed to get A100 for, say $10,000 each, it would cost you $70.66 million. The final choice between renting and purchasing would depend on how frequently you want to train your model. Another strategy to cut down on cost is to increase the training time. The math is linear again, suggesting that you can cut the training cost to one fourth if you are ok with a training time of 4 weeks. Please note I have not included any overhead like cooling/energy, real estate, support staff, et cetera in the cost calculation above.

2.????? Data – Data is the most important aspect of all machine learning algorithms. Unfortunately, it’s also the one which rests mostly at the bottom of the priority list of many organizations. Ask any ‘Data Scientist’ who makes the transition from academics to industry, and she/ he would tell you about the nightmare. Data has also been one of the bottlenecks for LLMs development, as the neural network algorithms (which are the backbone of LLM) still can’t take data as input unless they are converted to numbers.

a.????? Data Source - The Large Language Model is obviously going to need a large amount of data, but how much is large here? To set the context, ChatGPT when released was trained using 500 billion tokens! In GenAI, a token is a chunk of character (it could be a word, a phrase, a special character, or a symbol) which the LLM reads and generates. In other words, though “Hello, World!” has two words, an LLM may read it as 4 tokens. Though there are many ways in which tokens can be created, as a rule of thumb 100 tokens mean 75 words. If we used that analogy, “training data equivalent to 6.25 million books (assuming 200-page books with 60,000 words each) would be needed to build something like ChatGPT, Aha!” Internet came to the rescue of ChatGPT, as its main aim was to understand natural language. But who would come to your rescue? Well, you can use the internet too, I am sure you would have heard of ‘web crawlers.’ But remember, it may put you in trouble if not done cautiously, for copyright or privacy issues. You still have other options to choose from, you can use public dataset like ‘Hugging Face Datasets’ or implement a private vector database like ‘Pinecone.’ Last but not the least, you can use GenAI to create synthetic data as well, as it is one of the use cases of GenAI.

b.????? Data Preparation - The data collected using one of the options specified above, needs to be examined to ensure your LLM gets superior quality learning. You can start with removing outliers, doing imputation to take care of missing data, de duplication to avoid possible bias, et cetera. Based on the purpose for which you are going to use your LLM, you need to carefully examine the data and do the needful. If you are creating an LLM to be used as a Chat Bot for example, then you need to ensure that your training data does not include any toxic language, it presents the facts and not fantasy, it is as truthful as possible, et cetera. Last but not the least, make sure that you leverage as many tools and techniques as possible for data preparation, as you are dealing with Large Data.

c.????? Data Transformation – You now need to ensure that you convert your raw text data to a format which your model can understand. As explained above, Tokenization would be the next step. One of the most used Tokenization techniques is BPE (Byte-Pair Encoding) that merges the most frequently occurring characters or bytes into a single token until a certain number of tokens or vocabulary size is reached. There are other methods like Sentence Piece and Jurassic too.

3.????? Model Architecture – Transformer architecture is what everybody uses, so be it. There were two key reasons why the Transformer model took the Spotlight away from previously successful deep learning models and was widely accepted as more parallelizable model requiring less time to train. First, it shunned recurrence used in RNN/LSTM by using positional encoding, by adding a fixed weight to the embeddings which encode information related to a specific position of a token in a sentence. Second, it relied solely on self-attention to draw global dependencies between input and output, by allowing each word to attend to every other word in the sequence in parallel, which helped it in capturing long range dependencies better than RNN/LSTM and even CNN. You can choose an Encoder only Transformer (e.g., BERT) if you want to just do classification or choose a Decoder only transformer (e.g., GPT) if you want to just generate text; instead of using the complex encoder-decoder model used by Google team for language translation, which was also the birth of transformer model in 2017.

4.????? Training the LLM – Once you have finalized the Data, Architecture and Hardware, the training process starts. First thing to do is split the data between training and test, say 80:20 which most of the people use. Then pick the batch size based on the memory of the GPU chosen. Choose a relevant activation function, like ReLU, based on deep learning algorithm selected. Choose a loss function minimizer like ‘ADAM Optimizer’, and then decide on how you want to vary the learning rate over the training period. Finally, to avoid overfitting use a regularization technique like dropout or weight decay. That should do for someone who is creating an LLM for learning, but feel free to correct me if I missed something. There are many more things which can be done for better results, but that choice can be best made by the person developing the model.

5.???? Evaluating the LLM – Last but not the least would be the model evaluation. It’s crucial to know how your model performs not just in achieving its objective, but also when compared to Industry benchmarks. Amongst the various options available you can pick some of the intrinsic methods like ‘Perplexity’ and ‘BLEU score.’ Perplexity is a measure of predicting how well an LLM can predict the next word in the sentence, the lower the better. The BLUE score is a measure of how similar the text generated by an LLM is to a reference text, the higher the better. Next you can do some market benchmarking using ‘A12 Reasoning’, ‘MMLU’, ‘TruthfulQA’ and ‘HellaSwag.’ A12 Reasoning is a collection of science questions created for elementary school students. MMLU is a comprehensive test that evaluates the multitask precision of a text model. TruthfulQA assesses a model's tendency to create accurate answers and skip generating false information commonly found online. HellaSwag is a test that challenges models to make common-sense inferences that are easy for humans, with 95% precision.

Customizing an LLM available in market

By now you would have probably understood the reluctance around building your own LLM, unless it’s really needed. So, lets now look at the second option, model customization. Start with picking up any of the LLMs available in the market and ensure you follow the guidelines and license agreement to use that model. If you were to visualize, “it’s like hiring a Comp. Sc. Engineering college grad from a top institution like Indian Institute of Technology and then training her/him till the clearance to take up the production tasks.” The duration and nature of the training would depend on your business need. The same logic is used when we talk about customizing an LLM. The key aspect to remember here is the tradeoff between the training effort and the expected accuracy. If you want to use your model as a Chat Bot for level 1 customer support, you may not want to spend millions of dollars on ensuring that the model always gives the correct answer. Following are some of the key LLM customization options, at high level, sorted by increasing order of accuracy:

1.????? Prompt Engineering: It works by experimenting with the prompt sent to an LLM without changing its parameters. The key idea here is to ask the question in a better way to get the response that you want, by say giving example to the LLM, so that the model learns without training. There are many such techniques available which can be used to teach your LLM about your specific business domains. Let me try to see if I can explain a few in my own way:

a.????? Few Shot Learning: This approach uses prepending the prompt sent to the LLM with examples of desired output to set the context relevant to your business. The easiest way to for me to explain this would be “you take your child to the zoo and tell her/him that it’s a bird by showing one after the other ‘a parrot’, ‘a crow’, and ‘a pigeon’; with the expectation that she/ he will in return tell you that it’s a bird when she/ he sees ‘a dove’ as the fourth object.” It is called few shots learning, wherein the no. of shots is decided by how many birds you show your child.

b.???? Chain of Thought Reasoning: The above approach does not work when there is math or logical reasoning involved. For that there is another technique called Chain of thought reasoning, wherein a machine is taught to think like humans by breaking down the problem into smaller steps. Here instead of prompting the system with prepending that “The baker had 12 apples out of which they made pie of 8 and bought 6 more so they are left with 10 apples”, if the system is prompted with prepending like “The baker had 12 apples out of which they made pie of 8. So, they were left with 12 - 8 = 4 apples. Then they bought 6 more so they are now left with 4 + 6 = 10 apples”; then the system will understand better.

2.????? Prompt Learning: The description above would have given you a fair idea that “Prompt Engineering,” though creative, can only be used for limited tasks where there are not many variables. On top, though automation can help, it would still be highly labor intensive to create such prompt modifiers. In prompt learning, instead of prepending the prompt with text for better context, you prepend it with a virtual token embedding. This objective is achieved by using a small trainable model before the LLM, ensuring that problems like catastrophic forgetting (LLM forgetting the foundational learning) does not occur. This trainable model can be a LSTM or MLP and you can even save the virtual token embeddings in a prompt table for future use.

3.????? Parameter Efficient Fine Tuning (PEFT): So far, we saw how we add virtual prompts into the input without touching the LLM. Here on the similar lines, we add trainable layers; but not in the prompt, rather in the LLM architecture itself.

a.????? Adapter Learning: Adds small feed-forward layers in between the layers of the core transformer architecture. Only these layers are trained at fine-tuning time for specific downstream tasks. The adapter modules have the feature of near-identity initialization to ensure that the original network is unaffected when training starts.

b.????? LoRA: Low-Rank Adaptation, or LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, reducing the number of trainable parameters for downstream tasks. Other than being efficient, many LoRA modules can be built for different tasks on the same LLM and can be used to achieve different objective by just switching the matrices.

4.????? Fine Tuning: If data and resources are not a constraint, then this is the most effective approach.

a.????? SFT: Supervised Fine Tuning is used to fine tune all the parameters of the model using instructions dataset which is essentially labelled data of input and outputs. It teaches the model the domain-specific terms and how to follow user-specified instructions.

b.????? RLHF: Reinforcement Learning with Human Feedback is the most accurate model generation techniques amongst all described here. Its first stage stats with ‘SFT’ described above. The second stage involves training the ‘SFT’ as a ‘Reward Model’ using the feedback provided by humans as loss function. In other words, the responses from SFT are verified by humans as desirable or not, which of course is labor intensive, and that verification becomes the loss function for the training. The final step involves fine tuning the original policy model against RM using reinforcement learning with PPO (proximal policy optimization). The process is controlled using a KL divergence function which penalizes the RL policy from moving substantially away from the original model, which if not done can result in gibberish text generation which can fool the reward model.

Data

I have covered it as part of LLM section above in detail, but I would still caution you to ensure that you have the right volume and quality of data. Remember, “Garbage In, Garbage Out”!

Hardware Stack

As you would have understood by now, building something like ChatGPT requires a large investment in infrastructure. You would need a Scalable, Modular and High-Performance infrastructure which can manage resource intensive tasks like training a LLM from scratch.

1.????? GPU: At the core you would need a GPU, and since GPUs do not come cheap so it would be worth spending time in understanding what exactly is your requirement. The key components of a GPU to look for in order of importance would be the Core, then the memory bandwidth, followed by cache hierarchy and finally FLOPS. Since the most expensive part of any deep learning algorithm is matrix multiplications, so it is recommended to have a GPU with Tensor Core. One key aspect to keep in mind is that GPUs are faster than CPUs because they are efficient in matrix multiplications and convulsions, and the main reason for that is memory bandwidth and not parallelism. A 1 GHz processor can do 10^9 cycles per second, or so many computations. But that does not mean that it will do so many computations in reality, because of the latency to access global memory or to access cache or to do fused operations like a*b+c. Now if we wanted to do a 32*32 matrix multiplication, then it would translate to 64 tensor core operations which can be completed in 235 cycles (200 cycles latency for global memory access, and 34 cycles for shared memory, 1 cycle for Tensor Core), assuming we use 8 SM (Streaming Multiprocessor) having 8 Tensor Core each. It is clearly evident from the example above that the Tensor Cores are so fast that they are sitting ideal, most of the time waiting for memory to arrive from global memory. On an average the Tensor Core TFLOPS utilization for LLMs are around 50% as of today, which suggest that ultimately the memory bandwidth decides how fast GPU is. So, look for a GPU with high memory bandwidth, bidirectional even better. Since memory transfer to the Tensor Core is the limiting factor in GPU performance, the key to performance tunning lies in looking at other GPU attributes like memory hierarchy. To perform matrix multiplication, manipulation is done with the memory hierarchy of a GPU, which ranges from slow global memory to faster L2 memory, to fast local shared memory, to lightning-fast registers. Now if the same 32*32 matrix is chucked into smaller ones, called tiles, then we can achieve efficiency by loading those chunks directly into L2 cache, cutting down on the latency to access global memory multiple times. A TPU utilizes similar logic of using larger Tiles for each Tensor core which makes them more efficient than a GPU.

2.????? The Fabric: By now it should not be difficult to understand that parallelism is the key to working with LLM, even with powerful GPUs like H100 and GH200 in the market. So, the next most critical thing to understand and consider is the fabric underlying this parallel compute model that we use today, especially when you need to decide how many GPUs to pack in a server. Let us assume that the model parameters and data are distributed across all the GPUs in the multi-server cluster and that the training happens in parallel. The first component of the underlying fabric would be the GPU-to-GPU connection. The technologies available in the market in this segment are NVLink and PCIe Gen 5. A single latest H100 Tensor Core GPU supports up to 18 NVLink connections for a total bandwidth of 900 gigabytes per second (GB/s)—over 7X the bandwidth of PCIe Gen5. Next would be the connection between various nodes in a cluster. The third generation of NVSwitch builds on the advanced communication capability of NVLink to deliver higher bandwidth and reduced latency for compute-intensive workloads. To enable high-speed, collective operations, each NVSwitch has 64 NVLink ports equipped with engines for SHARP for in-network reductions and multicast acceleration. There are other technologies like InfiniBand Network, RoCE Lossless Ethernet, DOC Fully Scheduled Network, which I would not discuss here, but would request readers to understand as well for more clarity.

3.????? Storage: Another key consideration should be storage. When we talk about LLM with 70B parameter 2000B tokens, it’s obvious that we should have storage to support such endeavors. They key aspects of storage to consider would be security, availability, resilience, scalability, performance, and simplicity. Though object stores fit the criteria specified above the best, but SAN (Storage Attached Network) and DFS (Distributed File System) storage are recommended as well. There are many all flash NVMe storage products available in the market along with S3 object storage, which can be used.

It would be out of scope for this article to provided end to end view of infrastructure set up across tech stack, but the few key components specific to dealing with LLM have been discussed above. Other than those, a very important component worth attention would be the energy consumption & cooling requirements, as training an LLM would require huge computation which would both consume more energy and heat up the GPUs as well.

Software Stack

Once you have chosen and set up the hardware stack, the next thing to do is set up the ‘Infrastructure Management Suit’. The choice depends on how you want to set up the platform, like BareMetal vs VM or if you want to use Kubernetes et. The idea is to ensure that infrastructure can be provisioned on demand, and you have high availability as well. Next in line would be to decide on the ‘Model Serving Framework’, wherein you can use from various options like Triton Inference Server, SageMaker Neo, Cloud AI Platform, et cetera. The choice for this is also dependent of the ‘Deep Learning Framework’ that you select, like PyTorch, TensorFlow, et cetera. Last but not the least, you need to put ‘Observability Framework’ to monitor your entire tech stack from hardware to the model that you build. The idea here to ensure you are in complete control.

End User Interface

Before we proceed further, lets quickly analyze: “what made ChatGPT so successful”. Amongst the various key success factors, I think that ‘Simplicity’ was the biggest contributor to ChatGPT’s success. Its simple user interface, it being made available to public in a way in which a common man can understand it, really stood out. The man got his best virtual friend! Of course, at the core was another crucial factor, an AI which truly speaks your language, an AI which understands the context, nuances and even humor. But then there were similar applications existing in the market, some even with same GPT 3.5 at the core as ChatGPT, like JasperAi, Chatsonic, Rytr, et cetera; but none of them had the user friendliness which ChatGPT showcased. And last but not the least the tried and tested marketing strategy, of making ChatGPT available for free, with apps for both android and iPhone, also played a key role in its success. And that should summarize this section, “Does not matter how complex or advanced the technology may be behind your creation, your success would depend on how you present the product to its end user in the simplest possible form.” You can design a web page using react, you case use a Grafana or similar dashboard, you can use UI/UX tools like Figma or Miro or do whatever. The main thing to consider here is that how easily can the end user use it to achieve her/his business objective.

Things to Remember

There were many limitations of ChatGPT as well, in the model which first came out, which are being fixed with subsequent releases. Before we end this discussion, it would be worth highlighting a few which need to be kept in mind:

1.????? It does not guarantee the accuracy of information provided by it. Though its well known fact, but I would still remind of it and suggest carefully using any LLM without supervision for use cases where model output can directly impact human life.

2.????? The free version of ChatGPT (GPT 3.5) was last trained/ updated in Jan 2022, as per answers from ChatGPT itself. It suggests that there is a need for you to decide on frequency of training your model for latest information based on business use case.

3.????? It can provide Biased information, as it was built using web scrapping. That is why we discussed accuracy and diversity of data used for training.

4.????? It can’t ask clarifying question and serves best either by the user asking the right question or by using prompt engineering. If you are planning to build your LLM from scratch, then this is one design aspect you may want to include.

5.????? It gives different answer to the same question, when asked multiple times. Though that was one of the objectives, to make it sound original like humans, but it’s something that should be kept in mind while deciding the use case.

6.????? It can’t access Internet or any such information/ data in real time. This could be one of the improvement areas for you to think of while building your own product.

Other Key Considerations

Whenever you build a technology as disruptive as ‘ChatGPT’, there are a lot of non-technical considerations that you should keep in mind. I do not want to deviate from the topic of this article and go into ‘human psychology’ but allow me to attempt an analogy. “Though there are no guarantees, but to a considerable extent the upbringing of a child decides what she/he becomes as an adult. There might be situation where things go out of control, but for the most part the upbringing can still be carefully monitored. An AI is very similar to a child and learns the same way as children do.”While we are taking baby steps in this era of unsupervised learning, its high time we lay the moral foundations for AI to ensure a better future for the upcoming generation. I agree that its foolish to think that we as human can control everything around us, but it would be worst if we did not even make the attempt to put some controls.

An AI system like ChatGPT should be taught the value systems of our lives. Checkpoints and circuit breakers should be put at all possible avenues, including Data, User Interface, Models, Hardware, and Power Supply. An AI system:

1.????? Should diligently do what it is created for, e.g., ‘help’ if it’s a chat bot: It should ask follow-up question if the user is not clear, it should courteously deny unethical request, it should avoid foul language, it should honestly inform user that it does not have the right answer instead of fabricating things, it should present data and facts and not fantasies, it should avoid bias, et cetera.

2.????? Should not deviate from its purpose: It should play the role which it is designed to play and not assume an identity which it doesn’t have, it should not claim to have abilities which it does not possess, it should not give political answers, it should not manipulate user sentiment, et cetera.

3.????? Should have the right observability: The AI system should never be put to production without being tested thoroughly, it should be continuously monitored for any deviation in its behavior, it should have an autonomous fail-safe mechanism in place, et cetera.

4.????? Should be protected from bad influences, manipulators, and control freaks; especially from going into bad hands. It would be worth your time to check how ‘adversarial trainings’ are helping ChatGPT in handling situations like ‘Jailbreaking.’

Conclusion

To build something like ChatGPT is an ambitious task, which should not be taken up without a thorough understanding of all the distinct aspects. The key to success of any such endeavor would depend on carefully choosing the right hardware and software stack, selecting right quality and quantity of training data, thorough testing, fine tuning of end product, and last but not the least by presenting it to the end user in simplest possible form. The good news is, that the hardware and software vendors are already working together to make it easy for the masses to benefit from #genai technologies. As we move forward in this journey, let’s be mindful to ensure that required safety nets are also put in place for these disruptive technologies. The choice of whether to go ahead or not would of course depend on your personal or organization’s vision. Best of Luck!


Disclaimer: The views shared here are my personal opinion and not of my organization. This write up was prepared in personal capacity based on experience and publicly available data, so neither it should be considered as professional advice nor are there any guarantees of its completeness, accuracy, or usefulness.

References:

1.????https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

2.???? https://developer.nvidia.com/blog/selecting-large-language-model-customization-techniques/

3.???? https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

4.???? https://www.dhirubhai.net/pulse/gpu-fabrics-genai-workloads-sharada-yeluri-j8ghc/

5.???? https://huggingface.co/blog/rlhf

6.???? https://www.promptingguide.ai/

Brijesh Mishra

VmWare, OCP Cloud NFV SDN 5G

6 个月

Truely sir!!! We are already seeing classic application of GenAI for predictive maintenance and enhanced customer experience in various sectors!! The integration of AI and GenAI into Open RAN and the trajectory towards 6G is set to revolutionize the industry. With the innovative application of AI in Open RAN, we can expect enhanced network optimization and automation. This paves the way for more efficient and dynamic telecommunications infrastructure. As we look forward to 6G, the role of GenAI will be pivotal in unlocking unprecedented capabilities and ushering in a new era of connectivity.?

回复
Suvarthi Sen

Service delivery engineer at Dell Emc |Ex Cloud engineer IAAS at Capgemini India solution private limited.

7 个月

Sir, Your thoughtful discussion on building a product akin to ChatGPT resonates deeply. The Large Language Models (LLMs) you emphasized, play a crucial role in shaping AI solutions. These sophisticated algorithms, trained on extensive datasets, excel at?Natural Language Processing (NLP)?tasks. However, your excellent observation—that relying entirely on off-the-shelf LLMs will not be sufficient for our specialized industry needs—is spot on. Customization?emerges as the beacon guiding us toward tailored AI solutions. Whether we choose to create a bespoke LLM or fine-tune an existing one, our focus remains unwavering: addressing our precise business requirements. Your vision inspires us to harness the power of AI, driving innovation and enhancing our outcomes. Thank you for this enlightening perspective. ?? Let’s continue this journey together! ??

Murale Narayanan

????. ???????????????? / India Site Leader @ ???????? Technologies | Transforming Cloud Infrastructure with Data & AI | Innovation Advocate | Published Author | Driving Global Technology & SRE Delivery Excellence

7 个月

Also, time to upskill ourselves in the AI World - to excel in their careers, AI professionals should prioritize key skills such as a solid foundation in machine learning, proficiency in data science, mastery of programming languages like Python and R, and domain-specific knowledge. Staying updated with the latest tools and frameworks is crucial in this rapidly evolving field. Alongside technical mastery, cultivating soft skills like critical thinking, problem-solving, and strong communication is equally important for developing responsible and impactful AI solutions. In the dynamic landscape of AI, curiosity, and adaptability are key assets for continued success. With technical expertise and essential soft skills, AI professionals can position themselves for success in their careers and contribute meaningfully to the advancement of artificial intelligence.

Murale Narayanan

????. ???????????????? / India Site Leader @ ???????? Technologies | Transforming Cloud Infrastructure with Data & AI | Innovation Advocate | Published Author | Driving Global Technology & SRE Delivery Excellence

7 个月

This is a great, informative article that breaks down the complexities of building an LLM system like ChatGPT into clear and understandable sections. I particularly found the sections on hardware and software stack considerations, as well as the ethical considerations, to be very insightful. It highlights the significant investment and expertise required for such an undertaking. However, the potential benefits of responsible AI development make it an exciting field to follow. Thank you for sharing this informative piece!

Kishore Kumar Sivakumar

Sr. Principal SRE at Dell Technologies

7 个月

This document is exceptionally well-crafted, offering profound insights into GenAI and ChatGPT. The detailed explanations on data preparation, transformation methods, technical stack utilization, and the process of training and evaluating LLM are remarkably engaging. Your adept use of examples in each section further enhances clarity and understanding. The narrative provides a compelling glimpse into the possibilities of customizing existing LLMs or even creating our own. Overall, this article serves as an informative starting point for anyone embarking on a journey with GenAI. Kudos for such an insightful piece Nishant.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了