#34: ??Year-end reflection: TrueFoundry
Year-end reflection of our thesis on MLOps??
It’s time to reflect on TrueFoundry’s journey over the past year. This reflection isn’t just a celebration of our achievements but also an acknowledgment of the challenges we’ve navigated, appreciation of the opportunities we have been presented with, and the learnings we’ve embraced.
This blog focuses on the chronological journey of learnings and realizations indexed on our thesis on MLOPs- and how things played out in reality. It covers:
? GTM experiments to run based on our learnings of working with design partners.
? The hypothesis that we validated from our customers and prospecting calls.
? Uniformity of LLMOPs, MLOPs & DevOps.
Enterprise GenAI and LLMOps with Labhesh Patel ???? (ex-CTO Jumio Corporation )??
In this video, we have Labhesh Patel ???? , ex-CTO at Jumio Corporation, to talk about his previous stint at Jumio as a CTO and the following topics:
? Challenges related to data management, data quality, and the crucial role of data in machine learning pipelines.
? Generative AI and Visual Q&A, including use of segmentation maps and attention mechanisms in image-related tasks.
? Labhesh's extensive portfolio of more than 250 research papers and patents.
? Overcoming roadblocks post-implementation with a specific cloud provider
Generative AI applications in the identity detection industry.
? Hiring challenges and skillset disparities in ML Teams.
? Small Language Models (SLMs) vs. Large Language Models (LLMs).
? Transitioning from very large models to very small models, considering factors like simplicity, efficiency, and latency.
领英推荐
Handpicked Resources on MLOps & LLMs ??
Below are summary of some of the informative conversations on the most popular MLOps community and research papers:
?? Avoiding the meltdown of a vector DB
Summary: Postgres can act as a vector database using the open-source pg_vector extension. A useful pattern while using Postgres as a vector DB is to create a partial index for records that are recent. For example, we can create a partial index for records with age of less than 7 days.
?? Prototyping with LLMs on AWS
Summary: To reduce inference times on Huggingface models, you can use various tools like VLLM or DeepSpeed. If you are handling concurrent requests, you can obviously provision more GPU cards to support faster inference but this will eventually have diminishing returns due to communication overheads. In this case, you can put your inference servers behind a load balancers and distribute request using any load balancing algorithm and this should scale linearly with GPUs.
?? Retrieval-Augmented Generation for LLMs: A Survey
Summary: This survey paper explores the state of RAG systems in detail. It talks about various modules that can be added to RAG - advanced data processing, various indexing techniques and mutliple, iterative or hierarchical retrieval. It talks about processing corpora to obtain best semantic representation, how to match semantic representation of query to that of retrieved data and post-processing techniques for retrieved documents.
That's All for today ??
Brief about TrueFoundry!
Just as a reminder, for the new members of our community, TrueFoundry is a comprehensive ML/LLM Deployment PaaS, empowering ML Teams in enterprises to test, and deploy ML/LLMs with ease while ensuring the following benefits:
?Full security for the infra team.
?40% lower costs due to Resource Management.
?90% faster deployments with SRE best practices.
For LLM/GPT style model deployment, we allow users to select pre-configured models from our catalog and fine-tune them on their datasets