Data Phoenix Digest - ISSUE 6.2023
Dmytro Spodarets
DevOps Architect @ Grid Dynamics | Founder of Data Phoenix - The voice of AI and Data industry
Hey folks,
I'm excited to share that the Data Phoenix Digest is back every week after a short break. We're turning Data Phoenix into a community focused on Data & AI as an education project.
We're going to change how we do our weekly updates a little, too. We will make sure you know about all the latest news and events in our community so you can get involved. Plus, we will share key insights from the best research papers, articles, and news, helping you keep up with what's new and learn more in the Data & AI field!
Be active in our community and join our Slack to discuss the latest news of our community, top research papers, articles, events, jobs, and more...
?? Want to promote your company, conference, job, or event to the Data Phoenix community of Data & AI researchers and engineers? Click?here?for details.
Data Phoenix community news
Upcoming events:
Video records of past events:
Featured Article
?? Get ready for some thrilling updates on our?Slack! By becoming a member, you'll be entered for a chance to win one of three copies of the book,?"Experimentation for Engineers."?But that's not all! As a special bonus, we're also offering an exclusive 35% discount code (bldataphoenix23) on all?Manning Publications?products in any format. Don't let this incredible opportunity slip away!
Summary of the top papers and articles
Articles
This article explains how you can implement a GPT from scratch in just 60 lines of numpy. The trained GPT-2 model is then tried in practice to generate some text. If you are looking for a simple introduction to the GPT as an educational tool, this article is for you.
领英推荐
Koala is a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. This article describes the dataset curation and training process of the model, and also presents the results of a user study that compares the model to ChatGPT and Alpaca.
The goal for Spotify’s ML Platform is to create a seamless user experience for AIML practitioners who want to take an ML application from development to production. In this article, you can find a comprehensive dive-in into how this ML Platform works. Check it out!
This article delves into the steps involved in training a LlaMa model to answer questions on Stack Exchange with RLHF through Supervised Fine-tuning (SFT); Reward / preference modeling (RM); and, Reinforcement Learning from Human Feedback (RLHF). Check it out!
LLMs have made a significant impact in the field of AI, but most companies currently lack the ability to train these models themselves, relying instead on a few major tech firms as providers. Replit has made significant investments in developing the infrastructure necessary to train their own LLMs from scratch. In this blog post, they explain how they did it.
Papers & projects
SegGPT is a generalist model for segmenting everything in context. It can perform arbitrary segmentation tasks in images or videos via in-context inference, such as object instance, stuff, part, contour, and text. SegGPT is evaluated on a broad range of tasks.
This paper offers a somewhat comprehensive but simple catalog and classification of the most popular Transformer models. The paper also includes an introduction to the most important aspects and innovation in Transformer models.
In this paper, the authors derive a taxonomy of graph transformer architectures, bringing some order to this emerging field. They overview their theoretical properties, survey structural and positional encodings, and discuss extensions for important graph classes.
In this paper, the authors propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Learn how their new approach plays out!
In this paper, the authors propose a method for removing the mask-annotation requirement in Video Instance Segmentation (VIS). MaskFreeVIS achieves highly competitive VIS performance, while using bounding box annotations for the object state. The Temporal KNN-patch Loss (TK-Loss) is used to provide strong mask supervision without any labels.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit downstream code tasks. Such flexibility is enabled by a mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pre-training tasks.
?? If you enjoy our work, we would greatly appreciate your support by sharing our digest with your friends on Twitter, LinkedIn, or Facebook using the hashtag?#dataphoenix. Your help in reaching a wider audience is invaluable to us!
Dmitry Spodarets Thanks for Sharing! ?