The Creative, Occasionally Messy World of Textual?Data

The Creative, Occasionally Messy World of Textual?Data

For several years, the intersection of text and data stayed (more or less) within the realm of natural language processing (NLP)?—?the wide range of machine learning tasks that leverage textual data for prediction, classification, and recommendation tools.

The rise of large language models has introduced a host of exciting new possibilities into the field, with novel use cases and innovative workflows popping up at a rapid clip. Our highlights this week represent a wide cross-section of concepts and approaches that dig deeper into this emerging area. From prompt engineering to text-to-image and text-to-speech applications, we’re thrilled to share work by authors who explore the creative possibilities of textual data as both inputs and outputs of these powerful models. Let’s dive in.

  • Lost in DALL-E 3 Translation. What happens when you use text-to-image tools like DALL-E 3 in languages other than English? Yennie Jun continues to explore the discrepancies in model performance for users working in under-resourced languages and the ways in which gender and other biases seep through into the generated images.
  • How to Convert Any Text Into a Graph of Concepts. In his latest post, Rahul Nayak dives deep into the world of Knowledge-Graph Augmented Generation, walking us through the process of transforming a text corpus into a Graph of Concepts (GC) and then visualizing it to detect patterns and draw meaningful insights.

  • RAG: How to Talk to Your Data. We’ve covered retrieval-augmented generation many times in recent months, but Mariya Mansurova ’s addition to the conversation is still very much worth your time: it presents a compelling, practical workflow for analyzing customer feedback using ChatGPT.
  • FastSpeech: Paper Overview & Implementation. Text-to-speech tools have made major strides in recent years. To gain a solid understanding of how they work and how transformers are employed to improve their performance, don’t miss Essam Wisam ’s accessible introduction to the FastSpeech paper from 2019, which facilitated much of the progress we’ve seen in this domain.
  • Unlocking the Power of Text Data with LLMs. If you’re a beginner who’d like to start experimenting with cutting-edge text-data techniques, Sofia Rosa ’s step-by-step guide will get you rolling up your sleeves in no time. It walks us through an entire workflow, from downloading data to working with GPT-3 and analyzing results.
  • A Universal Roadmap for Prompt Engineering: The Contextual Scaffolds Framework (CSF). Prompt engineering has emerged as a crucial component in the interplay between human intuition and large language models’ capabilities. Giuseppe Scalamogna goes beyond basic prompting tips and tricks to present the contextual scaffolds framework (CSF), a “general purpose mental model for effective prompt engineering.”


We hope you have some time to branch out into other topics this week?—?here are some of our recent standouts on data visualization, generated-content detection, and more:


Thank you for supporting the work of our authors! If you enjoy the articles you read on TDS, consider becoming a Medium member?—?it unlocks our entire archive (and every other post on Medium, too).

Until the next Variable,

TDS Editors

Alan Delon Rolim Grangeiro

Gerente de Vendas do Canal Indireto | Representante de Vendas | Headhunter de Recrutamento e Sele??o | Desenvolvimento B2B | Customer Experience CX | Customer Success CS

1 年
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了