登录查看更多内容

Text Summarization With Deep Learning; LoftG: LoRA Fine-Tuning-Aware-Quantization; Generalizable CoT; Midjourney 6 VS. DALL-E 3; and More.

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

发布日期: 2023年12月26日

Editor's Paper Recommendations

Surveying the Landscape of Text Summarization with Deep Learning: A Comprehensive Review: In recent years, deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data, leading to significant improvements in performance across a wide range of NLP tasks. Deep learning models for NLP typically train deep neural networks with large amounts of data, allowing them to learn the patterns and relationships in language data. This contrasts with traditional NLP approaches, which rely on hand-engineered features and rules to perform NLP tasks. The ability of deep neural networks to learn hierarchical representations of language data, handle variable-length input sequences, and perform well on large datasets makes them well-suited for NLP applications. Text summarization has been a crucial research area in NLP due to the exponential growth of textual data and the rising demand for condensed, coherent, and informative summaries. Applying deep learning to text summarization involves using deep neural networks to perform tasks. This survey reviews fashionable text summarization tasks in recent years, including extractive, abstractive, multi-document, etc. Next, we discuss the most deep learning-based models and their experimental results on these tasks. The paper also covers datasets and data representation for summarization tasks. Finally, we delve into the opportunities and challenges associated with summarization tasks and their corresponding methodologies, aiming to inspire future research efforts to advance the field. Our survey aims to explain how these methods differ in their requirements, as understanding them is essential for choosing a technique suited for a specific setting.

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models: Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. This work focuses on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In these situations, there is often a steady difference in how well downstream tasks are done between full fine-tuning and quantization plus the LoRA fine-tuning approach. As a response, we came up with LoftQ (LoRA-Fine-Tuning-aware Quantization). This new quantization framework quantizes an LLM and finds the right low-rank initialization for LoRA fine-tuning simultaneously. This initialization eliminates the difference between the quantized and full-precision models and makes generalization much better in tasks that come after. We evaluate our method on natural language understanding, question answering, summarization, and generation tasks. Experiments have shown that our method performs significantly better than other quantization techniques, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.

Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models: Chain-of-thought (CoT) prompting, which creates intermediate reasoning chains that serve as the rationale for arriving at the answer, has shown that large language models (LLMs) can do amazing things with reasoning. However, current CoT methods employ general prompts such as Let's Think Step by Step or heavily rely on handcrafted task-specific demonstrations to attain preferable performances, engendering an inescapable gap between performance and generalization. To bridge this gap, we propose Meta-CoT, a generalizable CoT prompting method in mixed-task scenarios where the type of input questions is unknown. Meta-CoT first categorizes the scenario based on the input question and subsequently constructs diverse demonstrations from the corresponding data pool in an automatic pattern. Meta-CoT simultaneously performs remarkably well on ten public benchmark reasoning tasks and has superior generalization capabilities. Meta-CoT achieves the state-of-the-art result on SVAMP (93.7%) without any additional program-aided methods. Our further experiments on five out-of-distribution datasets verify the stability and generality of Meta-CoT.

InstructRetro: Instruction Tuning post-Retrieval-Augmented Pretraining: By using outside databases, pretraining auto-regressive large language models (LLMs) with retrieval shows better perplexity and factual accuracy. However, the current pre-trained retrieval-augmented LLM is still small (for example, Retro only has 7.5B parameters), which makes instruction tuning and zero-shot generalization less useful. This work introduces Retro 48B, the largest LLM pre-trained with retrieval before instruction tuning. Specifically, we continue to pre-train the 43B GPT model on an additional 100 billion tokens using the Retro augmentation method by retrieving 1.2 trillion tokens. The obtained foundation model, Retro 48B, largely outperforms the original 43B GPT regarding perplexity. After instruction tuning on Retro, InstructRetro significantly improves over the instruction-tuned GPT on zero-shot question-answering (QA) tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks and 10% over GPT across 4 challenging long-form QA tasks. Surprisingly, one can ablate the encoder from the InstructRetro architecture and directly use its decoder backbone while achieving comparable results. We hypothesize that pretraining with retrieval makes its decoder good at incorporating context for QA. Our results highlight the promising direction to obtain a better GPT decoder for QA through continued pretraining with retrieval before instruction tuning.

Are you looking to advertise a product, job opening, or event to an audience of over 40,000 AI researchers and engineers? Please reach out to us on?LinkedIn?to explore your options.

Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.

领英推荐

The Evolution and Impact of Natural Language…

Kavindu Rathnasiri 9 个月前

Transfer Learning in Large Language Models (LLMs)

Rany ElHousieny, PhD??? 1 年前

Exploring Hidden Markov Models and the Bayesian…

James Cupps 2 年前

Industry Insights

Growth Zone

Expert Advice

The AI Vanguard

43,542 位关注者

Data & Analytics

1 年

Loving the variety of topics covered in this issue! ??

AAMIR KHAN

Full Stack Data Scientist ? Entrepreneur ? AI Researcher ? Consultant ? Performance Engineer

1 年

The insights on Text Summarization n GPT-4 cost reduction are game-changers. Thanks for keeping us at the forefront of AI advancements!

Data & Analytics

1 年

Always on top of the latest AI trends! ??

Trudent Clinics

TrudentClinics ?irketinde Dental Treatment Services

1 年

Thank you for sharing

查看更多评论

要查看或添加评论，请登录

Danny Butvinik的更多文章

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

2024年4月18日

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

Editor's Paper Recommendations Assessing GPT4-V on Structured Reasoning Tasks: Multi-modality promises to unlock…

7 条评论
First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

2024年4月4日

First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

Editor's Paper Recommendations Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs: The ability of large…

3 条评论
LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

2024年3月12日

LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

Editor's Paper Recommendations Efficient Large Language Models Fine-Tuning on Graphs: Learning from Text-Attributed…

5 条评论
Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

2024年3月3日

Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

Editor's Paper Recommendations ChatGPT’s First Anniversary: Are Open-Source Large Language Models Catching Up?: Upon…

7 条评论
Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

2024年2月27日

Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

Editor's Paper Recommendations Multimodal Large Language Models: A Survey: The exploration of multimodal language…

1 条评论
ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

2024年2月20日

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

Editor's Paper Recommendations Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders…

5 条评论
Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

2024年2月13日

Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

Editor's Paper Recommendations The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using…

1 条评论
Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

2024年2月6日

Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

Editor's Paper Recommendations From Text to Structure: Using Large Language Models to Support the Development of Legal…

13 条评论
Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

2024年1月30日

Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

Editor's Paper Recommendations Fine-Tuning Language Models Using Formal Methods Feedback: Although pre-trained language…

9 条评论
What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

2024年1月23日

What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

Editor's Paper Recommendations Knowledge Editing for Large Language Models: A Survey: Large language models (LLMs) have…

10 条评论

See all articles

Text Summarization With Deep Learning; LoftG: LoRA Fine-Tuning-Aware-Quantization; Generalizable CoT; Midjourney 6 VS. DALL-E 3; and More.

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

Editor's Paper Recommendations

领英推荐

Industry Insights

Growth Zone

Expert Advice

The AI Vanguard

43,542 位关注者

Danny Butvinik的更多文章

社区洞察

其他会员也浏览了

GPT and Open AI are here what do expect more- A primer

The Genesis of ChatGPT: Tracing Back to Basic Neural Networks

The Evolution of Natural Language Processing: From Text to Multimodal AI

The 7 NLP Techniques That Will Change How You Communicate in The Future (Part II)

Understanding Transformations, Agents, and Deep Learning Frameworks: When and How to Use Them

5 Important Papers in NLP that Everyone Should Read

Understanding BERT: Revolutionizing Natural Language Processing

7 Of The Leading Language Models for NLP

Teaching AI to crack Jokes!

In-Depth Analysis of Select Large Language Models (LLMs)

Editor's Paper Recommendations

领英推荐

Industry Insights

Growth Zone

Expert Advice

The AI Vanguard

43,542 位关注者

Danny Butvinik的更多文章

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

社区洞察

其他会员也浏览了

GPT and Open AI are here what do expect more- A primer

The Genesis of ChatGPT: Tracing Back to Basic Neural Networks

The Evolution of Natural Language Processing: From Text to Multimodal AI

The 7 NLP Techniques That Will Change How You Communicate in The Future (Part II)

Understanding Transformations, Agents, and Deep Learning Frameworks: When and How to Use Them

5 Important Papers in NLP that Everyone Should Read

Understanding BERT: Revolutionizing Natural Language Processing

7 Of The Leading Language Models for NLP

Teaching AI to crack Jokes!

In-Depth Analysis of Select Large Language Models (LLMs)