登录查看更多内容

?? Create Your Own Custom Text Summarization Tool with Python!

Jesse Russell, PhD

Senior Data & Research Scientist | Samsara | Ex-Meta

发布日期: 2024年8月17日

Recently, I explored how to build a custom text summarization tool using Python, which can use AI to summarize research papers, articles, or any other documents. Here's a quick breakdown of how it works:

Download and Extract: Using the requests and pdfplumber libraries, the tool first downloads a PDF from a given URL and extracts the full text from it.
Summarize with Transformers: Leveraging the power of BART, a pre-trained Transformer model by Hugging Face, the tool then generates a summary of the extracted text. BART is specifically designed for tasks like summarization and translation.
Beam Search: The summarization process uses beam search, an algorithm that helps generate the most likely sequence of words by balancing exploration of possibilities and exploitation of the most promising paths.

?? How can you build it? With just a few lines of code, you can set up your own summarization tool. Here’s a version of the scrip with some notes:

领英推荐

Course: Introduction to LLMs in Python

Vincent Granville 2 个月前

McCulloch-Pitts: The First Computational Neuron

Dr. Kais Dukes 1 年前

Modular GANs with Neural Blocks in Python

Patrick Nicolas 1 个月前

# Import required libraries
# For making HTTP requests, so we can download files from the web
import requests
# For extracting text from PDF files
import pdfplumber
# A library by Hugging Face that provides pre-trained models and tokenizers for natural language processing tasks
from transformers import BartTokenizer, BartForConditionalGeneration

# Define the URL of the PDF to access -- replace this with your own URL to a PDF you want to summarize
url = 'https://arxiv.org/pdf/2102.04342'

# Download the PDF
# Make a GET request to the URL
# Open a file in write-binary mode
# And write the content of the PDF to the file
response = requests.get(url)
pdf_path = 'document.pdf'
with open(pdf_path, 'wb') as f:
    f.write(response.content)

# Extract text from the PDF
# Create an empty string to store the extracted text
text = ''
# Open the PDF and extract the text from the document page by page into the text string
with pdfplumber.open(pdf_path) as pdf:
    for page in pdf.pages:
        page_text = page.extract_text()
        if page_text:
            text += page_text

# Load the pre-trained summarization model and tokenizer
# Load the tokenizer for BART, which will convert text into tokens that the model can understand
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
# Load the pre-trained BART model specifically fine-tuned for text summarization
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

# Function to summarize text
# Convert the input text into token IDs that the model can process
# Specify that the token IDs should be returned as PyTorch tensors
# Generate a summary of the text based on the token IDs, using beam search to optimize the output
# Convert the generated token IDs back into human-readable text
def summarize_with_transformers(text, max_length=150, min_length=50):
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(inputs, max_length=max_length, min_length=min_length, length_penalty=2.0, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

# Generate and print summary
# Call the summarization function on the extracted text, specifying the desired length of the summary
summary = summarize_with_transformers(text, max_length=150, min_length=50)

# Finally, print your summary
print(summary)

?? Why it matters:

Efficiency: Instantly get a summary of complex documents, saving time and effort.
Customization: Tailor the tool to your specific needs, adjusting parameters like summary length and beam width.

Whether you're a data scientist, researcher, or just curious about AI, this tool is a great example of how you can apply state-of-the-art NLP models in real-world scenarios. If you’re interested in diving deeper into this topic or building your own custom solution, feel free to reach out!

#AI #MachineLearning #NLP #Python #TextSummarization #DataScience

Johannes Poscharnig

Senior Consultant & Coach | Former professional athlete | EdTech Founder | Enabling people and organizations to learn and change

1 个月

Jesse Russell, PhD Gerhard Bruno Fehr another easy way is to use a process automation of perplexity and claude/chat gpt. Why both ?? Perplexity can access the newest data from a variety of resources The capabilities of Claude or chat gpt to summarize are better… if you use the right prompt for that obviously.

2 次回应

要查看或添加评论，请登录

查看全部

?? Create Your Own Custom Text Summarization Tool with Python!

Jesse Russell, PhD

Senior Data & Research Scientist | Samsara | Ex-Meta

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Who needs accountants with Cloudployees built with python?

Building a neural network in python is quite simple

Rust vs. Python: The Battle for Deep Learning Dominance

Develop AI Using Python: A Step-by-Step Guide

How to Perform Zero-Shot Text Classification Using Hugging Face Transformers Library in Python

AI Unboxed: Exploring the Latest (and Greatest) in Generative AI Together!

Implementing LSTM with TensorFlow and Python

Setting Up the Python NLP Environment

How can I learn artificial intelligence with a little bit of knowledge of Python?

Deep Learning — neural network python

领英推荐

From Stream Hopping to AI Mastery: Unlocking the Power of Reinforcement Learning

2024年9月16日

?? Understanding Neural Networks: Beyond the Mystery ??

2024年9月3日

I built a chatbot using a custom data source, powered by LlamaIndex, OpenAI, and Streamlit, and so can you!

2024年5月27日

Race, Equity, and Ethics Questions on Child Welfare and Predictive Analytics

2017年3月23日

Demographics, policy, and foster care rates; A Predictive Analytics Approach

2016年1月1日

Reflective Decision-Making

2015年5月23日

Three Things You Can Do Now to Make Your Goals More Meaningful in 2015

2015年1月13日

社区洞察

其他会员也浏览了

Who needs accountants with Cloudployees built with python?

Building a neural network in python is quite simple

Rust vs. Python: The Battle for Deep Learning Dominance

Develop AI Using Python: A Step-by-Step Guide

How to Perform Zero-Shot Text Classification Using Hugging Face Transformers Library in Python

AI Unboxed: Exploring the Latest (and Greatest) in Generative AI Together!

Implementing LSTM with TensorFlow and Python

Setting Up the Python NLP Environment

How can I learn artificial intelligence with a little bit of knowledge of Python?

Deep Learning — neural network python