Extract and Summarize Web Content using Python
To extract the body text from the main content of a web page specified by a domain (URL), you can use To efficiently extract the main content from a webpage, Python developers can utilize the requests and BeautifulSoup libraries. The process involves simulating a browser request using custom headers to fetch the HTML content from a given URL. Once the page is successfully retrieved, the BeautifulSoup library parses the HTML and extracts the text specifically from the <body> tag.
import requests
from bs4 import BeautifulSoup
def extract_body_text(url):
# Define headers to simulate a browser request
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}
# Send a GET request to the URL with the custom headers
response = requests.get(url, headers=headers)
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content of the page
soup = BeautifulSoup(response.text, 'html.parser')
# Extract the text from the <body> tag
body_text = soup.body.get_text(separator=' ', strip=True)
return body_text
else:
return "Failed to retrieve content"
# Example URL
url = "https://www.acast.com/"
# Get the body text of the domain
text = extract_body_text(url)
print("Body Text:", text)
Moreover, developers can take advantage of the transformers library to further process the extracted text. By using pre-trained models like BERT or GPT, they can perform tasks such as summarization or sentiment analysis. In the provided example, a summarization pipeline condenses the extracted text into a brief summary, showcasing the powerful combination of web scraping and machine learning for practical applications in data processing and analysis.
领英推荐
from transformers import pipeline
# Load a summarization pipeline
summarizer = pipeline("summarization")
# Generate summary
summary = summarizer(text, max_length=130, min_length=30, do_sample=False)
print('\n')
print("Summary:", summary[0]['summary_text'])
?For the specific website acast.com ?here we get the following for the body text and the summary:
Body Text: Skip to content Features Switch to Acast Monetize your podcast Blog Advertise with Acast Log in Podcaster Acast+ listener Advertiser Sign up The home of podcasting Whether you're looking to start a new podcast or move your existing podcast, Acast has you covered. Create and publish to every podcast app there is - including Apple Podcasts, Spotify and Amazon Music. Sign up Whether you're looking to start a new podcast or move your existing podcast, Acast has you covered. Create and publish to every podcast app there is - including Apple Podcasts, Spotify and Amazon Music. Sign up Our network 125,000 podcasts listened to more than 400 million times every month. Learn more The tools you need Host and distribute Your podcast, on every platform. Promote Find and grow your audience. Analyze Insights and data for faster, smarter decisions. Monetize Make the most of your content, on your terms. Advertising Podcast advertising that delivers results to shout about With Acast, you can grow your brand and business by reaching your perfect audience on any podcast listening app Ready to get started? Book host-read sponsorships or run pre-recorded ads using Acast's self-serve advertising platform today, Get Started The easiest way to record, edit and mix your podcast Get free access to Podcastle to create your episodes, right from your web browser. Learn more Why creators love Acast Daily Tech News Show Daily Tech News Show explains how Acast has helped the podcast grow. Equity Mates Equity Mates on what they love about Acast’s insights features. The Adam Buxton Podcast Adam Buxton explains how Acast helps him monetize his podcast. Unlock your podcast’s full potential with Acast Move to Acast to make the most of your podcast. It's a simple process. Learn about switching Your one-stop guide to creating an original podcast. Learn more Blog Inspiring, insightful stories from the Acast community Podcaster stories, learning resources, news and views from across the Acast community. How to make money podcasting A complete guide to podcast monetization, including ads, sponsorships and subscriptions. How to publish your podcast to every app Getting your podcast on all the popular app is an important first step for a successful podcast. Podcast Gear to Improve your Audio Quality From audio interfaces, to microphones, and recording methods, let’s take a look at gear you can use to level up your podcast sound quality. Read more Can't find what you're looking for? I'm a podcaster a brand or agency a candidate an investor an Acast+ listener interested in growing my audience hosting my show logging in Show me what you’ve got Sign up to our newsletters First name Email Address * Country Australia Canada Denmark France Germany Ireland Mexico New Zealand Norway Sweden UK US Others.. By checking this box you agree you may occasionally receive additional communications from Acast in accordance with our Privacy Policies . Subscribe Successful subscription Our story Podcasting Advertising News Careers Investor relations Business Partnerships Affiliates Legal Privacy Do not sell/share my personal information Security Cookie Settings English German Spanish French Swedish Norwegian Bokm?l Italian Portuguese (Brazil) Get in touch Sign up
Summary: 125,000 podcasts listened to more than 400 million times every month . Create and publish to every podcast app there is - including Apple Podcasts, Spotify and Amazon Music . Monetize Make the most of your content, on your terms . Book host-read sponsorships or run pre-recorded ads using Acast's self-serve advertising platform .
Potential Application
Summarizing website content offers numerous practical applications across various industries. For instance, digital marketers can leverage summaries to understand and monitor competitors' content strategy efficiently, enabling them to refine their SEO and content marketing efforts. In the realm of academia, researchers can streamline their review of literature by quickly extracting essential information from numerous publications. News organizations can use automated summaries to provide readers with concise versions of lengthy articles, enhancing user engagement and content accessibility. Additionally, customer service departments might employ summarization tools to quickly parse and respond to customer inquiries and feedback, improving response times and customer satisfaction. Overall, the ability to summarize web content automatically can significantly boost productivity and information accessibility in multiple fields.