In today's fast-paced information age, staying updated with accurate and relevant news is crucial for professionals across various sectors. Media monitoring is not just about keeping track of news; it's about quickly identifying the most reliable information and gaining insights without being overwhelmed. That's where my project comes in – a tool designed to transform a list of news URLs into a comprehensive, duplicate-free report with summaries and images, focusing solely on reputable sources.
?? Addressing User Needs:
- Efficient Report Creation: Users input a list of news URLs and receive a detailed report in return. This was a manual process, taking around 2 hours / day.
- Quality Content Assurance: Filters news from reputable sources to maintain information credibility.
- No Duplicates: Utilizes OpenAI's GPT models to detect and eliminate duplicate content.
- Summarisation & Visuals: Articles are concisely translated and summarised with accompanying main images, providing quick yet comprehensive insights.
?? Why GPT for Deduplication and Summarisation?:
- Deduplication: GPT models excel in understanding text structures, making them perfect for identifying duplicate content.
- Summarization: These models are adept at condensing large volumes of information into short, digestible summaries while retaining key points and context.
- Backend Processing: Python scripts for core logic.
- AI Integration: OpenAI's GPT models for text analysis, translation and summarisation.
- Web Interface: Flask framework for a user-friendly experience.
- Document Assembly: Python-Docx for creating the final Word document.
- Hosted on Heroku: For a smooth and scalable user experience.
User Input: Begins with a user-provided list of URLs. In this specific example, news come from Ukrainian sources extracted by a Telegram bot:
Content Processing: AI models handle deduplication and summarization.
Report Generation: Produces a well-organized document with summaries, images, and links:
- Balancing AI token usage with high-volume data. In some cases there is the need to process several days of news, so the list can become long very easily.
- Cost-efficient model selection for different tasks.
- Implementing asynchronous tasks for user convenience. In this way, the user can submit the files to process and come back later to download the report.
- Automated Data Collection: Implementing a feature where users can configure a list of keywords to monitor. The tool will then autonomously gather news articles related to these keywords, making the process even more hands-off and tailored to specific interests.
- Personalised AI Prompts for Each User: The tool will create summaries based on individual user preferences and interests.
- Automated Email Reports: Enhancing convenience by setting up an automated system to email the reports directly to users.
- Websites Blacklisting: Adding the ability to blacklist certain websites. This ensures that users don’t receive content from sources they deem unreliable or irrelevant, further customizing their news feed.
- Real-Time Notifications: Implementing a system for real-time alerts or notifications for breaking news or highly relevant articles, keeping users informed instantaneously.