Leak Reveals Runway AI Trained on Stolen YouTube Videos
Hello and welcome to the Digital Frontiers of Data! ??
?? Get ready for some electrifying updates! We’re diving into fascinating stories like how Instagram creators can now make AI doppelgangers, APNIC names news director general and much, much more.
So, buckle up and get ready for an exciting ride through the world of data and tech! ???
Instagram creators can now make AI doppelgangers to chat with their followers????
The next time you DM a creator on Instagram, you might get a reply from their AI. Meta is rolling out AI Studio, a toolset that allows Instagram creators to create AI personas to chat with their followers.
Meta introduced AI Studio at its Connect event last fall and recently began testing creator-made AIs with a few prominent Instagrammers. Now, the tools are being made available to more US-based creators, allowing users to experiment with specialized AI “characters.”
According to Meta, these new creator AIs are designed to help popular Instagram users manage the overwhelming number of messages they receive daily. “They’ll be able to make an AI that functions as an extension of themselves,” says Connor Hayes, VP of Product for AI Studio at Meta.
Creators can use their comments, captions, Reel transcripts, and custom instructions or links to enable the AI to respond on their behalf," Hayes told Engadget.
Mark Zuckerberg has big ambitions for these chatbots, predicting "hundreds of millions" of creator-made AIs on Meta’s apps. However, it's uncertain if Instagram users will engage with AI versions of their favorite creators. Meta's previous experiment with AI chatbots modeled after celebrities like Snoop Dogg and Kendall Jenner was underwhelming, leading to their phase-out, as reported by The Information.
Lightrun launches its AI debugger to help developers fix their production code?????
Lightrun, a Tel Aviv-based startup that assists developers in debugging production code within their IDE, announced the launch of its first AI-based tool: the Runtime Autonomous AI Debugger. Currently in private beta, this tool aims to help developers fix production code issues in minutes rather than hours.
Lightrun also revealed an $18 million SAFE round raised last year from GTM Capital, with participation from existing investors Insight Partners and Glilot Capital. This brings Lightrun’s total funding to $45 million. The company plans to raise a Series B round next year.
Lightrun CEO Ilan Peleg stated that the company previously reduced Mean Time to Recovery to 30-45 minutes. Their new tool will automate the process from ticket creation to pinpointing the exact line of code causing issues.
Peleg hopes to eventually incorporate generative AI for automatic bug fixes, though this is not yet available. Currently, Lightrun is refining models for debugging, using insights from both code and monitoring data. They also plan to connect this system with enterprise tools like ticketing systems.
After several iterations, Lightrun’s system is now cost-effective and ready for regular use, a significant improvement from earlier, more expensive solutions.
Telcos are increasingly viewing automation as the key to generate more efficiency and value from their networks????
Monetizing within the telecom industry is often complex and challenging. Automation offers a smoother path for communications service providers (CSPs) by simplifying the development and coordination of various network systems, applications, and services. CSPs increasingly see automation as a key to enhancing efficiency and value, potentially leading to cost savings, time reductions, and revenue growth.
At the recent DTW Ignite conference in Copenhagen, Dell Technologies' Global Director of Specialty Sales, Celwin Tirath, and Blue Planet's VP of Products, Alliances & Architectures, Gabriele Di Piazza, discussed their collaboration. They are streamlining the addition of orchestration and automation to virtual network functions (VNFs) and cloud-native network functions (CNFs), helping CSPs transition to digital businesses.
Blue Planet, a division of Ciena, aids CSPs in automating network and service operations through its intelligent automation platform. This platform allows CSPs to discover, map, and model network resources, then automates the configuration, provisioning, data collection, and monitoring of VNFs and CNFs across various network domains.
Blue Planet, a division of Ciena, aids CSPs in automating network and service operations through its intelligent automation platform. This platform allows CSPs to discover, map, and model network resources, then automates the configuration, provisioning, data collection, and monitoring of VNFs and CNFs across various network domains.
Google releases new ‘open’ AI models with a focus on safety????
Google has released three new "open" generative AI models—Gemma 2 2B, ShieldGemma, and Gemma Scope—promising to be safer, smaller, and more transparent. These models, part of the Gemma 2 series introduced in May, offer different applications but all emphasize safety.
Unlike Google's proprietary Gemini models, the Gemma series is open-source, aimed at building goodwill within the developer community. Gemma 2 2B is a lightweight text-generation model compatible with various hardware and available through Google’s Vertex AI, Kaggle, and AI Studio. ShieldGemma, built on Gemma 2, includes safety classifiers to detect and filter harmful content.
Gemma Scope allows developers to closely examine specific aspects of a Gemma 2 model, making its inner workings more interpretable. According to Google, Gemma Scope uses specialized neural networks to break down complex data processed by Gemma 2 into more understandable forms, providing insights into how the model identifies patterns, processes information, and makes predictions.
This release follows a recent U.S. Commerce Department report endorsing open AI models, which noted that such models increase accessibility for smaller companies, researchers, and individual developers, while also emphasizing the need for monitoring to manage potential risks.
Asia's regional internet registry APNIC names new director general????
The Asia Pacific Network Information Center (APNIC) has appointed Jia Rong Low as its new Director General. Low, formerly ICANN's regional managing director and vice president, will lead APNIC in managing internet resources—such as IP addresses and autonomous system numbers—across 56 regional economies, developing resource policies, and conducting educational and advocacy efforts.
Paul Wilson, who led APNIC for 26 years, stepped down last month after announcing his departure in March.
Wilson endorsed Jia Rong Low as his successor, praising his drive and deep regional knowledge. “I believe the APNIC EC has made an excellent choice in appointing him as the next APNIC DG,” Wilson said. “He is well-suited to lead the great team built over many years.”
APNIC EC chair Kenny Huang also welcomed Low, describing him as bringing "an exceptional blend of experience, leadership, and vision" to the role. Huang noted that Low’s extensive involvement with regional and global internet initiatives provides him with the strategic insight needed to tackle complex challenges. "Jia Rong's collaborative leadership and commitment to inclusivity are crucial for addressing the needs of APNIC's diverse membership," Huang added.
APNIC's membership includes a diverse range of organizations, from large telcos and ISPs in China to small nations in the Pacific.
Challenges facing the registry include ongoing discussions about the multi-stakeholder model of internet governance, especially with the United Nations potentially reducing the technical community's input. Other issues include managing the dwindling IPv4 address pool and implementing recent governance reforms.
Housing APNIC might also be a consideration for Low. The registry had planned to move into purpose-built headquarters funded by the Asia Pacific Internet Development Trust (APIDT), which was granted ownership of an IPv4 block previously held by Japan's WIDE project in 2020. APIDT sold the IP block for $396 million, intending to use the funds to support internet development in the Asia Pacific region, including constructing a new building and offering space to APNIC at market rates.
In February 2024, Wilson mentioned to The Register that APNIC's current office was outdated and welcomed the idea of renting from APIDT, keeping the rent within the regional internet governance community. However, the plan was scrapped due to escalating construction costs, and APNIC has decided to continue using its current office for the foreseeable future.
Microsoft says massive Azure outage was caused by DDoS attack????
Microsoft confirmed that a nine-hour outage on Tuesday, which disrupted various Microsoft 365 and Azure services worldwide, was caused by a distributed denial-of-service (DDoS) attack.
The outage affected services including Microsoft Entra, Microsoft 365, Microsoft Purview (such as Intune, Power BI, and Power Platform), Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, and the Azure portal.
In a statement released today, Microsoft identified the DDoS attack as the cause of the disruption but has not yet identified the specific threat actor responsible.
Microsoft confirmed that a DDoS attack triggered a nine-hour outage, but a flaw in their defense implementation worsened the impact. They adjusted their network configurations and switched paths to mitigate the issue. BleepingComputer's inquiry about the attack remains unanswered.
Microsoft confirmed that the recent outage was due to an unexpected usage spike that caused Azure Front Door (AFD) and Azure CDN components to fail, resulting in errors and latency. They plan to release a Preliminary Post-Incident Review within 72 hours and a Final Review in two weeks with further details. The company previously faced similar issues from attacks by Anonymous Sudan and configuration changes affecting Microsoft 365 services.
Google upgrades Search to combat deepfakes and demote sites posting them????
Generative AI has made it harder to identify synthetic content and protect user privacy. To address this, Google has updated Search to better combat deepfakes and enhance data protection.
Starting Wednesday, Google will improve how it handles explicit fake content, including non-consensual deepfakes. Previously, users could request removal of such content from search results, but now Google will also filter out all duplicates of the image and similar explicit results, not just the original removal request. This update aims to more effectively remove harmful content from all relevant search results, including non-consensual and fake explicit imagery.
Google has updated its ranking systems to reduce the risk of explicit fake content appearing in Search. For queries with a higher risk of such content, the new system will prioritize high-quality, non-explicit results, especially for searches involving people’s names.
These updates have already cut explicit content exposure by over 70%. The changes are designed to promote educational content about deepfakes rather than the deepfakes themselves. Additionally, Google will demote sites with frequent removal requests.
As part of its Search enhancements, Google is now integrating the "About this image" contextual feature into both Circle to Search and Google Lens. Users can easily access this feature through both tools.
CISA adds VMware ESXi bug to its Known Exploited Vulnerabilities catalog?????
The U.S. Cybersecurity and Infrastructure Security Agency (CISA) has added a VMware ESXi authentication bypass vulnerability (CVE-2024-37085) with a CVSS score of 6.8 to its Known Exploited Vulnerabilities catalog.
Microsoft has reported that multiple ransomware gangs are exploiting this flaw to gain full administrative access to ESXi hypervisors. The vulnerability allows attackers with sufficient Active Directory permissions to regain control of an ESXi host by re-creating a deleted AD group, such as ‘ESXi Admins’.
The company has released patches for ESXi 8.0 and VMware Cloud Foundation 5.x but will not provide updates for the older versions, ESXi 7.0 and VMware Cloud Foundation 4.x. Users of these outdated versions are advised to upgrade to receive security updates and support. Microsoft has reported that ransomware groups like Storm-0506, Storm-1175, and Octo Tempest are exploiting this vulnerability, using it to deploy ransomware such as Akira and Black Basta.
Binding Operational Directive (BOD) 22-01 requires FCEB agencies to address known exploited vulnerabilities by the specified deadlines to protect their networks. Experts also advise private organizations to review the Catalog and address these vulnerabilities in their own systems.
Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films???♂???
A leaked internal spreadsheet reveals that Runway's popular Gen-3 Alpha text-to-video AI tool was trained using pirated content and stolen YouTube videos. Despite the tool's impressive performance, Runway had not disclosed the sources of its training data when it was launched.
The document obtained by 404 Media suggests there was a reason for Runway's lack of transparency. The spreadsheet contains extensive lists of popular content from major YouTube channels, including Disney, Netflix, and Sony, as well as links to websites known for hosting pirated content.
While 404 Media couldn’t verify if Gen-3 Alpha was trained on all the listed assets, the evidence strongly suggests it. This points to a troubling trend of AI companies using copyrighted content without regard for intellectual property rights—an ongoing issue in generative AI. Although it’s unclear which specific videos were included in the training data, 404 Media was able to generate convincing videos of popular YouTube personalities. Additionally, Runway is said to have used a proxy to obscure its activities and evade YouTube's detection.
An unnamed former employee revealed that the spreadsheet's channels were part of a company-wide effort to source high-quality videos for model training. A web crawler then downloaded all videos from these channels using proxies to bypass Google's blocks.
Last year, Runway secured $141 million in funding from major investors including Google, Salesforce, and NVIDIA, leading to a valuation of $1.5 billion.
Runway isn’t the only company facing scrutiny over the use of copyrighted material for AI training. OpenAI’s CTO Mira Murati admitted in a Wall Street Journal interview that she wasn’t sure if the training data for their Sora video generator included content from YouTube, Instagram, or Facebook—a statement that sparked skepticism. Shortly after, the New York Times reported that OpenAI had bypassed corporate policies and copyright laws by using tools to transcribe YouTube videos for training its AI chatbots.
YouTube CEO Neal Mohan has warned AI companies that using YouTube videos for training AI models breaches the platform’s terms of service.
This report adds to the growing evidence that companies like Runway and OpenAI are handling copyrighted material recklessly. Intellectual property issues are becoming a significant challenge for the development of generative AI, particularly for models that create full videos.
The technology is prompting lawmakers to reconsider "fair use," a doctrine allowing limited use of copyrighted content under U.S. law. While AI companies have argued that much of the data scraped is legally permissible, many copyright holders disagree, leading to an escalating legal conflict.
Runway's use of stolen and pirated videos has further intensified scrutiny and legal challenges against it.
How to Scrape Google Search Results with Python????
In today's data-centric world, access to information is essential. Data scientists, marketers, and SEO experts can gain valuable insights by extracting and analyzing Google search results. This process helps in understanding patterns, evaluating competitors, and assessing keyword effectiveness.
However, scraping Google search results involves both technical and legal challenges. This article will guide you through the complexities of using Python to collect search data, addressing legal concerns, technical issues, and practical application using a generic API.
What is a Google SERP?
When you search on Google, the results page displays a variety of elements, including search results, ads, featured snippets, knowledge panels, images, and videos. Understanding the layout of this page is crucial for effectively scraping or finding information.
Components of a Google SERP
Organic Search Results: Here are the main types of search results that Google prioritizes: they usually include a heading, web address, and a brief content excerpt.
Paid Advertisements: Google advertisements appear at both the top and bottom of the search engine results page (SERP) and are marked with a small "Ad" label.
Featured Snippets: Sometimes, you'll find boxes at the top of search results that provide a direct answer to your query.
Is it Legal to Scrape Google Search Results?
Extracting data from Google search results involves legal and ethical risks. Google's terms of service prohibit unauthorized automated access, and violating these terms can lead to IP bans or legal action.
Additionally, Google's robots.txt file restricts certain areas from web crawlers, and privacy laws like GDPR may apply depending on the data collected. Ethically, it's important to respect Google's terms and avoid overloading their servers, which could affect service performance for others.
Scrape Google Search Results: The Difficulties
Automated web scraping can strain servers and potentially degrade service performance for other users. Extracting data from Google search results involves various challenges and complexities.
Technical Challenges:
Overcoming Technical Challenges:
Using Python to scrape Google search results can aid in gaining insights and competitive analysis but involves legal and technical challenges. Adhering to Google's terms of service and using an API responsibly are crucial. This guide provides experienced users with the tools and methods needed for ethical data extraction, helping to inform decision-making and strategic planning.
How IP Checker tool working? Detail of key features.???♂???
A leaked internal spreadsheet reveals that Runway's popular Gen-3 Alpha text-to-video AI tool was trained using pirated content and stolen YouTube videos. Despite the tool's impressive performance, Runway had not disclosed the sources of its training data when it was launched.
The document obtained by 404 Media suggests there was a reason for Runway's lack of transparency. The spreadsheet contains extensive lists of popular content from major YouTube channels, including Disney, Netflix, and Sony, as well as links to websites known for hosting pirated content.
While 404 Media couldn’t verify if Gen-3 Alpha was trained on all the listed assets, the evidence strongly suggests it. This points to a troubling trend of AI companies using copyrighted content without regard for intellectual property rights—an ongoing issue in generative AI. Although it’s unclear which specific videos were included in the training data, 404 Media was able to generate convincing videos of popular YouTube personalities. Additionally, Runway is said to have used a proxy to obscure its activities and evade YouTube's detection.
An unnamed former employee revealed that the spreadsheet's channels were part of a company-wide effort to source high-quality videos for model training. A web crawler then downloaded all videos from these channels using proxies to bypass Google's blocks.
Last year, Runway secured $141 million in funding from major investors including Google, Salesforce, and NVIDIA, leading to a valuation of $1.5 billion.
Runway isn’t the only company facing scrutiny over the use of copyrighted material for AI training. OpenAI’s CTO Mira Murati admitted in a Wall Street Journal interview that she wasn’t sure if the training data for their Sora video generator included content from YouTube, Instagram, or Facebook—a statement that sparked skepticism. Shortly after, the New York Times reported that OpenAI had bypassed corporate policies and copyright laws by using tools to transcribe YouTube videos for training its AI chatbots.
YouTube CEO Neal Mohan has warned AI companies that using YouTube videos for training AI models breaches the platform’s terms of service.
This report adds to the growing evidence that companies like Runway and OpenAI are handling copyrighted material recklessly. Intellectual property issues are becoming a significant challenge for the development of generative AI, particularly for models that create full videos.
The technology is prompting lawmakers to reconsider "fair use," a doctrine allowing limited use of copyrighted content under U.S. law. While AI companies have argued that much of the data scraped is legally permissible, many copyright holders disagree, leading to an escalating legal conflict.
Runway's use of stolen and pirated videos has further intensified scrutiny and legal challenges against it.
Welcome to "Three Truths and a Lie"! We'll serve you up with four headlines, but only one is fibbing. Can you spot the rogue headline among the truth-tellers? Let the guessing games begin!
Answer: Nope, Reddit’s CEO isn’t trying to get Microsoft to pay in Doge Coins and pizza coupons—though that would make for a delicious tech deal! In reality, the CEO is just focusing on regular business matters. So, no Doge Coin pizza parties here—just good old-fashioned negotiations. ??????
Until next time, stay curious, stay tech-savvy, and we'll catch you in the next edition! ????
Want to gather data without breaking a sweat? Jump on board with our proxy solutions and let's make data collection a breeze!
No boring stuff here – just tech with a side of swagger! ????