登录查看更多内容

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

Anindita Desarkar, PhD

PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI

发布日期: 2024年9月15日

The Artificial Intelligence (AI) Act is a European Union (EU) law that establishes a legal framework for AI use.?The law aims to ensure that AI systems are trustworthy, safe, and respect fundamental rights.?It also aims to reduce administrative and financial burdens for businesses, particularly small and medium-sized enterprises (SMEs).?

However, memorizing the entire set of laws is a tedious task as its huge in nature. Hence, proposing a ScrapFly powered web scrapping and RAG based framework for easy exploration of all the laws present on the website.

Prerequisites:

An Open AI API Key: Please go to the site and generate your Open AI API Key: https://openai.com/index/openai-api/
A ScrapFly API Key: This is free for a certain amount of usage. Please generate your API Key from the below site: https://scrapfly.io/docs/scrape-api/getting-started

Outline of the Proposed Technique:

The following Figure 1 presents the outline of the proposed technique.

Methodology:

Let us move to the experimentation.

Step 1: Install the libraries

!pip install scrapfly-sdk

Step 2: Importing the libraries

领英推荐

Gemini 2.0 Brings the Era of Multimodal AI Agents

Unwind AI 3 个月前

Legal Considerations for AI Adoption in Nigeria, An…

Afriwise 4 个月前

Navigating the Evolving AI Regulatory Landscape:…

Shanthi Kumar V - I Build AI Competencies/Practices scale up AICXOs 2 个月前

from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

Step 3: Specify your API Keys and create ScrapflyClient instance

openai.api_key =  [Your OpenAI API Key]
scrapfly = ScrapflyClient(key=[Your Scrapfly API Key])

Step 4: Scrap the web content. Here we are scrapping Article 5 from AI Act portal from the site https://ai-act-law.eu/

api_response: ScrapeApiResponse = scrapfly.scrape(
    ScrapeConfig(
        # target website URL
        url="https://ai-act-law.eu/article/5/",
        # bypass anti scraping protection
        asp=True,
        # set the proxy location to a specific country
        country="US",
        # specify the proxy pool
        proxy_pool="public_residential_pool",
        # enable JavaScript rendering (use a cloud browser)
        render_js=True,
        # specify the web scraping format
        format="markdown"
    )
)

# get the results
data = api_response.scrape_result['content']
print(data)

The output is presented in the following Figure 2.

Figure 2: Output of Web scrapping from Article 5

The actual content in the AI Act poral looks like below.

Hence, we can say that we have successfully scrapped the content of Article 5 from the AI Act portal. We can do the same for other contents/webpages.

Step 5: Now, we have received the content successfully from the portal. Hence, can deploy the classical RAG framework where we can save the output in the Vector DB.

Step 6: By using RAG, we can retrieve the appropriate chunks from the vector DB based on user query and can pass those to the LLM so that user can receive the appropriate response for the same.

However, we can user the LlamaIndex/LangChain framework to do the same while using the ScrapflyReader/ScrapflyLoader classes.

Wish All a Happy Engineer's Day! Keep Experimenting!

Somsuvra Chatterjee

Director Product Engineering at LTIMindtree

6 个月

Great one

1 次回应

Arvind S.

Generative AI Strategist | Sr Director | ?? Data & Analytics | ?? | ?? Story Teller ???

6 个月

Very cool use case Dr. Anindita Desarkar, PhD i feel there are so many rules, laws and ACTs which may be difficult to remember and comprehend and the whole world would benefit from the RAG model. The world may not need one RAG that solves world hunger like Google , but 1000s of specialised rag. It could make knowledge level go supremely high

1 次回应

Aritra Sen

Applied Machine Learning | Generative AI

6 个月

The need of the hour !!!

1 次回应

查看更多评论

要查看或添加评论，请登录

Anindita Desarkar, PhD的更多文章

Deep Drive into DeepSeek for Deep Reasoning

2025年2月7日

Deep Drive into DeepSeek for Deep Reasoning

1. Introduction Large Language Models (LLMs) are quickly evolving, inching closer to the goal of Artificial General…

10 条评论
Unlocking Research Potential with Agentic Framework: Crew AI

2025年1月12日

Unlocking Research Potential with Agentic Framework: Crew AI

The Objective of this blog is to explore the tools of Crew AI Agentic framework and how these can be deployed towards…

5 条评论
Agents and Workflow: Real Applications to understand When to use What

2024年12月25日

Agents and Workflow: Real Applications to understand When to use What

1. What are Agents and Workflows? The term "agent" can be interpreted in different ways.

6 条评论
Understanding GraphRAG and Its Challenges

2024年10月5日

Understanding GraphRAG and Its Challenges

What is RAG? RAG is a natural language querying approach for enhancing existing LLMs with external knowledge, so…

5 条评论
Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

2024年9月13日

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response Accuracy in LLM response and hallucination are two…

1 条评论
Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

2024年8月11日

Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

If we ask any copilot today, “Please write the steps of genetic algorithm”. There are two possible ways of getting the…

3 条评论
How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

2024年7月21日

How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

Probably we all have faced the following questions at some point of time while working. · What is the best way for…

5 条评论
Green Computing: A Myth or Achievable Reality

2024年7月14日

Green Computing: A Myth or Achievable Reality

Introduction: In the arena of Generative AI, increased carbon footprint poses a significant threat to the society which…

1 条评论
How Research differs from usual Development

2024年6月22日

How Research differs from usual Development

Research is a process of systematic inquiry that entails collection of data; documentation of critical information; and…

3 条评论
Detect the Biased behavior of Large Language Models (LLM) through a set of Questionnaire

2024年4月7日

Detect the Biased behavior of Large Language Models (LLM) through a set of Questionnaire

Biasness in Large Language Model: Bias in Large Language Model (LLM) output refers to the incident where the response…

6 条评论

See all articles

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

Anindita Desarkar, PhD

PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI

Prerequisites:

Outline of the Proposed Technique:

Methodology:

领英推荐

Wish All a Happy Engineer's Day! Keep Experimenting!

Anindita Desarkar, PhD的更多文章

社区洞察

其他会员也浏览了

The state of AI web Agents

Ten Questions for Matt Mullenweg Re: Data Ownership and AI

Leverage the Potential of LLMs: More Than Just Text Generation!

The AI Data Wars Are Just Getting Started

Information Governance vs. OpenAI

Google’s “Information Gain” Ranking Factor Patent

AI's role in shaping the search engine landscape

AI vs AI: The Ultimate Jury System That Changes Everything!

AI Governance: My Top 5 Posts in 2024

Harmonizing AI Regulations: Lessons from the Digital Policy Alert

Prerequisites:

Outline of the Proposed Technique:

Methodology:

领英推荐

Wish All a Happy Engineer's Day! Keep Experimenting!

Anindita Desarkar, PhD的更多文章

Deep Drive into DeepSeek for Deep Reasoning

Unlocking Research Potential with Agentic Framework: Crew AI

Agents and Workflow: Real Applications to understand When to use What

Understanding GraphRAG and Its Challenges

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

Green Computing: A Myth or Achievable Reality

How Research differs from usual Development

Detect the Biased behavior of Large Language Models (LLM) through a set of Questionnaire

社区洞察

其他会员也浏览了

The state of AI web Agents

Ten Questions for Matt Mullenweg Re: Data Ownership and AI

Leverage the Potential of LLMs: More Than Just Text Generation!

The AI Data Wars Are Just Getting Started

Information Governance vs. OpenAI

Google’s “Information Gain” Ranking Factor Patent

AI's role in shaping the search engine landscape

AI vs AI: The Ultimate Jury System That Changes Everything!

AI Governance: My Top 5 Posts in 2024

Harmonizing AI Regulations: Lessons from the Digital Policy Alert