Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework
EU AI Act

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

The Artificial Intelligence (AI) Act is a European Union (EU) law that establishes a legal framework for AI use.?The law aims to ensure that AI systems are trustworthy, safe, and respect fundamental rights.?It also aims to reduce administrative and financial burdens for businesses, particularly small and medium-sized enterprises (SMEs).?

However, memorizing the entire set of laws is a tedious task as its huge in nature. Hence, proposing a ScrapFly powered web scrapping and RAG based framework for easy exploration of all the laws present on the website.


Prerequisites:

  1. An Open AI API Key: Please go to the site and generate your Open AI API Key: https://openai.com/index/openai-api/
  2. A ScrapFly API Key: This is free for a certain amount of usage. Please generate your API Key from the below site: https://scrapfly.io/docs/scrape-api/getting-started


Outline of the Proposed Technique:

The following Figure 1 presents the outline of the proposed technique.

Figure 1:


Methodology:

Let us move to the experimentation.

Step 1: Install the libraries

!pip install scrapfly-sdk        

Step 2: Importing the libraries

from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse        

Step 3: Specify your API Keys and create ScrapflyClient instance

openai.api_key =  [Your OpenAI API Key]
scrapfly = ScrapflyClient(key=[Your Scrapfly API Key])        

Step 4: Scrap the web content. Here we are scrapping Article 5 from AI Act portal from the site https://ai-act-law.eu/

api_response: ScrapeApiResponse = scrapfly.scrape(
    ScrapeConfig(
        # target website URL
        url="https://ai-act-law.eu/article/5/",
        # bypass anti scraping protection
        asp=True,
        # set the proxy location to a specific country
        country="US",
        # specify the proxy pool
        proxy_pool="public_residential_pool",
        # enable JavaScript rendering (use a cloud browser)
        render_js=True,
        # specify the web scraping format
        format="markdown"
    )
)

# get the results
data = api_response.scrape_result['content']
print(data)        

The output is presented in the following Figure 2.

Figure 2: Output of Web scrapping from Article 5

The actual content in the AI Act poral looks like below.

Article 5 of AI Act Portal

Hence, we can say that we have successfully scrapped the content of Article 5 from the AI Act portal. We can do the same for other contents/webpages.

Step 5: Now, we have received the content successfully from the portal. Hence, can deploy the classical RAG framework where we can save the output in the Vector DB.

Step 6: By using RAG, we can retrieve the appropriate chunks from the vector DB based on user query and can pass those to the LLM so that user can receive the appropriate response for the same.

However, we can user the LlamaIndex/LangChain framework to do the same while using the ScrapflyReader/ScrapflyLoader classes.


Wish All a Happy Engineer's Day! Keep Experimenting!


Somsuvra Chatterjee

Director Product Engineering at LTIMindtree

6 个月

Great one

Arvind S.

Generative AI Strategist | Sr Director | ?? Data & Analytics | ?? | ?? Story Teller ???

6 个月

Very cool use case Dr. Anindita Desarkar, PhD i feel there are so many rules, laws and ACTs which may be difficult to remember and comprehend and the whole world would benefit from the RAG model. The world may not need one RAG that solves world hunger like Google , but 1000s of specialised rag. It could make knowledge level go supremely high

Aritra Sen

Applied Machine Learning | Generative AI

6 个月

The need of the hour !!!

要查看或添加评论,请登录

Anindita Desarkar, PhD的更多文章

社区洞察

其他会员也浏览了