Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework
Anindita Desarkar, PhD
PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI
The Artificial Intelligence (AI) Act is a European Union (EU) law that establishes a legal framework for AI use.?The law aims to ensure that AI systems are trustworthy, safe, and respect fundamental rights.?It also aims to reduce administrative and financial burdens for businesses, particularly small and medium-sized enterprises (SMEs).?
However, memorizing the entire set of laws is a tedious task as its huge in nature. Hence, proposing a ScrapFly powered web scrapping and RAG based framework for easy exploration of all the laws present on the website.
Prerequisites:
Outline of the Proposed Technique:
The following Figure 1 presents the outline of the proposed technique.
Methodology:
Let us move to the experimentation.
Step 1: Install the libraries
!pip install scrapfly-sdk
Step 2: Importing the libraries
领英推荐
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse
Step 3: Specify your API Keys and create ScrapflyClient instance
openai.api_key = [Your OpenAI API Key]
scrapfly = ScrapflyClient(key=[Your Scrapfly API Key])
Step 4: Scrap the web content. Here we are scrapping Article 5 from AI Act portal from the site https://ai-act-law.eu/
api_response: ScrapeApiResponse = scrapfly.scrape(
ScrapeConfig(
# target website URL
url="https://ai-act-law.eu/article/5/",
# bypass anti scraping protection
asp=True,
# set the proxy location to a specific country
country="US",
# specify the proxy pool
proxy_pool="public_residential_pool",
# enable JavaScript rendering (use a cloud browser)
render_js=True,
# specify the web scraping format
format="markdown"
)
)
# get the results
data = api_response.scrape_result['content']
print(data)
The output is presented in the following Figure 2.
The actual content in the AI Act poral looks like below.
Hence, we can say that we have successfully scrapped the content of Article 5 from the AI Act portal. We can do the same for other contents/webpages.
Step 5: Now, we have received the content successfully from the portal. Hence, can deploy the classical RAG framework where we can save the output in the Vector DB.
Step 6: By using RAG, we can retrieve the appropriate chunks from the vector DB based on user query and can pass those to the LLM so that user can receive the appropriate response for the same.
However, we can user the LlamaIndex/LangChain framework to do the same while using the ScrapflyReader/ScrapflyLoader classes.
Wish All a Happy Engineer's Day! Keep Experimenting!
Director Product Engineering at LTIMindtree
6 个月Great one
Generative AI Strategist | Sr Director | ?? Data & Analytics | ?? | ?? Story Teller ???
6 个月Very cool use case Dr. Anindita Desarkar, PhD i feel there are so many rules, laws and ACTs which may be difficult to remember and comprehend and the whole world would benefit from the RAG model. The world may not need one RAG that solves world hunger like Google , but 1000s of specialised rag. It could make knowledge level go supremely high
Applied Machine Learning | Generative AI
6 个月The need of the hour !!!