The Rise of AI Engineers With GitHub Models, LLM Web Scraping, and More

The Rise of AI Engineers With GitHub Models, LLM Web Scraping, and More

Hi there! Thanks for stopping by. Here, in Scraping Digest, we share top news on everything tech & public data gathering. If you’re new to the community, make sure to subscribe and join the conversation by sharing your thoughts and ideas for future editions in the comments below.


? Industry news

GitHub Models: A New Generation of AI Engineers Building on GitHub

A brand new launch from GitHub – GitHub Models, empowering a community of over 100 million developers to become AI engineers and create with the help of cutting-edge AI models.

Access a variety of models, including Llama 3.1, GPT-4o, GPT-4o mini, Phi 3, and Mistral Large 2, through a built-in playground in GitHub. The feature allows you to experiment with different prompts and model settings for free.

?? Check out the GitHub Models introduction video.


??Useful tutorials & tips

Web Scraping SDK: Definition and Benefits at a Glance

Building software efficiently without leaving major pitfalls is no picnic. That's why modern engineering leverages Software Development Kits (SDKs), which include everything needed to simplify development processes for specific platforms, operating systems, or frameworks.?

Not long ago, we released two software development kits – Python and Go SDKs. This will help simplify integrating with Oxylabs's APIs, which can help you with retrieving search engine results (SERP), eCommerce data, real estate data, and more.

LLM Web Scraping: Integrate Assistants API With Scraper Data

The Assistants API, created by OpenAI, allows developers to access the power of AI and connect external tools to build any assistant application.?

This guide showcases several different methods for integrating scraped public web data to the Assistants API with the help of Oxylabs Scraper APIs. In the end, you’ll have gained experience with both APIs by building a practical Product Assistant that performs data analysis of scraped web pages to answer questions.


??? Code & tools

ZeroIntensity/pyawaitable: CPython API for asynchronous functions

The only library that enables writing and invoking asynchronous Python functions directly from pure C code (apart from manually creating an awaitable class from scratch, which is essentially the functionality PyAwaitable provides).

iterative/datachain: AI-dataframe for ML training and LLM apps

DataChain is a modern Python data-frame library tailored for artificial intelligence, designed to structure unorganized data into datasets and efficiently manage it on your local machine at scale.

uname-n/deltabase: A lightweight database built on polars and deltalake

DeltaBase is a versatile tool for managing Delta Tables across local and cloud environments. Tailored for data engineers, analysts, and developers, it guarantees data consistency, effective versioning, and smooth integration into your workflows.


OxyCon 2024 agenda unveiled!

OxyCon enters its 5th anniversary year with new expert topics and presentations.?

The topics will range from deeply technical discussions, such as “Ensuring Scalability in Data Collection” and “Imitating Real User Behavior with Mouse Movements,” to legal insights like “Legal Compliance in the Age of AI.”

The full agenda is already available on our website, so make sure to check it out!

Review agenda

Haven’t registered for OxyCon yet?

Save your FREE spot


???? Join our Discord community and be the first to know exclusive events and updates!

Every month we host Discord Live or Q&A with an expert session — don’t miss out!

Join now


Have questions or suggestions for future issues? Reach out to me via LinkedIn.?

Looking forward to hearing from you!

Cheers,

Liza


要查看或添加评论,请登录

Oxylabs.cn的更多文章