Data Engineering Concepts: Using an API to Send Data
How To Send Data Using An API

Data Engineering Concepts: Using an API to Send Data

Contents

  • Introduction
  • What is an API?
  • When to use an API instead of a data pipeline
  • Example
  • Setting the environment
  • Building the API
  • Using the API
  • Improving the API
  • Adding an API key
  • Controlling the amount of data

Introduction


Data transmission between a source and a destination is one of the key duties of data engineers. Data engineers are capable of doing it in numerous ways.

Depending on the issue, this position frequently calls for a data engineer to construct and maintain a sophisticated data pipeline. However, we can send data between devices or services without using data pipelines.

By creating a straightforward API that enables authorized users to request data from our services, we can frequently fulfill the request.

What is an API?

An API is merely an interface that enables users to submit HTTP requests to servers via the internet. A user can interact with numerous server services, such as accessing a database or running a function, using these HTTP requests.

Which operations users can initiate when they send HTTP requests is under the authority of the developers who construct the API.

For instance, we could design an API that, when given the right request, activates a function that calls a query to get the five customer ids that have been the most active in the past month from a table called "customers".


When to use an API instead of a data pipeline?

When to use an API instead of a data pipeline

Although we should be cautious of when to utilize them, APIs can be a terrific alternative to pipelines.

First, we can only send comparatively tiny quantities of data in each request because APIs are used to send data across the internet. Additionally, the API will be sluggish and ineffective if the data needs to go through a highly sophisticated processing process. Instead, we ought to construct a data pipeline in certain circumstances.

However, APIs can take the role of a pipeline when the required data is light and scheduling is unnecessary.

Users can independently pull data thanks to APIs. Users don't need to ask a data engineer to run a specific pipeline; they can interact with a service whenever they want.

Of course, a hybrid strategy is always an option. For transporting and processing a lot of data into the repository of our choice, we can build a data pipeline. Then provide an API that allows people to access modest portions of the processed data.

Example

Let's use Flask to create a straightforward API to put this into practice. Users will be able to send a GET request to our service using this API. The API will scrape the website "example.com" and extract the specified number of letters if the request is legitimate.

Example Domain

Setting the environment

Let's start by setting up a virtual environment:

Then activate it:

C:/projects/api_example# source bin/activate


The prompt should resemble the following in order to confirm that we are now in the virtual environment:

(api_example) C:/projects/api_example#

Flask, bs4, and requests are the next three libraries that need to be pip-installed.


(api_example) C:/projects/api_example# pip install flask bs4 requests


Next, make an "app" folder and a file called "app.py":


Great. We can now create the API.

Building the API

Let's start by creating the method to scrape the website example.com and extract all of the text we can.

The code:

If the connection was successful:

Appears to be operating well. Let's now begin creating the API. I'll build a local app that listens on port 5000 using flask.

The aforementioned function will be activated and the text that we just scraped will be delivered to any user who makes a GET request to the URL localhost:5000/.

To run the app, go to the folder “app” and run:

And then…

(api_example) C:/projects/api_example# cd app (api_example) C:/projects/api_example/app# flask run

The text that was collected appears in our straightforward app at localhost:5000/:

Using the API

Let's assume for a moment that we are users who require this data and wish to use the API that the developers created. We must perform a GET request to localhost:5000/ in order to get this data.

There are numerous ways we may achieve it. There are many tools available for this, but the most basic one is to utilize the "curl" Linux command.

Put this data in a text file called "scraped_data.txt" using the curl command.

(api_example) C:/projects/api_example# curl -o scraped_data.txt localhost:5000/

Output:

% Total? ? % Received % Xferd? Average Speed ? Time? ? Time ? ? Time? Current

?????????????????????????????????Dload? Upload ? Total ? Spent? ? Left? Speed100 ? 175? 100 ? 175? ? 0 ? ? 0? ? 634? ? ? 0 --:--:-- --:--:-- --:--:-- ? 636

All the text that was scraped ought to now be in the text file:

Improving the API

Returning to "playing" the devs, let's. We are entrusted with implementing some security as the programmers that created this API. We cannot permit anyone to retrieve data by sending a straightforward GET request.

Adding an API key

Adding an API key is a highly popular technique to increase security.

Let's assume that the API key in this straightforward example is 12345. To restrict access to data to requests sent to the URL localhost:5000/api_key=12345, we need to change the code. Other requests will all be rejected.

This will guarantee that the only users who may submit GET queries are those who are aware of the API key that we selected.

Using the API key 12345 this time, let's send a GET request:

(api_example) C:/projects/api_example# curl -o scraped_data.txt localhost:5000/api_key=12345

Output:

% Total? ? % Received % Xferd? Average Speed ? Time? ? Time ? ? Time? Current ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Dload? Upload ? Total ? Spent? ? Left? Speed 100 ? 175? 100 ? 175? ? 0 ? ? 0? ? 653? ? ? 0 --:--:-- --:--:-- --:--:-- ? 655

Great. Now, only individuals with permission and knowledge of the API key (12345) can scrape our data.

Controlling the amount of data

Let's next provide users the ability to choose how much info they receive. Users will have the option of selecting how many letters they want to receive in place of obtaining all the data. This may appear as follows:

Now suppose we simply want the first 100 letters. We can make the following GET request:

(api_example) C:/projects/api_example# curl -o scraped_data.txt localhost:5000/api_key=12345/number_of_letters=100

And the result is a text file with only the first 100 letters:

As we can see, using APIs to access services offered by developers and send tiny amounts of data online is helpful.

The article comes to an end here. I hope you enjoyed it and discovered something new. Please feel free to ask any questions in the comment area if you have any.

要查看或添加评论,请登录

Rakesh Roshan Acharya的更多文章

社区洞察

其他会员也浏览了