Apache Nifi
Javad Shekarian
fullstack developer | develope messenger (bychat.one) | +4 experience and still better than chatGBT ??
Hello Everyone !!!!!
In this article, I would like to introduce Apache NiFi to you and give you the experiences I had in using this great tool.
I have been programming for more than five years and about two years ago I heard the name of apache nifi and I was very surprised because I had not heard anything about Nifi in four years and after getting to know it I realized what amazing things can be done with it. That's why I decided to introduce this excellent tool to you, because despite its great capabilities, this tool is not widely known among programmers, and only companies that do large projects with modular architecture or microservices are familiar with this tools!
What can be done with Apache NiFi?
If I want to say in one word: everything! For example, you can create a web server with it, or you can do postman work with it, that is, you can send api to a specific url and receive the response (with the InvokeHttp processor), or you can publish information in Kafka or Rabbitmq, or even you can Listen to specific physical or virtual port!
What are the most important things to do with Apache NiFi?
The most important use of Apache NiFi is to work with different databases, which is widely used in microservice and module architecture. For example, suppose I want to create an app where anyone can login as an admin and sell their products, and there are also normal users who can login and buy products.
To do this, I have four databases, one of which is mysql, the second is redis, the third is postgresql, and the fourth is mongodb. Suppose I have four different services, for example, I have a service for admin and regular user registration, which is connected to the postgres database, I have a service to control tokens, which is connected to the redis database, and the next service is product registration, which is connected to the mysql database. And the last service is to do financial work, which is connected to mongodb.
All these services are separate and have no connection. Suppose a person registers as an administrator in the registration service, this data must be entered in three other databases at the same time, because in those services we need to know who is using that service and whether they have permission to use that service or not! You can use Apache NiFi for this! When a new record is entered in a database, it immediately transfers that data to other databases.
Suppose I have another service that is connected to Kafka and is listening to see if new information has entered the mysql database (this database is for product registration) or not (in Kafka, listening is called consume and sending information is called publish). So, I will listen to a specific database with NiFi (of course, Apache Nifi does not listen to the database, but here I used listening for a better understanding) to see if a new product has entered the database or not. As soon as someone registers a new product for purchase, Apache Nifi takes the personal information of the buyer and the product information and publishes that information in Kafka. After the product is published with Apache Nifi, the service it is consuming sends the buyer's information and product information in the form of an email or notification to the administrator of the product seller so that the administrator can give the necessary advice to the buyer!
This is an example of thousands of things that can be done with Apache NiFi!
More technical description:
Processor: It is the basic part of the data flow of Apache NiFi and each processor does a special job. For example, there is a processor that reads data from databases (such as ExecuteSql, QueryDatabaseTable, of course, these two processors have a small difference that I will explain in more detail later) or there is a processor that changes the data format, for example, it converts the json format to csv or it converts avro format to json. The name of this processor is ConvertRecord, or there is another processor that extracts information from json and stores it as key and value in attributes. The name of this processor is EvaluateJsonPath.
Queue: Processors are connected by a queue, and the result of the task done by the processor is transferred to the next processor through the queue, depending on the type of result.
Relationship: Processors, after doing their own task, the task may be successful or failed, the very interesting thing is that every relationship is a queue and depending on the result of the task, it enters into the success queue or enters into the failure queue. Success queue and failure queue are actually the same relationship!
For example, I want to query the users table in the shop database using Apache NiFi, and for this purpose I use the ExecuteSql processor. If the query operation is successful, the result will be entered in the success queue ( or relationship ), and if there is a problem during the query, it will be entered in the failure queue ( or relationship ).
领英推荐
service: It is one of the important components of Apache NiFi and it has the ability to create a service and use it several times. Each service performs a specific task. This task may be a connection to the database, for example, I create a connection to the shop database and use this service in several processors to connect to the shop database, or for example, I create a service whose task is to read csv files, or I create another service that It can write a csv file and we use it in several processors.
Process group: To make the tasks we do more organized, we can group each task into a special group called process group. Inside each process group, there are several processors connected to each other that do their own work. Of course, we can create several process groups inside one process group and place our project in them.
Apache Nifi has several other sections (such as funnel, which takes several different pieces of information and combines them and gives it to the next section, or for example, label, where we can write special notes in different sections), which will be covered in the next article and I will explain in more detail, of course, the most important parts of Apache Nifi are the ones above
ONE EXAMPLE:
Explanation:
ExecuteSql: In the example above, this processor executes a query on the database and gives the returned avro data to the ConvertAvroToJson
ConvertAvroToJson: this processor takes the avro data from ExecuteSql and converts it to Json and passes it to SplitJson ( which is an array of different user profiles information )
SplitJson: this processor divides the array of data into its indexes, for example, it converts an array of 16 users into 16 separate users and passes the 16 data to the next processor which is EvaluateJsonPath
EvaluateJsonPath: This processor converts each index into key and value (for example, firstname: javad) and passes the result to the next processor which is ExecuteSql
ExecuteSql: In this processor, I wrote a query that takes the data ( The same data that I extracted from json with EvaluateJsonPath and put it in the attribute as key and value ) and enters the linkdin table. Of course, there is another processor called PutSql that enters the data into the database, and it is better to use this processor to enter the data not ExecuteSql.
Avro format is a binary format, and because it multiplies the speed of querying, Apache NiFi uses this format in its processors, and the returned data from the processors (of course, only the processors that are connected to the database) is always in Avro format.
I hope my article has encouraged you to learn apache nifi
Thank you for taking the time to read my article!!