Apache Kafka to the Kubernetes Era!
Alice Sophiya Samuel
Linux administrator | 2x RedHat Certified | Ansible | Linux | AWS | Azure | Datacenter Infrastructure Management | OpenShift | Docker
Here, it get back to back updates with the red hat that seems the most rapid speed of new era started with kubernetes . where data get overwhelmed the storage, filtering, analyzing , handling are become tougher. Here the Apache Kafka is here to make a platform for the data streaming.
What is Apache Kafka?
Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. In short, it moves massive amounts of data—not just from point A to B, but from points A to Z and anywhere else you need, all at the same time.
Apache Kafka is an alternative to a traditional enterprise messaging system. It started out as an internal system developed by LinkedIn to handle 1.4 trillion messages per day, but now it's an open source data streaming solution with application for a variety of enterprise needs.
When to use Apache Kafka ?
Apache Kafka is built into streaming data pipelines that share data between systems and/or applications, and it is also built into the systems and applications that consume that data. Apache Kafka supports a range of use cases where high throughput and scalability are vital. Since Apache Kafka minimizes the need for point-to-point integrations for data sharing in certain applications, it can reduce latency to milliseconds. This means data is available to users faster, which can be advantageous in use cases that require real-time data availability, such as IT operations and e-commerce.
Apache Kafka can handle millions of data points per second, which makes it well-suited for big data challenges. However, Kafka also makes sense for companies that are not currently handling such extreme data scenarios. In many data processing use cases, such as the Internet of Things (IoT) and social media, data is increasing exponentially, and may quickly overwhelm an application you are building based on today's data volume. In terms of data processing, you must consider scalability, and that means planning for the increased proliferation of your data.
How Kubernetes scales Apache Kafka applications ?
Kubernetes is the ideal platform for Apache Kafka. Developers need a scalable platform to host Kafka applications, and Kubernetes is the answer.
Like Apache Kafka, Kubernetes also makes your development process more agile. Kubernetes—the technology behind Google’s cloud services—is an open source system for managing containerized applications, and it eliminates many of the manual processes associated with containers. Using Apache Kafka in Kubernetes streamlines the deployment, configuration, management, and use of Apache Kafka.
By combining Kafka and Kubernetes, you gain all the benefits of Kafka, and also the advantages of Kubernetes: scalability, high availability, portability and easy deployment.
The scalability of Kubernetes is a natural complement to Kafka. In Kubernetes, you can scale resources up and down with a simple command, or scale automatically based on usage as needed to make the best use of your computing, networking, and storage infrastructure. This capability enables Apache Kafka to share a limited pool of resources with other applications. Kubernetes also offers Apache Kafka portability across infrastructure providers and operating systems. With Kubernetes, Apache Kafka clusters can span across on-site and public, private, or hybrid clouds, and use different operating systems.
How to install kafka on RHEL 8 ?
STEP:1 To download Kafka from the closest mirror, we need to consult the official download site. We can copy the URL of the .tar.gz file from there. We’ll use wget, and the URL pasted to download the package to the target machine:
# wget https://www-eu.apache.org/dist/kafka/2.1.0/kafka_2.11-2.1.0.tgz -O /opt/kafka_2.11-2.1.0.tgz
STEP 2 : We enter the /opt directory, and extract the archive:
# cd /opt
# tar -xvf kafka_2.11-2.1.0.tgz
STEP 3 : And create a symlink called /opt/kafka that points to the now created /opt/kafka_2_11-2.1.0 directory to make our lives easier.
ln -s /opt/kafka_2.11-2.1.0 /opt/kafka
STEP 4 : We create a non-privileged user that will run both zookeeper and kafka service.
# useradd kafka
STEP 5 : And set the new user as owner of the whole directory we extracted, recursively:
# chown -R kafka:kafka /opt/kafka*
STEP 6: we create the unit file /etc/systemd/system/zookeeper.service with the following content:
领英推荐
[Unit]
Description=zookeeper
After=syslog.target network.target
[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
[Install]
WantedBy=multi-user.target
Note that we don’t need to write the version number three times because of the symlink we created. The same applies to the next unit file for Kafka, /etc/systemd/system/kafka.service, that contains the following lines of configuration:
[Unit]
Description=Apache Kafka
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
STEP 7: We need to reload systemd to get it read the new unit files:
# systemctl daemon-reload
Now we can start our new services (in this order):
# systemctl start zookeeper
# systemctl start kafka
If all goes well, systemd should report running state on both service’s status, similar to the outputs below:
# systemctl status zookeeper.service
zookeeper.service - zookeeper
Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2019-01-10 20:44:37 CET; 6s ago
Main PID: 11628 (java)
Tasks: 23 (limit: 12544)
Memory: 57.0M
CGroup: /system.slice/zookeeper.service
11628 java -Xmx512M -Xms512M -server [...]
# systemctl status kafka.service
kafka.service - Apache Kafka
Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2019-01-10 20:45:11 CET; 11s ago
Main PID: 11949 (java)
Tasks: 64 (limit: 12544)
Memory: 322.2M
CGroup: /system.slice/kafka.service
11949 java -Xmx1G -Xms1G -server [...]
Optionally we can enable automatic start on boot for both services:
# systemctl enable zookeeper.service
# systemctl enable kafka.service
Deploying Kafka on Open Shift
Managed Services
There are multiple ways to use Kafka in the cloud. One way is to use IBM’s managed Event Streams service or Red Hat’s managed service OpenShift Streams for Apache Kafka. The big advantage of managed services is that you don’t have to worry about managing, operating and maintaining the messaging systems. As soon as you deploy services in your own clusters, you are usually responsible for managing them. Even if you use operators which help with day 2 tasks, you will have to perform some extra work compared to managed services.
Operators
Another approach to use Kafka is to install it in your own clusters. Especially for early stages in projects when developers want simply to try out things, this is a pragmatic approach to get started. For Kafka multiple operators are available which you find on the OperatorHub page in the OpenShift Console, for example:
Strimzi is the open source upstream project for Red Hat’s AMQ Streams operator. It’s also the same code base used in Red Hat’s new managed Kafka service.
Conclusion
These use cases illustrate the versatility and power of Open Shift Streams in enabling organizations to build robust and salable real-time data streaming applications. Conclusion: Red Hat Open Shift Streams for Apache Kafka simplifies the process of building real-time data streaming applications on Red Hat Open Shift.
Refer:
https://developers.redhat.com/blog/2018/10/29/how-to-run-kafka-on-openshift-the-enterprise-kubernetes-with-amq-streams#test_using_an_external_application
https://heidloff.net/article/deploying-kafka-on-openshift/
https://linuxconfig.org/how-to-install-kafka-on-redhat-8