Apache Kafka to the Kubernetes Era!

Apache Kafka to the Kubernetes Era!

Here, it get back to back updates with the red hat that seems the most rapid speed of new era started with kubernetes . where data get overwhelmed the storage, filtering, analyzing , handling are become tougher. Here the Apache Kafka is here to make a platform for the data streaming.

What is Apache Kafka?

Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. In short, it moves massive amounts of data—not just from point A to B, but from points A to Z and anywhere else you need, all at the same time.

Apache Kafka is an alternative to a traditional enterprise messaging system. It started out as an internal system developed by LinkedIn to handle 1.4 trillion messages per day, but now it's an open source data streaming solution with application for a variety of enterprise needs.


When to use Apache Kafka ?

Apache Kafka is built into streaming data pipelines that share data between systems and/or applications, and it is also built into the systems and applications that consume that data. Apache Kafka supports a range of use cases where high throughput and scalability are vital. Since Apache Kafka minimizes the need for point-to-point integrations for data sharing in certain applications, it can reduce latency to milliseconds. This means data is available to users faster, which can be advantageous in use cases that require real-time data availability, such as IT operations and e-commerce.

Apache Kafka can handle millions of data points per second, which makes it well-suited for big data challenges. However, Kafka also makes sense for companies that are not currently handling such extreme data scenarios. In many data processing use cases, such as the Internet of Things (IoT) and social media, data is increasing exponentially, and may quickly overwhelm an application you are building based on today's data volume. In terms of data processing, you must consider scalability, and that means planning for the increased proliferation of your data.

How Kubernetes scales Apache Kafka applications ?

Kubernetes is the ideal platform for Apache Kafka. Developers need a scalable platform to host Kafka applications, and Kubernetes is the answer.

Like Apache Kafka, Kubernetes also makes your development process more agile. Kubernetes—the technology behind Google’s cloud services—is an open source system for managing containerized applications, and it eliminates many of the manual processes associated with containers. Using Apache Kafka in Kubernetes streamlines the deployment, configuration, management, and use of Apache Kafka.

By combining Kafka and Kubernetes, you gain all the benefits of Kafka, and also the advantages of Kubernetes: scalability, high availability, portability and easy deployment.

The scalability of Kubernetes is a natural complement to Kafka. In Kubernetes, you can scale resources up and down with a simple command, or scale automatically based on usage as needed to make the best use of your computing, networking, and storage infrastructure. This capability enables Apache Kafka to share a limited pool of resources with other applications. Kubernetes also offers Apache Kafka portability across infrastructure providers and operating systems. With Kubernetes, Apache Kafka clusters can span across on-site and public, private, or hybrid clouds, and use different operating systems.


How to install kafka on RHEL 8 ?

STEP:1 To download Kafka from the closest mirror, we need to consult the official download site. We can copy the URL of the .tar.gz file from there. We’ll use wget, and the URL pasted to download the package to the target machine:

# wget https://www-eu.apache.org/dist/kafka/2.1.0/kafka_2.11-2.1.0.tgz -O /opt/kafka_2.11-2.1.0.tgz        

STEP 2 : We enter the /opt directory, and extract the archive:

# cd /opt
# tar -xvf kafka_2.11-2.1.0.tgz        

STEP 3 : And create a symlink called /opt/kafka that points to the now created /opt/kafka_2_11-2.1.0 directory to make our lives easier.

ln -s /opt/kafka_2.11-2.1.0 /opt/kafka        

STEP 4 : We create a non-privileged user that will run both zookeeper and kafka service.

# useradd kafka        

STEP 5 : And set the new user as owner of the whole directory we extracted, recursively:

# chown -R kafka:kafka /opt/kafka*        

STEP 6: we create the unit file /etc/systemd/system/zookeeper.service with the following content:

[Unit]
Description=zookeeper
After=syslog.target network.target

[Service]
Type=simple

User=kafka
Group=kafka

ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh

[Install]
WantedBy=multi-user.target        

Note that we don’t need to write the version number three times because of the symlink we created. The same applies to the next unit file for Kafka, /etc/systemd/system/kafka.service, that contains the following lines of configuration:

[Unit]
Description=Apache Kafka
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple

User=kafka
Group=kafka

ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target        

STEP 7: We need to reload systemd to get it read the new unit files:

# systemctl daemon-reload        

Now we can start our new services (in this order):

# systemctl start zookeeper
# systemctl start kafka        

If all goes well, systemd should report running state on both service’s status, similar to the outputs below:

# systemctl status zookeeper.service
  zookeeper.service - zookeeper
   Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-01-10 20:44:37 CET; 6s ago
 Main PID: 11628 (java)
    Tasks: 23 (limit: 12544)
   Memory: 57.0M
   CGroup: /system.slice/zookeeper.service
            11628 java -Xmx512M -Xms512M -server [...]

# systemctl status kafka.service
  kafka.service - Apache Kafka
   Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-01-10 20:45:11 CET; 11s ago
 Main PID: 11949 (java)
    Tasks: 64 (limit: 12544)
   Memory: 322.2M
   CGroup: /system.slice/kafka.service
            11949 java -Xmx1G -Xms1G -server [...]        

Optionally we can enable automatic start on boot for both services:

# systemctl enable zookeeper.service
# systemctl enable kafka.service        

Deploying Kafka on Open Shift

Managed Services

There are multiple ways to use Kafka in the cloud. One way is to use IBM’s managed Event Streams service or Red Hat’s managed service OpenShift Streams for Apache Kafka. The big advantage of managed services is that you don’t have to worry about managing, operating and maintaining the messaging systems. As soon as you deploy services in your own clusters, you are usually responsible for managing them. Even if you use operators which help with day 2 tasks, you will have to perform some extra work compared to managed services.

Operators

Another approach to use Kafka is to install it in your own clusters. Especially for early stages in projects when developers want simply to try out things, this is a pragmatic approach to get started. For Kafka multiple operators are available which you find on the OperatorHub page in the OpenShift Console, for example:

  1. Strimzi
  2. Red Hat Integration – AMQ Streams

Strimzi is the open source upstream project for Red Hat’s AMQ Streams operator. It’s also the same code base used in Red Hat’s new managed Kafka service.

Conclusion

These use cases illustrate the versatility and power of Open Shift Streams in enabling organizations to build robust and salable real-time data streaming applications. Conclusion: Red Hat Open Shift Streams for Apache Kafka simplifies the process of building real-time data streaming applications on Red Hat Open Shift.


Refer:

https://developers.redhat.com/blog/2018/10/29/how-to-run-kafka-on-openshift-the-enterprise-kubernetes-with-amq-streams#test_using_an_external_application

https://heidloff.net/article/deploying-kafka-on-openshift/

https://linuxconfig.org/how-to-install-kafka-on-redhat-8


要查看或添加评论,请登录

Alice Sophiya Samuel的更多文章

  • Basic (Beginner) ways to make a budget Infra

    Basic (Beginner) ways to make a budget Infra

    When a client uses resources without limits, I suggested some AWS changes to optimize costs. Now, they're saving .

  • KUBERNETES - THE FUTURE OF LINUX

    KUBERNETES - THE FUTURE OF LINUX

    Kubernetes doesn’t replace Linux; it extends it by providing a layer of abstraction that allows organizations to scale…

  • OPENSHIFT DEPOLYMENTS

    OPENSHIFT DEPOLYMENTS

    What is OpenShift Deployment? An OpenShift Deployment is a resource object that provides declarative updates to…

  • Kubernetes Deployments with ARGO CD

    Kubernetes Deployments with ARGO CD

    Argo CD is an open-source continuous delivery tool for Kubernetes that helps you automate the deployment of your…

  • Forecasting Infrastructure Trends: Cloud vs. Bare Metal

    Forecasting Infrastructure Trends: Cloud vs. Bare Metal

    Navigating between the tailored control of private data centers (bare metal) and the expansive versatility of cloud…

  • Easy Understanding of Kubernetes Architecture with Real world Examples

    Easy Understanding of Kubernetes Architecture with Real world Examples

    Kubernetes, often abbreviated as K8s, is an open-source container orchestration platform designed to automate the…

  • Kubernates - Case Study on Industries

    Kubernates - Case Study on Industries

    In today's rapidly evolving digital landscape, businesses are constantly challenged to adapt and innovate. As the…

  • AWS FARGATE

    AWS FARGATE

    AWS Fargate: Serverless compute for containers AWS Fargate is a serverless compute engine for containers that enables…

  • KERBEROS

    KERBEROS

    It's always used to say that old is gold , here the tech field moves to the faster miles , It felt to be embracing when…

  • WHERE DOES THE IP HAVE COME FROM?

    WHERE DOES THE IP HAVE COME FROM?

    Our daily lives involved IP addresses. In the modern era, every smart device on the network is identifiable by its IP…

社区洞察

其他会员也浏览了