登录查看更多内容

Apache Kafka to the Kubernetes Era!

Alice Sophiya Samuel

Linux administrator | 2x RedHat Certified | Ansible | Linux | AWS | Azure | Datacenter Infrastructure Management | OpenShift | Docker

发布日期: 2024年3月7日

Here, it get back to back updates with the red hat that seems the most rapid speed of new era started with kubernetes . where data get overwhelmed the storage, filtering, analyzing , handling are become tougher. Here the Apache Kafka is here to make a platform for the data streaming.

What is Apache Kafka?

Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. In short, it moves massive amounts of data—not just from point A to B, but from points A to Z and anywhere else you need, all at the same time.

Apache Kafka is an alternative to a traditional enterprise messaging system. It started out as an internal system developed by LinkedIn to handle 1.4 trillion messages per day, but now it's an open source data streaming solution with application for a variety of enterprise needs.

When to use Apache Kafka ?

Apache Kafka is built into streaming data pipelines that share data between systems and/or applications, and it is also built into the systems and applications that consume that data. Apache Kafka supports a range of use cases where high throughput and scalability are vital. Since Apache Kafka minimizes the need for point-to-point integrations for data sharing in certain applications, it can reduce latency to milliseconds. This means data is available to users faster, which can be advantageous in use cases that require real-time data availability, such as IT operations and e-commerce.

Apache Kafka can handle millions of data points per second, which makes it well-suited for big data challenges. However, Kafka also makes sense for companies that are not currently handling such extreme data scenarios. In many data processing use cases, such as the Internet of Things (IoT) and social media, data is increasing exponentially, and may quickly overwhelm an application you are building based on today's data volume. In terms of data processing, you must consider scalability, and that means planning for the increased proliferation of your data.

How Kubernetes scales Apache Kafka applications ?

Kubernetes is the ideal platform for Apache Kafka. Developers need a scalable platform to host Kafka applications, and Kubernetes is the answer.

Like Apache Kafka, Kubernetes also makes your development process more agile. Kubernetes—the technology behind Google’s cloud services—is an open source system for managing containerized applications, and it eliminates many of the manual processes associated with containers. Using Apache Kafka in Kubernetes streamlines the deployment, configuration, management, and use of Apache Kafka.

By combining Kafka and Kubernetes, you gain all the benefits of Kafka, and also the advantages of Kubernetes: scalability, high availability, portability and easy deployment.

The scalability of Kubernetes is a natural complement to Kafka. In Kubernetes, you can scale resources up and down with a simple command, or scale automatically based on usage as needed to make the best use of your computing, networking, and storage infrastructure. This capability enables Apache Kafka to share a limited pool of resources with other applications. Kubernetes also offers Apache Kafka portability across infrastructure providers and operating systems. With Kubernetes, Apache Kafka clusters can span across on-site and public, private, or hybrid clouds, and use different operating systems.

How to install kafka on RHEL 8 ?

STEP:1 To download Kafka from the closest mirror, we need to consult the official download site. We can copy the URL of the .tar.gz file from there. We’ll use wget, and the URL pasted to download the package to the target machine:

# wget https://www-eu.apache.org/dist/kafka/2.1.0/kafka_2.11-2.1.0.tgz -O /opt/kafka_2.11-2.1.0.tgz

STEP 2 : We enter the /opt directory, and extract the archive:

# cd /opt
# tar -xvf kafka_2.11-2.1.0.tgz

STEP 3 : And create a symlink called /opt/kafka that points to the now created /opt/kafka_2_11-2.1.0 directory to make our lives easier.

ln -s /opt/kafka_2.11-2.1.0 /opt/kafka

STEP 4 : We create a non-privileged user that will run both zookeeper and kafka service.

# useradd kafka

STEP 5 : And set the new user as owner of the whole directory we extracted, recursively:

# chown -R kafka:kafka /opt/kafka*

STEP 6: we create the unit file /etc/systemd/system/zookeeper.service with the following content:

领英推荐

Modernising Uber’s Batch Data Infrastructure with…

developrec 5 个月前

The ScyllaDB Sync: May 2024

ScyllaDB 10 个月前

AWS Glue vs. AWS DataSync: Choosing the Right Data…

WorkiFicient Technologies Pvt Ltd 9 个月前

[Unit]
Description=zookeeper
After=syslog.target network.target

[Service]
Type=simple

User=kafka
Group=kafka

ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh

[Install]
WantedBy=multi-user.target

Note that we don’t need to write the version number three times because of the symlink we created. The same applies to the next unit file for Kafka, /etc/systemd/system/kafka.service, that contains the following lines of configuration:

[Unit]
Description=Apache Kafka
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple

User=kafka
Group=kafka

ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target

STEP 7: We need to reload systemd to get it read the new unit files:

# systemctl daemon-reload

Now we can start our new services (in this order):

# systemctl start zookeeper
# systemctl start kafka

If all goes well, systemd should report running state on both service’s status, similar to the outputs below:

# systemctl status zookeeper.service
  zookeeper.service - zookeeper
   Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-01-10 20:44:37 CET; 6s ago
 Main PID: 11628 (java)
    Tasks: 23 (limit: 12544)
   Memory: 57.0M
   CGroup: /system.slice/zookeeper.service
            11628 java -Xmx512M -Xms512M -server [...]

# systemctl status kafka.service
  kafka.service - Apache Kafka
   Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-01-10 20:45:11 CET; 11s ago
 Main PID: 11949 (java)
    Tasks: 64 (limit: 12544)
   Memory: 322.2M
   CGroup: /system.slice/kafka.service
            11949 java -Xmx1G -Xms1G -server [...]

Optionally we can enable automatic start on boot for both services:

# systemctl enable zookeeper.service
# systemctl enable kafka.service

Deploying Kafka on Open Shift

Managed Services

There are multiple ways to use Kafka in the cloud. One way is to use IBM’s managed Event Streams service or Red Hat’s managed service OpenShift Streams for Apache Kafka. The big advantage of managed services is that you don’t have to worry about managing, operating and maintaining the messaging systems. As soon as you deploy services in your own clusters, you are usually responsible for managing them. Even if you use operators which help with day 2 tasks, you will have to perform some extra work compared to managed services.

Operators

Another approach to use Kafka is to install it in your own clusters. Especially for early stages in projects when developers want simply to try out things, this is a pragmatic approach to get started. For Kafka multiple operators are available which you find on the OperatorHub page in the OpenShift Console, for example:

Strimzi
Red Hat Integration – AMQ Streams

Strimzi is the open source upstream project for Red Hat’s AMQ Streams operator. It’s also the same code base used in Red Hat’s new managed Kafka service.

Conclusion

These use cases illustrate the versatility and power of Open Shift Streams in enabling organizations to build robust and salable real-time data streaming applications. Conclusion: Red Hat Open Shift Streams for Apache Kafka simplifies the process of building real-time data streaming applications on Red Hat Open Shift.

Refer:

https://developers.redhat.com/blog/2018/10/29/how-to-run-kafka-on-openshift-the-enterprise-kubernetes-with-amq-streams#test_using_an_external_application

https://heidloff.net/article/deploying-kafka-on-openshift/

https://linuxconfig.org/how-to-install-kafka-on-redhat-8

要查看或添加评论，请登录

Alice Sophiya Samuel的更多文章

Basic (Beginner) ways to make a budget Infra

2025年2月4日

Basic (Beginner) ways to make a budget Infra

When a client uses resources without limits, I suggested some AWS changes to optimize costs. Now, they're saving .
KUBERNETES - THE FUTURE OF LINUX

2024年12月21日

KUBERNETES - THE FUTURE OF LINUX

Kubernetes doesn’t replace Linux; it extends it by providing a layer of abstraction that allows organizations to scale…
OPENSHIFT DEPOLYMENTS

2024年6月22日

OPENSHIFT DEPOLYMENTS

What is OpenShift Deployment? An OpenShift Deployment is a resource object that provides declarative updates to…
Kubernetes Deployments with ARGO CD

2024年5月11日

Kubernetes Deployments with ARGO CD

Argo CD is an open-source continuous delivery tool for Kubernetes that helps you automate the deployment of your…
Forecasting Infrastructure Trends: Cloud vs. Bare Metal

2024年4月27日

Forecasting Infrastructure Trends: Cloud vs. Bare Metal

Navigating between the tailored control of private data centers (bare metal) and the expansive versatility of cloud…
Easy Understanding of Kubernetes Architecture with Real world Examples

2024年4月20日

Easy Understanding of Kubernetes Architecture with Real world Examples

Kubernetes, often abbreviated as K8s, is an open-source container orchestration platform designed to automate the…
Kubernates - Case Study on Industries

2024年4月5日

Kubernates - Case Study on Industries

In today's rapidly evolving digital landscape, businesses are constantly challenged to adapt and innovate. As the…
AWS FARGATE

2024年3月28日

AWS FARGATE

AWS Fargate: Serverless compute for containers AWS Fargate is a serverless compute engine for containers that enables…
KERBEROS

2024年3月6日

KERBEROS

It's always used to say that old is gold , here the tech field moves to the faster miles , It felt to be embracing when…
WHERE DOES THE IP HAVE COME FROM?

2024年2月13日

WHERE DOES THE IP HAVE COME FROM?

Our daily lives involved IP addresses. In the modern era, every smart device on the network is identifiable by its IP…

See all articles

Apache Kafka to the Kubernetes Era!

Alice Sophiya Samuel

Linux administrator | 2x RedHat Certified | Ansible | Linux | AWS | Azure | Datacenter Infrastructure Management | OpenShift | Docker

What is Apache Kafka?

When to use Apache Kafka ?

How Kubernetes scales Apache Kafka applications ?

How to install kafka on RHEL 8 ?

领英推荐

Deploying Kafka on Open Shift

Conclusion

Alice Sophiya Samuel的更多文章

社区洞察

其他会员也浏览了

How to Optimize Kafka Topics and Messaging

Kafka Mastery: Essential Strategies for Scaling, Best Practices, and Cost Efficiency

Building Stateful Applications on Kubernetes Best Practices

Timeplus support for Redpanda Serverless, updates to demos, and enhanced External Tables/Streams

Comparison Between Redis and Kafka

Effective Digital Transformation with the New "Kafcongo" Tech Stack: Kafka, Confluent & MongoDB

Kafka vs SQS: Similarities

Data per Service Pattern in Microservices

Building vs. buying: deciding on a Kafka platform

Top 10 operational challenges in managing Kafka

What is Apache Kafka?

When to use Apache Kafka ?

How Kubernetes scales Apache Kafka applications ?

How to install kafka on RHEL 8 ?

领英推荐

Deploying Kafka on Open Shift

Conclusion

Alice Sophiya Samuel的更多文章

Basic (Beginner) ways to make a budget Infra

KUBERNETES - THE FUTURE OF LINUX

OPENSHIFT DEPOLYMENTS

Kubernetes Deployments with ARGO CD

Forecasting Infrastructure Trends: Cloud vs. Bare Metal

Easy Understanding of Kubernetes Architecture with Real world Examples

Kubernates - Case Study on Industries

AWS FARGATE

KERBEROS

WHERE DOES THE IP HAVE COME FROM?

社区洞察

其他会员也浏览了

How to Optimize Kafka Topics and Messaging

Kafka Mastery: Essential Strategies for Scaling, Best Practices, and Cost Efficiency

Building Stateful Applications on Kubernetes Best Practices

Timeplus support for Redpanda Serverless, updates to demos, and enhanced External Tables/Streams

Comparison Between Redis and Kafka

Effective Digital Transformation with the New "Kafcongo" Tech Stack: Kafka, Confluent & MongoDB

Kafka vs SQS: Similarities

Data per Service Pattern in Microservices

Building vs. buying: deciding on a Kafka platform

Top 10 operational challenges in managing Kafka