Load Balancing for K8S
When you deploy your application, system, or solution within a Kubernetes cloud native infrastructure, there will be three levels of load balancing as illustrated in the banner diagram above.
Traffic to your applications is expected to come from one or more of three definitive sources: a web browser accessing your frontend web graphic interface (GUI), a mobile client app, or an integrating solution invoking an API exposed by your system.
Usually, most end consumers access the application over the Internet while some WAN users do access it through a VPN.
Regardless of the channel used the traffic will arrive to a network load balancer (NLB) after being filtered out by a web application firewall (WAF) and/or a normal firewall depending on the far end it is coming from.
The NLB will distribute traffic among the configured cluster nodes in this case Kubernetes nodes.
Each of the cluster nodes is equipped with an application load balancer (ALB) such as Istio or NGINX.
The ALB does not no on which node application PODs are actually deployed, but it can communicate with the application designated deployment service through service discovery.
The application service does know and can reach out to all application PODs on all cluster nodes and hence it can distribute requests among the configured application instances (PODs).
The beauty of this configuration is that utilizing horizontal scalability features in K8S we can continue load balancing traffic among a dynamically launched number of application instances based on CPU and RAM utilization rules.