Skip to main content

Command Palette

Search for a command to run...

Building reliable, scalable applications on Kubernetes — Part 1

Updated
10 min read

Kubernetes is an open source orchestrator for deploying containerized applications. It is a component of cloud-native development that helps in achieving high levels of velocity, agility, reliability and efficiency.

Cloud-native is an approach of building, deploying and managing applications in cloud computing environments. It allows companies to build highly scalable, flexible and resilient applications that can quickly be updated to meet customer demands.

The key building blocks when working with Kubernetes are immutable infrastructure and containers — which are part of the 5 technological blocks of cloud-native. The others are; microservices, declarative APIs (Infrastructure as Code) and service meshes.

Infrastructure is immutable if its current state is fully represented by a single artifact. Any desired changes are implemented by creating a new artifact. With immutable infrastructure, scaling is simply running another copy of the infrastructure.

Containers are the smallest deployable unit in a cloud-native architecture. Containerization is the act of creating sandboxes of the environment in which an application runs in. It allows applications to run independently of the underlying hardware and operating system.

Reliability and scalability provisions

Kubernetes offers reliability and scalability provisions for building both applications and development teams. Kubernetes offers the following abstractions to support this initiative;

  • Pods;

    A pod is a group of containers. Pods are the smallest deployable unit in Kubernetes. They can be used to group related container images into a single unit.

  • Kubernetes services;

    Services are load balancing, naming and discovery provision for pods in a Kubernetes cluster. The Kubernetes service object operates at layer 4 (Transport Layer) of the OSI model.

  • Namespaces;

    Namespaces are isolation mechanisms provided in Kubernetes.

  • Ingress;

    Ingress allows exposing Kubernetes services to the open internet. It is a provision in Kubernetes for implementing virtual hosting — hosting multiple HTTP sites on a single IP address. Ingress offers Layer 7 based load balancing in a Kubernetes cluster.

Development teams can be scaled through adopting development of decoupled, service-oriented architectures. Kubernetes offers mechanisms to automatically scale the machines on which applications from different teams are deployed. The many applications can be transparently run on the same machines without interfering with each other, thereby reducing the overhead and cost of microservice architectures.

Core components of a Kubernetes cluster

A Kubernetes cluster consists of a control plane and one or more worker nodes. The control plane is the central nervous system of the cluster whilst the worker nodes provide compute resources for the cluster.

API Server

This is the central component through which all other Kubernetes components can be interacted with. The various components are controlled through an HTTP API exposed by the API Server. Command-line tools such as kubectl interact with the Kubernetes cluster through the API Server.

Controller Manager

The controller manager runs the various controllers that provide the self-healing properties of Kubernetes. Examples of controllers include; Deployment controller, ReplicaSet controller, and DaemonSet controller.

Scheduler

The scheduler manages the process of assigning (scheduling) pods to worker nodes in the cluster. Its mandate ends when a pod has been bound to a worker node. The process of actually running the pod is left to another Kubernetes component — Kubelet.

Etcd

Etcd is a highly-available consistent key-value store that provides storage for the state/configuration of a Kubernetes cluster.

Kubelet

Kubelet is responsible for running pods on worker nodes.

Kube-proxy

Kube-proxy maintains network rules (Iptables) through which Kubernetes Services are able to provide load-balancing capabilities to pods. An example rule is a mapping of Service’s Cluster IP to the various pods associated with it.

Kubernetes Objects

Pod objects

A pod is a collection of application containers and volumes running in the same execution environment. Containers within a pod run separate cgroup however do share a number of Linux namespaces. Examples of shared namespaces include;

  1. Network namespace; which leads to the containers belonging to the same network

  2. UTS namespace; which leads to the containers having the same hostname and domain

  3. IPC namespace; which allows the containers to communicate with each other using SysV style inter-process communication

The Kubernetes scheduler is responsible for allocating (scheduling) pods onto available worker nodes. An important note to make is that once a pod is scheduled on a node, no rescheduling occurs if that node fails.

Kubelet is responsible for running pods on the worker node. It takes care of restarting the pod in case it fails health checks — using the liveness probe — and managing the resources used by the pod.

Resource management in pods is configured on a per container basis and is achieved in two ways;

  1. Resource requests

    These specify the minimum amount of resources required to run a container. A pod will be scheduled on a node if the node meets all the resource requests of the containers in the pod.

  2. Resource limits

    In times of resource abundance, a container can be allocated more than its specified resource requests. Resource limits are used to cap the maximum amount of resources that should ever be allocated to the container.

Resource management in pods is most often used for CPU and memory (RAM). CPU requests are implemented using cpu-shares functionality in the Linux kernel — which means allocated CPU for a container can be changed without restarting the container. On the other hand, if a node runs out of memory, a container consuming more than its specified memory requests is restarted with lesser memory — not less than its memory requests.

ReplicaSets, StatefulSets, DaemonSets and Deployments

ReplicaSets, StatefulSets, Deployments and DaemonSets are ways of managing pods. The diagram below illustrates how they each interact with pods.

ReplicaSets

A ReplicaSet is a cluster wide pod manager (pod reconciliation loop) that ensures that the right types and number of related pods are running at all times. It is able to identify the pods that it should manage through labels specified in one of its properties — spec.selector

Pods can be adopted by or quarantined from a ReplicaSet. This is accomplished by modifying the pods by way of adding or removing the labels matching what the ReplicaSet uses to identify them. This is especially useful when troubleshooting issues with an application or adding a ReplicaSet to manage already existing pods.

Pods should never be deployed outside the control of ReplicaSets for any production setup. This is because such a deployment will not withstand node failures i.e when the node on which the pod is running gets destroyed, the pod will have to be manually redeployed — which is not the case when using a ReplicaSet.

ReplicaSets can be autoscaled by using a Horizontal Pod Autoscaler (HPA). Some of the metrics that can be used to trigger such autoscaling include CPU usage and memory utilization.

StatefulSets

StatefulSets are similar to ReplicaSets except they are suitable for deploying applications that require stable identifiers, ordered deployment, deletion, or scaling. Examples of such applications include; a Kafka cluster, a MongoDB cluster, etc.

The following characteristics are unique to Statefulsets;

  1. Stable/predictable, unique network identifiers

  2. Ordered, graceful deployment and scaling

  3. Ordered, automated rolling updates

The pod names and hostnames in a StatefulSet take the form $.metadata.name-$ordinal. As an example, if the name of a StatefulSet is kafka-broker and the number of replicas( $.spec.replicas) is 3, the following will be the names of the created pods; kafka-broker-0, kafka-broker-1, kafka-broker-2. The pods will be created in ascending order of their ordinal i.e from ordinal 0 to 2 and destroyed in reverse order i.e from ordinal 2 to 0.

Unlike in a ReplicaSet, each pod in a StatefulSet is unique, as a result, stateless load balancing — using ClusterIPs — is not feasible. Load balancing in StatefulSets is achieved using a headless Service — services are discussed in the Service Objects section. The DNS names for pods in a StatefulSet with a governing headless service follow the format $(pod-name).$(service-name).$(namespace).svc.cluster.local

DaemonSets

DaemonSets are just like ReplicaSets except they are used for ensuring a single copy of a related set of pods is deployed on all or a subset of nodes in a cluster. Example use cases of DaemonSets include;

  1. Installing software (for example security and compliance software) on every node in the Kubernetes cluster

  2. Installing log collectors and monitoring agents

Deployments

Deployments are management objects for ReplicaSets. They allow incremental rollout of new application versions through two strategies; Recreate and RollingUpdate(default).

During the rollout process, a Deployment will have two ReplicaSets associated with it; OldReplicaSet and NewReplicaSet. The rollout process consists of one or more stages that involve scaling down the OldReplicaSet and scaling up the NewReplicaSet.

The Recreate strategy scales down the OldReplicaSet to 0 and scales up the NewReplicaSet to the desired number of replicas in one go (one rollout stage).

The RollingUpdate strategy is the default strategy and by default comes with two properties maxSurge and maxUnavailable both set to 25% — essentially a canary release. maxSurge specifies the maximum number of extra pods that can be created during a rollout stage and maxUnavailable specifies the maximum number of pods that can be destroyed during a rollout stage.

Pod readiness checks — $.spec.template.spec.containers[x].readinessProbe — are essential in ensuring Deployments work correctly. A Deployment is able to know it is time to progress to the next rollout stage through the readiness status reports of the pods associated with the NewReplicaSet.

The speed at which a Deployment is able move through the various rollout stages can be controlled using the minReadySeconds$.spec.minReadySeconds. The minReadySeconds specify the number of seconds the Deployment should wait before progressing to a next rollout stage. This can be essential in instances where error conditions take some time to manifest.

progressDeadlineSeconds specify the maximum amount of time a rollout stage is allowed to take before the deployment is halted.

A blue-green deployment can be achieved by setting the maxSurge property of a RollingUpdate to 100%.

Service objects

Service objects provide load balancing, naming and discovery for pods. Services use labels to identify the pods they forward traffic to. Traffic is only forwarded only to pods that pass readiness checks. The DNS record of a Service follows the format $(service-name).$(service-namespace).svc.cluster.local. The table below shows the various DNS names that can be used to access a service;

DNS nameAccess Scope
$(service-name).$(service-namespace).svc.cluster.localCluster
$(service-name).$(service-namespace)Cluster
$(service-name)Namespace

There are 2 main types of Service objects;

  1. ClusterIP

    This is the default and most commonly used type of Service. It has a stable virtual IP — ClusterIP — assigned to it by kube-proxy and against which it load-balances traffic to a list of pods.

  2. NodePort

    A NodePort Service is similar to a ClusterIP Service except it ensures that a port through which traffic can be forwarded to the Service is exposed on all nodes in the cluster. A NodePort Service has an accompanying auto assigned clusterIP.

    NodePorts are one way of exposing a Service outside the Kubernetes cluster. Traffic sent to an IP address of any node in the cluster will reach the Service as long as it is directed to the port specified by the NodePort.

The other types of Service objects include;

  1. LoadBalancer

    This is used for provisioning a load balancer for a Service in cloud providers that do offer external load balancers.

  2. ExternalName

    Instead of using labels to identify pods to which to forward traffic, Service objects of type ExternalName have an explicit DNS name.

    This type of Service is useful for exposing applications not running in a Kubernetes cluster as though they’re. An example is exposing an RDS instance as though it were a pod directly running in the cluster. The benefit of this is it improves portability of applications since the external DNS name can be transparently changed without updating the applications — the applications rely on the stable Service DNS name.

  3. Headless Service

    A common type of headless Service is one of type ClusterIP: None. This type of Service is useful in situations where there is a need to get the actual endpoints of the pods mapped to the Service object — for example when working with StatefulSets.

    The other type of headless Service is similar to the ExternalName Service above except it is configured using IP addresses (EndpointSlices) instead of a DNS name.

To be continued!

In a followup session, a practical guide to the discussed concepts will be given. The following Kubernetes objects will also be discussed in detail;

  • Ingress Objects

  • ConfigMaps and Secrets

  • Jobs