Kubernetes Truely HA Cluster

Kubernetes Concepts 

  • Skydns is the DNS addon for service ip .
  • Jobs (kind:Job) are complementary to Replication Controllers. A Replication Controller manages pods which are not expected to terminate (e.g. web servers), and a Job manages pods that are expected to terminate (e.g. batch jobs). A Job can also be used to run multiple pods in parallel and one can control the parallelism.
  • Endpoints are nothing but collection of pod_ip:port
  • Port: is the abstracted Service port. Service is backed by a group of pods. These pods are exposed through endpoints.
  • TargetPort: is the port the container accepts traffic on
  • NodePort: When a new service get created  in kube-cluster, kube-proxy opens a port on all the nodes (also called as nodeport). Connections to that port will be proxied to the pods usinf selectors and labels
  • Services are a “layer 3” (TCP/UDP over IP) construct. In Kubernetes v1.1 the Ingress API was added (beta) to represent “layer 7” (HTTP) services.
  • A service defines a set of pods and a means by which to access them, such as single stable IP address (Cluster IP or VIP) and corresponding DNS name.
  • A replication controller ensures that a specified number of pod replicas are running at any one time. It ensure both scaling and failover. Pods like that could be accessible in cluster by each other.
  • A selector is an expression that matches labels in order to identify related resources, such as which pods are targeted by a load-balanced service.
  • A label is a key/value pair that is attached to a resource (e.g. pod).
  • A pod is a co-located group of containers and volumes.
  • NameSpace can define scope for resources , resource policies, resource constraints/limit for cpu/mem etc
  • By default kubernetes create deployment (newer concept of RC) for pods if RC is not defined. Deployment support rollback to previous deployment that was missing in RC.
  • Kube-Proxy is responsible for implementing a form of virtual IP(clusterIP). In Kubernetes v1.0 the proxy was purely in userspace. In Kubernetes v1.1 an iptables proxy was also added.
    • Proxy-mode: userspace : In this mode, kube-proxy watches the Kubernetes master for the addition and removal of Service and Endpoints  For each Service it opens a port (randomly chosen) on the local node. Any connections to this “proxy port” will be proxied to one of the Service’s backend Pods (as reported in Endpoints).
    • Iptable proxy : kube-proxy watches the Kubernetes master for the addition and removal of Service and Endpoints  For each Service it installs iptables rules which capture traffic to the Service’s clusterIP(which is virtual) and Port and redirects that traffic to one of the Service’s backend sets. For each Endpoints object it installs iptables rules which select a backend Pod.


  • Security in Kubernetes is applied to 4 type of consumers (3 infra consumer types and 1 service consumers type)
    • When a human access the cluster (e.g. using kubectl), he is authenticated by the apiserver as a particular User Account.
    • All infrastructure components (kubelets, kube-proxies, controllers, scheduler) should have an infrastructure user that they can authenticate with and be authorized to perform only the functions they require against the APIServer.
    • Processes in containers inside pods can also contact the apiserver. When they do, they are authenticated as a particular Service Account. This cover inter-container and container-apiserver communication.
    • When a outside cluster consumer contact a service using kube-proxy. It is being authenticated as per Service account via service itself.
  • Apiserver is responsible for perforing authentication and authorization for users of kube-infrastructure e.g. kubectl.
  • Kubelet handles locating and authenticating to the apiserver
  • A secret stores sensitive data, such as authentication tokens/certificates, which can be made available to containers/application upon request.
  • Namespace is a mechanism to partition resources created by users into a logically named group.
  • A security context is a set of constraints that are applied to a container/pod in order to achieve the following goals
    • Ensure a clear isolation between container and the underlying host it runs on using user namespaces feature of docker
    • Limit the ability of the container to negatively impact the infrastructure or other containers by using Docker features such as the ability to add or remove capabilities (cpu/memory etc) .

Security Implementation :

  • Create a secure image registry server.
  • Run apiserver with https and ABAC authorization
  • Configure Kublet/Kube-Proxy to contact at https port of apiserver .
  • kube-proxy maintains iptables routing from the clusterIP (VIP) to the nodeport. We can define iptabel firewall rules (e.g. allowed sources) to avoid insure access.
  • A pod runs in a security context under a service account that is defined by an administrator, and the  secrets a pod has access to is limited by that service account.
  • For Infrastructure users security would be implemented as below to secure apiserver access
    • Create namespace ->  Set Cluster Name and override cluster-level Properties for this namespace ->  Set credentials to the cluster and user in Namespace ->  Create Security Context to “Cluter+Namespace+User” combination
  • For Service consumers
    •  Create service account-> secure it with secret -> Create service under service account -> Create pods belonging to service
    • Define iptable rules for service access
  • Kube-up.sh create below certificates in /srv/kuberntes/
    • First a CA is created, the result is a cert/key pair (ca.crt/ca.key). You can use easyrsa to generate your PKI or OpenSSL
    • Then a certificate is requested and signed using this CA (server.cert/server.key), it will be used
      • by the api server to enable HTTPS and verify service account tokens
      • by the controller manager to sign service account tokens, so that pods can authenticate against the API using these tokens
    • Another certificate is requested and signed (kubecfg.crt/kubecfg.key) using the same CA, you can use it to authenticate your clients

Kubernetes HA Cluster



  • flannel is used because we want to use overlay network. Other options to flannel are Open vSwitch or any other SDN tool
  • While configuring cluster/ubuntu/config-default.sh we should be aware that private ip ranges should not conflit with datacenter private ips. we can use any of these range – (10/8 prefix) – (172.16/12 prefix) – (192.168/16 prefix)
  • As of Kubernetes 1.3, DNS is a built-in service(based on skydns) launched automatically using the addon manager “cluster add-on” (/etc/kubernetes/addons). DNS would be used to resolve hostnames like http://www.dns.com into machine ips
  • Etcd Cluster: etcd provides features both TTL on objects, and a compare and swap operation, to implement an election algorithm. Kubernetes used both of these feature for master selection and HA.
  • Unelected instances can watch “/election” (or some other well known key) and if it is empty become elected by writing their ID to it.   The written value is given a TTL   that removes it after a set interval, and the elected instance must rewrite it periodically to remain elected. By the use of etcd’s atomic compare and swap operation, there is no risk of a clash between two instances being undetected.
  • Podmaster: 
    • Podmaster’s job is to implement a master election protocol using etcd “compare and swap”. If the apiserver node wins the election, it starts the master component it is managing (e.g. the scheduler), if it loses the election, it ensures that any master components running on the node (e.g. the scheduler) are stopped.
    • Podmaster is a small utility written in Go-lang that uses etcd’s atomic “CompareAndSwap” functionality to implement master election. The first master to reach the etcd cluster wins the race and becomes the master node, marking itself as with an expring key that it periodically extends. If it finds the key has expired, it attempts to take over using an atomic request. If it is the current master, it copies the scheduler and controller-manager manifests into the kubelet directory, and if it isn’t it removes them. As all it does is copy files, it could be used for anything that requires leader election, not just kubernetes!



  • Docker failover using monit
  • Kubelet failover using monit
  • Kube Master Process (apiserver, scheduler and controller) failover using kubelet
  • Kube Worker Process (Kube-proxy) failover using monit
  • Master Node Failover using podmaster and Loadbalancer
  • Etcd failover using etcd cluster

The easiest way to implement an HA Kubernetes cluster is to start with an existing single-master cluster. The instructions at https://get.k8s.io describe easy installation for single-master clusters on a variety of platforms.

Now start using guide below http://kubernetes.io/docs/admin/high-availability/


This entry was posted in Clustering, Virtulization and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s