Control CPU and Memory Resources consumed by Pods in Kubernetes

By default, containers / pods are allocated unbound resources in the Kubernetes cluster. This allows them to consume as much resources as they need on a given node. However, this is a not a pretty scenario for cluster administrators. With the resource quotas, admins can restrict the amount of cpu and memory resources available, on a namespace basis. Within a namespace, the resource per container or pod can be controlled by using limit ranges.

Limit Range policy can be also be used to minimum and maximum storage request per PersistentVolumeClaim as well.

If no resource requests or limits are defined by the pod/container, limit range policy can be used to do the default allocation.

Create a Limit Range for a namespace

Since Limit Range policy is scoped at namespace level, we need to create a namespace first. By default, it no namespace is specified, the command is executed against the default namespace. We can create a new namespace using the kubectl create namspace command:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl create namespace lrdemo
namespace/lrdemo created

To create a limit range, we need to describe policy in a manifest. We can define a limit range policy as below as example:

apiVersion: v1
kind: LimitRange
metadata:
  name: lr-demo
spec:
  limits:
  - type: Container
    max: 
      memory: 1Gi
      cpu: 1
    min:
      memory: 200Mi
      cpu: "200m"  

In above manifest, min defines the minimum value of resource to be allocated and max defines the maximum resource to be allocated for the resource type of container.

It is preferable to define policy at container level, as different containers have different requirements depending upon the underlying application workload. The total resource consumption by a pod is the sum of resources consumption by containers within it.

Lets go ahead and create the object with kubectl apply command:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl apply -f limit-range.yaml -n lrdemo
limitrange/lr-demo created

cloud_user@d7bfd02ab81c:~/workspace$ kubectl get limitrange lr-demo -n lrdemo
NAME      CREATED AT
lr-demo   2021-05-01T22:18:14Z

cloud_user@d7bfd02ab81c:~/workspace$ kubectl describe limitrange lr-demo -n lrdemo
Name:       lr-demo
Namespace:  lrdemo
Type        Resource  Min    Max  Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---    ---  ---------------  -------------  -----------------------
Container   cpu       200m   1    1                1              -
Container   memory    200Mi  1Gi  1Gi              1Gi            -

As you can see above that even though we did not specified the default values the resource type container, they are automatically set as per the maximum available values. This is again not a desired behavior. You would often want to set default value as somewhat middle between min and max, so that you can accommodate more than one resource of its type.

To set default values for allocation of cpu and memory, we can use property default:

apiVersion: v1
kind: LimitRange
metadata:
  name: lr-demo
spec:
  limits:
  - type: Container
    max: 
      memory: 1Gi
      cpu: 1
    min:
      memory: 200Mi
      cpu: "200m"
    default:
      memory: 250Mi
      cpu: "250m"

After this, save and apply the configuration:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl apply -f limit-range.yaml -n lrdemo
limitrange/lr-demo configured

cloud_user@d7bfd02ab81c:~/workspace$ kubectl describe limitrange lr-demo -n lrdemo
Name:       lr-demo
Namespace:  lrdemo
Type        Resource  Min    Max  Default Request  Default Limit  Max Limit/Request Ratio
----        --------  ---    ---  ---------------  -------------  -----------------------
Container   cpu       200m   1    250m             250m           -
Container   memory    200Mi  1Gi  250Mi            250Mi          -

Create a Pod without specifying resource constraints

If we create a pod in a namespace, where limit range is defined, and if we have not defined any resource constraints in pod manifest, it will be allocated resources as per default values specified in the policy. For example, consider below pod manifest:

apiVersion: v1
kind: Pod
metadata:
  name: pod-lr-demo
spec:
  restartPolicy: Never
  containers:
  - name: ui-demo
    image: docker.io/mohitgoyal/demo-ui01:1.0.0
    ports:
    - name: http
      containerPort: 80
      protocol: TCP

Now, create the pod and view the limits applicable to it:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl apply -f pod-limits.yaml -n lrdemo
pod/pod-lr-demo created

cloud_user@d7bfd02ab81c:~/workspace$ kubectl describe pod pod-lr-demo -n lrdemo
Name:         pod-lr-demo
Namespace:    lrdemo
Priority:     0
Node:         k3d-worker-0/172.18.0.3
Start Time:   Sat, 01 May 2021 22:41:31 +0000
Labels:       <none>
Annotations:  kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container ui-demo; cpu, memory limit for container ui-demo
Status:       Running
IP:           10.42.1.11
IPs:
  IP:  10.42.1.11
Containers:
  ui-demo:
    Container ID:   containerd://de28174f20e0f098dfe937fc1446da0eadbb0c1db7d7b1f5c37c1ae2e2a4c0f4
    Image:          docker.io/mohitgoyal/demo-ui01:1.0.0
    Image ID:       docker.io/mohitgoyal/demo-ui01@sha256:29fcddd0b6b1a6e70b1a40b19834ef783d27622bb92461a42686dac1d81dd514
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sat, 01 May 2021 22:41:39 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     250m
      memory:  250Mi
    Requests:
      cpu:        250m
      memory:     250Mi
    Environment:  <none>

Create a Pod with resource constraints defined

To specify a request for a Container, include the resources:requests field in the Container’s resource manifest. To specify a limit, include resources:limits:

apiVersion: v1
kind: Pod
metadata:
  name: pod-lr-demo
spec:
  restartPolicy: Never
  containers:
  - name: ui-demo
    image: docker.io/mohitgoyal/demo-ui01:1.0.0
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
    resources:
      requests: 
        cpu: "250m"
        memory: "300Mi"
      limits:
        cpu: "500m"
        memory: "500Mi"

Note that requests specifies a minimum amount of resources that must be allocated to the container. The minimum amount of resources for the pod is equal to the sum of the all the minimum amount of resources for the containers within it. This parameter is used by Kubernetes scheduler, to schedule the pod onto a node.

While the requests is used to specify a minimum, limits is used to specify the maximum. If we do not defined the limits, it means the container is free to use as much as maximum amount of resources allocated as per the max resources defined in the limit range policy or maximum amount of resources available on the node (if limit range policy is not defined).

Lets apply the pod manifest and check resource allocated with kubectl describe:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl apply -f pod-limits.yaml -n lrdemo
pod/pod-lr-demo created

cloud_user@d7bfd02ab81c:~/workspace$ kubectl describe pod pod-lr-demo -n lrdemo
Name:         pod-lr-demo
Namespace:    lrdemo
Priority:     0
Node:         k3d-worker-0/172.18.0.3
Start Time:   Sun, 02 May 2021 06:57:12 +0000
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.42.1.13
IPs:
  IP:  10.42.1.13
Containers:
  ui-demo:
    Container ID:   containerd://47396b9ce2c6a796ffb1ac0e291d1018abee484c86513b2ae5faefc922648a35
    Image:          docker.io/mohitgoyal/demo-ui01:1.0.0
    Image ID:       docker.io/mohitgoyal/demo-ui01@sha256:29fcddd0b6b1a6e70b1a40b19834ef783d27622bb92461a42686dac1d81dd514
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 02 May 2021 06:57:13 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  500Mi
    Requests:
      cpu:        250m
      memory:     300Mi
    Environment:  <none>

Crossing the limits defined

As we discussed above, While the requests is used to specify a minimum, limits is used to specify the maximum. If we do not defined the limits, it means the container is free to use as much as maximum amount of resources allocated as per the max resources defined in the limit range policy or maximum amount of resources available on the node (if limit range policy is not defined). If the container tries to allocate more memory or cpu resources than defined limits, it becomes a target for termination by Kubelet.

Lets create a pod with below manifest:

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo-2
spec:
  restartPolicy: Always
  containers:
  - name: memory-demo-2-ctr
    image: polinux/stress
    resources:
      requests:
        memory: "250Mi"
      limits:
        memory: "500Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "550M", "--vm-hang", "1"]

If we see the pod status after some time, it will be marked as OOMKilled:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl apply -f pod-oom.yaml -n lrdemo
pod/memory-demo-2 created

cloud_user@d7bfd02ab81c:~/workspace$ kubectl get pod memory-demo-2 -n lrdemo
NAME            READY   STATUS      RESTARTS   AGE
memory-demo-2   0/1     OOMKilled   2          35s

The container can be restarted based on the restartPolicy defined for it. The default restart policy is Always. So kubelet will try to restart it. However, since it will again crosses the memory limit, it will be killed again and it goes like this. After some time, the Kubelet will mark the status as BackOff and wait before attempting it all over again:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  71s                default-scheduler  Successfully assigned lrdemo/memory-demo-2 to k3d-worker-0
  Normal   Pulled     69s                kubelet            Successfully pulled image "polinux/stress" in 1.761727056s
  Normal   Pulled     67s                kubelet            Successfully pulled image "polinux/stress" in 1.774770704s
  Normal   Pulled     49s                kubelet            Successfully pulled image "polinux/stress" in 1.772412722s
  Normal   Pulling    22s (x4 over 71s)  kubelet            Pulling image "polinux/stress"
  Normal   Pulled     20s                kubelet            Successfully pulled image "polinux/stress" in 1.766717291s
  Normal   Created    20s (x4 over 69s)  kubelet            Created container memory-demo-2-ctr
  Normal   Started    20s (x4 over 69s)  kubelet            Started container memory-demo-2-ctr
  Warning  BackOff    6s (x6 over 65s)   kubelet            Back-off restarting failed container

If we choose to define restartPolicy as Never, Kubelet will not attempt to restart the container, if its failed. For example, consider below manifest:

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo-2
spec:
  restartPolicy: Never
  containers:
  - name: memory-demo-2-ctr
    image: polinux/stress
    resources:
      requests:
        memory: "250Mi"
      limits:
        memory: "500Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "550M", "--vm-hang", "1"]

Events associated with above pod would be:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  70s   default-scheduler  Successfully assigned lrdemo/memory-demo-2 to k3d-worker-0
  Normal  Pulling    69s   kubelet            Pulling image "polinux/stress"
  Normal  Pulled     67s   kubelet            Successfully pulled image "polinux/stress" in 1.78066592s
  Normal  Created    67s   kubelet            Created container memory-demo-2-ctr
  Normal  Started    67s   kubelet            Started container memory-demo-2-ctr

Specifying Requests that are too big

The memory request for the pod is the sum of the memory requests for all the containers in the pod. Likewise, the memory limit for the pod is the sum of the limits of all the containers in the pod. Same goes for the cpu’s.

Pod scheduling is based on requests. A pod is scheduled to run on a Node only if the Node has enough available resources to satisfy the pod’s resource request and its within the limit range defined for that namespace.

Lets consider the below pod manifest where we are requesting more resources than the limit range defined:

apiVersion: v1
kind: Pod
metadata:
  name: pod-lr-demo
spec:
  restartPolicy: Never
  containers:
  - name: ui-demo
    image: docker.io/mohitgoyal/demo-ui01:1.0.0
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
    resources:
      requests: 
        cpu: 2
        memory: "1500Mi"
      limits:
        cpu: 2
        memory: "2500Mi"

If we try to create pod, kubectl will deny the request as it crosses the threshold defined:

cloud_user@d7bfd02ab81c:~/workspace$ kubectl apply -f pod-oor.yaml -n lrdemo
Error from server (Forbidden): error when creating "pod-oor.yaml": pods "pod-lr-demo" is forbidden: [maximum cpu usage per Container is 1, but limit is 2, maximum memory usage per Container is 1Gi, but limit is 2500Mi]

This is an easy situation to avoid, as the error message is pretty helpful. In a more likely scenario, when several users or team share a cluster, they are segregated into different namespaces. To divide the resources fairly among the teams, an admin can use the ResourceQuota object.

A resource quota, defined by a ResourceQuota object, provides constraints that limit aggregate resource consumption per namespace. It can limit the quantity of objects that can be created in a namespace by type, as well as the total amount of compute resources that may be consumed by resources in that namespace.

It is more likely that you would be crossing the threshold defined for the namespace, compared to crossing policy limits.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s