cotalks.dev

Optimizing Resource Usage in Kubernetes by Carlos Sanchez

(link)
Channel: Devoxx

Summary

Carlos Sanchez discusses practical ways to optimize Kubernetes resource usage at platform scale, based on Adobe Experience Manager Cloud Service. The talk focuses on resource requests and limits, autoscaling strategies, node provisioning, and workload-specific tradeoffs for Java-based services running across many clusters and namespaces. It also covers techniques for reducing cost and cluster pressure without requiring application changes in every team. The main theme is that optimization is not one tool, but a combination of scheduling, scaling, architecture, and workload management choices. The talk compares cluster autoscaling, Karpenter, VPA, HPA, ARM-based nodes, hibernation/suspend-to-zero patterns, and event-driven autoscaling with KEDA. It also warns about Kubernetes control-plane scaling issues and common mistakes such as using etcd or ConfigMaps like a database.

Key Takeaways

  • Kubernetes requests and limits behave differently: requests affect scheduling and relative CPU share, while limits enforce throttling or eviction.
  • CPU limits can cause severe throttling for multi-threaded applications, especially Java services with short bursts of parallel work.
  • Cluster autoscaling and Karpenter reduce overprovisioning, but require headroom, node caps, and awareness of cloud-provider capacity constraints.
  • Vertical Pod Autoscaler can resize resources in place in Kubernetes 1.32+, but Java memory behavior can limit its usefulness.
  • Horizontal Pod Autoscaler works well when the scaling metric matches the workload, such as requests per minute or queue depth.
  • ARM-based nodes provided meaningful cost savings for containerized Java workloads with minimal application changes.
  • Hibernation and scale-to-zero patterns can save substantial cost for intermittently used environments, especially when combined with faster JVM startup techniques.
  • Large-scale Kubernetes platforms must also manage control-plane load, avoiding excessive object counts and watch pressure on the API server.

Sections

Use case: large-scale platform operations

The talk is grounded in Adobe Experience Manager Cloud Service, which runs a large number of customer environments across many Kubernetes clusters and regions. The platform uses operator patterns, init containers, and sidecars to keep application containers simpler and to reduce the need for application-level changes when optimizing resource usage.

How Kubernetes requests and limits work

Requests are used by the scheduler to place pods on nodes and represent guaranteed resources. Limits are enforced by the kernel or kubelet and cap resource usage. For CPU, exceeding limits leads to throttling; for memory, it can trigger container termination; for ephemeral storage, it can cause eviction. The talk emphasizes that CPU requests are not hard caps—they are weighted shares of node CPU time.

CPU throttling and Java workloads

The speaker explains why CPU limits can be problematic for multi-threaded Java services. A container with several active threads can exhaust its CPU quota quickly and then be throttled for the rest of the CFS period, producing high tail latency and request spikes. This makes CPU limits especially important to size carefully for web services and request-driven workloads.

Cluster autoscaler and node provisioning

Cluster autoscaling increases or decreases node count based on unschedulable pods and resource demand. The talk notes the importance of headroom for VM startup delays, max node limits, and cloud-provider capacity or quota constraints. It also highlights cost and availability tradeoffs when scaling across multiple zones and regions.

Karpenter for faster, more flexible node scaling

Karpenter is presented as a newer autoscaler that can provision matching instance types directly from pending pods, without requiring pre-created node groups. It can choose node shapes that better fit CPU/memory ratios, scale faster than the classic cluster autoscaler, and reduce waste by consolidating workloads onto more efficient nodes.

Vertical Pod Autoscaler and in-place resizing

VPA adjusts CPU and memory allocations for existing pods. With in-place pod resizing now stable in Kubernetes 1.32, resources can be increased without restarting pods, though capacity on the target node still matters. The speaker notes that automatic VPA restarts can be disruptive and that Java memory behavior makes memory resizing less straightforward than CPU resizing.

Horizontal Pod Autoscaler and metric choice

HPA is used to add replicas based on metrics such as requests per minute. The talk recommends choosing metrics that reflect real workload pressure rather than raw CPU, because CPU spikes may not correlate with useful throughput and can cause scaling feedback loops. For the speaker’s use case, request rate combined with warm-up handling worked well.

ARM-based nodes

Switching to ARM was described as a relatively easy optimization for containerized Java applications, with reported cost savings and similar performance. The main caveat is verifying that dependencies and binaries are architecture-safe before migrating.

Scale-to-zero and hibernation patterns

For intermittently used customer environments, the platform scales pods down and removes Kubernetes objects that create control-plane load, such as many ingress routes. This goes beyond simply stopping workloads and can significantly reduce cluster pressure. The speaker also references CRaC and container checkpoint/restore efforts as ways to make resume times much faster.

KEDA and event-driven autoscaling

KEDA is used to scale workloads based on external events and metrics such as Kafka queue depth or schedules. It is especially useful when scaling from zero, since CPU-based metrics cannot trigger scale-up if no pod is running. The speaker positions KEDA as a good fit for queue-driven and event-driven systems.

Operational limits and control-plane scalability

The talk closes with warnings about using Kubernetes objects as persistent storage or databases. Too many objects, watches, or mutable ConfigMaps and Secrets can overload the API server and make the cluster unstable. A specific default limit of 110 pods per node is also called out as something operators should check when planning dense scheduling.

Keywords: kubernetes resource optimization, kubernetes requests and limits, cpu throttling in kubernetes, kubernetes memory limits, ephemeral storage eviction, cluster autoscaler, karpenter, vertical pod autoscaler, horizontal pod autoscaler, keda, event-driven autoscaling, in-place pod resizing, java on kubernetes, jvm cpu throttling, arm nodes on kubernetes, kubernetes hibernation, scale to zero, checkpoint and restore, crac for java, kubernetes control plane scaling, etcd performance, pod scheduling, cloud capacity planning, multi-cluster kubernetes, adobe experience manager cloud service

note