
Grow your own way: Introducing native support for custom metrics in GKE
When platform engineers, AI Infrastructure leads and developers think about autoscaling workloads running on Kubernetes, their goal is straightforward: get the capacity they need, when they need it, at the best price. However, while scaling on CPU and memory is simple enough, scaling on application signals like queue depth or active requests is not. Historically, it’s been achieved via a complex sequence of different steps involving monitoring, IAM and specific agent configuration, adding significant operational overhead. Today, we are removing that friction, with native support for custom metrics for the Horizontal Pod Autoscaler (HPA) running on Google Kubernetes Engine (GKE). This is a new feature that elevates custom workload signals to a native GKE capability. The current challenge: The custom metric "tax" If you’ve ever tried to scale a workload based on custom metrics (like active requests, KV Cache or a game server player count), you know this architecture is surprisingly heavy
Continue reading on Google Cloud Blog
Opens in a new tab




