Adidas 平台团队如何降低运行 Kubernetes 集群的成本_AI阅读总结

包阅导读总结

关键词：Adidas、Kubernetes 集群、成本降低、平台团队、优化措施

总结：Adidas 平台团队通过多措施将运行 Kubernetes 集群在 AWS 的成本降低多达 50%，包括引入 Karpenter 降低 EC2 实例成本、创建自动的垂直 Pod 自动缩放器（VPAs）、设置默认 VPA 值、利用工具在非办公时间缩放资源等，并分享了成功优化成本的关键考虑因素。

主要内容：

– 降低 Kubernetes 集群运行成本

– 引入 Karpenter 降低 EC2 实例成本

– 动态配置计算资源

– 优化集群资源利用

– 整合现有工作流

– 自动创建垂直 Pod 自动缩放器（VPAs）

– 利用 Kyverno 为开发和暂存集群生成默认 VPAs

– 配置 Kyverno 政策进行检查和验证

– 设置默认 VPA 值

– 控制资源请求，设定最小和最大允许值

– 存在一些局限性

– 非办公时间缩放资源

– 利用 kube-downscaler 调整副本

– 解决未充分利用节点问题

– 实施 Kyverno 政策

– 建立清理政策

– 成功成本优化的关键考虑因素

– 确保足够的节点容量

– 设置适当的 VPA 配置值

– 告知用户变化

– 保持全面监控

– 相关示例和参考

– 其他组织降低云成本的例子

– 应用优化与云成本和可持续性

思维导图：

文章地址：https://www.infoq.com/news/2024/07/adidas-kubernetes-cost-reduction/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

文章来源：infoq.com

作者：Claudio Masolo

发布时间：2024/7/31 0:00

语言：英文

总字数：857字

预计阅读时间：4分钟

评分：91分

标签：Kubernetes,AWS,云成本管理,Karpenter,垂直 Pod 自动伸缩器

以下为原文内容

本内容来源于用户推荐转载，旨在分享知识与观点，如有侵权请联系删除联系邮箱 media@ilingban.com

In a recent Medium post, platform engineer Iya Langdisclosed how adidas reduced the costs of running Kubernetes clusters in AWS by up to 50%.

The multi-pronged approach the adidas team took can be useful for platform engineering teams in many other organizations, as a recent CNCF report stated that Kubernetes has driven cloud spending up for 49% of respondents.

The first measure introduced by the team focused on lowering EC2 instance costs. To achieve this, they implemented Karpenter, an AWS-developed cluster autoscaler that adjusts node counts based on application demand. Karpenter’s features include:

Dynamically provisions compute resources (EC2 instances) based on real-time pod scheduling needs. This ensures a cluster has the right nodes at the right time to handle application workloads.
Optimizes cluster resource utilization by:
- Launching only the necessary instance types to meet pod requirements.
- Identifying opportunities to remove under-utilized nodes.
- Replacing expensive instances with more cost-effective options when possible. Leverages spot instances (unused AWS compute capacity available at a lower cost) by identifying the least expensive options with minimal interruption risk.
- Consolidating workloads onto more efficient computing resources.
Integrates seamlessly with existing Kubernetes workflows. You can configure various aspects of its behavior, including:
- The types of EC2 instances used for provisioning.
- Launch template specifications for node configuration.
- Scaling policies to tailor resource allocation to specific needs.

Karpenter currently supports only AWS, but there are plans to include other cloud providers.

The second measure introduced by the adidas team was the automatic creation of Vertical Pod Autoscalers (VPAs) to improve resource utilization. In particular, the platform team automated the creation of Vertical Pod Autoscalers (VPAs) for all workloads in development and staging clusters. Adidas chose Kyverno, a policy tool typically used for application security, to generate default VPAs.

Kyverno is a policy engine that operates as a dynamic admission controller within a Kubernetes cluster. It handles validating and mutating admission webhook HTTP callbacks from the Kubernetes API server, applying relevant policies to enforce or reject admission requests. Kyverno policies can target resources based on various criteria, including resource kind, name, label selectors, etc. Mutating policies can be specified using overlays (similar to Kustomize) or JSON Patch formats. Validating policies use an overlay syntax and support pattern matching and conditional (if-then-else) logic. Policy enforcement results are recorded as Kubernetes events. For requests that are allowed or predate implementing a Kyverno policy, Kyverno generates Policy Reports. These reports provide a running list of resources matched by the policy, their statuses, and additional details.

adidas Kubernetes cost reduction architecture

Kyverno architecture

The adidas team configured the Kyverno policies to:

Check if the resource has a Horizontal Pod Autoscaler (HPA) or VPA.
Verify if automatic VPA creation is permitted for the resource and its namespace.

The third measure introduced was setting default VPA values. Configuring VPAs without prior knowledge of the applications posed a challenge. The Adidas team decided to control only resource requests to prevent application disruptions during usage spikes. They set minimum allowed values to very low levels (e.g., 10 millicores for CPU and 32 megabytes for memory) and set maximum values based on original requests or limits to ensure stability. For applications with multiple containers, the team avoided maxAllowed values to prevent potential issues.

Implementing default VPAs resulted in a 30% reduction in CPU and memory usage across development and staging clusters. However, some limitations exist:

VPAs cannot work with HPAs using resource metrics.
Older Java applications might not benefit due to fixed heap sizes.
Certain applications require uninterrupted operation, necessitating an opt-out option.

adidas Kubernetes cost reduction: CPU and memory usage

CPU and memory usage after the VPA creation on a big cluster

The adidas team also aimed to reduce their CO2 footprint and save money by scaling down the resources during non-office hours. They utilized kube-downscaler. This tool adjusts replicas based on a predefined schedule, allowing customization for specific applications.

After implementing all of these measures, the team faced the problem of underutilized nodes. To address the issue, they implemented some Kyverno policies to prevent problematic Pod Disruption Budget (PDB) configurations that hinder node removal. A cleanup policy was also established to remove invalid PDBs periodically.

The adidas team implemented the cost optimization measures described for the non-production clusters, with PDB policies applied across all environments. This implementation led to a 50% reduction in monthly costs for development and staging clusters. They adopted an opt-in model for production clusters, allowing application teams to choose their tools and configurations.

The Adidas team shared some key considerations for successful cost optimization:

Ensuring sufficient node capacity to handle increased pod density.
Setting appropriate VPA configuration values to balance cost savings and application performance.
Informing users about changes to prevent incident-related disruptions.
Maintaining comprehensive monitoring to measure impact.

The team also acknowledged that cost optimization is an ongoing process requiring continuous adjustments.

Additional examples of organizations attempting to reduce cloud costs can be found on Reddit, e.g. “Reducing Cloud Costs on Kubernetes Dev Envs by Over 95%” and “How to reduce the AWS costs?”

Application optimization can also reduce cloud costs and improve sustainability. Erik Peterson provided guidance about this at QCon SF and wrote a related article for InfoQ, “Million Dollar Lines of Code—an Engineering Perspective on Cloud Cost Optimization.”

分类

Adidas 平台团队如何降低运行 Kubernetes 集群的成本_AI阅读总结 — 包阅AI

以下为原文内容