Slack 发布了用于 Kubernetes 有状态部署的操作器_AI阅读总结

包阅导读总结

“`

Slack、Kubernetes、StatefulSets、Bedrock Operator、Deployment

“`

Slack 开发了用于 Kubernetes StatefulSets 的 Bedrock 操作器，以解决部署中的限制和问题，该操作器已在其广泛的基础设施中部署，虽有一定局限性，但实现了更好的控制和集成，未来计划扩大其使用。

– Slack 开发操作器背景

– 为解决管理 StatefulSet 部署的局限性。

– 现有更新策略存在不足。

– Bedrock 操作器的特点

– 基于 Kubebuilder 开发。

– 管理自定义资源 StatefulsetRollout。

– 解决了部署慢、控制不足等问题。

– 集成内部服务发现和提供通知。

– 部署和工作流程

– 工程师在 bedrock.yaml 文件定义配置。

– 通过内部平台启动部署。

– 操作器持续监控和协调状态。

– 提供实时更新和状态报告。

– 操作器的局限性

– 处理大型 StatefulSets 需调整通知系统。

– 存在“版本泄漏”问题。

– 未来计划

– 计划扩大操作器模型的使用。

– 探索其他相关项目。

思维导图：

文章地址：https://www.infoq.com/news/2024/08/slack-kubernetes-operator-bedroc/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

文章来源：infoq.com

作者：Matt Saunders

发布时间：2024/8/7 0:00

语言：英文

总字数：775字

预计阅读时间：4分钟

评分：87分

标签：Kubernetes,有状态部署,Slack,Bedrock Rollout Operator,部署策略

以下为原文内容

本内容来源于用户推荐转载，旨在分享知识与观点，如有侵权请联系删除联系邮箱 media@ilingban.com

Slack, the popular workplace communication platform, has developed a custom Kubernetes operator to address limitations in managing StatefulSet deployments. In an article on Slack’s Engineering blog, Clément Labbe (Senior Software Engineer, Cloud) introduces the Bedrock Rollout Operator, written to offer improved control and features for deploying stateful applications in Kubernetes clusters.

Engineers commonly use StatefulSets to run applications which need persistent storage and unique pod identities. However, Slack’s engineering teams found existing update strategies for StatefulSets to be lacking. The default RollingUpdate strategy, while automated, only updates one pod at a time, leading to slow deployments for applications with numerous pods. The OnDelete strategy allows manual control but lacks advanced features like percent-based rollouts.

Slack developed the Bedrock Rollout Operator using Kubebuilder to meet their internal teams’ needs. This operator manages a custom resource called StatefulsetRollout, which encapsulates the StatefulSet specification along with additional parameters for enhanced functionality.

The Bedrock Rollout Operator solves several fundamental problems:

Slow deployments: It addresses the limitation of the default RollingUpdate strategy, which updates only one pod at a time, making it very slow for applications with many pods.
Lack of control: It provides more controlled rollouts than the native Kubernetes options, allowing for faster percent-based rollouts and the ability to pause rollouts.
Limited rollback capabilities: It enables quicker rollbacks when needed.
Integration gaps: It integrates with Slack’s internal service discovery (Consul) and provides Slack notifications about rollout status, filling gaps in their existing workflow.
Customisation needs: It allows Slack to implement custom rollout logic that fits their specific requirements, which weren’t met by standard Kubernetes features.
Visibility: It improves visibility into the rollout process through real-time Slack notifications and integration with their internal release management UI.
Large-scale management: Although it required some adjustments, the solution helps manage large StatefulSets with up to 1,000 pods.

The operator is deployed across Slack’s extensive Kubernetes infrastructure, which comprises over 200 clusters and manages nearly 100 stateful services.

Slack rollout architecture

The rollout process begins with Slack engineers defining their application configuration in a bedrock.yaml file. When a developer initiates a deployment through Slack’s internal release platform, the Bedrock API transforms this configuration into a StatefulsetRollout resource.

The Bedrock Rollout Operator continuously monitors the StatefulsetRollout resource and reconciles the desired state with the actual state of the cluster. To facilitate the rollout, it performs actions such as creating or updating StatefulSets and terminating pods. Rather than operating in an event-driven fashion, the operator uses a self-enqueuing reconciliation loop. This approach allows for sequential processing of custom resources, reducing the risk of race conditions and simplifying the overall reconciliation process.

The operator provides real-time updates to users through rich-text Slack notifications, which include details such as version numbers and the list of pods being rolled out. Additionally, it communicates with the Bedrock API to report the success or failure of rollouts, ensuring that Slack’s release management UI reflects the status.

While the custom operator has proven effective for Slack’s needs, it does have some limitations. One challenge arose when dealing with extremely large StatefulSets containing up to 1,000 pods. This required modifications to the notification system to avoid rate-limiting problems. Another limitation is the “version leak” problem inherent in using the OnDelete strategy for StatefulSets. In scenarios where rollouts are paused or only partially completed, pods running the previous version that are terminated for reasons other than the rollout may be replaced by pods running the new version. This can lead to gradual, unintended convergence towards a full rollout over time. Slack mitigates this issue by encouraging teams to complete their rollouts promptly.

Slack has achieved greater control over StatefulSet deployments by creating a custom solution that integrates seamlessly with its existing internal systems and communication channels. As Kubernetes evolves, its maintainers may incorporate some of this functionality into core Kubernetes features. However, in the meantime, the flexibility and integration capabilities offered by the operator model are likely to be valuable for organisations with complex deployment needs and custom infrastructure.

Slack plans to expand its use of the operator model for managing Kubernetes deployments. The company is exploring existing CNCF projects such as Argo Rollouts and OpenKruise for non-stateful Deployment resources. Other organisations have also developed rollout operators – for example Grafana Labs offer an operator providing finer-grained control over rollouts.

Other products, such asArgo Rollouts,provide similar functionality, additionally offering blue-green, canary, canary analysis, experimentation, and progressive delivery features but focusing on Deployments.Meanwhile, Flagger offers up Canary Releases, with or without session affinity, blue-green and A/B testing for similar needs. Bikram Kundu at jstobigdata talks through the complexity and limitations of StatefulSets in Kubernetes, also offering a summary of best practice in this area.

分类

Slack 发布了用于 Kubernetes 有状态部署的操作器_AI阅读总结 — 包阅AI

以下为原文内容