Posted in

GitHub 如何提升代码推送处理的可靠性_AI阅读总结 — 包阅AI

包阅导读总结

1.

关键词:GitHub、Code Push、Reliability、Upgrade、Architecture

2.

总结:GitHub 为提升代码推送处理的可靠性和效率进行了技术升级,将原本的整体作业分解为多个独立并行进程,还引入了新的 Kafka 主题,改进了相关机制,新架构提高了稳定性,减少了问题影响范围和依赖。

3.

主要内容:

– GitHub 对代码推送进行技术升级以提高可靠性和效率

– 背景:代码推送到 GitHub 会触发系列动作,内部有超 60 个流程被激活

– 之前问题:通过单一的 RepositoryPushJob 处理,重试困难,早期错误会影响后续步骤

– 新的代码推送处理方式

– 分解为多个独立并行进程

– 引入新 Kafka 主题广播推送事件

– 分析分类处理任务,为各任务设置新背景作业和重试设置

– 利用内部系统响应 Kafka 事件,进行多项改进

– 成果和影响

– 问题影响范围缩小,稳定性提高

– 明确责任归属,方便团队添加和迭代功能

– 处理更可靠

思维导图:

文章地址:https://www.infoq.com/news/2024/06/github-push-process-enhancement/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

文章来源:infoq.com

作者:Aditya Kulkarni

发布时间:2024/6/28 0:00

语言:英文

总字数:606字

预计阅读时间:3分钟

评分:88分

标签:github 推送过程增强,架构与设计,DevOps,设计,架构


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

GitHub has rolled out several technical upgrades to enhance the reliability and efficiency of code pushes, one of the frequent actions performed by developers on the platform. This move addresses potential issues and aims to provide a smoother experience for users who regularly push code to GitHub.

William Haltom, software engineer at GitHub, elaborated on thebackground of this technical upgrade. To set the ground, Haltom shared that pushing code to GitHub sets off a series of actions, such as synchronizing pull requests, dispatching webhooks, triggering workflows, installing apps, publishing GitHub Pages, and updating Codespaces configuration. In addition, over 60 internal processes within GitHub are activated with each push, enabling different features and automated tools for developers.

Previously, handling all the actions triggered by a code push was done through a monolithic background job known as the RepositoryPushJob. This job, inside GitHub’s Ruby on Rails monolith, sequentially executed all push processing logic. However, there were issues due to its size and complexity. The retrying individual tasks within the job was difficult, and most steps weren’t retried at all.

This lack of reliable retry mechanisms meant that errors in the early stages of the job could cascade and impact subsequent steps, creating a wide range of potential problems.

Source: How we improved push processing on GitHub

GitHub revamped its code push process by breaking down the long, sequential job into multiple independent, parallel processes. To achieve this, they implemented a new Kafka topic to broadcast push events. Then, they analyzed and categorized the numerous push processing tasks based on their owning services or logical relationships, such as dependencies and retry requirements.

Each group of tasks was assigned to a new background job with a designated owner and appropriate retry settings. These jobs were then configured to be triggered by the new Kafka events.

For this architecture, GitHub utilized an internal system to queue background jobs in response to Kafka events. Several improvements were made, including developing a reliable publisher for the Kafka events, setting up a dedicated worker pool to manage the increased number of jobs, enhancing observability to monitor the push event flow, and establishing a system for consistent feature flagging to ensure a safe and controlled rollout of the new system.

Source: How we improved push processing on GitHub

GitHub recently made news byintroducing Arm64 support on GitHub Actions, providing developers with Arm-built images to release their software on Arm architecture. This announcement sparked a conversation within the tech community on Hacker News. Obviyus, a GitHub and HN user, expressed their excitement for the introduction of Arm64 support, stating that they had been relying on self-hosted Arm runners for their projects. They noted how compiling code on their small Arm VPS could significantly slow down other tasks and welcomed the addition of official Arm64 support as a much-needed improvement.

Earlier this year, one of the Hacker News posts also discussed the Copilot Workspace, a tool designed to streamline the development process by enabling developers to use natural language to brainstorm, plan, code, test, and execute projects.

Haltom further explained the results of the architectural revamp, stating that the smaller, decoupled processes have led to a reduced blast radius for problems. Issues with one part of the push handling logic no longer cascade and impact other areas, leading to improved stability and reliability. This decoupling has also decreased dependencies.

Additionally, the new architecture has clarified ownership, distributing responsibility for push processing code among more than 15 service owners. This allows teams to add and iterate on push functionality without unintended consequences for others. Lastly, the smaller, less complex jobs allow for more reliable push processing.