包阅导读总结
1.
关键词:TikTok、Monorepo、Sparo、性能提升、Git
2.
总结:TikTok 工程师推出新工具 Sparo 以改善 Monorepo 性能,解决了大规模 Repos 常见的 Git 命令执行缓慢问题,同时介绍了相关优化措施及其他团队的类似努力。
3.
主要内容:
– TikTok 面临随着 TypeScript Monorepo 规模增长带来的性能问题
– 介绍 Monorepo 及使用中的争议
– 工程师遇到如 git clone 耗时久等问题
– 尝试多种缓解方法后创建 Sparo 工具
– 利用“sparse checkout”和“partial clone”改善 Git 命令速度
– 具备如定义工作所需项目集等功能
– 大幅提升性能,如缩短 git clone 和 git checkout 时间
– 其他团队的相关努力
– GitHub 推出 FSMonitor 及优化措施
– Graphite 强调维护可扩展 Git Monorepo 的最佳实践
– TikTok 团队开发 Sparo 的相关情况
– 开源的决策和好处
– 考虑安全影响
– 未来发展计划
思维导图:
文章来源:infoq.com
作者:Matt Saunders
发布时间:2024/9/4 0:00
语言:英文
总字数:935字
预计阅读时间:4分钟
评分:89分
标签:单体代码库,Git 性能,TikTok,Sparo,开源
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
Engineers from TikTok have announced a new tool –Sparo — to help deal with the problems associated with using monorepos, solving many of the performance issues that come with larger repos.
A monorepo is a single git repository that houses multiple projects, from applications to microservices. It includes better visibility, collaboration and tooling standardization across teams. It is a widely-used but often debated technique for an engineering team to move towards monorepos, especially when their code base grows in scale and complexity. However, as monorepos balloon in size, developers can run into significant performance issues when running common Git commands like status, diff, and checkout. TikTok’s front-end team recently faced this challenge as their TypeScript monorepo grew to over 1,000 projects and 200,000 source files.
In a post on TikTok’s developer blog, Adrian Zhang, an engineer on TikTok’s front-end infrastructure team, explained the issues that engineers were experiencing with monorepo performance:
“People with slow internet frequently reported git clone taking more than 40 minutes. It is a scalability problem: Git stores everything forever, which means a high-traffic repository will steadily increase in every metric – file writes, disk storage, download size. Git will eventually become slow for everyone – it’s just a question of when!”
The TikTok team tried various techniques to mitigate the slowness, including partial clone, shallow clone, and Git Large File Storage (LFS). However, they ultimately created a new open-source tool named Sparo to address the performance issues.
Sparo leverages two key Git features – “sparse checkout” and “partial clone” – to dramatically improve the speed of common Git commands. “Sparse checkout” allows developers to check out only the subset of files they need rather than the entire repository. “Partial clone” optimizes this further by fetching file contents on demand and excluding irrelevant history.
“Sparo follows the spirit of [Microsoft’s] Scalar and [Twitter’s] Focus, adding a couple other details,” says Zhang in the blog post. “Checkout profiles allow teams to define the set of projects and dependencies their developers typically work on. And we designed the Sparo CLI to be a drop-in replacement for the git CLI, intercepting every command to ensure Git is invoked optimally.”
The team’s rationale for developing Sparo was clear – Git’s built-in features, while powerful, were proving cumbersome for their large-scale monorepo. “When we advised people to configure Git directly, they found it awkward to use,” explained Zhang. “Sparse checkout requires you to determine which folders you need, expressed using cone mode globs that are error-prone. It’s feasible to educate a small team about Git best practices, but when you reach 6-digit merge request IDs, things really need to be as simple as possible.”
With Sparo, the TikTok team achieved significant performance improvements. For example, a git clone operation that previously took 23 minutes was reduced to just over 2 minutes using Sparo. Similarly, a git checkout operation went from 1 minute and 26 seconds to 30 seconds.
The GitHub engineering team has also been working to improve monorepo performance, introducing the new built-in Git file system monitor (FSMonitor) feature in version 2.37.0. FSMonitor reduces the time required for commands like git status by only searching for changes in recently modified files, rather than scanning the entire working tree.
Jeff Hostetler, a software engineer at GitHub, explained that FSMonitor works by registering with the operating system to receive change notification events, so it knows exactly which files have been modified without having to do a full search.
“When FSMonitor is enabled, git status takes less than a second on worktrees with millions of files” – Jeff Hostetler
FSMonitor can be further optimized by enabling the core.untrackedcache feature, which remembers the results of previous untracked file searches. Combined with FSMonitor, this can result in a 10x speedup for the untracked file portion of git status.
Code review software vendor Graphite also emphasizes the importance of best practices for maintaining scalable Git monorepos in a blog post. These include:
- Keeping commit history clean and linear using rebase
- Managing tags and references to prevent performance degradation
- Organizing the directory structure for easy navigation
- Maintaining a clean branch management strategy, such as trunk-based development
“As a Git monorepo grows, commands like git log or git blame can slow down due to the large number of commits. To mitigate this, you can use tools that bypass the performance issues, and carefully manage your refs to ensure operations involving them are not hindered by the sheer volume.” – Greg Foster, Graphite engineer
For the TikTok team, open-sourcing Sparo was an essential step in the development process. “Although it seems natural to open source this project, by January, the growing concerns about Git slowness pressured our team to deliver a fix as quickly as possible,” said Zhang. “We decided to start closed but pursue open source concurrently as long as those efforts didn’t impact our timeline.”
According to Zhang, working on GitHub brought some important benefits, with documentation turning out more professionally and written for a broader audience. “Engineers seem to write better code when they know it’s public! We shared our designs and demos with the Rush Stack community, receiving valuable input from senior engineers at other companies.” added Zhang.
The TikTok team also had to consider security implications when developing Sparo. In the blog post, Zhang explained that TikTok’s approval workflow includes a review by a security expert,which added an interesting new perspective to this project. Looking ahead, the TikTok team plans to focus on two key features for Sparo’s future development: a telemetry plugin system to power monitoring dashboards, and support for other frontend workspaces beyond their current RushJS implementation.