Posted in

GitHub 拉取请求揭示团队开发习惯_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:GitHub、Pull Requests、Dev Habits、Workflow、Code Review

2. 总结:研究人员通过分析 GitHub 项目的问题和拉取请求,发现不同的工作流类型和行为模式,揭示团队开发习惯,指出其对项目的影响,提出优化工作流程的建议。

3. 主要内容:

– 研究背景

– 重复问题和竞争的拉取请求影响团队,沟通和工作分配方式可能是问题所在。

– 研究方法

– 研究 56 个 GitHub 项目,将问题和拉取请求表示为节点和边,构建图模型,还收集作者和时间戳等。

– 发现的工作流类型

– 包括简单解决问题、竞争拉取请求、重复问题等 8 种,每种与工作实践相关。

– 如竞争拉取请求可能暗示沟通不佳,而分解拉取请求对大项目有积极作用。

– 项目影响与建议

– 工作流类型与项目成熟度有关,可用于改进代码审查和项目管理。

思维导图:

文章地址:https://thenewstack.io/what-github-pull-requests-reveal-about-your-teams-dev-habits/

文章来源:thenewstack.io

作者:Joab Jackson

发布时间:2024/6/24 11:47

语言:英文

总字数:1120字

预计阅读时间:5分钟

评分:84分

标签:CI/CD,DevOps


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

Overwhelmed by duplicate issues being filed on your GitHub project? Or by competing pull requests that are chewing up your team’s time? Your communication style, or way of delegating work, may be parts of the problem, researchers have found.

As any project manager knows, developers work by issues and pull requests (PR) on git and git-services such as GitHub and GitLab.

A group of researchers from Federal University of Pará, Brazil and University of British Columbia studied these coding behaviors, charting them on a graph to see what “hidden patterns” could be found.

Searching for these patterns in your own team’s development may reveal areas where workflow can be optimized.

“If your project has a disproportionate amount of competing PRs, or ‘duplicate issue hubs,’ it might be assigned to revisit your code review or bug reporting practices,” noted Emilie Ma, one of the researchers who spoke at the Linux Foundation Open Source Summit earlier this year.

How Researchers Track GitHub Behavior?

The team looked at 56 GitHub projects, capturing all the issues and PRs these projects generated. In a graph model (captured in Neo4j), issues and PRs were represented as nodes, and the links between them were represented as edges.

In a git workflow, issues are created to identify work to be done. The resulting code created is then bundled into PRs that then are typically reviewed before being merged into the core body of code. In an ideal world, a single PR resolves the issue.

In terms of links, issues could be open (meaning work or discussion was under way), or they can be closed. Same with PRs, except that once finished, they hit another status, being merged. Both issues and PRs can be duplicates. Authors and timestamps were also collected.

“This graph-based approach provides a window into a set of collaborative software engineering practices that have not been previously described,” the researchers wrote.

In the process, they built a visualization tool, WorkflowsExplorer to display the results.

Previous studies looked at issues and pull requests independently, though there is a value in studying them in tandem. “Issues and PRs are coupled in practice:
Issues are frequently resolved with PRs, and PRs are associated with Issues,” the researchers wrote.

A graphic showing the the methodology ("Revealing Software DevelopmentWork Patterns withPR-Issue Graph Topologies")

The researchers’ methodology (“Revealing Software Development Work Patterns with
PR-Issue Graph Topologies”)

Basic GitHub Workflow Patterns

As a result of these labors, the researchers found eight distinct workflow types, or behavior patterns, which made up over 1,000 instances of dev actions.

“Each of these workflow type definitions is associated with a work practice,” Ma said.

Graphs of the different types of workflow patterns, or behaviors, found by the research team (“Revealing Software Development Work Patterns with
PR-Issue Graph Topologies”).

Not surprisingly, 35.7% of relationship types were of a simple resolution to an issue. But there were lots of other patterns, some good and others not so much.

Here is one workflow type they found, “Competing PRs”: Two or more coders separately propose a feature, and each submits a PR.

In the case of Competing PRs, “Contributors tend to be overeager to contribute their own implementations of a task without otherwise communicating,” Ma said.

That only one of the PRs has been accepted suggests that the project has less-than-optimal communications, as there is duplicate work going on. One PR may be rejected because it hampers performance too much, but another is accepted.

On the upside, however, this behavior allows the project can “be more picky” about accepting PRs, Ma said.

Another pattern: Duplicate Issues. Here multiple issues are raised, independent of one another. If you get a few duplicate issue, you have a “duplicate issue hub,” Ma explained. This is another potential negative for the project.

Breaking changes is a frequent cause of duplicate issue hubs. They are a sign that the project can be more articulate in its messaging about upcoming changes.

“Duplicate issue hubs tend to arise by contributors aren’t aware of the work being going going on in a project, or if they just haven’t bothered to search through previous issues. And this causes additional maintenance burden,” Ma said.

“It might be assigned to reevaluate how you’re messaging, the change that’s causing those duplicate issues to better inform and users,” Ma said.

Overall duplicate issues come up less frequently than you would guess, however. The researchers only 15 instance of duplicate issue nodes across the 90,000 nodes studied.

She pointed to one Apache project that crated a one weekly bot to assemble an issue of all the PRs that were merged that week, letting everyone know what updates have been made.

An overeager developer may lead to another problematic pattern, that of solving several issues in a single PR (Divergent PR). This can slow down the team because these hydra-headed PRs will require more time to review, as the reviewer may not be conversant in all the issues being addressed.

Not all patterns are problematic, though.

For adding big features, a shop may use Decomposed PR, which involves not one but multiple PRs chained together. Each one is part of a job (“frontend”) and may rely on other PRs (“backend”) to complete the issue. Often, these are completed by a single author, who can submit one PR, then start on another one while the first is being reviewed.

This approach is often regarded as a positive pattern, as it makes code easier to write, review and commit, especially for larger projects.

What Workflow Types Say About Your Project

Overall, workflow types were found in all the projects studied, though larger projects had more of these patterns. The largest of these projects had more than 150 workflow types.

“There’s a link between the maturity of a project and its need for structured and highly organized collaboration that manifests itself in these workflow types,” Ma said.

To validate their findings, the research team interviewed a number of project developers, who saw value in this approach in helping improve code review and project management practices. Divergent PRs, for example, could be a signal for more code review prioritization.

“Think of this PR/Issue Graph as a sort of Grafana to monitor your project’s collaboration health, [one] that can help identify problem areas and serve as a global reference point to understand your project as a whole,” Ma said.

The paper detailing all the work, “Revealing Software Development Work Patterns with PR-Issue Graph Topologies,” will be presented July 18 toThe ACM International Conference on the Foundations of Software Engineering taking place in Brazil. Cleidson de Souza, Jesse Wong, Dongwook Yoon, and Ivan Beschastnikh were the other authors in this work.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTubechannel to stream all our podcasts, interviews, demos, and more.

GroupCreated with Sketch.