包阅导读总结
1. 关键词:GitHub Copilot、Fine-tuned Models、Enterprise、Customization、Security
2. 总结:GitHub Copilot Enterprise 的微调模型现处于有限公开测试阶段。其旨在满足各组织需求,通过定制提升相关性和个性化协助,能适应独特编码环境,且重视数据安全与隐私,用户可加入等待列表参与。
3. 主要内容:
– GitHub Copilot Enterprise
– 使命是打造适合各组织的 AI 助手,整合功能增强协助。
– 微调模型有限公开测试,企业用户可用自有代码库和编码实践定制。
– 微调模型的优势
– 基于研究,GitHub Copilot 已对开发有积极影响。
– 微调是定制的大跨越,适应组织独特编码环境,熟悉各种元素,提供更合需求的代码建议。
– 定制满足特定需求
– 依赖内部 API 等的组织均可受益。
– 减少代码调整,助新成员快速上手,开发者更专注创作。
– 微调模型的实现
– 用 LoRA 方法定制,更高效且结合团队交互。
– 基于 Azure OpenAI 服务,自动升级基础模型。
– 安全和隐私
– 采取关键步骤保障数据安全,数据归用户所有且模型私密。
– 训练数据临时使用,流程结束后移除,确保正确模型用于代码完成。
– 参与有限公开测试
– 逐步从等待列表中接纳客户,未加入的可点击链接加入。
思维导图:
文章来源:github.blog
作者:Shuyin Zhao
发布时间:2024/9/10 16:00
语言:英文
总字数:986字
预计阅读时间:4分钟
评分:89分
标签:GitHub Copilot,微调模型,AI 助手,定制化,企业软件
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
Our mission is to create an AI assistant that’s not only ubiquitous and trustworthy but also uniquely tailored to each organization’s needs. By integrating features like repository indexing and knowledge bases, we’ve enhanced GitHub Copilot to deliver more contextual and personalized assistance.
Now, we’re taking this mission a step further with the introduction of fine-tuned models, available in limited public beta for Copilot Enterprise users today. This new capability lets enterprise users customize the Copilot with their proprietary codebases and coding practices, providing increased relevance, quality, and consistent code completion support that’s aligned with their specific needs.
The power of a fine-tuned GitHub Copilot
GitHub Copilot is already making a significant impact on how developers and organizations build software, with impressive results. Our research with Accenture shows that developers using Copilot experienced an 8% increase in pull requests, a 15% boost in pull request merge rates, and an 84% increase in build success rate. The research also found that 90% of developers were more fulfilled in their job when using GitHub Copilot, and 95% said they enjoyed coding more with the help of Copilot. These results highlight the effectiveness Copilot has in driving productivity, code quality, and developer happiness across organizations.
Fine-tuned models represent the next big leap in customization, now extending these capabilities directly into the in-line code completion experience. By training Copilot on your private codebases and also incorporating telemetry from how your team interacts with Copilot suggestions, fine-tuned models enable Copilot to adapt to your organization’s unique coding environment with latency that works seamlessly in real-time coding. Copilot becomes intimately familiar with your modules, functions, rare languages like legacy or proprietary languages, and internal libraries—delivering code suggestions that are not just syntactically correct, but more deeply aligned with your team’s coding style and standards.
Customize GitHub Copilot for your needs
Any organization that relies on internal APIs, specialized frameworks, proprietary languages, or strict coding styles can benefit from fine-tuned models. For example, finance organizations that work with legacy languages like COBOL can leverage fine-tuned models to address their specific coding requirements. On the other hand, sectors such as technology or healthcare, which often rely on internal libraries to enforce compliance and security standards—like ensuring cloud resources are deployed in line with organizational policies—can see meaningful improvements in coding accuracy and efficiency. Whether your team is operating in these industries or others, fine-tuned models help Copilot become a powerful, customized tool tailored to your unique needs.
With fine-tuned models, your developers receive code that requires fewer adjustments, which can help new team members onboard more quickly and experienced developers focus more on creating rather than correcting. This is a major step forward in making Copilot an even more helpful tool for your organizations, by providing more relevant, high-quality, and consistent coding support.
How we fine-tune custom models for GitHub Copilot
It’s true that we’ve been on a customization journey with GitHub Copilot, and we know that how we implement customization really matters. Copilot Enterprise first received repository indexing and knowledge bases—both powered by retrieval-augmented generation (RAG), a lightweight retrieval method used to gather information from sources outside of the model’s original training data without needing to retrain the entire model. While RAG has been effective for enhancing the chat experience with up-to-date outputs, it doesn’t meet the performance demands required for real-time code completion. With fine-tuning, we’re bringing customization to the code completion experience for the first time, allowing Copilot to deliver contextualized suggestions with the speed necessary for inline coding.
We customize each model using the LoRA (Low-Rank Adaptation) method, which fine-tunes a smaller subset of the most important model parameters during the supervised training phase, making the model more manageable and efficient. For users, this means faster and more affordable training compared to traditional fine-tuning techniques. Additionally, the fine-tuning process incorporates insights from how your team interacts with suggestions from Copilot, ensuring that the model aligns more closely with your organization’s specific needs.
Every time you train a custom model, it starts with our base model for in-line code completion. As we upgrade our base models, future custom model retrainings will automatically roll forward to these new models as soon as fine-tuning is available. The fine-tuning process is powered by the Azure OpenAI Service, which provides scalability and security throughout the training pipeline.
Security and privacy in fine-tuned models
We know privacy and security are paramount, so we’ve taken critical steps to incorporate data security steps for fine-tuned models at the scale that enterprises need. Importantly, your data is always yours. It is never used to train another customer’s model, and your custom model remains private, giving you full control.
When you initiate a training process, your repository data and telemetry data are tokenized and temporarily copied to the Azure training pipeline. Some of this data is used for training, while another set is reserved for validation and quality assessment. Once the fine-tuning process is complete, the model undergoes a series of quality evaluations to ensure it outperforms the baseline model. This includes testing against your validation data to confirm that the new model will improve code suggestions specific to your repositories.
If your model passes these quality checks, it is deployed to Azure OpenAI. This setup allows us to host multiple LoRA models at scale while keeping them network isolated from one another. After the process is complete, your temporary training data is removed from all surfaces, and data flow resumes through the normal inference channels. The Copilot proxy services ensure that the correct custom model is used for your developers’ code completions.
How to participate in the limited public beta
Fine-tuned models are currently in limited public beta, and we are gradually onboarding customers from our waitlist. If you’re already on the waitlist, we sincerely appreciate your patience as we proceed with the onboarding process. If you haven’t joined the waitlist yet, you can do so by clicking here.