包阅导读总结
1. 关键词:
– Kubernetes
– AI 工作流
– 语言模型
– 模型部署
– KAITO
2. 总结:
本文探讨了在生成式 AI 蓬勃发展的背景下,Kubernetes 成为适合 AI/ML 应用的开源平台,Kubernetes AI Toolchain Operator(KAITO)能简化 AI 模型在 Kubernetes 上的部署、缩放和管理工作流,支持多种模型,提供数据安全控制,且在不断发展。
3. 主要内容:
– 生成式 AI 发展迅速,新语言模型在尺寸、性能和用例上发展迅速
– 开源语言模型成本效益高,Kubernetes 适合扩展和自动化 AI/ML 应用
– 容器的优点:便于模型分布、跨环境运行一致、简化版本控制、提供资源隔离、利于扩展
– Kubernetes AI Toolchain Operator(KAITO):
– 由微软开发,能简化 AI 部署
– 工作流原本需数周,用 KAITO 只需几分钟
– 支持多种模型,提供简单接口
– 工作原理:安装后选择预设,自动配置基础设施
– 对有合规要求的应用提供数据安全控制
– 目前可在 Azure 上用 KAITO 配置 GPU,未来将支持其他提供商
– KAITO 最新版本支持模型微调
思维导图:
文章地址:https://thenewstack.io/jumpstart-ai-workflows-with-kubernetes-ai-toolchain-operator/
文章来源:thenewstack.io
作者:Sachi Desai
发布时间:2024/7/26 14:40
语言:英文
总字数:1393字
预计阅读时间:6分钟
评分:87分
标签:AI 部署,Kubernetes,KAITO,开源,AI 自动化
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
Generative AI is booming as industries and innovators look for ways to transform digital experiences. From AI chatbots to language translation tools, people are interacting with a common set of AI/ML models, known as language models in everyday scenarios. As a result, new language models have developed rapidly in size, performance and use cases in recent years.
As an application developer, you may want to integrate a high-performance model by simply making a REST call and pointing your app to an inferencing endpoint for options like Falcon, Mistral, Phi and other models. Just like that, you’ve unlocked the doors to the AI kingdom and all kinds of intelligent applications.
Open source language models are a cost-effective way to experiment with AI, and Kubernetes has emerged as the open source platform best suited for scaling and automating AI/ML applications without compromising the security of user and company data.
“Let’s make Kubernetes the engine of AI transformation,” said Jorge Palma, Microsoft principal product manager lead. He gave the keynote at KubeCon Europe 2024, where AI use cases were discussed everywhere. Palma talked about the number of developers he’s met who are deploying models locally in their own infrastructure, putting them in containers and using Kubernetes clusters to host them.
“Container images are a great format for models. They’re easy to distribute,” Palma told KubeCon. “Then you can deploy them to Kubernetes and leverage all the nice primitives and abstractions that it gives you — for example, managing that heterogeneous infrastructure, and at scale.”
Containers also help you avoid the annoying “but it runs fine on my machine” issue. They’re portable, so your models run consistently across environments. They simplify version control to better maintain iterations of your model as you fine-tune for performance improvements. Containers provide resource isolation, so you can run different AI projects without mixing up components. And, of course, running containers in Kubernetes clusters makes it easy to scale out — a crucial factor when working with large models.
“If you aren’t going to use Kubernetes, what are you going to do?” asked Ishaan Sehgal, a Microsoft software engineer. He is a contributor to the Kubernetes AI Toolchain Operator (KAITO) project and has helped develop its major components to simplify AI deployment on a given cluster. KAITO is a Kubernetes operator and open source project developed at Microsoft that runs in your cluster and automates the deployment of large AI models.
As Sehgal pointed out, Kubernetes gives you the scale and resiliency you need when running AI workloads. Otherwise, if a virtual machine (VM) fails or your inferencing endpoint goes down, you must attach another node and set everything up again. “The resiliency aspect, the data management — Kubernetes is great for running AI workloads, for those reasons,” he said.
Kubernetes Makes It Easier, KAITO Takes It Further
Kubernetes makes it easier to scale out AI models, but it’s not exactly easy. In my article “Bring your AI/ML workloads to Kubernetes and leverage KAITO,” I highlight some of the hurdles that developers face with this process. For example, just getting started is complicated. Without prior experience, you might need several weeks to correctly set up your environment. Downloading and storing the large model weights, upwards of 200 GB in size, is just the beginning. There are storage and loading time requirements for model files. Then you need to efficiently containerize your models and host them — choosing the right GPU size for your model while keeping costs in mind. And there are troubleshooting pesky quota limits on compute hardware.
Using KAITO, a workflow that previously could span weeks now takes only minutes. This tool streamlines the tedious details of deploying, scaling, and managing AI workloads on Kubernetes, so you can focus on other aspects of the ML life cycle. You can choose from a range of popular open source models or onboard your custom option, and KAITO tunes the deployment parameters and automatically provisions GPU nodes for you. Today, KAITO supports five model families and over 10 containerized models, ranging from small to large language models.
For an ML engineer like Sehgal, KAITO overcomes the hassle of managing different tools, add-ons and versions. You get a simple, declarative interface “that encapsulates all the requirements you need for running your inferencing model. Everything gets set up,” he explained.
How KAITO Works
Using KAITO is a two-step process. First, install KAITO on your cluster, and then select a preset that encapsulates all the requirements needed for inference with your model. Within the associated workspace custom resource definition (CRD), a minimum GPU size is recommended so you don’t have to search for the ideal hardware. You can always customize the CRD to your needs. After deploying the workspace, KAITO uses the node provisioner controller to automate the rest.
“KAITO is basically going to provision GPU nodes and add them to your cluster on your behalf,” explained Microsoft senior cloud evangelist Paul Yu. “As a user, I just have to deploy my workspace into the AKS cluster, and that creates the additional CR.”
As shown in the following KAITO architecture, the workspace invokes a provisioner to create and configure the right-sized infrastructure for you, and it even distributes your workload across smaller GPU nodes to reduce costs. The project uses open source Karpenter APIs for the VM configuration based on your requested size, installing the right drivers and device plug-ins for Kubernetes.
For applications with compliance requirements, KAITO provides granular control over data security and privacy. You can ensure that models are ring-fenced within your organization’s network and that your data never leaves the Kubernetes cluster.
Check out this tutorial on how to bring your own AI models to intelligent apps on Azure Kubernetes Service, where Sehgal and Yu integrate KAITO in a common e-commerce application in a matter of minutes.
Working With Managed Kubernetes
Currently, you can use KAITO to provision GPUs on Azure, but the project is evolving quickly. The roadmap includes support for other managed Kubernetes providers. When you use a managed Kubernetes service, you can interact with other services from that cloud platform more easily to add capabilities to your workflow or applications.
Earlier this year, at Microsoft Build 2024, BMW talked about its use of generative AI and Azure OpenAI Service in the company’s connected car app, My BMW, which runs on AKS.
Brendan Burns, co-founder of the Kubernetes open source project, introduced the demo. “What we’re seeing is, as people are using Azure OpenAI Service, they’re building the rest of the application on top of AKS,” he told the audience. “But, of course, just using OpenAI Service isn’t the only thing you might want to use. There are a lot of reasons why you might want to use open source large language models, including situations like data and security compliance, and fine-tuning what you want to do. Maybe there’s just a model out there that’s better suited to your task. But doing this inside of AKS can be tricky so we integrated the Kubernetes AI Toolchain Operator as an open source project.”
Make Your Language Models Smarter with KAITO
Open-source language models are trained on extensive amounts of text from a variety of sources, so the output for domain-specific prompts may not always meet your needs. Take the pet store application, for example. If a customer asks the integrated pre-trained model for dog food recommendations, it might give different pricing options across a few popular dog breeds. This is informative but not necessarily useful as the customer shops. After fine-tuning the model on historical data from the pet store, the recommendations can instead be tailored to well-reviewed, affordable options available in the store.
As a step in your ML lifecycle, fine-tuning helps customize open source models to your own data and use cases. The latest release of KAITO, v0.3.0, supports fine-tuning, and inference with adapters and a broader range of models. You can simply define your tuning method and data source in a KAITO workspace CR and see your intelligent app become more context-aware while maintaining data security and compliance requirements in your cluster.
To stay up to date on the project roadmap and test these new features, check out KAITO on GitHub.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don’t miss an episode. Subscribe to our YouTubechannel to stream all our podcasts, interviews, demos, and more.