包阅导读总结
1.
关键词:Netflix、Maestro、Workflow Orchestrator、Open Source、Data Workflow
2.
总结:Netflix 开源了其数据科学家和分析师日常使用的工作流编排器 Maestro。Maestro 具有高可扩展性和严格的服务水平目标,基于多种开源技术,支持多种工作流和格式,能应对大规模数据处理需求。
3.
主要内容:
– Netflix 开源工作流编排器 Maestro
– 供内部数据科学家和分析师理解用户行为等大规模数据趋势
– 基于 Apache 2.0 许可证
– Maestro 的特点与功能
– 高度可扩展、可扩展,能满足严格服务水平目标
– 基于 Git、Java 等开源技术
– 可通过 cURL 命令行操作,支持多种格式
– 管理工作流全生命周期,支持多种工作流类型
– Maestro 的诞生
– 原 Meson 在高负载下表现不佳
– Maestro 从设计之初就注重高可扩展性
– 在 AWS 2023 Re:Invent 会议上被进一步介绍
思维导图:
文章地址:https://thenewstack.io/netflix-open-sources-maestro-a-next-gen-data-workflow-engine/
文章来源:thenewstack.io
作者:Joab Jackson
发布时间:2024/8/2 13:29
语言:英文
总字数:556字
预计阅读时间:3分钟
评分:91分
标签:数据工作流引擎,开源,Netflix,工作流编排,可扩展性
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
Video and gaming streaming service Netflix has released as open source the workflow orchestrator that its army of data scientists and analysts use every day to understand user behaviors and other large-scale data-driven trends.
The Maestro workflow orchestrator, released under an Apache 2.0 license, was designed to support hundreds of thousands of workflows and has completed up to 2 million jobs in a single day for the media company.
How Maestro Works
According to company engineers, it is highly scalable, extensible and able to meet strict service level objectives (SLO) even during spikes of traffic.
It is built on top of a range of open source technologies, namely Git, Java (21), Gradle and Docker.
Maestro can be evoked from the cURL command line, which provides the ability to create, run, and delete a workflow and an associated batch of data. The workflow is defined in JSON, and the user’s business logic can be packaged into Docker images, Jupyter notebooks, bash scripts, SQL, Python, and other formats.
Behind the scenes, Maestro manages the entire lifecycle of a workflow, handling retries, queuing, and task distribution to compute engines. Not only does it support Directed Acyclic Graphs (DAGs) — table stakes in the AI-driven world of 2024 — but also cyclic workflows and multiple reusable patterns, through for each loop, sub workflows, and conditional branching.
“It supports a wide range of workflow use cases, including ETL pipelines, ML workflows, AB test pipelines, pipelines to move data between different storages,” a group of Netflix engineers collectively wrote in a recent blog post announcing the release. “Maestro’s horizontal scalability ensures it can manage both a large number of workflows and a large number of jobs within a single workflow.”
Birth of Maestro
Netflix is no stranger to open source software, having released many tools it developed internally as open source. System stress-testing tool Chaos Monkey was released in 2011, and inspired a whole generation of chaos testing tools. Other open source projects that Netflix has spun off include the routing gateway Zuul and the microservices routing engine Conductor, since deprecated.
Netflix first let the world know about Maestro in 2022 in a blog post that explained its origins. The orchestrator then being used, called Meson, was straining under the workloads of thousands of daily jobs, particularly around peak usage time.
“Meson was based on a single leader architecture with high availability. As the usage increased, we had to vertically scale the system to keep up and were approaching AWS instance type limits,” the engineers wrote in the 2022 post.
Worse, the workloads were expected to increase by at least 100% per year, and the sizes of the workflows were expected to grow as well.
From the start, Maestro was designed to be highly-scale and extensible. It was built on a DAG architecture, where each workflow was comprised of a series of steps. And each step can have dependencies, triggers and other conditionals. The business logic of each workflow is run in isolation, guaranteeing SLOs are met. All the services are designed to be stateless so they can be scaled out as needed.
At Amazon Web Services‘ 2023 Re:Invent conference, the Netflix engineering team delved further into detailing Maestro:
VIDEO
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don’t miss an episode. Subscribe to our YouTubechannel to stream all our podcasts, interviews, demos, and more.