Posted in

Canva 选择 KDS 而不是 SNS+SQS,日均 250 亿事件,成本降低 85%_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:Canva、Amazon KDS、Product Analytics、Cost Savings、Data Pipeline

2. 总结:Canva 为其产品分析平台评估数据处理方案,最终因成本低等优势选择 Amazon KDS 替代 SNS+SQS,使用过程中采取措施优化成本和解决问题,还利用工具保障数据质量。

3. 主要内容:

– Canva 为产品分析平台评估多种数据处理方案,包括 AWS SNS 和 SQS、MKS 及 Amazon KDS。

– 早期使用 SQS 和 SNS 的组合,虽易设置且具弹性和扩展性,但成本占运行架构的 80%。

– 对比后选择 Amazon KDS,因其成本低、维护少,虽有较高延迟但可接受。

– 为提高成本效益,采用事件批处理和 zstd 压缩,每年节省 60 万美元。

– 解决 KDS 高尾延迟和限流问题,采用 SQS 队列作为备用逻辑,实现 p99 延迟低于 20ms。

– 用 Protocol Buffers 保障架构,Datumgen 工具验证兼容性和生成代码,保障数据质量。

思维导图:

文章地址:https://www.infoq.com/news/2024/08/canva-amazon-kinesis-data-stream/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

文章来源:infoq.com

作者:Rafal Gancarz

发布时间:2024/8/7 0:00

语言:英文

总字数:499字

预计阅读时间:2分钟

评分:89分

标签:数据工程,AWS 服务,成本优化,事件驱动架构,产品分析


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.

Canva processes around 25 billion product analytics events per day to power many user-facing features, such as personalization and recommendations, usage statistics and insights. The data captured is also key to supporting A/B testing for any new product features.

The data pipeline that collects and distributes product analytics events needs support not only very high throughput, but also high availability (99.999% uptime), and be cost-effective, reliable and user-friendly. The team responsible for delivering the event-driven architecture (EDA) for the product analytics used the combination of AWS SQS and SNS in the early stages of the MVP. These services were easy to set up and provided excellent resiliency and scalability, but their cost accounted for 80% of running the architecture.

Product Analytics Data Pipeline Using Amazon KDS (Source: Canva Engineering Blog)

Based on the initial MVP experience, the team decided to look for alternatives that would meet performance requirements at lower costs and considered two other AWS services: Amazon Managed Streaming for Apache Kafka (MSK) and Amazon Kinesis Data Stream (KDS). Engineers compared cost, performance and maintenance between these services and opted for KDS for its low cost (85% cheaper than SQS+SNS) and extremely low maintenance, despite higher latency compared to MSK (10-20ms higher but acceptable).

To improve cost-effectiveness of KDS-based solution, the team used batching of events and zstd compression, with 10x compression ratio and 100ms per batch compression latency. Engineers estimate that the use of compression resulted in $600k annual savings.

One area that required special attention while using KDS was high tail latency (over 500ms) and throttling when throughput spikes would go over 1MB/s hard limit threshold per shard. Engineers implemented a fallback logic that utilized SQS queue and, as a result, achieved p99 latency below 20ms, while paying less than $100 per month for SQS. The fallback optionadditionally doubles down as a failover mechanism in case KDS would experience severe service degradation or an outage.

Fallback to SQS in Case of KDS Thottling (Source: Canva Engineering Blog)

The team used Protocol Buffers to ensure the architecture could describe and evolve event definitions over time. Canva has been using Protocol Buffers to define contracts between microservices already, but for event definitions, it additionally required full backward and forward compatibility. Engineers also created a home-grown code generation tool on top of protoc.

Datumgen is used to verify compatibility requirements and generate code in multiple languages. Furthermore, the tool extracts metadata from event definitions to enhance the event catalog data with details about technical and business owners, as well as field descriptions. Well-documented and up-to-date event schemas help Canva maintain data quality, avoid costly issues with schema incompatibility at runtime, and empower engineers to discover available product analytics events.