Posted in

如何在 AWS 的事件驱动架构中使用生存时间_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:

– Time To Live

– Distributed Systems

– AWS

– Event-Driven Architecture

– Data Expiration

2. 总结:

本文介绍了在分布式系统(特别是 AWS 环境)中的 Time To Live(TTL)概念,包括其定义、适用与不适用场景,在消息队列、对象存储系统、数据库和事件驱动架构中的应用,TTL 可帮助管理数据存储和过期,与事件驱动架构结合能实现数据转换。

3. 主要内容:

– 什么是分布式系统中的 Time To Live(TTL)

– TTL 是数据在分布式系统或其组件中保持相关或存储的时间量

– 影响系统设计、成本和扩展

– TTL 的适用场景

– 缓存数据

– 分析数据

– 索引数据

– 短内容社交媒体应用

– TTL 的不适用场景

– 流媒体平台存储的媒体

– 银行交易数据

– 在 AWS 中的应用

– 在消息队列(AWS SQS)中

– 可设置自定义 TTL,默认保留 4 天,最大 14 天

– 注意与可见性超时的区别

– 在对象存储系统(AWS S3)中

– 可通过生命周期规则设置

– 在数据库(AWS DynamoDB)中

– 可对某些数据设置,通过特定属性和时间戳自动删除

– 在事件驱动架构中

– 数据过期可触发事件,如社交媒体应用中消息过期的处理

– 总结

– TTL 的定义和作用

– 适用与不适用场景

– 在多种分布式系统中的应用及与事件驱动架构的结合

思维导图:

文章地址:https://www.freecodecamp.org/news/how-to-use-time-to-live-in-event-driven-architecture/

文章来源:freecodecamp.org

作者:Anant Chowdhary

发布时间:2024/7/16 16:53

语言:英文

总字数:1632字

预计阅读时间:7分钟

评分:88分

标签:分布式系统,软件架构,AWS


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

Distributed systems generally involve the storage and exchange of huge amounts of data. Not all data is created the same, and some of it can even expire – by design.

As the Buddha said, “All conditioned things are impermanent.”

In this article, we’ll look at how the concept of time to live can help us with this type of data and when it makes sense to use it.

Table of Contents

  1. What is Time to Live (TTL) in Distributed Systems?
  2. How to use TTL in Message Queues (AWS SQS)
  3. How to use TTL in Object Storage Systems (AWS S3)
  4. How to use TTL in Databases (AWS DynamoDB)
  5. How to use TTL in Event Based Architecture
  6. Summary

What is Time to Live (TTL) in Distributed Systems?

TTL, as the name suggests, is the amount of time a piece of data stays relevant or stays stored in a distributed system or a component of a distributed system. A TTL may be set on any piece of data that isn’t needed indefinitely.

Knowing when and when not to use a TTL can sometimes be tricky. It can also affect the way a system is designed, cost and scaling considerations. In the following sections, we learn about when and when not to use TTL.

Where does TTL make sense?

As mentioned above, it makes sense to use TTL for any piece of data that is ephemeral. Some common examples of use cases where you can set a TTL on data are:

  1. Cached data: Cached data is pretty much omnipresent in distributed systems. For instance, a very popular social media post’s resources (image, video, audio) may be cached on a CDN (Content Delivery Network)’s servers. You don’t want this data to live forever on the server, so in some cases it may make sense to add a TTL to this data, so that it is automatically removed after a certain period of time.
  2. Analytics Data: Most if not all large scale systems store some form of metrics that help analyze things like latency, system health, and product metrics amongst others. In a large number of cases, you wouldn’t want these metrics to be stored in systems forever. Only recent data (say 60 days or 180 days) may be useful in most cases. A TTL on data in this case makes sense, especially if you have constraints on memory.
  3. Indexed data: Search is a feature that’s ubiquitous across products. Be it social media apps, e-mail or search engines – indexed data is vital to blazing fast searches. Indexed data, however, can become stale after a while, so it makes sense for the index to expire after some time. Hence, a TTL here can be useful.
  4. Social media apps with short lived content: Social media apps with short lived content are extremely popular and images/videos posted are often short lived. In case these images do not need to be stored for posterity, they can benefit from a TTL being set on them. In addition to being memory efficient, it also aids privacy.

Where does TTL not make sense?

In the above section we looked at a few cases where TTL makes sense. What about cases where TTL isn’t common and isn’t useful? Let’s look at some examples:

  1. Media stored for streaming platforms: Streaming platforms often use cloud storage solutions such as Amazon AWS S3 to store objects that correspond to the media they stream to customers. These forms of media are generally not ephemeral and are expected to stay on platforms for years if not decades. Since such data isn’t expected to expire anytime, TTL does not make sense here.
  2. Bank transactions: Bank transactions produce some of the most sensitive data that are stored in cloud-based and distributed systems. For audit and book-keeping purposes, these pieces of data are generally stored for decades. So, since this data seemingly never expires, there’s generally no use for a TTL here. This isn’t to say that this form of data can’t be moved from fast access databases/caches to slower and cheaper forms of data storage, though.

How to Use TTL in Message Queues (AWS SQS)

AWS SQS is a distributed message queuing solution that is the backbone of many versatile distributed systems across the world. Message queues can process billions of messages and are used almost universally across distributed systems around the world.

In this section, we’ll look at how TTLs can be useful while we consider design options with respect to message queues.

What happens if a message queue’s consumers have been backed up for several days, or messages simply haven’t been consumed for a while? We have the option of setting a custom Time To Live on SQS messages.

By default, the retention period is 4 days. The maximum TTL at the time of writing is 14 days. So it’s important to be aware of constraints such as these while using AWS SQS to design systems.

Note that with AWS SQS, a retention period is a set on the queue itself, and not individually for each message.

Boto is an AWS SDK for Python that enables developers to create, configure, and manage AWS services and resources programmatically. Boto is widely used for prototyping, production systems, and in general offers a user-friendly interface for accessing services like S3, EC2, and DynamoDB.

Here’s a code snippet using Boto that will help you set the MessageRetentionPeriod attribute which is the formal name for TTL in this context.

sqs = boto3.client('sqs', aws_access_key_id=your_aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name='your_region')retention_period_seconds = 86400  response = sqs.set_queue_attributes(    QueueUrl=your_queue_url,    Attributes={        'MessageRetentionPeriod': str(retention_period_seconds)    })

Visibility Timeout in Message Queues (AWS SQS)

Note that while it’s tempting to think of Visibility Timeout in SQS as Time To Live, these aren’t the same. Time To Live or Retention Period is different from Visibility Timeout.

Visibility timeout instead refers to a generally shorter period of time by which a message should be processed (once picked up by a consumer). If not, it is back in the SQS queue and visible to consumers again, with its receive count having been increased by one.

How to Use TTL in Object Storage Systems (AWS S3)

The all-versatile AWS S3, which is an object storage solution, gives users the ability to set a Time To Live on objects stored in S3 buckets.

S3 is extremely flexible with the way TTLs are set on objects / buckets. You can set Lifecycle rules to specify what objects or what versions of an object you’d like to remove.

Managing your storage lifecycle is a great read on the AWS Documentation website.

How to Use TTL in Databases (AWS DynamoDB)

Some types of data in databases are prime candidates to have a TTL set on them. Pieces of data such as logs and analytics data may become stale very fast, and/or they may lose utility with time.

TTL in DynamoDB provides a cost-effective approach that lets you automatically remove items that are no longer relevant. It is supported natively and can be set on the whole DynamoDB table.

Here’s a code snippet that lets you set the TTL on a DynamoDB table (again, using Boto):

ddb_client = boto3.client('dynamodb')ttl_response = ddb_client.update_time_to_live(    TableName=your_table_name,    TimeToLiveSpecification={        'Enabled': True,        'AttributeName': your_ttl_attribute_name    })if ttl_response['ResponseMetadata']['HTTPStatusCode'] == 200:    print("Time To Live (TTL) has been successfully enabled.")else:    print(f"Failed to enable Time To Live (TTL)")

Here, the your_ttl_attribute_name attribute is the attribute that DynamoDB looks at to determine whether or not the item is to be deleted. The attribute is generally set to some timestamp in the future. When that timestamp is reached, DynamoDB removes the item from the table.

How to Use TTL in Event-Based Architecture

So far we’ve discussed Time To Live and where it can be useful. What about its implications? Lots of cloud based solutions provide notifications that can indicate that a piece of data has indeed reached it’s expiration, and allow you to take actions based on the expiration of that data.

Let’s look at a common use case. Suppose you have a social media app that you’re building that lets users send each other ephemeral messages. Now while the contents of these messages themselves are ephemeral, you may still want to retain a log of what users a particular user exchanged messages with, even though the contents of the message (audio/video/image) may have expired.

The diagram below explains a possible architecture in a little more detail:

ImageSocial Media App Architecture Example

Suppose a user exchanges messages with another user. An entry corresponding to a message is stored in ActiveMessageDB which, for the purpose of simplicity, we’ll suppose is a NoSQL database that stores messages.

If the app here allows for expiring messages, you could set a TTL on the entry. While the message entry itself is deleted after the TTL is reached, an event can be fired off to let a system know that the message is being deleted.

In the above diagram, the event is picked up by an AWS Lambda instance and a much smaller amount of data is written to another database MessageLogDB which isn’t as frequently accessed as ActiveMessageDB. What we just saw is an instance of event-based architecture being coupled with TTL.

Summary

  1. TTL is the amount of time a piece of data stays relevant or stays stored in a distributed system or a component of a distributed system.
  2. TTL makes sense in use cases where data can be deleted, can expire, or its form can change after a certain period of time.
  3. TTL is popular and generally easy to set on many distributed systems offerings.
  4. TTL can be paired with event driven architecture to transform data.