包阅导读总结
1. 关键词:Datastream、Stream Recovery、Data Loss、Downtime、Log Position
2. 总结:文本介绍了 Datastream 的流恢复功能,包括在金融机构和在线零售商等场景中的应用,阐述了其带来的减少数据丢失、减少停机时间、简化恢复等好处,还说明了使用方式和在特定位置启动流的情况,目前该功能在谷歌云所有区域可用。
3. 主要内容:
– Datastream 流恢复场景
– 金融机构因硬件故障数据库实例故障切换,流恢复确保交易数据不丢失。
– 在线零售商网络中断,恢复后部分变化不可用,流恢复能尽快恢复复制。
– 流恢复的好处
– 减少数据丢失,应对多种事故导致的数据丢失。
– 减少停机时间,快速恢复流和继续摄取。
– 简化恢复,通过简单界面操作。
– 如何使用流恢复
– MySQL 和 Oracle 有多种选择,如重试、跳过当前位置等,还可指定特定日志位置。
– PostgreSQL 可创建新复制槽恢复。
– 从指定位置启动流
– 如数据库升级、迁移或结合特定时间的历史数据时,可通过流恢复 API 指定起始位置。
– 功能可用性
– 谷歌云所有区域的控制台和 API 中均可用。
思维导图:
文章地址:https://cloud.google.com/blog/products/data-analytics/datastream-gets-stream-recovery/
文章来源:cloud.google.com
作者:Etai Margolin,Sagi Yosefia
发布时间:2024/7/1 0:00
语言:英文
总字数:550字
预计阅读时间:3分钟
评分:83分
标签:数据分析,流式传输,Datastream,流恢复,数据库故障转移
以下为原文内容
本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com
Consider a financial institution that uses Datastream to replicate transaction data from their operational database to BigQuery for analytics. Due to a hardware failure, the primary database instance undergoes an unplanned failover to a replica. The replication pipeline in Datastream is broken because the original source is now unavailable. Stream recovery allows the replication to resume from the failover database instance, ensuring no transaction data is lost.
Or think of an online retailer that uses Datastream to replicate customer feedback to BigQuery for sentiment analysis using BigQuery ML. A prolonged network outage disrupts the connection to the source database. By the time network connectivity is restored, some of the changes are no longer available on the database server. In this case, stream recovery allows the user to quickly resume the replication from the first available log position. While some feedback may be lost, the retailer prioritizes having the most recent data for ongoing sentiment analysis and trend identification.
Benefits of stream recovery
Stream recovery offers a number of benefits, including:
-
Reduced data loss: Recover from data loss caused by database instance failovers, unintended log file deletion, and other incidents.
-
Reduced downtime: Minimize downtime by quickly recovering your stream and resuming ongoing CDC ingestion.
-
Simplified recovery: Easily recover your stream via a simple and intuitive interface.
How to use stream recovery
Stream recovery provides a few options for you to choose from, depending on the specific failure scenario and the availability of recent log files. For MySQL and Oracle, you can choose to retry from the current log position, skip the current position and stream from the next available position, or skip the current position and stream from the most recent position. You also have the option to provide a specific log position, e.g., the Log Sequence Number (LSN) or Change Sequence Number (CSN), for the stream to resume from, which gives you fine-grained control to ensure that no data is lost or duplicated in the destination.
For PostgreSQL sources, you can create a new replication slot in your PostgreSQL database and instruct Datastream to resume streaming from the new replication slot.
Starting a stream from a specified position
In addition to stream recovery, there are a number of scenarios where you may need to start or resume a stream from a specific log position. For example, if the source database is upgraded or migrated, or when historical data already exists in the destination and you’d like to combine CDC from a specific point in time (where the historical data ends). The stream recovery API can be used in these cases to specify a starting position before starting the stream.
Get started
Stream recovery is now generally available in the Google Cloud console and API for all available Datastream sources in all Google Cloud regions.
To learn more about stream recovery, please visit the Datastream documentation.