Posted in

Bigtable 分布式计数_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:Bigtable、计数、工作流、使用案例、优势

2. 总结:文本介绍了 Bigtable 在分布式计数方面的应用,包括其能优化分析工作流,列举了多种使用案例及带来的好处,还提到构建计数器的方法。

3. 主要内容:

– Bigtable 分布式计数

– 提供常见计数器示例,如跟踪社交平台的喜欢、不喜欢和浏览量

– 优化分析工作流

– 对比过去处理数据的方式,凸显 Bigtable 能即时更新计数的优势

– 示例使用案例

– 涵盖社交、Web 分析、AdTech、IoT 等多个领域

– 计数的优势

– 高效数据聚合

– 巨大吞吐量

– 全球一致性

– 构建首个计数器

– 以跟踪社交媒体帖子独特交互者为例,介绍创建聚合列族等操作

思维导图:

文章地址:https://cloud.google.com/blog/products/databases/distributed-counting-with-bigtable/

文章来源:cloud.google.com

作者:Bora Beran,Steve Niemitz

发布时间:2024/8/20 0:00

语言:英文

总字数:1300字

预计阅读时间:6分钟

评分:89分

标签:Bigtable,分布式计数器,数据聚合,实时分析,Google Cloud


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

Image provides an example of a counter that you have probably seen before that is tracking engagement as likes, dislikes, and views over time.

Let’s now take a look at how Bigtable can streamline your analytics workflows.

Example use cases

Imagine you’re running a social media app and need to track billions of interactions with content every day throughout the world, such as the total number of impressions, views, and clicks. This use case is not much different than tracking interactions with ads in marketing, product listings in retail or page views on any website. In the past, you had two main options. Either you could write each activity to the database and then run multiple queries to calculate the totals; that can be expensive, result in delayed metrics, and negatively impact serving performance during processing, even on specialized columnar systems. Or you could turn to a streaming framework, which, while capable of handling real-time data, has a complicated architecture with management overhead and significant additional costs.

With Bigtable, you can simply increment a counter for each activity as it occurs, which makes the total count available within Bigtable instantly. You can also decrement, even delete or reset counts, and the numbers will synchronize. The results can be read from a dashboard or application with single-digit millisecond latency, providing your global workforce, partners, advertisers, and customers with up-to-date key engagement metrics for various campaigns and marketing initiatives.

While seemingly simple, counting can be quite powerful. Consider these additional use cases:

  • Web analytics: Measure Unique visitors, daily/weekly/monthly active users.

  • AdTech: Track impressions, engagements, clicks for ads served.

  • IoT & telemetry: Aggregate sensor data, track device usage, or identify trends and anomalies.

  • Social media: Measure user interactions, analyze content popularity, or track trending topics.

  • Gaming: Monitor player activity, track achievements, or analyze game performance.

  • E-commerce: Track sales, inventory levels, or customer engagement metrics in real time.

Benefits

  • Efficient data aggregation: Automatically generate metrics in Bigtable on any table. Merge values (+/-) or delete/reset counters. Metrics can be stored across time windows and retrieved with low-latency lookups.

  • Massive throughput: Build metrics on streaming data with tens of thousands of mutations per second per Bigtable node even on a single metric.

  • Global consistency: Metrics can be consolidated throughout the globe and any inconsistencies are automatically resolved and won’t diverge (i.e. they’re eventually consistent).

Building your first counter

Building a counter in Bigtable is straightforward. For this walkthrough, let’s build on our previous example that is tracking billions of daily user interactions throughout the world. Let’s say we now want to track the unique individuals that are engaging with each post.

First, create an aggregate column family, specifying the desired aggregation function. This may be a SUM, MIN, MAX, or, for our example, the HyperLogLog (HLL) data type, which lets us approximately count by unique values. Bigtable relies on HLL++ sketch for this approximation.

In the code snippets below, we assume we have an existing table called posts where each row is a social media post that’s already storing the text copy, author, etc. We will add a counter directly to this table.

We will use Bigtable’s command line tool (cbt) to add a column family called count that has no garbage collection policy (never) and is of HLL type, that will accept integers as inputs (inthll).