Posted in

Datadog 将大规模可观测性直接带到您的手机_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:

Datadog、Observability、On-Call、Phone、Alert

2. 总结:

Datadog 的 On-Call 服务处于测试阶段,能直接将观测数据推送到手机,减少接收和处理警报的时间,提供更多功能,优化了待命管理,旨在提高处理问题的效率,减少警报疲劳。

3. 主要内容:

– Datadog 致力于解决 IT 行业夜间或周末待命的痛点。

– 此前多为页面警报,如今 On-Call 服务可在手机接收呼叫、短信等多种警报。

– 能直接访问 Datadog 后端,在手机上操作和获取大量观测功能。

– 提供了如最小化上下文切换、确保服务和团队归属明确等功能。

– 对比以往工具。

– 之前无法在 Datadog 配置待命体验,现可在手机集中管理。

– 减少了查找解决方案或找对人的时间。

– 实现了问题的升级或交接。

– 有助于减少警报疲劳,优化了待命管理。

思维导图:

文章地址:https://thenewstack.io/datadog-brings-big-observability-directly-to-your-phone/

文章来源:thenewstack.io

作者:B. Cameron Gain

发布时间:2024/7/3 14:35

语言:英文

总字数:807字

预计阅读时间:4分钟

评分:84分

标签:DevOps,可观测性


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

NEW YORK — Having to be on call at night or on the weekends if necessary has been one of the major pain points of IT for decades. Evolving from previously page-only alerts and having to act on those, Datadog is looking at adding more functionality to your telephone. Through its On-Call service available in beta, instead of just getting an alert through a pager service like PagerDuty, you’re getting a call, a text or another alert on your smartphone.

You are then directly able to access the Datadog backend. What this means is there is less time spent receiving the alert and then having to go to your PC or even another application on your phone. This is streamlined so that you’re getting the data you need directly on your phone.

You can take action and have a lot of observability functionality on your phone as well, allowing you to manage incidents as they occur. This integration provides the data you require, making the process more efficient and effective.

On-Call offers everything that a pager solution provides, allowing you to receive alerts on your phone, while it additionally supports functionalities that can be run on Datadog, Daljeet Sandu, a project manager for Datadog said during the keynote at DASH 2024, Datadog’s annual user’s conference here. “Essentially, you can run Datadog from your phone in ways you couldn’t before,” Sandu said.

A blog post, Sandu co-wrote explained that On-Call enables users to:

  • Minimize context switching by consolidating monitoring, paging, and resolution into a single platform.
  • Ensure clear service and team ownership to break down knowledge silos and avoid confusion
  • Implement intuitive scheduling and escalation policies for timely responses.
  • Gain actionable insights from pages with detailed analytics.

If using PagerDuty, on-call schedule rotations could be configured, including paging rotations. However, with Datadog itself, configuring the on-call experience was not possible to date. Instead of using a different tool to manage the on-call experience, it can now be done where all teams, countries, and service catalogs are managed on the phone. Indeed, Datadog wasn’t sending the alerts before. Alerts were sent, but then those alerts would usually be consumed by PagerDuty or another vendor that would then call the phone to notify the user.

Metrics, traces and logs, and dashboarding are pushed directly to the phone. The product also provides context with an evaluation graph. “All the load-bearing charts that people use are now available in the mobile app,” Michael Whetten, vice president of project management, told me during the sidelines of the conference.

Better Sleep

The idea is for the right data to be sent to the right team members at the right time, all done in a centralized way through Datadog without bypassing the paging system. You might think this could contribute to alert fatigue, making you always accessible. However, for those who are “on call,” it helps reduce it. The aim is to improve context so that when you’re on call, you don’t have to look at different data sources on your PC or other devices. Instead, you have everything on your phone.

If you get that dreaded call at 3 a.m., you can quickly determine if you’re the right person to handle the issue. If not, you can alert the proper team who can remediate the problem, allowing you to go back to sleep. This streamlined approach ensures that you’re always connected to operations through Datadog’s observability platform. Ultimately, if configured properly, your time spent searching for solutions or finding the right person when issues occur should be minimized.

“Escalation or handoffs are now possible,” Whetten said. “Escalations to someone else can be done. This is new, so it can be checked right there on the phone if there’s an issue going on.” By looking through the dashboard in the context of Datadog on the phone, it can be determined if the issue isn’t for the current person and handed off to someone else, a team, or escalated to a manager or someone more sophisticated and aware of this kind of issue, Whetten said.

Managing the on-call rotation is important because if someone is on call, usually there’s someone behind them, Whetten said. ”The person on call is not the only one on call. So, it is handed off to the right person,” he noted. “Usually, one would have to get up, get their laptop, acknowledge the page in PagerDuty, and then open up Datadog to figure out who the right person to call is. Here, the aim is to reduce the amount of work.”

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTubechannel to stream all our podcasts, interviews, demos, and more.

GroupCreated with Sketch.