Posted in

排查 Google Cloud 负载均衡后端问题 | Google Cloud 博客_AI阅读总结 — 包阅AI

包阅导读总结

1.

关键词:Google Cloud、Load Balancing、Health Check、Backend VM、Troubleshooting

2.

总结:本文主要探讨了 Google Cloud 负载均衡后端出现问题时的排查方法,包括可能的根本原因、后端 VM 配置的检查、服务 IP 和端口绑定的验证、本地路由的核实、流量捕获与分析以及相关文档查阅和寻求云支持等内容。

3.

主要内容:

– 问题现象

– 收到“Connection refused”的探针结果文本,表明健康检查探针 SYN 包到达后端 VM 但连接被重置/拒绝。

– 可能的根本原因

– 健康检查配置中指定的端口在目标机器未打开。

– 端口打开但待处理连接的积压已满。

– 目标机器的本地防火墙阻止访问。

– 后端 VM 配置

– 对于代理负载均衡器,验证后端机器上指定端口是否开放。

– 对于直通负载均衡器,进行服务 IP 和端口绑定的测试。

– 核实本地路由,必要时手动添加。

– 进行流量捕获和分析。

– 其他

– 阅读手册获取更全面理解。

– 必要时向云支持寻求帮助。

思维导图:

文章地址:https://cloud.google.com/blog/products/networking/troubleshooting-google-cloud-load-balancing-backends/

文章来源:cloud.google.com

作者:Disha Madaan

发布时间:2024/6/20 0:00

语言:英文

总字数:2282字

预计阅读时间:10分钟

评分:86分

标签:网络,开发者与实践者


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

We received probeResultText as “Connection refused” which indicates that the health check probe SYN packet arrived at the backend VM but the connection was reset/refused by the backend instead of sending a SYN-ACK.

Possible root causes:

  • The port specified in the Health check configuration (port 80 in this example) is not open on the destination machine.

  • The port is open on the destination machine, but its backlog of pending connections is full.

  • The local firewall at the destination machine is blocking access.

More details about other health checks:

Success criteria for gRPC

Success criteria for legacy health checks

3. Backend VM configuration

If health check logs indicate that the backend is not responding to the HC probes, refer to this section to verify if the backend VM is configured correctly to handle the health check probes.

For proxy load balancers:

The following checks apply to Application Load Balancers:

  • Verify that the port specified in the Health check configuration (port 80 in this example) is open on the backend machine.

  • For linux machines, use command netstat -tulpn | grep <port-number> , replace <port-number> with the port specified in health check configuration.

  • For HTTP health checks, verify that the application running on the backend machine is responding on the configured HC path and port.

For passthrough load balancers

The following checks apply to passthrough Network Load Balancers. Packets sent to a passthrough Network Load Balancer will arrive at backend VMs with the destination IP of the load balancer itself. This type of load balancer is not a proxy, and this is expected behavior.

Verifying service IP and port binding

The software running on the backend VM run must be:

To test this empirically, connect (via SSH or RDP) to a backend VM, then perform the following tests (using curl, telnet, or similar):

  • Attempt to reach the service by contacting it using the internal IP address of the backend VM itself, 127.0.0.1, or localhost.

  • Attempt to reach the service by contacting it using the IP address of the Load Balancer’s forwarding rule.

If a user cannot reach their service using the internal IP address of the backend VM itself, 127.0.0.1, or localhost, then the problem is with the service’s software running on the backend VM. If user can reach their service using the internal IP address of the backend VM itself, 127.0.0.1, or localhost, but not the IP address of the Load Balancer’s forwarding rule, then their software isn’t properly listening (bound) to the Load Balancer’s IP address.

You can verify that the software running on their backend VM is properly bound by inspecting the output of this command on a Linux system: netstat -tulpn

If the software is bound to the wrong port or is bound to just the IP address of the backend VM, the customer must reconfigure their software.

Verify local route

Each backend VM must be configured to accept packets “sent to the load balancer” — that is, the destination of packets delivered is the IP address of the load balancer. Under most circumstances, this is implemented with a local route.

For VMs created from Google Cloud images, the Guest agent (formerly, the Windows Guest Environment or Linux Guest Environment) installs the local route for the Load Balancer’s IP address. GKE nodes based on COS implement this in iptables instead. If you are using a custom VM image, the local route wouldn’t be implemented by default.

On a Linux backend VM, you can verify the presence of the local route by running the following command, replacing LOAD_BALANCER_IP with the load balancer’s IP:

sudo ip route list table local | grep [LOAD_BALANCER_IP]

If the local route is missing, you can manually add the local route to the VMs:

sudo ip route add to local [LOAD_BALANCER_IP] dev [INTERFACE] proto 66

Replace the LOAD_BALANCER_IP with the Load Balancer’s IP and [INTERFACE] with the interface where the route is missing.

If you are running a Google provided image, check why the Guest Agent was unable to install the local route. Check if the Guest agent is running. If not, install the Guest Agent.

There is a known issue that the registered local routes will be deleted if the NIC got reset by systemd-networkd restart (systemd based system including Ubuntu). It could happen if the instance is configured as auto-update enabled (apt upgrade) and you still use the old version of the Guest Agent, and any update requires restart of systemd-networkd. You should update the Guest Agent to version 20210408.00 or newer to prevent this issue. Detailed information: Github Issue and Github PR

Perform traffic capture and analysis:

You can use packet capture to pinpoint the communication between the health check probes and the backend VM. Packet capture can be done with tcpdump as follows:

  1. Install tcpdump on the backend VM.

  2. Start tcpdump capture.

  3. Analyze the tcpdump output to identify the problem.

4. Read the manual

The troubleshooting steps in above sections can help to identify the reasons for health check failures but to gain a comprehensive understanding of health checks: their purpose, mechanisms, operational dynamics and about the limitations and supported configurations, you should always consult the Google Cloud documentation:

Working with Cloud Support: Once that you have pinpointed the issue and you have analyzed the problem, you may need to reach out to Cloud Support for further assistance. To facilitate a smooth experience, be sure to explain your needs, clearly describe the business impact and give enough context with all the information collected.