Posted in

环境模式网格:无边车 Istio 能否让应用程序更快?_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:Ambient Mesh、Istio、性能、应用、测试

2. 总结:本文探讨了 Istio 的 Ambient 模式的性能,通过测试发现其在某些情况下能使应用稍快,分析了可能的原因,包括 ztunnel 的优化、TCP_NODELAY 标志启用等。

3. 主要内容:

– Ambient mode 是 Istio 于 2022 年引入的无 sidecar 数据平面,今年 5 月达 Beta 状态

– 测试环境为具有特定配置的三工作节点 Kubernetes 集群

– 初始测试结果:将 Bookinfo 应用加入 Ambient Mesh 后,平均或 P90 延迟影响近乎为零,详情服务在 Ambient 模式下稍快

– 对详情服务的负载测试:与无 Mesh 相比,Ambient 模式平均改进延迟 6 – 11%

– 应用在 Ambient Mesh 有时更快的原因探讨

– 第一个理论:怀疑 ztunnel 的连接池和 HTTP Connect 作用,但该理论存挑战

– 第二个理论:新 Istio 版本中详情服务的性能相关 PR 改进了性能

– 第三个理论:ztunnel 出色的读/写缓冲区管理及 HTTP/2 复用减少了开销,Fortio 服务的系统调用减少,解释了延迟改进和 CPU 减少

思维导图:

文章地址:https://thenewstack.io/ambient-mesh-can-sidecar-less-istio-make-applications-faster/

文章来源:thenewstack.io

作者:Lin Sun

发布时间:2024/8/19 22:54

语言:英文

总字数:2432字

预计阅读时间:10分钟

评分:88分

标签:Istio,环境模式网格,Kubernetes,性能测试,服务网格


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

Ambient mode is the new sidecar-less data plane introduced in Istio in 2022. When ambient mode reached Beta status in May this year, I watched users kick the tires and run load tests to understand the performance implications after adding their applications to the mesh.

Inspired by Quentin Joly’s blog about the incredible performance of Istio in ambient mode and similar feedback from other users in the community that sometimes applications are slightly faster in ambient mode, I decided to validate these results myself.

Test Environment:

I used a three-worker node Kubernetes cluster with 256GB RAM and a 32-core CPU in each node.

Istio uses a few tools to make consistent benchmarking easy. First, we use a load testing tool called Fortio, which runs at a specified number of requests per second (RPS), records a histogram of execution time and calculates percentiles — e.g., P99, the response time where 99% of the requests took less than that number.

We also provide a sample app called Bookinfo, which includes microservices written in Python, Java, Node.js and Ruby.

Each of the Bookinfo deployments has two replicas, which are evenly distributed to the three-worker nodes. Using a pod anti-affinity rule, I made sure that Fortio was placed on a different node than the details service.

Initial Test Result

I installed the Bookinfo application from the Istio v1.22.3 release. Using the Fortio tool to drive load to individual Bookinfo services (for example, details) or the full Bookinfo app, I noticed near-zero latency impact after adding everything to the ambient mesh. Most of the time they are within the range of 0-5% increase for the average or P90. I have noticed consistently that the details service in Istio ambient mode is slightly faster, just like Quentin reported in his blog.

Load Testing the Details Service

I did the same test as Quentin, sending 100 RPS via 10 connections to the details service, and collected results for no mesh and ambient.

No Mesh: 100 RPS to the details service.

Ambient: 100 RPS to the details service.

Just like Quentin, I had to run multiple tests to validate that ambient mode is slightly more performant than no mesh — which is very hard to believe! In the case of the Bookinfo details service, adding ambient mode improved latencies by 6-11% on average — as well as adding mTLS and L4 observability!

Table 1: Fortio to the details service 100 RPS 10 connections.

Why Are Apps Sometimes Faster in the Ambient Mesh?

We’ve been taught that service meshes add latency. Quentin’s results, replicated here, show a case where a workload is faster when running through a service mesh. What is happening?

First Theory

When your applications are in the ambient mesh, the load requests travel first through a lightweight local node proxy called ztunnel, then to the destination ztunnel, and onward to the service. The details service is using HTTP/1.1 with the Webrick library in Ruby and we have seen poor connection management and keep-alive behaviors in older or poorly configured HTTP libraries. My first hypothesis was that when the client and server are on different nodes, proxying through client and server ztunnels could actually be faster if the applications are not using efficient HTTP/2 connections. Ztunnel uses connection pooling and HTTP Connect to establish secure tunnels between nodes to leverage parallelism and HTTP/2 stream multiplexing under loads.

However, this theory has some challenges. Why have I only observed this consistently with the details service but not any other Bookinfo services?

Researching further, I discovered that our Fortio load tool has connection keep-alive enabled by default. With 10 connections from Fortio to the details service and the details service (using the WEBrick Ruby library) respecting the connection keep-alive settings, the connections can be reused effectively without ambient.

Load Testing With Connection Close

Next, I explored running the same load testing with setting the Connection: close header. This forcibly disables any HTTP connection pooling which is a good way to test this hypothesis.

curl v d ‘{“metadata”: {“url”:”http://details:9080/details/0″, “c”:”10″, “qps”: “100”, “n”: “2000”, “async”:”on”, “save”:”on”}}’

“localhost:8081/fortio/rest/run?jsonPath=.metadata” H “Connection: close”

No Mesh: Fortio to the details service 100 RPS 10 connections with connection close.

Ambient: Fortio to the details service 100 RPS 10 connections with connection close.

Table 2: Fortio to the details service 100 RPS 10 connections with connection close.

Compared with Table 1 results, Table 2 numbers have much higher response times, which is expected as each connection is closed immediately after each response from the details service. Given P50, P75, P90 and P99 are all slower from the ambient run with connection close, it seems safe to rule out connection pooling in ztunnel from the first theory could make requests faster.

Second Theory

I noticed there is a performance-related PR from John Howard in the details and productpage services of the Bookinfo application in our new Istio v1.23 release. For the details service, the PR enabled the TCP_NODELAY flag for the details WEBrick server, which would reduce the unnecessary delay (up to 40ms) from the response time of the details service. For the productpage service, the PR enabled keep-alive on incoming requests, which will reuse existing incoming connections and thus improve performance.

With the newly updated details deployment that includes the fix, I repeated the same tests sending 100 RPS via 10 connections to the details service. The results for no mesh and ambient are really close so I ran each of the tests three times to ensure the results are consistent. Below are screenshots of the first run for each scenario:

No Mesh: Fortio to the new details service 100 RPS 10 connections.

Ambient: Fortio to the new details service 100 RPS 10 connections.

I built a table for the three runs for each scenario:

Table 3: Fortio to the new details service 100 RPS 10 connections.

Compared with the previous result from Table 1, the no mesh numbers from Table 3 have improved quite a bit (more substantially at higher percentage than the ambient numbers) and are now closer to the ambient numbers. Ztunnel has TCP_NODELAY enabled by default, which contributed to the ambient performance improvement over no mesh in Table 1 when the old details service doesn’t have TCP_NODELAY enabled. When the new details service has TCP_NODELAY enabled, it has also improved the ambient response times slightly.

Table 3 also shows there is not much difference for average, P50, P75, and P90 between no mesh and ambient runs for this type of load testing to the new details service with TCP_NODELAY enabled. The differences between these runs are likely noise with the exception of P99 where the no mesh is consistently 8% or more slower.

Third Theory

Continue reviewing the test results from Table 3, why would there be similar latency between no mesh and ambient when there are extra hops to ztunnel pods and significant benefits provided by ambient such as mTLS and L4 observability between the Fortio and details service? For the P99 case, why would the details service in the ambient mode be faster consistently?

Ztunnel provides great read/write buffer management with HTTP/2 multiplexing, which could effectively minimize or sometimes even eliminate the overhead added by the extra hops through the client and the server ztunnel pods. I decided to measure this with syscalls using strace from both the Fortio and details service by getting into their Kubernetes worker nodes and attaching the pids using strace while filtering out the irrelevant traces:

strace fp {pid} e trace=write,writev,read,recvfrom,sendto,readv

The strace output from the details service is similar for the no-mesh and ambient cases:

read(9, “GET /details/0 HTTP/1.1\r\nHost: d”…, 8192) = 118

write(9, “HTTP/1.1 200 OK\r\nContent-Type: a”…, 180) = 180

write(9, “{\”id\”:0,\”author\”:\”William Shakes”…, 178) = 178

write(2, “192.168.239.19 – – [13/Aug/2024:”…, 80) = 80

Output 1: No mesh or ambient — attach strace to the details service’s PID.

The strace outputs from the Fortio service for no-mesh vs ambient are different. In the no-mesh case, we see Fortio executed two reads, one for the HTTP headers and another for the body.

read(13, “HTTP/1.1 200 OK\r\nContent-Type: a”…, 4096) = 180

read(13, “{\”id\”:0,\”author\”:\”William Shakes”…, 4096) = 178

write(19, “GET /details/0 HTTP/1.1\r\nHost: d”…, 118) = 118

Output 2: No mesh — attach strace to Fortio’s PID.

In the ambient case we consistently see just one read for both the headers and the body.

read(19, “HTTP/1.1 200 OK\r\nContent-Type: a”…, 4096) = 358

write(19, “GET /details/0 HTTP/1.1\r\nHost: d”…, 118) = 118

Output 3: Ambient mesh — attach strace to Fortio’s PID.

Why would this happen? It makes sense that the write calls are unchanged since they are entirely based on the application behavior which is not changed in this case. Ambient coalesces these multiple application writes and converts them into a single network write and by implication a single read in the peer.

In the test scenario above I observed a 60% reduction in total syscalls by the Fortio service with ambient enabled. This is very substantial and explains the majority of the improvement in latency and ~25% CPU reduction of the Fortio pod at peak time with ambient. The reduction in syscalls is more than offsetting the cost of mTLS and the other features of ztunnel. I expect this pattern to be quite common in enterprises with some HTTP libraries and applications doing a better job of buffering and flushing and some not so much. Often this will correlate with the age of applications and the SDKs they were built on.

No mesh and ambient runs: Fortio to the details service 100 QPS 10 connections.

What About the Entire Bookinfo Application?

With the newly updated details and productpage deployments, I started with sending 1000 RPS via 100 connections to the Bookinfo application, and observed great results for no mesh and ambient.

No Mesh: Fortio to the new Bookinfo app 1000 RPS 100 connections.

No Mesh: Fortio to the new Bookinfo app 1000 RPS 100 connections.

Table 4: Fortio to the new Bookinfo app 1000 RPS 100 connections.

For comparison, I also ran the same test against the old Bookinfo sample shipped in v1.22.3, and you can see that the new Bookinfo made 5-10X improvements on response times, for either no mesh or ambient!

Table 5: Fortio to the old Bookinfo app 1000 RPS 100 connections.

Increased the load to 4000 RPS with 400 connections with the new Bookinfo deployments:

Ambient: Fortio to the new Bookinfo app 4000 RPS 400 connections.

Ambient: Fortio to the new Bookinfo app 4000 RPS 400 connections.

The response times are still very good, way better than the old Bookinfo app with only 1000 RPS and 100 connections (Table 5):

Table 6: Fortio to the new Bookinfo app 4000 RPS 400 connections.

It is really nice to see that Bookinfo handles 4000 RPS without any errors and ambient mode is about 3-4% slower than no mesh with all the benefits of encryption in transit with mTLS and L4 observability. I recall I could only reach up to 1200 RPS with the old Bookinfo app, which already resulted in a small percentage of errors. Now I can increase loads to 4000 or higher RPS without errors.

Wrapping Up:

Ambient mode at L4 introduces only a very tiny impact — and occasionally even an automatic improvement! — to users’ application latencies. Combined with the simple UX by labeling the namespace to enroll your application to ambient without restarting any workloads, it provides a delightful experience to users that we intended when we initially named it ambient.

I would like to thank all of our Istio maintainers who built such a delightful project and CNCF for providing the Istio project access to the infrastructure lab where I performed the test. I would also like to thank Quentin Joly and many users who provided me with the “ambient is slighter faster than no mesh sometimes” feedback which triggered me to run the above benchmark tests to experience the improvement or tiny latency impact under load for myself.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTubechannel to stream all our podcasts, interviews, demos, and more.

GroupCreated with Sketch.