The Transmission Control Protocol (TCP) congestion control algorithm (CCA) governs how much data should be sent between clients and servers to maximize the utilization of available bandwidth and avoid congestion. Since its inception other CCAs have been developed since its inception, such as Bottleneck Bandwidth, Round-trip propagation time (TCP BBR), and CUBIC. While TCP BBR and CUBIC aim to achieve congestion avoidance, understanding their effectiveness has been an ongoing mission for our engineering and research teams.
TCP BBR aims to achieve higher throughput by using packet delay as an indicator instead of packet loss. However, our previous research reported that BBR performs poorly in all cases. Specifically, our evaluation concluded there was little to no benefit in the throughput for small files (<100KB). Moreover, we observed BBR performance for flows with low round-trip time (RTT) and low retransmits was worse than CUBIC. Finally, the BBR benefits were only seen for client-facing traffic, while back-office connections (low RTT and negligible retransmits) performed better with CUBIC.
Edgecast, now Edgio, is a global multi-tenant CDN delivering web traffic for many large (VOD and live) video streaming customers. Given that congestion control tunings using BBR for one customer can adversely affect another customer’s performance, and a blanket enablement might result in performance degradation in some scenarios, we implemented a mechanism to detect cases where BBR provides improved performance and can dynamically enable it for all CDN customers. The result is a new, dynamic congestion control tuning feature that we’ve made available to all our customers.
Insights into the methodology
Perhaps the most important input to such a dynamic system is the data that powers it. Our dynamic congestion control tuning mechanism sits on top of our large-scale socket data collection, which exports TCP (xTCP) socket performance data from all the edge caches. Specifically, it extracts information from the Linux Kernel’s tcp_info structure via NetLink and streams it via Kafka into a ClickHouse cluster. Having this socket performance data at scale allows us to monitor the performance of the connections to our cache servers at very high granularity. xTCP has proven to be a powerful tool for many CDN optimizations. For example, we recently turned our IPv6 initial congestion window and monitored gains performance using xTCP.
xTCP is similar to work done by Google Measurement lab’s (M-Lab) tcp-info tool with significant differences coming from optimizations needed to manage the large number of sockets seen by our edge caches (compared to M-Lab servers) and the ability to export the data in protobuf format. Stay tuned, we plan to open source xTCP soon.
In the following figure, we show the overview of our system. xTCP data is collected at scale from all our edge caches streamed into Kafka. xTCP data is then collected in a ClickHouse cluster, which powers our network data analytics, including the BBR controller, which detects the underperforming prefixes at each edge PoP.
Figure 1. Overview of the data collection using a xTCP and BBR configuration push.
While we want to maintain the dynamic nature of our workflow, we also need to select consistently underperforming prefixes at each edge point of presence (PoP) to avoid flip-flopping between CUBIC and BBR over short durations. And, as previously noted, we selectively activate BBR for requests where the file size is greater than 100KB. A fine-tuned CUBIC flow performs better for small files.
The BBR controller uses two metrics to assess the health of every observed client prefix:
- Duty Cycle: How long was a prefix (/24 or /56) in the bottom 20th percentile performance group?
- Flap Rate: How often does the prefix appear and disappear from the bottom 20th percentile performance group, i.e., change of state?
The algorithm then consistently detects worse-performing prefixes over the past few hours. This detection runs every 5 minutes. While the total number of prefixes selected per edge PoP could be in the hundreds, we observed that prefix performance remains relatively consistent. The same prefixes are regularly selected, and new additions in each round (as shown in the following figure from the Chicago PoP) are very few.
Figure 2. The number of new prefixes selected per config generation round is low.
If any, new prefixes are selected to enable BBR, and a configuration is generated, passed through a validation step and pushed out to our edge caches globally.
Performance gains
We are happy to report that enabling BBR across our edge worldwide has shown considerable performance improvements. A key metric we track from the xTCP socket data is the delivery rate reported in TCP_INFO. Since we dynamically enable BBR for the most underperforming prefixes, we expect our lower percentile (worst case) delivery rate to improve.
The following figure shows the improvement in the 5th and 10th percentile delivery rate at our Los Angeles PoP as soon as the BBR change was enabled.
Figure 3. Improvements were observed in the 5th and 10th percentile delivery rates following BBR change.
Similarly, in the following figure, we show considerable improvement (~2x) in the lower percentile delivery rate for a large residential ISP in the U.S. as soon as we dynamically enabled BBR at all of our North American PoPs.
Figure 4. Improvements were observed for a large residential ISP after dynamically enabling BBR.
The delivery rate extracted from TCP-info provides a good estimate of the performance seen by the client. However, the most accurate performance indicator is the throughput seen in the HTTP access logs for the client connection, i.e., goodput.
We measure the goodput from an edge cache server. As shown in the following figure, the change resulted in increased goodput. Overall, the 10th percentile goodput increased by 12%.
Figure 5. The 10th percentile goodput increased by 12%.
Special thanks to the BBR development team at Google for their amazing work on BBRv1 and their continued effort on BBRv2. We look forward to BBRv2 and will continue to push relevant changes to our platform shortly. Kudos to Sergio Ruiz, Joseph Korkames, Joe Lahoud, Juan Bran, Daniel Lockhart, Ben Lovett, Colin Rasor, Mohnish Lad, Muhammad Bashir, Zach Jones, and Dave Andrews at Edgecast for supporting this change during development, testing and roll out. The Edgio engineering team would especially like to thank Dave Seddon for contributing to developing the xTCP tool that powered much of the analysis.
With dynamic congestion control tuning, Edgio customers now automatically gain performance improvements for their underperforming clients and improve their bottom line performance resulting in faster web delivery and a reduction in rebuffers for video streaming.