Video Quality of Experience and Rebuffers
In the early days of video streaming, viewers were willing to endure a frustrating playback experience to gain access to exclusive content. As the number of content providers sharing their content among multiple distributors has grown, Quality of Experience (QoE) has become vital to viewer retention.
Quality of Experience refers to the overall experience of a user watching a video stream. Unlike Quality of Service (QoS), QoE is a more subjective matter, thus difficult to measure or guarantee a certain level. QoE comprises many key performance indicators (KPIs) that video services track to gain clarity of their platform’s performance. These quality metrics can be broken down into specific areas of concern, such as rebuffering or extensive bitrate fluctuation.
Of the various metrics, rebuffering is the viewers’ most noticeable and annoying fault. That little spinning wheel is the symbol of a bad viewer experience. Video industry research consistently shows that viewers abandon a stream when they experience rebuffering. The blame for rebuffering and a degraded QoE can be difficult to pinpoint. It could stem from sources across the viewer’s Internet Service Provider (ISP), the content delivery network (CDN), the client’s browser/player app, or the original publisher’s video infrastructure.
While problems with the ISP or the publisher are largely out of our control, we can now capture actionable data that enables us to identify and resolve QoE issues stemming from the CDN. To do this, we’ve developed an algorithm we call “Estimate Rebuffer” to identify video QoE issues using web server logs. This real-time monitoring system uses granular data to identify QoE issues and drill down to understand root causes and corresponding resolution actions. In this post, we’ll look at how this algorithm determines QoE problems and how we can use it to improve QoE.
The Estimate Rebuffer Algorithm Overview
One way to track QoE is for the player to send QoE data to the CDN. This requires players and clients to adopt a software development kit (SDK). And given the broad diversity in playback devices, client-side QoE metrics are nearly impossible to capture consistently. The Estimate Rebuffer algorithm mitigates the need for player/client changes or SDK adoption. It’s an estimate because it doesn’t need information sent via beacons from the client side. However, given its breadth across the data center and delivery networks, it provides much sharper insight into the root cause of QoE issues compared to the client side alone.
The Estimate Rebuffer tool identifies QoE issues using server-side client access logs from the video services on our platform. To evaluate QoE, it uses three pieces of information:
A timestamp of when a client requested an asset/video-stream-chunk
The filename of the asset/video-stream-chunk
A session or client identifier
From this information, without the need for third-party tools, the Estimate Rebuffer algorithm can determine key elements that influence QoE, including the following:
Rebuffering — The algorithm details the number of rebuffers a client has seen, the duration of the rebuffer events, and the ratio of the rebuffering to the time spent watching the video stream.
Average bitrate — The video quality is a function of the video bitrate. A higher average bitrate means better video quality, clearer and crisper pictures, richer colors, and a better experience.
Rate of fluctuation — Viewers tend to respond negatively to fluctuations in bitrate, preferring a constant bitrate. This metric determines the number of times the video stream changes its quality.
Quality distribution — This allows us to determine what fraction of the video was served at what quality to a given client. For example, 80% was served at high quality, 10% medium, 10% low.
How it works
How can the Estimate Rebuffer algorithm provide such a useful evaluation of QoE with just two pieces of information? Let’s take a look.
An adaptive bitrate (ABR) video stream comprises many individual video chunks or assets. Each chunk is of a fixed size, typically 4 seconds. For instance, a 40-second ABR video stream has 10 chunks (40/4 = 10 chunks).
Each chunk is named sequentially, for example, A1.ts, A2.ts, A3.ts…A10.ts, and so on. The first letter is the quality type. In our case: A is the lowest, B is higher than A, C is higher than B…, and so on. With this knowledge, we look at requests from each client and check them in sequence. If their quality changes, for example, A1.ts, B2.ts, A3.ts, we add it to the rate of fluctuation metric.
Since we know when a client requested a chunk from us and how long each chunk is (4 seconds), we can add all the time/duration for all the chunks requested. If we see a gap in between, request gaps longer than the number of chunks the player requested in the past — which is the video in the buffer — we count it as a rebuffer. We also consider how much of the buffered video clients would have watched when they made a new chunk request from the CDN.
This algorithm is not exclusive to Verizon Media, now Edgio. It can be extended to other video services on the Edgio delivery network as long as they use a similar file naming convention.
With the QoE data in hand, we can improve QoE in several ways, including debugging specific issues and identifying underperforming networks. Once we identify QoE issues, we can easily dig deeper to understand why they happened.
When we see poor QoE, we can look at per data center QoE metrics to identify which data center observed the poor QoE. Once we identify the data center, we can drill down to identify which network caused it, isolate causes, and recommend fixes. For example, one fix could be to not use a specific network during the next live video stream if that network has shown to be prone to failures. Usually, when we have rebuffering issues, we manually move traffic before the event begins to make room for new traffic. Based on the estimated rebuffer algorithm’s data, our traffic management team can create a pre-game buffer to move traffic before the game starts to preempt the capacity issues.
And since the system can work in real-time, we can proactively take corrective actions during a live-streaming video. This could entail, for example, moving traffic from a data center experiencing poor QoE to a healthier data center. Real-time error detection and resolution is a highly effective tool for reducing the number of clients experiencing rebuffering or other issues.
Although eliminating the dreaded spinning wheel may never be possible, server-side analysis tools like the Estimate Rebuffer algorithm go a long way toward making its appearance much less frequent.