The challenges of mixing live video streams over IP networks

Doug WinterFounder and CTO

The Stack

Welcome to our second post on the work we’re doing with BBC Research & Development. If you’ve not read the first post, you should go read that first 😉

Introducing IP Studio

The first part of the infrastructure we’re working with here is something called IP Studio. In essence this is a platform for discovering, connecting and transforming video streams in a generic way, using IP networking – the standard on which pretty much all Internet, office and home networks are based.

Up until now video cameras have used very simple standards such as SDI to move video around. Even though SDI is digital, it’s just point-to-point – you connect the camera to something using a cable, and there it is. The reason for the remarkable success of IP networks, however, is their ability to connect things together over a generic set of components, routing between connecting devices. Your web browser can get messages to and from this blog over the Internet using a range of intervening machines, which is actually pretty clever.

Doing this with video is obviously in some senses well-understood – we’ve all watched videos online. There are some unique challenges with doing this for live television though!

Why live video is different

First, you can’t have any buffering: this is live. It’s unacceptable for everyone watching TV to see a buffering message because the production systems aren’t quick enough.

Second is quality. These are 4K streams, not typical internet video resolution. 4K streams have (roughly) 4000 horizontal pixels compared to the (roughly) 2000 for a 1080p stream (weirdly 1080p, 720p etc are named for their vertical pixels instead). this means they need about 4 times as much bandwidth – which even in 2017 is quite a lot. Specialist networking kit and a lot of processing power is required.

Third is the unique requirements of production – we’re not just transmitting a finished, pre-prepared video, but all the components from which to make one: multiple cameras, multiple audio feeds, still images, pre-recorded video. Everything you need to create the finished live product. This means that to deliver a final product you might need ten times as much source material – which is well beyond the capabilities of any existing systems.

IP Studio addresses this with a cluster of powerful servers sitting on a very high speed network. It allows engineers to connect together “nodes” to form processing “pipelines” that deliver video suitable for editing. This means capturing the video from existing cameras (using SDI) and transforming them into a format which will allow them to be mixed together later.

It’s about time

That sounds relatively straightforward, except for one thing: time. When you work with live signals on traditional analogue or point-to-point digital systems, then live means, well, live. There can be transmission delays in the equipment but they tend to be small and stable. A system based on relatively standard hardware and operating systems (IP Studio uses Linux, naturally) is going to have all sorts of variable delays in it, which need to be accommodated.

IP Studio is therefore based on “flows” comprising “grains”. Each grain has a quantum of payload (for example a video frame) and timing information. the timing information allows multiple flows to be combined into a final output where everything happens appropriately in synchronisation. This might sound easy but is fiendishly difficult – some flows will arrive later than others, so systems need to hold back some of them until everything is running to time.

To add to the complexity, we need two versions of the stream, one at 4k and one at a lower resolution.

Don’t forget the browser

Within the video mixer we’re building, we need the operator to be able to see their mixing decisions (cutting, fading etc.) happening in front of them in real time. We also need to control the final transmitted live output. There’s no way a browser in 2017 is going to show half-a-dozen 4k streams at once (and it would be a waste to do so). This means we are showing lower resolution 480p streams in the browser, while sending the edit decisions up to the output rendering systems which will process the 4k streams, before finally reducing them to 1080p for broadcast.

So we’ve got half-a-dozen 4k streams, and 480p equivalents, still images, pre-recorded video and audio, all being moved around in near-real-time on a cluster of commodity equipment from which we’ll be delivering live television!

Read part 3 of this project with BBC R&D where we delve into rapid user research on an Agile project.

[Update: Now the project is done you can read the full case study on our website]

Back to The Stack