Category Archives: React

Screenshot of SOMA vision mixer

Compositing and mixing video in the browser

This blog post is the 4th part of our ongoing series working with the BBC Research & Development team. If you’re new to this project, you should start at the beginning!

BBC R&D logoLike all vision mixers, SOMA (Single Operator Mixing Application) has a “preview” and “transmission” monitor. Preview is used to see how different inputs will appear when composed together – in our case, a video input, a “lower third” graphic such as a caption which fades in and out, and finally a “DOG” such as a channel or event identifier shown in the top corner throughout a broadcast.

When switching between video feeds SOMA offers a fast cut between inputs or a slower mix between the two. As and when edit decisions are made, the resulting output is shown in the transmission monitor.

The problem with software

However, one difference with SOMA is that all the composition and mixing is simulated. SOMA is used to build a set of edit decisions which can be replayed later by a broadcast quality renderer. The transmission monitor is not just a view of the output after the effects have been applied as the actual rendering of the edit decisions hasn’t happened yet. The app needs to provide an accurate simulation of what the edit decision will look like.

The task of building this required breaking down how output is composed – during a mix both the old and new input sources are visible, so six inputs are required.

VideoContext to the rescue

Enter VideoContext, a video scheduling and compositing library created by BBC R&D. This allowed us to represent each monitor as a graph of nodes, with video nodes playing each input into transition nodes allowing mix and opacity to be varied over time, and a compositing node to bring everything together, all implemented using WebGL to offload video processing to the GPU.

The flexible nature of this library allowed us to plug in our own WebGL scripts to cut the lower third and DOG graphics out using chroma-keying (where a particular colour is declared to be transparent – normally green), and with a small patch to allow VideoContext to use streaming video we were off and going.

Devils in the details

The fiddly details of how edits work were as fiddly as expected: tracking the mix between two video inputs versus the opacity of two overlays appeared to be similar problems but required different solutions. The nature of the VideoContext graph meant we also had to keep track of which node was current rather than always connecting the current input to the same node. We put a lot of unit tests around this to ensure it works as it should now and in future.

By comparison a seemingly tricky problem of what to do if a new edit decision was made while a mix was in progress was just a case of swapping out the new input, to avoid the old input reappearing unexpectedly.

QA testing revealed a subtler problem that when switching to a new input the video takes a few tens of milliseconds to start. Cutting immediately causes a distracting flicker as a couple of blank frames are rendered – waiting until the video is ready adds a slight delay but this is significantly less distracting.

Later in the project a new requirement emerged to re-frame videos within the application and the decision to use VideoContext paid off as we could add an effect node into the graph to crop and scale the video input before mixing.

And finally

VideoContext made the mixing and compositing operations a lot easier than they would have been otherwise. Towards the end we even added an image source (for paused VTs) using the new experimental Chrome feature captureStream, and that worked really well.

After making it all work the obvious point of possible concern is performance, and overall it works pretty well.  We needed to have half-a-dozen or so VideoContexts running at once and this was effective on a powerful machine.  Many more and the computer really starts struggling.

Even a few years ago attempting this in the browser would have been madness, so its great to see such a lot of progress in something so challenging, and opening up a whole new range of software to work in the browser!


Sign pointing to Usability Lab

Rapid user research on an Agile project

Our timeline to build an in-browser vision mixer for BBC R&D (previously, previously) is extremely tight – just 2 months. UX and development runs concurrently in Agile fashion (a subject for a future blog post), but design was largely done within the first month.

Too frequently for projects on such timescales there is pressure to omit user testing in the interest of expediency. One could say it’s just a prototype, and leave it until the first trials to see how it performs, and hopefully get a chance to work the learnings into a version 2. Or, since we have weekly show & tell sessions with project stakeholders, one could argue complacently that as long as they’re happy with what they’re seeing, the design’s on track.

Why test?

But the stakeholders represent our application’s target users only slightly better than ourselves, which is not very well – they won’t be the ones using it. Furthermore, this project aims to broaden the range of potential operators – from what used to be the domain of highly experienced technicians, to something that could be used by a relative novice within hours. So I wanted to feel confident that even people who aren’t familiar with the project would be able to use it – both experts and novices. I’m not experienced in this field at all, so I was making lots of guesses and assumptions, and I didn’t want to go too far before finding they’re wrong.

One of the best things about working at the BBC is the ingrained culture of user centred design, so there was no surprise at the assumption that I’d be testing paper prototypes by the 2nd week. Our hosts were very helpful in finding participants within days – and with 100s of BBC staff working at MediaCity there is no danger of using people with too much knowledge of the project, or re-using participants. Last but not least, BBC R&D has a fully equipped usability lab – complete with two-way mirror and recording equipment. Overkill for my purposes – I would’ve managed with an ordinary office – but having the separate viewing room helped ensure that I got the entire team observing the sessions without crowding my subject. I’m a great believer in getting everyone on the project team seeing other people interact with and talk about the application.

Paper prototypes

Annoted paper prototyping test scriptPaper prototypes are A3 printouts of the wireframes, each representing a state of the application. After giving a brief description of what the application is used for, I show the page representing the application’s initial state, and change the pages in response to user actions as if it were the screen. (Users point to what they would click.) At first, I ask task-based questions: “add a camera and an audio source”; “create a copy of Camera 2 that’s a close-up”; etc. As we linger on a screen, I’ll probe more about their understanding of the interface: “How would you change the keyboard shortcut for Camera 1?”; “What do you think Undo/Redo would do on this screen?”; “What would happen if you click that?”; and so on. It doesn’t matter that the wireframes are incomplete – when users try to go to parts of the application that haven’t been designed yet, I ask them to describe what they expect to see and be able to do there.

In all, I did paper prototype testing with 6 people on week 2, and with a further 3 people on week 3. (With qualitative testing even very few participants tend to find the major issues.) In keeping with the agile nature of the project, there was no expectation of me producing a report of findings that everyone would read, although I do type up my notes in a shared document to help fix them in my memory. Rather, my learnings go straight into the design – I’m usually champing at the bit to make the changes that seem so obvious after seeing a person struggle, feeling really happy to have caught them so early on. Fortunately, user testing showed that the broad screen layout worked well – the main changes were to button labels, icon designs, and generally improved affordances.

Interactive prototypes

By week 4 my role had transitioned into front-end development, in which I’m responsible for creating static HTML mockups with the final design and CSS, which the developers use as reference markup for the React components. While this isn’t mainstream practice in our industry, I find it has numerous advantages, especially for an Agile project, as it enables me to leave the static, inexact medium of wireframes behind and refine the design and interaction directly within the browser. (I add some some dynamic interactivity using jQuery, but this is throwaway code for demo purposes only.)

The other advantage of HTML mockups is that they afford us an opportunity to do interactive user testing using a web browser, well before the production application is stable enough to test. Paper prototyping is fine up to a point, but you have plenty of limitations – for example, you can’t scroll, there are no mousever events, you can’t resize the screen, etc.

So by week 5 I was able to test nearly all parts of the application, in the browser, with 11 users. (This included two groups of 4, which worked better than I expected – one person manning the mouse and keyboard, but everyone in the group thinking out loud.) It was really good being able to see the difference that interactivity made, such as hover states, and seeing people actually trying to click or drag things rather than just saying what they’d do gave me an added level of confidence in my findings. Again, immediately afterwards, I made several changes that I’m confident improves the application – removing a redundant button that never got clicked, adding labels to some icons, strengthening a primary action by adding an icon, among others. Not to mention fixing numerous technical bugs that came up during testing. (I use Github comments to ensure developers are aware of any HTML changes to components at this stage.)

Never stop testing

Hopefully we’ll have time for another round of testing with the production application. This should give a more faithful representation of the vision mixing workflow, since in the mockups the application is always in the same state, using dummy content. With every test we can feel more confident – and our stakeholders can feel more confident – that what we’re building will meet its goals, and make users productive rather than frustrated. And on a personal level, I’m just relieved that we won’t be launching with any of the embarrassing gotchas that cropped up and got fixed during testing.

Read part 4 of this project working with BBC R&D where we talk about compositing and mixing video in the browser.

The challenges of mixing live video streams over IP networks

Welcome to our second post on the work we’re doing with BBC Research & Development. If you’ve not read the first post, you should go read that first 😉

Introducing IP Studio

BBC R&D logo

The first part of the infrastructure we’re working with here is something called IP Studio. In essence this is a platform for discovering, connecting and transforming video streams in a generic way, using IP networking – the standard on which pretty much all Internet, office and home networks are based.

Up until now video cameras have used very simple standards such as SDI to move video around. Even though SDI is digital, it’s just point-to-point – you connect the camera to something using a cable, and there it is. The reason for the remarkable success of IP networks, however, is their ability to connect things together over a generic set of components, routing between connecting devices. Your web browser can get messages to and from this blog over the Internet using a range of intervening machines, which is actually pretty clever.

Doing this with video is obviously in some senses well-understood – we’ve all watched videos online. There are some unique challenges with doing this for live television though!

Why live video is different

First, you can’t have any buffering: this is live. It’s unacceptable for everyone watching TV to see a buffering message because the production systems aren’t quick enough.

Second is quality. These are 4K streams, not typical internet video resolution. 4K streams have (roughly) 4000 horizontal pixels compared to the (roughly) 2000 for a 1080p stream (weirdly 1080p, 720p etc are named for their vertical pixels instead). this means they need about 4 times as much bandwidth – which even in 2017 is quite a lot. Specialist networking kit and a lot of processing power is required.

Third is the unique requirements of production – we’re not just transmitting a finished, pre-prepared video, but all the components from which to make one: multiple cameras, multiple audio feeds, still images, pre-recorded video. Everything you need to create the finished live product. This means that to deliver a final product you might need ten times as much source material – which is well beyond the capabilities of any existing systems.

IP Studio addresses this with a cluster of powerful servers sitting on a very high speed network. It allows engineers to connect together “nodes” to form processing “pipelines” that deliver video suitable for editing. This means capturing the video from existing cameras (using SDI) and transforming them into a format which will allow them to be mixed together later.

It’s about time

That sounds relatively straightforward, except for one thing: time. When you work with live signals on traditional analogue or point-to-point digital systems, then live means, well, live. There can be transmission delays in the equipment but they tend to be small and stable. A system based on relatively standard hardware and operating systems (IP Studio uses Linux, naturally) is going to have all sorts of variable delays in it, which need to be accommodated.

IP Studio is therefore based on “flows” comprising “grains”. Each grain has a quantum of payload (for example a video frame) and timing information. the timing information allows multiple flows to be combined into a final output where everything happens appropriately in synchronisation. This might sound easy but is fiendishly difficult – some flows will arrive later than others, so systems need to hold back some of them until everything is running to time.

To add to the complexity, we need two versions of the stream, one at 4k and one at a lower resolution.

Don’t forget the browser

Within the video mixer we’re building, we need the operator to be able to see their mixing decisions (cutting, fading etc.) happening in front of them in real time. We also need to control the final transmitted live output. There’s no way a browser in 2017 is going to show half-a-dozen 4k streams at once (and it would be a waste to do so). This means we are showing lower resolution 480p streams in the browser, while sending the edit decisions up to the output rendering systems which will process the 4k streams, before finally reducing them to 1080p for broadcast.

So we’ve got half-a-dozen 4k streams, and 480p equivalents, still images, pre-recorded video and audio, all being moved around in near-real-time on a cluster of commodity equipment from which we’ll be delivering live television!

Read part 3 of this project with BBC R&D where we delve into rapid user research on an Agile project.

MediaCity UK offices

Building a live television video mixing application for the browser

BBC R&D logoThis is the first in a series of posts about some work we are doing with BBC Research & Development.

The BBC now has, in the lab, the capability to deliver live television using high-end commodity equipment direct to broadcast, over standard IP networks. What we’ve been tasked with is building one of the key front-end applications – the video mixer. This will enable someone to mix an entire live television programme, at high quality, from within a standard web-browser on a normal laptop.

In this series of posts we’ll be talking in great depth about the design decisions, implementation technologies and opportunities presented by these platforms.

What is video mixing?

Video editing used to be a specialist skill requiring very expensive, specialist equipment. Like most things this has changed because of commodity, high-powered computers and now anyone can edit video using modestly priced equipment and software such as the industry standard Adobe Premiere. This has fed the development of services such as YouTube where 300 hours of video are uploaded every minute.

“Video Mixing” is the activity of getting the various different videos and stills in your source material and mixing them together to produce a single, linear output. It can involve showing single sources, cutting and fading between them, compositing them together, showing still images and graphics and running effects over them. Sound can be similarly manipulated. Anyone who has used Premiere, even to edit their family videos, will have some idea of the options involved.

Live television is a very different problem

First you need to produce high-fidelity output in real time. If you’ve ever used something like Premiere you’ll know that when you finally render your output it can take quite a long time – it can easily spend an hour rendering 20 minutes of output. That would be no good if you were broadcasting live! This means the technology used is very different – you can’t just use commodity hardware, you need specialist equipment that can work with these streams in realtime.

Second the capacity for screw up is immensely higher. Any mistakes in a live broadcast are immediately apparent, and potentially tricky to correct. It is a high-stress environment, even for experienced operators.

Finally, the range of things you might choose to do is much more limited, because you can spend little time setting it up. This means live television tends to use a far smaller ‘palette’ of mixing operations.

Even then, a live broadcast might require half a dozen people even for a modest production. You need someone to set up the cameras and control them, a sound engineer to get the sound right, someone to mix the audio, a vision mixer, a VT Operator (to run any pre-recorded videos you insert – perhaps the titles and credits) and someone to set up the still image overlays (for example, names and logos).

If that sounds bad, imagine a live broadcast away from the studio – the Outside Broadcast. All the people and equipment needs to be on site, hence the legendary “OB Van”:


Inside one of those vans is the equipment and people needed to run a live broadcast for TV. They’d normally transmit the final output directly to air by satellite – which is why you generally see a van with a massive dish on it nearby. This equipment runs into millions and millions of pounds and can’t be deployed on a whim. When you only have a few channels of course you don’t need many vans…

The Internet Steamroller

The Internet is changing all of this. Services like YouTube Live and Facebook Live mean that anyone with a phone and decent coverage can run their own outside broadcast. Where once you needed a TV network and millions of pounds of equipment now anyone can do it. Quality is poor and there are few options for mixing, but it is an amazingly powerful tool for citizen journalism and live reporting.

Also, the constraints of “channels” are going. Where once there was no point owning more OB Vans than you have channels, now you could run dozens of live feeds simultaneously over the Internet. As the phone becomes the first screen and the TV in the corner turns into just another display many of the constraints that we have taken for granted look more and more anachronistic.

These new technologies provide an opportunity, but also some significant challenges. The major one is standards – there is a large ecosystem of manufacturers and suppliers whose equipment needs to interoperate. The standards used, such as SDI (Serial Digital Interface) have been around for decades and are widely supported. Moving to an Internet-based standard needs cooperation across the industry.

BBC R&D has been actively working towards this with their IP Studio  project, and the standards they are developing with industry for Networked Media.

Read part 2 of this project with BBC R&D where I’ll describe some of the technologies involved, and how we’re approaching the project.