Category Archives: React

SOMA in use during the 2017 Edinburgh Festival

Live video mixing with the BBC: Lessons learned

In this post I am going to reflect on some of the more interesting aspects of this project and the lessons they might provide for other projects.BBC R&D logo

This post is one of a series talking about our work on the SOMA video mixing application for the BBC. The previous posts in the series are:

  1. Building a live television video mixing application for the browser
  2. The challenges of mixing live video streams over IP networks
  3. Rapid user research on an Agile project
  4. Compositing and mixing video in the browser
  5. Taming the Async Beast with FRP and RxJS
  6. RxJS: An object lesson in terrible good software
  7. Video: The future of TV broadcasting
  8. Integrating UX with agile development

In my view there are three broad areas where this project has some interesting lessons.

Novel domains

First is the novel domain.

This isn’t unfamiliar – we often work in novel domains that we have little to no knowledge of. It is the nature of technical agency in fact – while we have some domains that we’ve worked in for many years such as healthcare and education there are always novel businesses with entirely new subjects to wrap our heads around.  (To give you some idea, a few recent examples include store-and-forward television broadcasting, horse racing odds, medical curricula, epilepsy diagnosis, clustering automation and datacentre hardware provisioning.)

Over the years this has been the thing that I have most enjoyed out of every aspect of our work. Plunging into an entirely new subject with a short amount of time to understand it and make a useful contribution is exhilarating.

Although it might sound a bit insane to throw a team who know nothing about a domain at a problem, what we’re very good at is designing and building products. As long as our customers can provide the domain expertise, we can bring the product build. It is easier for us to learn the problem domain than it is for a domain expert to learn how to build great products.

The greatest challenge with a new domain is the assumptions. We all have these in our work – the things we think are so well understood that we don’t even mention them. These are a terrible trap for software developers, because we can spend weeks building completely the wrong thing with no idea that we’re doing so.

We were very lucky in this respect to be working with a technical organisation within the BBC: Research & Development. They were aware of this risk and did a very good job of arranging our briefing, which included a visit to a vision mixing gallery. This is the kind of exercise that delivers a huge amount in tacit understanding, and allows us to ask the really stupid questions in the right setting.

I think of the core problem as a “Rumsfeld“. Although he got a lot of criticism for these comments I think they’re bizarrely insightful. There really are unknown unknowns, and what the hell do you do about them? You can often sense that they exist, but how do you turn them into known unknowns?

For many of these issues the challenge is not the answer, which is obvious once it has been found, but facilitating the conversation to produce the answer. It can be a long and frustrating process, but critical to success.

I’d encourage everyone to try and get the software team into the existing environment of the target stakeholder groups to try and understand at a fundamental level what they need.

The Iron Triangle

The timescale for this project was extraordinarily difficult – nine weeks from a standing start. In addition much of the scope was quite fixed – we were largely building core functionality that, if missing, would have rendered the application useless. In addition we wanted to achieve the level of finish for the UX that we generally deliver.

This was extremely ambitious, and in retrospect we bit off more than we could reasonably chew.

Time is the greatest enemy of software projects because of the challenges in estimation. For reasons covered in a different blog post, estimation for software projects is somewhere between an ineffable art reserved only for the angels, and completely impossible.

Triangle with sides labeled Quality, Scope and TimeWhen estimates are impossible, time becomes an even greater challenge. One of the truisms of our industry is the “Iron Triangle” of time, scope and quality. Like a good chinese buffet, you can only choose two. If you want a fixed time and scope, it is quality that will suffer.

Building good software takes thought and planning. Also, the first version of a component is rarely the best – it takes time to assemble, then consider it, and then perhaps shape it into something near its final form.

Quality is, itself, an aggregate quality. Haste lowers the standards for each part and so, by a process of multiplication, lowers far more the overall quality of a product. The only way to achieve a very high quality for the finished product is for every single part to be of similarly high quality. This is generally our goal.

However. Whisper it. It is possible to “manage” quality, if you understand your process and know the goal. Different kinds of testing can provide different levels of certainty of code quality. Manual testing, when done exhaustively, can substitute in some cases for perfection in code.

We therefore managed our quality, and I think actually did well here.

Asynchronous integration components had to be of absolute perfection because any bugs would result in general lack of stability which would be impossible to trace. The only way to build these is carefully, with a design and the only way to test these is exhaustively with unit and integration tests.

On the other hand, there were a lot of aspects of the UI where it was crucial that they performed and looked excellent, but the code could be rougher around the edges, and could just be hacked out. This was my area of the application, and my goal was to deliver features as fast as possible with just acceptable quality. Some of the code was quite embarrassing but we got the project over the line in the time, with the scope, and it all worked. This was sufficient for those areas.

Experimental technologies

I often talk about our approach using the concept of an innovation curve, and our position on it (I think I stole the idea from Ian Jindal – thanks Ian!).

If you can imagine a curve like this one, where the X axis is “how innovative your technologies are”, the Y axis is “pain”.

In practical terms this can be translated into “how likely I am to find the answer to my problems on Stack Overflow“.

At the very left, everything has been seen and done before, so there is no challenge from novelty – but you are almost certainly not making the most of available technologies.

At the far right, you are hand crafting your software from individual photons and you have to conduct high-energy physics experiments to debug your code. You are able to mould the entire universe to your whim – but it takes forever and costs a fortune.

There is no correct place to sit on this curve – where you sit is a strategic (and emotional) decision that depends on the forces at play in your particular situation.

Isotoma endeavours to be somewhere on the shoulder of the curve. The software we build generally needs to last 5+ years, so we can’t pick flash-in-the-pan technologies that will be gone in 18 months. But similarly we need to be relatively recent so it doesn’t become obsolete. This is sometimes called “leading edge”. Almost bleeding edge, but not so close you get cut. With careful choice of tools it is possible to maintain a position like this successfully.

This BBC project was off to the right of this curve, far closer to the bleeding edge than we’d normally choose, and we definitely suffered.

Some of the technologies we had to use had some serious issues:

  1. To use IPStudio, a properly cutting edge product developed internally by BBC R&D, we routinely had to read the C++ source code of the product to find answers to integration questions.
  2. We needed dozens of coordinated asynchronous streams running, for which we used RxJS. This was interesting enough to justify two posts on this blog on its own.
  3. WebRTC, which was the required delivery mechanism for the video, is absolutely not ready for this use case. The specification is unclear, browser implementation is incomplete and it is fundamentally unsuited at this time to synchronised video delivery.
  4. The video compositing technologies in browsers actually works quite well, but was entirely new to us and it took considerable time to gain sufficient expertise to do a good job. Also browser implementations still have surprising sharp edges (only 16 WebGL contexts are allowed! Why 16? I dunno.)

Any of these one issues could have sunk our project, so I am very proud we shipped good software, with all four issues.

Lessons learned? Task allocation is the key to this one I think.

One person, Alex, devoted his time to the IPStudio and WebRTC work for pretty much the entire project, and Ricey concentrated on video mixing.

Rather than try and skill up several people, concentrate the learning in a single brain. Although this is generally a terrible idea (because then you have a hard dependency on a single individual for a particular part of the codebase), in this case it was the only way through, and it worked.

Also, don’t believe any documentation, or in fact anything written in any human languages. When working on the bleeding edge you must “Use The Source, Luke”. Go to the source code and get your head around it. Everything else lies.

Summary

I am proud, justifiably I think, that we delivered this project successfully. It was used at the Edinburgh festival and actual real live television was mixed using our product, given all the constraints above.

The lessons?

  1. Spend the time and effort to make sure your entire team understand the tacit requirements of the problem domain and the stakeholders.
  2. Have an approach to managing appropriate quality that delivers the scope and timescale, if these are heavily constrained.
  3. Understand your position on the innovation curve and choose a strategic approach to managing this.

The banner image at the top of the article, taken by Chris Northwood, shows SOMA in use during the 2017 Edinburgh Festival.

The future of TV broadcasting

Earlier this year we started working on an exciting project with BBC Research & Development on their long term programme developing IP Studio – the next generation IP-network based broadcast television platform. The BBC are developing new industry-wide standards working with manufacturers which will hopefully be adopted worldwide.

This new technology dramatically changes the equipment needed for live TV broadcasting. Instead of large vans stocked with pricey equipment and a team of people; shows can be recorded, edited and streamed live by a single person with a 4k camera, a laptop and internet access.

The freedom of being able to stream and edit live TV within the browser will open up endless possibilities for the entertainment and media sector.

You can see more Isotoma videos on Vimeo.

Video: Serverless – the real sharing economy

Serverless is a new application design paradigm, typified by services like AWS Lambda, Azure Cloud Functions and IBM OpenWhisk. It is particularly well suited to mobile software and Single-Page Application frameworks such as React.

In this video, Doug Winter talks at Digital North in Manchester about what Serverless is, where it comes from, why you would want to use it, how the economics function and how you can get started.

You can see more Isotoma videos on Vimeo.

RxJS: An object lesson in terrible good software

We recently used RxJS on a large, complex asynchronous project integrated with a big third-party distributed system. We now know more about it than, frankly, anyone would ever want to.

While we loved the approach, we hated the software itself. The reasons for this are a great lesson in how not to do software.

Our biggest issue by far with RxJS is that there are two actively developed, apparently stable versions at two different URLs. RxJS 4 is the top result from google for RxJS and lives at https://github.com/Reactive-Extensions/RxJS, and briefly mentions that there is an unstable version 5 at a different address. RxJS 5 lives at https://github.com/ReactiveX/rxjs and has a completely different API to version 4, completely different, far worse (“WIP”) documentation, doesn’t allude to its level of stability, and is written in typescript, so users will need to learn some typescript before they can understand the codebase.

Which version should new adopters use? I have absolutely no idea. Either way, when you google for advice and documentation, you can be fairly certain that the results you get will be for a version you’re not using.

RxJS goes to great lengths to swallow your errors. We’re pretty united here in thinking that it definitely should not. If an observable fires its “error” callback, it’s reasonable that the emitted error should be picked up by the nearest catch operator. Sadly though RxJS also wraps all of the functions that you pass to it with a try/catch block and any exception raised by those functions will also be shunted to the nearest try/catch block. Promises do this too, and many have complained bitterly about it already.

What this means in practice is that finding the source of an error is extremely difficult. RxJS tries to capture the original stack trace and make it available in the catch block but often fails, resulting in a failed observable and an “undefined” error. When my code breaks I’d like it to break where it broke, not in a completely different place. If I expect an error to occur I can catch it as I would anywhere else in the codebase and emit an observable error of the form that I’d expect in my catch block so that my catch blocks don’t all have to accommodate expected failure modes and any arbitrary exception. Days and days of development were lost to bisecting a long pile of dot-chained functions in order to isolate the one that raised the (usually stupidly trivial) error.

At the very least, it’d be nice to have the choice to use an unsafe observable instead. For this reason alone we are unlikely to use RxJS again.

We picked RxJS 5 as it’s been around for a long time now and seems to be being maintained by Netflix, which is reassuring.

The documentation could be way better. It is incomplete, with some methods not documented at all, partially documented as a mystery-meat web application that can’t be searched like any normal technical documentation. Code examples rarely use real-world use-cases so it’s tough to see the utility of many of the Observable methods. Most of the gotchas that caught out all of our developers weren’t alluded to at any point in any of the documentation (in the end, a youtube talk by the lead developer saved the day, containing the first decent explanation of the error handling mechanism that I’d seen or read). Worst of all, the only documentation that deals with solving actual problems with RxJS (higher-order observables) is in the form of videos on the paywalled egghead.io. I can’t imagine a more effective way to put off new adopters than requiring them to pay $200 just to appreciate how the library is commonly used (though, to be clear, I am a fan of egghead).

Summed up best by this thread, RxJS refuses to accept its heritage and admit that it’s a functional library. Within the javascript community there exists a huge functional programming subcommunity that has managed to put together a widely-adopted specification for writing functional javascript libraries that can interoperate with the rest of the available javascript functional libraries. RxJS chooses not to work to this specification  and a number of design decisions such as introspecting certain contained values and swallowing them drastically reduces the ease with which RxJS can be used in functional javascript codebases.

RxJS makes the same mistake lodash did a number of years ago, regularly opting for variadic arguments to its methods rather than taking arrays (the worst example is merge). Lodash did eventually learn its lesson, I hope RxJS does too.

Taming the Async Beast with FRP and RxJS

The Problem

We’ve recently been working on an in-browser vision mixer for the BBC (previous blog posts here, here, here, and here). Live vision mixing involves keeping track of a large number of BBC R&D logointerdependent data streams. Our application receives timing data for video tapes and live video streams via webrtc data channels and websocket connections and we’re sending video and audio authoring decisions over other websockets to the live rendering backend.

Many of the data streams we’re handling are interdependent; we don’t want to send an authoring decision to the renderer to cut to a video tape until the video tape is loaded and ready to play, so we need to wait until the video tape is ready to play before we send an authoring decision; if the authoring websocket has closed we’ll need to reconnect to it then retry sending that authoring decision.

Orchestrating interdependent asynchronous data streams is a fundamentally complex problem.

Promises are one popular solution for composing asynchronous operations and safely transforming the results, however they have a number of limitations. The primary issue is that they cannot be cancelled, so we need to handle teardown separately somehow. We could use the excellent fluture or Task Future libraries instead, both of which support cancellation (and are lazy and chainable and fantasy-land compliant), but futures and promises handle one single future value (or error value), not a stream of many values (or error value). The team working this project are fans of futures (less so of promises) and were aiming to write the majority of the codebase in a functional style using folktale and ramda (and react-redux) so wanted a functional, composable way to handle ongoing streams of data that could sit comfortably within the rest of the codebase.

A Solution

After some debate, we decided to use FRP (functional reactive programming) powered by the observable pattern. Having used RxJS (with redux-observable) for smaller projects in the past, we were confident that it could be an elegant solution to our problem. You can find out more about RxJS here and here but, in short, it’s a library that allows subscribers to listen to and transform the output of a data stream as per the observer pattern, and allows the observable (the thing subscribed to) to “complete” its stream when it runs out of data (or whatever), similar to an iterator from the iterator pattern. Observables also allow their subscribers to terminate them at any point, and typically observables will encapsulate teardown logic related to their data source – a websocket, long-poll, webrtc data channel, or similar.

RxJS implements the observer pattern in a functional way that allows developers to compose together observables, just as they’d compose functions or types. RxJS has its roots in functional reactive programming and leverages the power of monadic composition to chain together streams while also ensuring that teardown logic is preserved and handled as you’d expect.

Why FRP and Observables?

The elegance and power of observables is much more easily demonstrated than explained in a wordy paragraph. I’ll run through the basics and let your imagination think through the potential of it all.

A simple RxJS observable looks like this:

Observable.of(1, 2, 3)

It can be subscribed to as follows:

Observable.of(1, 2, 3).subscribe({
  next: val => console.log(`Next: ${val}`),
  error: err => console.error(err),
  complete: () => console.log('Completed!')
});

Which would emit the following to the console:

Next: 1
Next: 2
Next: 3
Completed!

We can also transform the data just as we’d transform values in an array:

Observable.of(1, 2, 3).map(x => x * 2).filter(x => x !== 4).subscribe(...)
2
6
Completed!

Observables can also be asynchronous:

Observable.interval(1000).subscribe(...)
0 [a second passes]
1 [a second passes]
2 [a second passes]
...

Observables can represent event streams:

Observable.fromEvent(window, 'mousemove').subscribe(...)
[Event Object]
[Event Object]
[Event Object]

Which can also be transformed:

Observable.fromEvent(window, 'mousemove')
  .map(ev => [ev.clientX, ev.clientY])
  .subscribe(...)
[211, 120]
[214, 128]
[218, 139]
...

We can cancel the subscriptions which will clean up the event listener:

const subscription = Observable.fromEvent(window, 'mousemove')
  .map(ev => [ev.clientX, ev.clientY])
  .subscribe(...)

subscription.unsubscribe();

Or we can unsubscribe in a dot-chained functional way:

Observable.of(1, 2, 3)
  .take(2)  // After receiving two values, complete the observable early
  .subscribe(...)
1
2
Completed!
Observable.fromEvent(window, 'mousemove')
  .map(ev => [ev.clientX, ev.clientY])
   // Stop emitting when the user clicks
  .takeUntil(Observable.fromEvent(window, 'click'))
  .subscribe(...)

Note that those last examples left no variables lying around. They are entirely self-contained bits of functionality that clean up after themselves.

Many common asynchronous stream use-cases are catered for natively, in such a way that the “operators” (the observable methods e.g. “throttle”, “map”, “delay”, “filter”) take care of all of the awkward state required to track emitted values over time.

Observable.fromEvent(window, 'mousemove')
  .map(...)
  .throttle(1000) // only allow one event through per second
  .subscribe(...);

… and that’s barely scratching the surface.

The Benefits

Many of the benefits of RxJS are the benefits of functional programming. The avoidance of state, the readability and testability of short, pure functions. By encapsulating the side-effects associated with your application in a generic, composable way, developers can maximise the reusability of the asynchronous logic in their codebase.

By seeing the application as a series of data transformations between the external application interfaces, we can describe those transformations by composing short, pure functions and lazily applying data to them as it is emitted in real-time.

Messy, temporary, imperative variables are replaced by functional closure to give observables access to previously emitted variables in a localised way that limits the amount of the application logic and state a developer must hold in their head at any given time.

Did It Work?

Sort of.  We spent a lot of our time in a state of low-level fury at RxJS, so much so that we’ve written up a long list of complaints, in another post.

There are some good bits though:

FRP and the observable pattern are both transformative approaches to writing complex asynchronous javascript code, producing fewer bugs and drastically improving the reusability of our codebase.

RxJS operators can encapsulate extremely complex asynchronous operations and elegantly describe dependencies in a terse, declarative way that leaves no state lying around.

In multiple standups throughout the project we’ve enthusiastically raved about how these operators have turned a fundamentally complex part of our implementation into a two line solution. Sure those two lines usually took a long time to craft and get right, but once working, it’s difficult to write many bugs in just two lines of code (when compared to the hundreds of lines of imperative code we’d otherwise need to write if we rolled our own).

That said, RxJS is a functional approach to writing code so developers should expect to incur a penalty if they’re new to the paradigm as they go from an imperative, object-oriented approach to system design to a functional, data-flow-driven approach instead. There is also a very steep learning curve required to feel the benefits of RxJS as developers familiarise themselves with the toolbox and the idiosyncrasies.

Would We Use It Again?

Despite the truly epic list of shortcomings, I would still recommend an FRP approach to complex async javascript projects. In future we’ll be trying out most.js to see if it solves the myriad of problems we found with RxJS. If it doesn’t, I’d consider implementing an improved Observable that keeps its hands off my errors.

It’s also worth mentioning that we used RxJS with react-redux to handle all redux side-effects. We used redux-observable to achieve this and it was terrific. We’ll undoubtedly be using redux-observable again.

 

Screenshot of SOMA vision mixer

Compositing and mixing video in the browser

This blog post is the 4th part of our ongoing series working with the BBC Research & Development team. If you’re new to this project, you should start at the beginning!

BBC R&D logoLike all vision mixers, SOMA (Single Operator Mixing Application) has a “preview” and “transmission” monitor. Preview is used to see how different inputs will appear when composed together – in our case, a video input, a “lower third” graphic such as a caption which fades in and out, and finally a “DOG” such as a channel or event identifier shown in the top corner throughout a broadcast.

When switching between video feeds SOMA offers a fast cut between inputs or a slower mix between the two. As and when edit decisions are made, the resulting output is shown in the transmission monitor.

The problem with software

However, one difference with SOMA is that all the composition and mixing is simulated. SOMA is used to build a set of edit decisions which can be replayed later by a broadcast quality renderer. The transmission monitor is not just a view of the output after the effects have been applied as the actual rendering of the edit decisions hasn’t happened yet. The app needs to provide an accurate simulation of what the edit decision will look like.

The task of building this required breaking down how output is composed – during a mix both the old and new input sources are visible, so six inputs are required.

VideoContext to the rescue

Enter VideoContext, a video scheduling and compositing library created by BBC R&D. This allowed us to represent each monitor as a graph of nodes, with video nodes playing each input into transition nodes allowing mix and opacity to be varied over time, and a compositing node to bring everything together, all implemented using WebGL to offload video processing to the GPU.

The flexible nature of this library allowed us to plug in our own WebGL scripts to cut the lower third and DOG graphics out using chroma-keying (where a particular colour is declared to be transparent – normally green), and with a small patch to allow VideoContext to use streaming video we were off and going.

Devils in the details

The fiddly details of how edits work were as fiddly as expected: tracking the mix between two video inputs versus the opacity of two overlays appeared to be similar problems but required different solutions. The nature of the VideoContext graph meant we also had to keep track of which node was current rather than always connecting the current input to the same node. We put a lot of unit tests around this to ensure it works as it should now and in future.

By comparison a seemingly tricky problem of what to do if a new edit decision was made while a mix was in progress was just a case of swapping out the new input, to avoid the old input reappearing unexpectedly.

QA testing revealed a subtler problem that when switching to a new input the video takes a few tens of milliseconds to start. Cutting immediately causes a distracting flicker as a couple of blank frames are rendered – waiting until the video is ready adds a slight delay but this is significantly less distracting.

Later in the project a new requirement emerged to re-frame videos within the application and the decision to use VideoContext paid off as we could add an effect node into the graph to crop and scale the video input before mixing.

And finally

VideoContext made the mixing and compositing operations a lot easier than they would have been otherwise. Towards the end we even added an image source (for paused VTs) using the new experimental Chrome feature captureStream, and that worked really well.

After making it all work the obvious point of possible concern is performance, and overall it works pretty well.  We needed to have half-a-dozen or so VideoContexts running at once and this was effective on a powerful machine.  Many more and the computer really starts struggling.

Even a few years ago attempting this in the browser would have been madness, so its great to see such a lot of progress in something so challenging, and opening up a whole new range of software to work in the browser!

Read part 5 of this project with BBC R&D where Developer Alex Holmes talks about Taming async with FRP and RxJS.

Sign pointing to Usability Lab

Rapid user research on an Agile project

Our timeline to build an in-browser vision mixer for BBC R&D (previously, previously) is extremely tight – just 2 months. UX and development runs concurrently in Agile fashion (a subject for a future blog post), but design was largely done within the first month.

Too frequently for projects on such timescales there is pressure to omit user testing in the interest of expediency. One could say it’s just a prototype, and leave it until the first trials to see how it performs, and hopefully get a chance to work the learnings into a version 2. Or, since we have weekly show & tell sessions with project stakeholders, one could argue complacently that as long as they’re happy with what they’re seeing, the design’s on track.

Why test?

But the stakeholders represent our application’s target users only slightly better than ourselves, which is not very well – they won’t be the ones using it. Furthermore, this project aims to broaden the range of potential operators – from what used to be the domain of highly experienced technicians, to something that could be used by a relative novice within hours. So I wanted to feel confident that even people who aren’t familiar with the project would be able to use it – both experts and novices. I’m not experienced in this field at all, so I was making lots of guesses and assumptions, and I didn’t want to go too far before finding they’re wrong.

One of the best things about working at the BBC is the ingrained culture of user centred design, so there was no surprise at the assumption that I’d be testing paper prototypes by the 2nd week. Our hosts were very helpful in finding participants within days – and with 100s of BBC staff working at MediaCity there is no danger of using people with too much knowledge of the project, or re-using participants. Last but not least, BBC R&D has a fully equipped usability lab – complete with two-way mirror and recording equipment. Overkill for my purposes – I would’ve managed with an ordinary office – but having the separate viewing room helped ensure that I got the entire team observing the sessions without crowding my subject. I’m a great believer in getting everyone on the project team seeing other people interact with and talk about the application.

Paper prototypes

Annoted paper prototyping test scriptPaper prototypes are A3 printouts of the wireframes, each representing a state of the application. After giving a brief description of what the application is used for, I show the page representing the application’s initial state, and change the pages in response to user actions as if it were the screen. (Users point to what they would click.) At first, I ask task-based questions: “add a camera and an audio source”; “create a copy of Camera 2 that’s a close-up”; etc. As we linger on a screen, I’ll probe more about their understanding of the interface: “How would you change the keyboard shortcut for Camera 1?”; “What do you think Undo/Redo would do on this screen?”; “What would happen if you click that?”; and so on. It doesn’t matter that the wireframes are incomplete – when users try to go to parts of the application that haven’t been designed yet, I ask them to describe what they expect to see and be able to do there.

In all, I did paper prototype testing with 6 people on week 2, and with a further 3 people on week 3. (With qualitative testing even very few participants tend to find the major issues.) In keeping with the agile nature of the project, there was no expectation of me producing a report of findings that everyone would read, although I do type up my notes in a shared document to help fix them in my memory. Rather, my learnings go straight into the design – I’m usually champing at the bit to make the changes that seem so obvious after seeing a person struggle, feeling really happy to have caught them so early on. Fortunately, user testing showed that the broad screen layout worked well – the main changes were to button labels, icon designs, and generally improved affordances.

Interactive prototypes

By week 4 my role had transitioned into front-end development, in which I’m responsible for creating static HTML mockups with the final design and CSS, which the developers use as reference markup for the React components. While this isn’t mainstream practice in our industry, I find it has numerous advantages, especially for an Agile project, as it enables me to leave the static, inexact medium of wireframes behind and refine the design and interaction directly within the browser. (I add some some dynamic interactivity using jQuery, but this is throwaway code for demo purposes only.)

The other advantage of HTML mockups is that they afford us an opportunity to do interactive user testing using a web browser, well before the production application is stable enough to test. Paper prototyping is fine up to a point, but you have plenty of limitations – for example, you can’t scroll, there are no mousever events, you can’t resize the screen, etc.

So by week 5 I was able to test nearly all parts of the application, in the browser, with 11 users. (This included two groups of 4, which worked better than I expected – one person manning the mouse and keyboard, but everyone in the group thinking out loud.) It was really good being able to see the difference that interactivity made, such as hover states, and seeing people actually trying to click or drag things rather than just saying what they’d do gave me an added level of confidence in my findings. Again, immediately afterwards, I made several changes that I’m confident improves the application – removing a redundant button that never got clicked, adding labels to some icons, strengthening a primary action by adding an icon, among others. Not to mention fixing numerous technical bugs that came up during testing. (I use Github comments to ensure developers are aware of any HTML changes to components at this stage.)

Never stop testing

Hopefully we’ll have time for another round of testing with the production application. This should give a more faithful representation of the vision mixing workflow, since in the mockups the application is always in the same state, using dummy content. With every test we can feel more confident – and our stakeholders can feel more confident – that what we’re building will meet its goals, and make users productive rather than frustrated. And on a personal level, I’m just relieved that we won’t be launching with any of the embarrassing gotchas that cropped up and got fixed during testing.

Read part 4 of this project working with BBC R&D where we talk about compositing and mixing video in the browser.

The challenges of mixing live video streams over IP networks

Welcome to our second post on the work we’re doing with BBC Research & Development. If you’ve not read the first post, you should go read that first 😉

Introducing IP Studio

BBC R&D logo

The first part of the infrastructure we’re working with here is something called IP Studio. In essence this is a platform for discovering, connecting and transforming video streams in a generic way, using IP networking – the standard on which pretty much all Internet, office and home networks are based.

Up until now video cameras have used very simple standards such as SDI to move video around. Even though SDI is digital, it’s just point-to-point – you connect the camera to something using a cable, and there it is. The reason for the remarkable success of IP networks, however, is their ability to connect things together over a generic set of components, routing between connecting devices. Your web browser can get messages to and from this blog over the Internet using a range of intervening machines, which is actually pretty clever.

Doing this with video is obviously in some senses well-understood – we’ve all watched videos online. There are some unique challenges with doing this for live television though!

Why live video is different

First, you can’t have any buffering: this is live. It’s unacceptable for everyone watching TV to see a buffering message because the production systems aren’t quick enough.

Second is quality. These are 4K streams, not typical internet video resolution. 4K streams have (roughly) 4000 horizontal pixels compared to the (roughly) 2000 for a 1080p stream (weirdly 1080p, 720p etc are named for their vertical pixels instead). this means they need about 4 times as much bandwidth – which even in 2017 is quite a lot. Specialist networking kit and a lot of processing power is required.

Third is the unique requirements of production – we’re not just transmitting a finished, pre-prepared video, but all the components from which to make one: multiple cameras, multiple audio feeds, still images, pre-recorded video. Everything you need to create the finished live product. This means that to deliver a final product you might need ten times as much source material – which is well beyond the capabilities of any existing systems.

IP Studio addresses this with a cluster of powerful servers sitting on a very high speed network. It allows engineers to connect together “nodes” to form processing “pipelines” that deliver video suitable for editing. This means capturing the video from existing cameras (using SDI) and transforming them into a format which will allow them to be mixed together later.

It’s about time

That sounds relatively straightforward, except for one thing: time. When you work with live signals on traditional analogue or point-to-point digital systems, then live means, well, live. There can be transmission delays in the equipment but they tend to be small and stable. A system based on relatively standard hardware and operating systems (IP Studio uses Linux, naturally) is going to have all sorts of variable delays in it, which need to be accommodated.

IP Studio is therefore based on “flows” comprising “grains”. Each grain has a quantum of payload (for example a video frame) and timing information. the timing information allows multiple flows to be combined into a final output where everything happens appropriately in synchronisation. This might sound easy but is fiendishly difficult – some flows will arrive later than others, so systems need to hold back some of them until everything is running to time.

To add to the complexity, we need two versions of the stream, one at 4k and one at a lower resolution.

Don’t forget the browser

Within the video mixer we’re building, we need the operator to be able to see their mixing decisions (cutting, fading etc.) happening in front of them in real time. We also need to control the final transmitted live output. There’s no way a browser in 2017 is going to show half-a-dozen 4k streams at once (and it would be a waste to do so). This means we are showing lower resolution 480p streams in the browser, while sending the edit decisions up to the output rendering systems which will process the 4k streams, before finally reducing them to 1080p for broadcast.

So we’ve got half-a-dozen 4k streams, and 480p equivalents, still images, pre-recorded video and audio, all being moved around in near-real-time on a cluster of commodity equipment from which we’ll be delivering live television!

Read part 3 of this project with BBC R&D where we delve into rapid user research on an Agile project.

MediaCity UK offices

Building a live television video mixing application for the browser

BBC R&D logoThis is the first in a series of posts about some work we are doing with BBC Research & Development.

The BBC now has, in the lab, the capability to deliver live television using high-end commodity equipment direct to broadcast, over standard IP networks. What we’ve been tasked with is building one of the key front-end applications – the video mixer. This will enable someone to mix an entire live television programme, at high quality, from within a standard web-browser on a normal laptop.

In this series of posts we’ll be talking in great depth about the design decisions, implementation technologies and opportunities presented by these platforms.

What is video mixing?

Video editing used to be a specialist skill requiring very expensive, specialist equipment. Like most things this has changed because of commodity, high-powered computers and now anyone can edit video using modestly priced equipment and software such as the industry standard Adobe Premiere. This has fed the development of services such as YouTube where 300 hours of video are uploaded every minute.

“Video Mixing” is the activity of getting the various different videos and stills in your source material and mixing them together to produce a single, linear output. It can involve showing single sources, cutting and fading between them, compositing them together, showing still images and graphics and running effects over them. Sound can be similarly manipulated. Anyone who has used Premiere, even to edit their family videos, will have some idea of the options involved.

Live television is a very different problem

First you need to produce high-fidelity output in real time. If you’ve ever used something like Premiere you’ll know that when you finally render your output it can take quite a long time – it can easily spend an hour rendering 20 minutes of output. That would be no good if you were broadcasting live! This means the technology used is very different – you can’t just use commodity hardware, you need specialist equipment that can work with these streams in realtime.

Second the capacity for screw up is immensely higher. Any mistakes in a live broadcast are immediately apparent, and potentially tricky to correct. It is a high-stress environment, even for experienced operators.

Finally, the range of things you might choose to do is much more limited, because you can spend little time setting it up. This means live television tends to use a far smaller ‘palette’ of mixing operations.

Even then, a live broadcast might require half a dozen people even for a modest production. You need someone to set up the cameras and control them, a sound engineer to get the sound right, someone to mix the audio, a vision mixer, a VT Operator (to run any pre-recorded videos you insert – perhaps the titles and credits) and someone to set up the still image overlays (for example, names and logos).

If that sounds bad, imagine a live broadcast away from the studio – the Outside Broadcast. All the people and equipment needs to be on site, hence the legendary “OB Van”:

P1040518

Inside one of those vans is the equipment and people needed to run a live broadcast for TV. They’d normally transmit the final output directly to air by satellite – which is why you generally see a van with a massive dish on it nearby. This equipment runs into millions and millions of pounds and can’t be deployed on a whim. When you only have a few channels of course you don’t need many vans…

The Internet Steamroller

The Internet is changing all of this. Services like YouTube Live and Facebook Live mean that anyone with a phone and decent coverage can run their own outside broadcast. Where once you needed a TV network and millions of pounds of equipment now anyone can do it. Quality is poor and there are few options for mixing, but it is an amazingly powerful tool for citizen journalism and live reporting.

Also, the constraints of “channels” are going. Where once there was no point owning more OB Vans than you have channels, now you could run dozens of live feeds simultaneously over the Internet. As the phone becomes the first screen and the TV in the corner turns into just another display many of the constraints that we have taken for granted look more and more anachronistic.

These new technologies provide an opportunity, but also some significant challenges. The major one is standards – there is a large ecosystem of manufacturers and suppliers whose equipment needs to interoperate. The standards used, such as SDI (Serial Digital Interface) have been around for decades and are widely supported. Moving to an Internet-based standard needs cooperation across the industry.

BBC R&D has been actively working towards this with their IP Studio  project, and the standards they are developing with industry for Networked Media.

Read part 2 of this project with BBC R&D where I’ll describe some of the technologies involved, and how we’re approaching the project.