Category Archives: React

FP: a quiet revolution

Functional Programming (FP) is taking over the programming world, which is kind of weird since it has taken over the programming world at least once before. If you aren’t a developer then you may never even have heard of it. This post aims to explain what it is and why you might care about it even if you never program a computer – and how you might go about adopting it in your organisation.

Not too long ago, every graduate computer scientist would have spent some time doing FP, perhaps in a language called LISP. FP was considered a crucial grounding in CompSci and some FP texts gained a cult following. The legendary “wizard book” Structure and Intpretation of Computer Programs was the MIT Comp-101 textbook.

Famously a third of students dropped out in their first semester because they found this book too difficult.

I think this was likely to be how MIT taught the course as much as anything, but nevertheless functional programming (and the confusingly-brackety LISP) started getting a reputation for being too difficult for mere mortals.

Along with the reputation for impossibility, universities started getting a lot of pressure to turn out graduates with “useful skills”. This has always seemed a bit of a waste of university’s time to me – they are very specifically not supposed to be useful in that sense. I’d much rather graduates got the most out of their limited time at university learning the things that only universities can provide, rather than programming which, bluntly, we can do a lot more effectively than academics.

Anyway, I digress.

The rise of Object Orientation

So it came to pass that universities decided to stop teaching academic languages and start teaching Java. Ten years ago I’d guess well over half of all university programming courses taught Java. Java is not a functional language and until recently had no functional features. It was unremittingly, unapologetically Object Oriented (OO).  Contrary to Sun’s bombastic marketing when they released Java (and claimed it was a revolution in programming) Java as a language was about as mainstream and boring as it could be. The virtual machine (the JVM) was much more interesting, and I’ll come back to that later.

(OO is not in itself opposed to FP, and vice versa. Many languages – as we’ll see – are able to support both paradigms. However OO, particularly the way it was taught with Java, encourages a way of thinking about data flowing through a system, and this leads to data being copied and duplicated… which leads to all sorts of problems managing state. FP meanwhile tends to think in terms of transformation of data, and relies on the programming language to deal with the menial tasks of deciding when to copy data whilst doing so. When computers were slow this could cause significant bottlenecks, but computers these days are huge and fast and you can get more of them easily, so it doesn’t matter nearly as much – until it suddenly does of course. Anyway, I digress again.)

In the workplace meanwhile FP had never really taken off. The vast majority of software is written using imperative languages like ‘C’ or Object Oriented languages like.. well pretty much any language you’ve heard of. Perl, Python, Java, C#, C++ – Object Orientation had taken over the world. FP’s steep learning curve, reputation for impossibility, academic flavour and at times performance constraints made it seem something only a lunatic would select.

And so did some proclaim, Fukuyama-like, the “end of history”: Object Orientation was the one true way to build software. That is certainly how it seemed until a few years ago.

Then something interesting started happening, a change that has had far-reaching effects on many programming languages: existing OO languages started gaining FP features. Python was an early adopter here but a lot of OO languages started gaining a smattering of FP features.

This has provided an easy way for existing programmers to be exposed to how FP thinks about problem solving – and the way one approaches a large problem in FP can be dramatically different to traditional OO approaches.

Object Oriented software has been so dominant that its benefits and drawbacks are rarely discussed – in fact the idea that it might have drawbacks would have been thought madness by many until recently.

OO does have real benefits. It provides a process-driven approach for analysis, where your problem domain is analysed first for the data that exists in the business or whatever, and then behaviours are hooked onto these data. A large system is decomposed by responsibilities towards data.

There are some other things where OO helps too, although they don’t maybe sound so great. Mediocre can be good enough – and when you’ve got hundreds of programmers on a mammoth government project you need to be able to accommodate the mediocre. The reliance on process and good enough code means your developers become more replaceable. Need one thousand identical carbon units? Lets go!

Of course you don’t get that for free. The resulting code often has problems, and sometimes severe ones. Non-localised errors are a major problem, with causes and effects being removed by billions of lines of code and sometimes weeks of execution. State becomes a constant problem, with huge amounts of state being passed around inside transactions. Concurrency issues are common as well, with unnecessary locking or race conditions being rife.

The outcome is also often very difficult to debug, with a single thread of execution sometimes involving hundreds of cooperating objects, each of which only contributes only one or two lines of code.

The impact of this is difficult to quantify, but I don’t think it is unfair to put some of the epic failures large scale IT to the choices of these tools and languages.

Javascript

Strangely one of the places where FP is now being widely practised is in front-end applications, specifically Single-Page Applications (SPAs) written in frameworks like React.

The most recent Javascript standards (officially called, confusingly, ECMAScript) have added oodles of functional syntax and behaviour, to the extent that it is possible to write it almost entirely functionally. Furthermore, these new javascript standards can be transpiled into previous versions of Javascript, meaning they will run pretty much anywhere.

Since pretty much every device in the world has a Javascript virtual machine installed, this means we now have the worlds largest ever installed based of functional computers – and more and more developers are using it.

The FP frameworks that are emerging in Javascript to support functional development are bringing some of the more recent research and design from universities directly into practice in a way that hasn’t really happened previously.

The JVM

The other major movement has been the development of functional languages that run on the Java Virtual Machine (the JVM). Because these languages can call Java functions it means they come with a ready-built standard library that is well known and well documented. There’s a bunch of these with Clojure and Scala being particularly prominent.

These have allowed enterprise teams with a large existing commitment to Java to start developing in FP without throwing away their existing code. I suspect it has also allowed them to retain some senior staff who would otherwise have left through boredom.

Ironically Java itself has added loads of functional features over the last few years, in particular lambda functions and closures.

How to adopt FP

We’ve adopted FP for some projects with some real success and there is a lot of enthusiasm for it here (and admittedly the odd bit of resistance too). We’ve learned a few things about how to go about adopting it.

First, you need to do more design work. Particularly with developers who are new to the approach, spending more time in design is of great benefit – but I would argue this is generally the case in our industry. An abiding problem is the resistance to design and the need to just write some code. Even in the most agile processes design is critical and should not be sidelined. Accommodating this design work in your process is crucial. This doesn’t mean big fat documents, but it does mean providing the space to think and for teams to discuss design before implementation, perhaps with spikes for prototypes.

Second, get up to speed with supporting libraries that work in a functional manner, and avoid those that are brutally OO. Just using ramda encourages developers to work in a more functional manner and develop composable interfaces.

Third, there is still a problem with impenetrable jargon, and it can be a turn off. Avoid talking about monads, even if you think you need one 😉

Finally, you really do not need to be smarter to work with FP. There is a learning curve and it is really quite steep in places, but once you’ve climbed it the kinds of solutions you develop feel just as natural as the OO ones did previously.

 

 

 

 

SOMA in use during the 2017 Edinburgh Festival

Live video mixing with the BBC: Lessons learned

In this post I am going to reflect on some of the more interesting aspects of this project and the lessons they might provide for other projects.BBC R&D logo

This post is one of a series talking about our work on the SOMA video mixing application for the BBC. The previous posts in the series are:

  1. Building a live television video mixing application for the browser
  2. The challenges of mixing live video streams over IP networks
  3. Rapid user research on an Agile project
  4. Compositing and mixing video in the browser
  5. Taming the Async Beast with FRP and RxJS
  6. RxJS: An object lesson in terrible good software
  7. Video: The future of TV broadcasting
  8. Integrating UX with agile development

In my view there are three broad areas where this project has some interesting lessons.

Novel domains

First is the novel domain.

This isn’t unfamiliar – we often work in novel domains that we have little to no knowledge of. It is the nature of technical agency in fact – while we have some domains that we’ve worked in for many years such as healthcare and education there are always novel businesses with entirely new subjects to wrap our heads around.  (To give you some idea, a few recent examples include store-and-forward television broadcasting, horse racing odds, medical curricula, epilepsy diagnosis, clustering automation and datacentre hardware provisioning.)

Over the years this has been the thing that I have most enjoyed out of every aspect of our work. Plunging into an entirely new subject with a short amount of time to understand it and make a useful contribution is exhilarating.

Although it might sound a bit insane to throw a team who know nothing about a domain at a problem, what we’re very good at is designing and building products. As long as our customers can provide the domain expertise, we can bring the product build. It is easier for us to learn the problem domain than it is for a domain expert to learn how to build great products.

The greatest challenge with a new domain is the assumptions. We all have these in our work – the things we think are so well understood that we don’t even mention them. These are a terrible trap for software developers, because we can spend weeks building completely the wrong thing with no idea that we’re doing so.

We were very lucky in this respect to be working with a technical organisation within the BBC: Research & Development. They were aware of this risk and did a very good job of arranging our briefing, which included a visit to a vision mixing gallery. This is the kind of exercise that delivers a huge amount in tacit understanding, and allows us to ask the really stupid questions in the right setting.

I think of the core problem as a “Rumsfeld“. Although he got a lot of criticism for these comments I think they’re bizarrely insightful. There really are unknown unknowns, and what the hell do you do about them? You can often sense that they exist, but how do you turn them into known unknowns?

For many of these issues the challenge is not the answer, which is obvious once it has been found, but facilitating the conversation to produce the answer. It can be a long and frustrating process, but critical to success.

I’d encourage everyone to try and get the software team into the existing environment of the target stakeholder groups to try and understand at a fundamental level what they need.

The Iron Triangle

The timescale for this project was extraordinarily difficult – nine weeks from a standing start. In addition much of the scope was quite fixed – we were largely building core functionality that, if missing, would have rendered the application useless. In addition we wanted to achieve the level of finish for the UX that we generally deliver.

This was extremely ambitious, and in retrospect we bit off more than we could reasonably chew.

Time is the greatest enemy of software projects because of the challenges in estimation. For reasons covered in a different blog post, estimation for software projects is somewhere between an ineffable art reserved only for the angels, and completely impossible.

Triangle with sides labeled Quality, Scope and TimeWhen estimates are impossible, time becomes an even greater challenge. One of the truisms of our industry is the “Iron Triangle” of time, scope and quality. Like a good chinese buffet, you can only choose two. If you want a fixed time and scope, it is quality that will suffer.

Building good software takes thought and planning. Also, the first version of a component is rarely the best – it takes time to assemble, then consider it, and then perhaps shape it into something near its final form.

Quality is, itself, an aggregate quality. Haste lowers the standards for each part and so, by a process of multiplication, lowers far more the overall quality of a product. The only way to achieve a very high quality for the finished product is for every single part to be of similarly high quality. This is generally our goal.

However. Whisper it. It is possible to “manage” quality, if you understand your process and know the goal. Different kinds of testing can provide different levels of certainty of code quality. Manual testing, when done exhaustively, can substitute in some cases for perfection in code.

We therefore managed our quality, and I think actually did well here.

Asynchronous integration components had to be of absolute perfection because any bugs would result in general lack of stability which would be impossible to trace. The only way to build these is carefully, with a design and the only way to test these is exhaustively with unit and integration tests.

On the other hand, there were a lot of aspects of the UI where it was crucial that they performed and looked excellent, but the code could be rougher around the edges, and could just be hacked out. This was my area of the application, and my goal was to deliver features as fast as possible with just acceptable quality. Some of the code was quite embarrassing but we got the project over the line in the time, with the scope, and it all worked. This was sufficient for those areas.

Experimental technologies

I often talk about our approach using the concept of an innovation curve, and our position on it (I think I stole the idea from Ian Jindal – thanks Ian!).

If you can imagine a curve like this one, where the X axis is “how innovative your technologies are”, the Y axis is “pain”.

In practical terms this can be translated into “how likely I am to find the answer to my problems on Stack Overflow“.

At the very left, everything has been seen and done before, so there is no challenge from novelty – but you are almost certainly not making the most of available technologies.

At the far right, you are hand crafting your software from individual photons and you have to conduct high-energy physics experiments to debug your code. You are able to mould the entire universe to your whim – but it takes forever and costs a fortune.

There is no correct place to sit on this curve – where you sit is a strategic (and emotional) decision that depends on the forces at play in your particular situation.

Isotoma endeavours to be somewhere on the shoulder of the curve. The software we build generally needs to last 5+ years, so we can’t pick flash-in-the-pan technologies that will be gone in 18 months. But similarly we need to be relatively recent so it doesn’t become obsolete. This is sometimes called “leading edge”. Almost bleeding edge, but not so close you get cut. With careful choice of tools it is possible to maintain a position like this successfully.

This BBC project was off to the right of this curve, far closer to the bleeding edge than we’d normally choose, and we definitely suffered.

Some of the technologies we had to use had some serious issues:

  1. To use IPStudio, a properly cutting edge product developed internally by BBC R&D, we routinely had to read the C++ source code of the product to find answers to integration questions.
  2. We needed dozens of coordinated asynchronous streams running, for which we used RxJS. This was interesting enough to justify two posts on this blog on its own.
  3. WebRTC, which was the required delivery mechanism for the video, is absolutely not ready for this use case. The specification is unclear, browser implementation is incomplete and it is fundamentally unsuited at this time to synchronised video delivery.
  4. The video compositing technologies in browsers actually works quite well, but was entirely new to us and it took considerable time to gain sufficient expertise to do a good job. Also browser implementations still have surprising sharp edges (only 16 WebGL contexts are allowed! Why 16? I dunno.)

Any of these one issues could have sunk our project, so I am very proud we shipped good software, with all four issues.

Lessons learned? Task allocation is the key to this one I think.

One person, Alex, devoted his time to the IPStudio and WebRTC work for pretty much the entire project, and Ricey concentrated on video mixing.

Rather than try and skill up several people, concentrate the learning in a single brain. Although this is generally a terrible idea (because then you have a hard dependency on a single individual for a particular part of the codebase), in this case it was the only way through, and it worked.

Also, don’t believe any documentation, or in fact anything written in any human languages. When working on the bleeding edge you must “Use The Source, Luke”. Go to the source code and get your head around it. Everything else lies.

Summary

I am proud, justifiably I think, that we delivered this project successfully. It was used at the Edinburgh festival and actual real live television was mixed using our product, given all the constraints above.

The lessons?

  1. Spend the time and effort to make sure your entire team understand the tacit requirements of the problem domain and the stakeholders.
  2. Have an approach to managing appropriate quality that delivers the scope and timescale, if these are heavily constrained.
  3. Understand your position on the innovation curve and choose a strategic approach to managing this.

The banner image at the top of the article, taken by Chris Northwood, shows SOMA in use during the 2017 Edinburgh Festival.

The future of TV broadcasting

Earlier this year we started working on an exciting project with BBC Research & Development on their long term programme developing IP Studio – the next generation IP-network based broadcast television platform. The BBC are developing new industry-wide standards working with manufacturers which will hopefully be adopted worldwide.

This new technology dramatically changes the equipment needed for live TV broadcasting. Instead of large vans stocked with pricey equipment and a team of people; shows can be recorded, edited and streamed live by a single person with a 4k camera, a laptop and internet access.

The freedom of being able to stream and edit live TV within the browser will open up endless possibilities for the entertainment and media sector.

You can see more Isotoma videos on Vimeo.

Video: Serverless – the real sharing economy

Serverless is a new application design paradigm, typified by services like AWS Lambda, Azure Cloud Functions and IBM OpenWhisk. It is particularly well suited to mobile software and Single-Page Application frameworks such as React.

In this video, Doug Winter talks at Digital North in Manchester about what Serverless is, where it comes from, why you would want to use it, how the economics function and how you can get started.

You can see more Isotoma videos on Vimeo.

RxJS: An object lesson in terrible good software

We recently used RxJS on a large, complex asynchronous project integrated with a big third-party distributed system. We now know more about it than, frankly, anyone would ever want to.

While we loved the approach, we hated the software itself. The reasons for this are a great lesson in how not to do software.

Our biggest issue by far with RxJS is that there are two actively developed, apparently stable versions at two different URLs. RxJS 4 is the top result from google for RxJS and lives at https://github.com/Reactive-Extensions/RxJS, and briefly mentions that there is an unstable version 5 at a different address. RxJS 5 lives at https://github.com/ReactiveX/rxjs and has a completely different API to version 4, completely different, far worse (“WIP”) documentation, doesn’t allude to its level of stability, and is written in typescript, so users will need to learn some typescript before they can understand the codebase.

Which version should new adopters use? I have absolutely no idea. Either way, when you google for advice and documentation, you can be fairly certain that the results you get will be for a version you’re not using.

RxJS goes to great lengths to swallow your errors. We’re pretty united here in thinking that it definitely should not. If an observable fires its “error” callback, it’s reasonable that the emitted error should be picked up by the nearest catch operator. Sadly though RxJS also wraps all of the functions that you pass to it with a try/catch block and any exception raised by those functions will also be shunted to the nearest try/catch block. Promises do this too, and many have complained bitterly about it already.

What this means in practice is that finding the source of an error is extremely difficult. RxJS tries to capture the original stack trace and make it available in the catch block but often fails, resulting in a failed observable and an “undefined” error. When my code breaks I’d like it to break where it broke, not in a completely different place. If I expect an error to occur I can catch it as I would anywhere else in the codebase and emit an observable error of the form that I’d expect in my catch block so that my catch blocks don’t all have to accommodate expected failure modes and any arbitrary exception. Days and days of development were lost to bisecting a long pile of dot-chained functions in order to isolate the one that raised the (usually stupidly trivial) error.

At the very least, it’d be nice to have the choice to use an unsafe observable instead. For this reason alone we are unlikely to use RxJS again.

We picked RxJS 5 as it’s been around for a long time now and seems to be being maintained by Netflix, which is reassuring.

The documentation could be way better. It is incomplete, with some methods not documented at all, partially documented as a mystery-meat web application that can’t be searched like any normal technical documentation. Code examples rarely use real-world use-cases so it’s tough to see the utility of many of the Observable methods. Most of the gotchas that caught out all of our developers weren’t alluded to at any point in any of the documentation (in the end, a youtube talk by the lead developer saved the day, containing the first decent explanation of the error handling mechanism that I’d seen or read). Worst of all, the only documentation that deals with solving actual problems with RxJS (higher-order observables) is in the form of videos on the paywalled egghead.io. I can’t imagine a more effective way to put off new adopters than requiring them to pay $200 just to appreciate how the library is commonly used (though, to be clear, I am a fan of egghead).

Summed up best by this thread, RxJS refuses to accept its heritage and admit that it’s a functional library. Within the javascript community there exists a huge functional programming subcommunity that has managed to put together a widely-adopted specification for writing functional javascript libraries that can interoperate with the rest of the available javascript functional libraries. RxJS chooses not to work to this specification  and a number of design decisions such as introspecting certain contained values and swallowing them drastically reduces the ease with which RxJS can be used in functional javascript codebases.

RxJS makes the same mistake lodash did a number of years ago, regularly opting for variadic arguments to its methods rather than taking arrays (the worst example is merge). Lodash did eventually learn its lesson, I hope RxJS does too.

Taming the Async Beast with FRP and RxJS

The Problem

We’ve recently been working on an in-browser vision mixer for the BBC (previous blog posts here, here, here, and here). Live vision mixing involves keeping track of a large number of BBC R&D logointerdependent data streams. Our application receives timing data for video tapes and live video streams via webrtc data channels and websocket connections and we’re sending video and audio authoring decisions over other websockets to the live rendering backend.

Many of the data streams we’re handling are interdependent; we don’t want to send an authoring decision to the renderer to cut to a video tape until the video tape is loaded and ready to play, so we need to wait until the video tape is ready to play before we send an authoring decision; if the authoring websocket has closed we’ll need to reconnect to it then retry sending that authoring decision.

Orchestrating interdependent asynchronous data streams is a fundamentally complex problem.

Promises are one popular solution for composing asynchronous operations and safely transforming the results, however they have a number of limitations. The primary issue is that they cannot be cancelled, so we need to handle teardown separately somehow. We could use the excellent fluture or Task Future libraries instead, both of which support cancellation (and are lazy and chainable and fantasy-land compliant), but futures and promises handle one single future value (or error value), not a stream of many values (or error value). The team working this project are fans of futures (less so of promises) and were aiming to write the majority of the codebase in a functional style using folktale and ramda (and react-redux) so wanted a functional, composable way to handle ongoing streams of data that could sit comfortably within the rest of the codebase.

A Solution

After some debate, we decided to use FRP (functional reactive programming) powered by the observable pattern. Having used RxJS (with redux-observable) for smaller projects in the past, we were confident that it could be an elegant solution to our problem. You can find out more about RxJS here and here but, in short, it’s a library that allows subscribers to listen to and transform the output of a data stream as per the observer pattern, and allows the observable (the thing subscribed to) to “complete” its stream when it runs out of data (or whatever), similar to an iterator from the iterator pattern. Observables also allow their subscribers to terminate them at any point, and typically observables will encapsulate teardown logic related to their data source – a websocket, long-poll, webrtc data channel, or similar.

RxJS implements the observer pattern in a functional way that allows developers to compose together observables, just as they’d compose functions or types. RxJS has its roots in functional reactive programming and leverages the power of monadic composition to chain together streams while also ensuring that teardown logic is preserved and handled as you’d expect.

Why FRP and Observables?

The elegance and power of observables is much more easily demonstrated than explained in a wordy paragraph. I’ll run through the basics and let your imagination think through the potential of it all.

A simple RxJS observable looks like this:

Observable.of(1, 2, 3)

It can be subscribed to as follows:

Observable.of(1, 2, 3).subscribe({
  next: val => console.log(`Next: ${val}`),
  error: err => console.error(err),
  complete: () => console.log('Completed!')
});

Which would emit the following to the console:

Next: 1
Next: 2
Next: 3
Completed!

We can also transform the data just as we’d transform values in an array:

Observable.of(1, 2, 3).map(x => x * 2).filter(x => x !== 4).subscribe(...)
2
6
Completed!

Observables can also be asynchronous:

Observable.interval(1000).subscribe(...)
0 [a second passes]
1 [a second passes]
2 [a second passes]
...

Observables can represent event streams:

Observable.fromEvent(window, 'mousemove').subscribe(...)
[Event Object]
[Event Object]
[Event Object]

Which can also be transformed:

Observable.fromEvent(window, 'mousemove')
  .map(ev => [ev.clientX, ev.clientY])
  .subscribe(...)
[211, 120]
[214, 128]
[218, 139]
...

We can cancel the subscriptions which will clean up the event listener:

const subscription = Observable.fromEvent(window, 'mousemove')
  .map(ev => [ev.clientX, ev.clientY])
  .subscribe(...)

subscription.unsubscribe();

Or we can unsubscribe in a dot-chained functional way:

Observable.of(1, 2, 3)
  .take(2)  // After receiving two values, complete the observable early
  .subscribe(...)
1
2
Completed!
Observable.fromEvent(window, 'mousemove')
  .map(ev => [ev.clientX, ev.clientY])
   // Stop emitting when the user clicks
  .takeUntil(Observable.fromEvent(window, 'click'))
  .subscribe(...)

Note that those last examples left no variables lying around. They are entirely self-contained bits of functionality that clean up after themselves.

Many common asynchronous stream use-cases are catered for natively, in such a way that the “operators” (the observable methods e.g. “throttle”, “map”, “delay”, “filter”) take care of all of the awkward state required to track emitted values over time.

Observable.fromEvent(window, 'mousemove')
  .map(...)
  .throttle(1000) // only allow one event through per second
  .subscribe(...);

… and that’s barely scratching the surface.

The Benefits

Many of the benefits of RxJS are the benefits of functional programming. The avoidance of state, the readability and testability of short, pure functions. By encapsulating the side-effects associated with your application in a generic, composable way, developers can maximise the reusability of the asynchronous logic in their codebase.

By seeing the application as a series of data transformations between the external application interfaces, we can describe those transformations by composing short, pure functions and lazily applying data to them as it is emitted in real-time.

Messy, temporary, imperative variables are replaced by functional closure to give observables access to previously emitted variables in a localised way that limits the amount of the application logic and state a developer must hold in their head at any given time.

Did It Work?

Sort of.  We spent a lot of our time in a state of low-level fury at RxJS, so much so that we’ve written up a long list of complaints, in another post.

There are some good bits though:

FRP and the observable pattern are both transformative approaches to writing complex asynchronous javascript code, producing fewer bugs and drastically improving the reusability of our codebase.

RxJS operators can encapsulate extremely complex asynchronous operations and elegantly describe dependencies in a terse, declarative way that leaves no state lying around.

In multiple standups throughout the project we’ve enthusiastically raved about how these operators have turned a fundamentally complex part of our implementation into a two line solution. Sure those two lines usually took a long time to craft and get right, but once working, it’s difficult to write many bugs in just two lines of code (when compared to the hundreds of lines of imperative code we’d otherwise need to write if we rolled our own).

That said, RxJS is a functional approach to writing code so developers should expect to incur a penalty if they’re new to the paradigm as they go from an imperative, object-oriented approach to system design to a functional, data-flow-driven approach instead. There is also a very steep learning curve required to feel the benefits of RxJS as developers familiarise themselves with the toolbox and the idiosyncrasies.

Would We Use It Again?

Despite the truly epic list of shortcomings, I would still recommend an FRP approach to complex async javascript projects. In future we’ll be trying out most.js to see if it solves the myriad of problems we found with RxJS. If it doesn’t, I’d consider implementing an improved Observable that keeps its hands off my errors.

It’s also worth mentioning that we used RxJS with react-redux to handle all redux side-effects. We used redux-observable to achieve this and it was terrific. We’ll undoubtedly be using redux-observable again.

 

Screenshot of SOMA vision mixer

Compositing and mixing video in the browser

This blog post is the 4th part of our ongoing series working with the BBC Research & Development team. If you’re new to this project, you should start at the beginning!

BBC R&D logoLike all vision mixers, SOMA (Single Operator Mixing Application) has a “preview” and “transmission” monitor. Preview is used to see how different inputs will appear when composed together – in our case, a video input, a “lower third” graphic such as a caption which fades in and out, and finally a “DOG” such as a channel or event identifier shown in the top corner throughout a broadcast.

When switching between video feeds SOMA offers a fast cut between inputs or a slower mix between the two. As and when edit decisions are made, the resulting output is shown in the transmission monitor.

The problem with software

However, one difference with SOMA is that all the composition and mixing is simulated. SOMA is used to build a set of edit decisions which can be replayed later by a broadcast quality renderer. The transmission monitor is not just a view of the output after the effects have been applied as the actual rendering of the edit decisions hasn’t happened yet. The app needs to provide an accurate simulation of what the edit decision will look like.

The task of building this required breaking down how output is composed – during a mix both the old and new input sources are visible, so six inputs are required.

VideoContext to the rescue

Enter VideoContext, a video scheduling and compositing library created by BBC R&D. This allowed us to represent each monitor as a graph of nodes, with video nodes playing each input into transition nodes allowing mix and opacity to be varied over time, and a compositing node to bring everything together, all implemented using WebGL to offload video processing to the GPU.

The flexible nature of this library allowed us to plug in our own WebGL scripts to cut the lower third and DOG graphics out using chroma-keying (where a particular colour is declared to be transparent – normally green), and with a small patch to allow VideoContext to use streaming video we were off and going.

Devils in the details

The fiddly details of how edits work were as fiddly as expected: tracking the mix between two video inputs versus the opacity of two overlays appeared to be similar problems but required different solutions. The nature of the VideoContext graph meant we also had to keep track of which node was current rather than always connecting the current input to the same node. We put a lot of unit tests around this to ensure it works as it should now and in future.

By comparison a seemingly tricky problem of what to do if a new edit decision was made while a mix was in progress was just a case of swapping out the new input, to avoid the old input reappearing unexpectedly.

QA testing revealed a subtler problem that when switching to a new input the video takes a few tens of milliseconds to start. Cutting immediately causes a distracting flicker as a couple of blank frames are rendered – waiting until the video is ready adds a slight delay but this is significantly less distracting.

Later in the project a new requirement emerged to re-frame videos within the application and the decision to use VideoContext paid off as we could add an effect node into the graph to crop and scale the video input before mixing.

And finally

VideoContext made the mixing and compositing operations a lot easier than they would have been otherwise. Towards the end we even added an image source (for paused VTs) using the new experimental Chrome feature captureStream, and that worked really well.

After making it all work the obvious point of possible concern is performance, and overall it works pretty well.  We needed to have half-a-dozen or so VideoContexts running at once and this was effective on a powerful machine.  Many more and the computer really starts struggling.

Even a few years ago attempting this in the browser would have been madness, so its great to see such a lot of progress in something so challenging, and opening up a whole new range of software to work in the browser!

Read part 5 of this project with BBC R&D where Developer Alex Holmes talks about Taming async with FRP and RxJS.

Sign pointing to Usability Lab

Rapid user research on an Agile project

Our timeline to build an in-browser vision mixer for BBC R&D (previously, previously) is extremely tight – just 2 months. UX and development runs concurrently in Agile fashion (a subject for a future blog post), but design was largely done within the first month.

Too frequently for projects on such timescales there is pressure to omit user testing in the interest of expediency. One could say it’s just a prototype, and leave it until the first trials to see how it performs, and hopefully get a chance to work the learnings into a version 2. Or, since we have weekly show & tell sessions with project stakeholders, one could argue complacently that as long as they’re happy with what they’re seeing, the design’s on track.

Why test?

But the stakeholders represent our application’s target users only slightly better than ourselves, which is not very well – they won’t be the ones using it. Furthermore, this project aims to broaden the range of potential operators – from what used to be the domain of highly experienced technicians, to something that could be used by a relative novice within hours. So I wanted to feel confident that even people who aren’t familiar with the project would be able to use it – both experts and novices. I’m not experienced in this field at all, so I was making lots of guesses and assumptions, and I didn’t want to go too far before finding they’re wrong.

One of the best things about working at the BBC is the ingrained culture of user centred design, so there was no surprise at the assumption that I’d be testing paper prototypes by the 2nd week. Our hosts were very helpful in finding participants within days – and with 100s of BBC staff working at MediaCity there is no danger of using people with too much knowledge of the project, or re-using participants. Last but not least, BBC R&D has a fully equipped usability lab – complete with two-way mirror and recording equipment. Overkill for my purposes – I would’ve managed with an ordinary office – but having the separate viewing room helped ensure that I got the entire team observing the sessions without crowding my subject. I’m a great believer in getting everyone on the project team seeing other people interact with and talk about the application.

Paper prototypes

Annoted paper prototyping test scriptPaper prototypes are A3 printouts of the wireframes, each representing a state of the application. After giving a brief description of what the application is used for, I show the page representing the application’s initial state, and change the pages in response to user actions as if it were the screen. (Users point to what they would click.) At first, I ask task-based questions: “add a camera and an audio source”; “create a copy of Camera 2 that’s a close-up”; etc. As we linger on a screen, I’ll probe more about their understanding of the interface: “How would you change the keyboard shortcut for Camera 1?”; “What do you think Undo/Redo would do on this screen?”; “What would happen if you click that?”; and so on. It doesn’t matter that the wireframes are incomplete – when users try to go to parts of the application that haven’t been designed yet, I ask them to describe what they expect to see and be able to do there.

In all, I did paper prototype testing with 6 people on week 2, and with a further 3 people on week 3. (With qualitative testing even very few participants tend to find the major issues.) In keeping with the agile nature of the project, there was no expectation of me producing a report of findings that everyone would read, although I do type up my notes in a shared document to help fix them in my memory. Rather, my learnings go straight into the design – I’m usually champing at the bit to make the changes that seem so obvious after seeing a person struggle, feeling really happy to have caught them so early on. Fortunately, user testing showed that the broad screen layout worked well – the main changes were to button labels, icon designs, and generally improved affordances.

Interactive prototypes

By week 4 my role had transitioned into front-end development, in which I’m responsible for creating static HTML mockups with the final design and CSS, which the developers use as reference markup for the React components. While this isn’t mainstream practice in our industry, I find it has numerous advantages, especially for an Agile project, as it enables me to leave the static, inexact medium of wireframes behind and refine the design and interaction directly within the browser. (I add some some dynamic interactivity using jQuery, but this is throwaway code for demo purposes only.)

The other advantage of HTML mockups is that they afford us an opportunity to do interactive user testing using a web browser, well before the production application is stable enough to test. Paper prototyping is fine up to a point, but you have plenty of limitations – for example, you can’t scroll, there are no mousever events, you can’t resize the screen, etc.

So by week 5 I was able to test nearly all parts of the application, in the browser, with 11 users. (This included two groups of 4, which worked better than I expected – one person manning the mouse and keyboard, but everyone in the group thinking out loud.) It was really good being able to see the difference that interactivity made, such as hover states, and seeing people actually trying to click or drag things rather than just saying what they’d do gave me an added level of confidence in my findings. Again, immediately afterwards, I made several changes that I’m confident improves the application – removing a redundant button that never got clicked, adding labels to some icons, strengthening a primary action by adding an icon, among others. Not to mention fixing numerous technical bugs that came up during testing. (I use Github comments to ensure developers are aware of any HTML changes to components at this stage.)

Never stop testing

Hopefully we’ll have time for another round of testing with the production application. This should give a more faithful representation of the vision mixing workflow, since in the mockups the application is always in the same state, using dummy content. With every test we can feel more confident – and our stakeholders can feel more confident – that what we’re building will meet its goals, and make users productive rather than frustrated. And on a personal level, I’m just relieved that we won’t be launching with any of the embarrassing gotchas that cropped up and got fixed during testing.

Read part 4 of this project working with BBC R&D where we talk about compositing and mixing video in the browser.

The challenges of mixing live video streams over IP networks

Welcome to our second post on the work we’re doing with BBC Research & Development. If you’ve not read the first post, you should go read that first 😉

Introducing IP Studio

BBC R&D logo

The first part of the infrastructure we’re working with here is something called IP Studio. In essence this is a platform for discovering, connecting and transforming video streams in a generic way, using IP networking – the standard on which pretty much all Internet, office and home networks are based.

Up until now video cameras have used very simple standards such as SDI to move video around. Even though SDI is digital, it’s just point-to-point – you connect the camera to something using a cable, and there it is. The reason for the remarkable success of IP networks, however, is their ability to connect things together over a generic set of components, routing between connecting devices. Your web browser can get messages to and from this blog over the Internet using a range of intervening machines, which is actually pretty clever.

Doing this with video is obviously in some senses well-understood – we’ve all watched videos online. There are some unique challenges with doing this for live television though!

Why live video is different

First, you can’t have any buffering: this is live. It’s unacceptable for everyone watching TV to see a buffering message because the production systems aren’t quick enough.

Second is quality. These are 4K streams, not typical internet video resolution. 4K streams have (roughly) 4000 horizontal pixels compared to the (roughly) 2000 for a 1080p stream (weirdly 1080p, 720p etc are named for their vertical pixels instead). this means they need about 4 times as much bandwidth – which even in 2017 is quite a lot. Specialist networking kit and a lot of processing power is required.

Third is the unique requirements of production – we’re not just transmitting a finished, pre-prepared video, but all the components from which to make one: multiple cameras, multiple audio feeds, still images, pre-recorded video. Everything you need to create the finished live product. This means that to deliver a final product you might need ten times as much source material – which is well beyond the capabilities of any existing systems.

IP Studio addresses this with a cluster of powerful servers sitting on a very high speed network. It allows engineers to connect together “nodes” to form processing “pipelines” that deliver video suitable for editing. This means capturing the video from existing cameras (using SDI) and transforming them into a format which will allow them to be mixed together later.

It’s about time

That sounds relatively straightforward, except for one thing: time. When you work with live signals on traditional analogue or point-to-point digital systems, then live means, well, live. There can be transmission delays in the equipment but they tend to be small and stable. A system based on relatively standard hardware and operating systems (IP Studio uses Linux, naturally) is going to have all sorts of variable delays in it, which need to be accommodated.

IP Studio is therefore based on “flows” comprising “grains”. Each grain has a quantum of payload (for example a video frame) and timing information. the timing information allows multiple flows to be combined into a final output where everything happens appropriately in synchronisation. This might sound easy but is fiendishly difficult – some flows will arrive later than others, so systems need to hold back some of them until everything is running to time.

To add to the complexity, we need two versions of the stream, one at 4k and one at a lower resolution.

Don’t forget the browser

Within the video mixer we’re building, we need the operator to be able to see their mixing decisions (cutting, fading etc.) happening in front of them in real time. We also need to control the final transmitted live output. There’s no way a browser in 2017 is going to show half-a-dozen 4k streams at once (and it would be a waste to do so). This means we are showing lower resolution 480p streams in the browser, while sending the edit decisions up to the output rendering systems which will process the 4k streams, before finally reducing them to 1080p for broadcast.

So we’ve got half-a-dozen 4k streams, and 480p equivalents, still images, pre-recorded video and audio, all being moved around in near-real-time on a cluster of commodity equipment from which we’ll be delivering live television!

Read part 3 of this project with BBC R&D where we delve into rapid user research on an Agile project.

MediaCity UK offices

Building a live television video mixing application for the browser

BBC R&D logoThis is the first in a series of posts about some work we are doing with BBC Research & Development.

The BBC now has, in the lab, the capability to deliver live television using high-end commodity equipment direct to broadcast, over standard IP networks. What we’ve been tasked with is building one of the key front-end applications – the video mixer. This will enable someone to mix an entire live television programme, at high quality, from within a standard web-browser on a normal laptop.

In this series of posts we’ll be talking in great depth about the design decisions, implementation technologies and opportunities presented by these platforms.

What is video mixing?

Video editing used to be a specialist skill requiring very expensive, specialist equipment. Like most things this has changed because of commodity, high-powered computers and now anyone can edit video using modestly priced equipment and software such as the industry standard Adobe Premiere. This has fed the development of services such as YouTube where 300 hours of video are uploaded every minute.

“Video Mixing” is the activity of getting the various different videos and stills in your source material and mixing them together to produce a single, linear output. It can involve showing single sources, cutting and fading between them, compositing them together, showing still images and graphics and running effects over them. Sound can be similarly manipulated. Anyone who has used Premiere, even to edit their family videos, will have some idea of the options involved.

Live television is a very different problem

First you need to produce high-fidelity output in real time. If you’ve ever used something like Premiere you’ll know that when you finally render your output it can take quite a long time – it can easily spend an hour rendering 20 minutes of output. That would be no good if you were broadcasting live! This means the technology used is very different – you can’t just use commodity hardware, you need specialist equipment that can work with these streams in realtime.

Second the capacity for screw up is immensely higher. Any mistakes in a live broadcast are immediately apparent, and potentially tricky to correct. It is a high-stress environment, even for experienced operators.

Finally, the range of things you might choose to do is much more limited, because you can spend little time setting it up. This means live television tends to use a far smaller ‘palette’ of mixing operations.

Even then, a live broadcast might require half a dozen people even for a modest production. You need someone to set up the cameras and control them, a sound engineer to get the sound right, someone to mix the audio, a vision mixer, a VT Operator (to run any pre-recorded videos you insert – perhaps the titles and credits) and someone to set up the still image overlays (for example, names and logos).

If that sounds bad, imagine a live broadcast away from the studio – the Outside Broadcast. All the people and equipment needs to be on site, hence the legendary “OB Van”:

P1040518

Inside one of those vans is the equipment and people needed to run a live broadcast for TV. They’d normally transmit the final output directly to air by satellite – which is why you generally see a van with a massive dish on it nearby. This equipment runs into millions and millions of pounds and can’t be deployed on a whim. When you only have a few channels of course you don’t need many vans…

The Internet Steamroller

The Internet is changing all of this. Services like YouTube Live and Facebook Live mean that anyone with a phone and decent coverage can run their own outside broadcast. Where once you needed a TV network and millions of pounds of equipment now anyone can do it. Quality is poor and there are few options for mixing, but it is an amazingly powerful tool for citizen journalism and live reporting.

Also, the constraints of “channels” are going. Where once there was no point owning more OB Vans than you have channels, now you could run dozens of live feeds simultaneously over the Internet. As the phone becomes the first screen and the TV in the corner turns into just another display many of the constraints that we have taken for granted look more and more anachronistic.

These new technologies provide an opportunity, but also some significant challenges. The major one is standards – there is a large ecosystem of manufacturers and suppliers whose equipment needs to interoperate. The standards used, such as SDI (Serial Digital Interface) have been around for decades and are widely supported. Moving to an Internet-based standard needs cooperation across the industry.

BBC R&D has been actively working towards this with their IP Studio  project, and the standards they are developing with industry for Networked Media.

Read part 2 of this project with BBC R&D where I’ll describe some of the technologies involved, and how we’re approaching the project.