Author Archives: Doug Winter

Screenshot of SOMA vision mixer

Compositing and mixing video in the browser

This blog post is the 4th part of our ongoing series working with the BBC Research & Development team. If you’re new to this project, you should start at the beginning!

BBC R&D logoLike all vision mixers, SOMA (Single Operator Mixing Application) has a “preview” and “transmission” monitor. Preview is used to see how different inputs will appear when composed together – in our case, a video input, a “lower third” graphic such as a caption which fades in and out, and finally a “DOG” such as a channel or event identifier shown in the top corner throughout a broadcast.

When switching between video feeds SOMA offers a fast cut between inputs or a slower mix between the two. As and when edit decisions are made, the resulting output is shown in the transmission monitor.

The problem with software

However, one difference with SOMA is that all the composition and mixing is simulated. SOMA is used to build a set of edit decisions which can be replayed later by a broadcast quality renderer. The transmission monitor is not just a view of the output after the effects have been applied as the actual rendering of the edit decisions hasn’t happened yet. The app needs to provide an accurate simulation of what the edit decision will look like.

The task of building this required breaking down how output is composed – during a mix both the old and new input sources are visible, so six inputs are required.

VideoContext to the rescue

Enter VideoContext, a video scheduling and compositing library created by BBC R&D. This allowed us to represent each monitor as a graph of nodes, with video nodes playing each input into transition nodes allowing mix and opacity to be varied over time, and a compositing node to bring everything together, all implemented using WebGL to offload video processing to the GPU.

The flexible nature of this library allowed us to plug in our own WebGL scripts to cut the lower third and DOG graphics out using chroma-keying (where a particular colour is declared to be transparent – normally green), and with a small patch to allow VideoContext to use streaming video we were off and going.

Devils in the details

The fiddly details of how edits work were as fiddly as expected: tracking the mix between two video inputs versus the opacity of two overlays appeared to be similar problems but required different solutions. The nature of the VideoContext graph meant we also had to keep track of which node was current rather than always connecting the current input to the same node. We put a lot of unit tests around this to ensure it works as it should now and in future.

By comparison a seemingly tricky problem of what to do if a new edit decision was made while a mix was in progress was just a case of swapping out the new input, to avoid the old input reappearing unexpectedly.

QA testing revealed a subtler problem that when switching to a new input the video takes a few tens of milliseconds to start. Cutting immediately causes a distracting flicker as a couple of blank frames are rendered – waiting until the video is ready adds a slight delay but this is significantly less distracting.

Later in the project a new requirement emerged to re-frame videos within the application and the decision to use VideoContext paid off as we could add an effect node into the graph to crop and scale the video input before mixing.

And finally

VideoContext made the mixing and compositing operations a lot easier than they would have been otherwise. Towards the end we even added an image source (for paused VTs) using the new experimental Chrome feature captureStream, and that worked really well.

After making it all work the obvious point of possible concern is performance, and overall it works pretty well.  We needed to have half-a-dozen or so VideoContexts running at once and this was effective on a powerful machine.  Many more and the computer really starts struggling.

Even a few years ago attempting this in the browser would have been madness, so its great to see such a lot of progress in something so challenging, and opening up a whole new range of software to work in the browser!

Read part 5 of this project with BBC R&D where Developer Alex Holmes talks about Taming async with FRP and RxJS.

The challenges of mixing live video streams over IP networks

Welcome to our second post on the work we’re doing with BBC Research & Development. If you’ve not read the first post, you should go read that first 😉

Introducing IP Studio

BBC R&D logo

The first part of the infrastructure we’re working with here is something called IP Studio. In essence this is a platform for discovering, connecting and transforming video streams in a generic way, using IP networking – the standard on which pretty much all Internet, office and home networks are based.

Up until now video cameras have used very simple standards such as SDI to move video around. Even though SDI is digital, it’s just point-to-point – you connect the camera to something using a cable, and there it is. The reason for the remarkable success of IP networks, however, is their ability to connect things together over a generic set of components, routing between connecting devices. Your web browser can get messages to and from this blog over the Internet using a range of intervening machines, which is actually pretty clever.

Doing this with video is obviously in some senses well-understood – we’ve all watched videos online. There are some unique challenges with doing this for live television though!

Why live video is different

First, you can’t have any buffering: this is live. It’s unacceptable for everyone watching TV to see a buffering message because the production systems aren’t quick enough.

Second is quality. These are 4K streams, not typical internet video resolution. 4K streams have (roughly) 4000 horizontal pixels compared to the (roughly) 2000 for a 1080p stream (weirdly 1080p, 720p etc are named for their vertical pixels instead). this means they need about 4 times as much bandwidth – which even in 2017 is quite a lot. Specialist networking kit and a lot of processing power is required.

Third is the unique requirements of production – we’re not just transmitting a finished, pre-prepared video, but all the components from which to make one: multiple cameras, multiple audio feeds, still images, pre-recorded video. Everything you need to create the finished live product. This means that to deliver a final product you might need ten times as much source material – which is well beyond the capabilities of any existing systems.

IP Studio addresses this with a cluster of powerful servers sitting on a very high speed network. It allows engineers to connect together “nodes” to form processing “pipelines” that deliver video suitable for editing. This means capturing the video from existing cameras (using SDI) and transforming them into a format which will allow them to be mixed together later.

It’s about time

That sounds relatively straightforward, except for one thing: time. When you work with live signals on traditional analogue or point-to-point digital systems, then live means, well, live. There can be transmission delays in the equipment but they tend to be small and stable. A system based on relatively standard hardware and operating systems (IP Studio uses Linux, naturally) is going to have all sorts of variable delays in it, which need to be accommodated.

IP Studio is therefore based on “flows” comprising “grains”. Each grain has a quantum of payload (for example a video frame) and timing information. the timing information allows multiple flows to be combined into a final output where everything happens appropriately in synchronisation. This might sound easy but is fiendishly difficult – some flows will arrive later than others, so systems need to hold back some of them until everything is running to time.

To add to the complexity, we need two versions of the stream, one at 4k and one at a lower resolution.

Don’t forget the browser

Within the video mixer we’re building, we need the operator to be able to see their mixing decisions (cutting, fading etc.) happening in front of them in real time. We also need to control the final transmitted live output. There’s no way a browser in 2017 is going to show half-a-dozen 4k streams at once (and it would be a waste to do so). This means we are showing lower resolution 480p streams in the browser, while sending the edit decisions up to the output rendering systems which will process the 4k streams, before finally reducing them to 1080p for broadcast.

So we’ve got half-a-dozen 4k streams, and 480p equivalents, still images, pre-recorded video and audio, all being moved around in near-real-time on a cluster of commodity equipment from which we’ll be delivering live television!

Read part 3 of this project with BBC R&D where we delve into rapid user research on an Agile project.

MediaCity UK offices

Building a live television video mixing application for the browser

BBC R&D logoThis is the first in a series of posts about some work we are doing with BBC Research & Development.

The BBC now has, in the lab, the capability to deliver live television using high-end commodity equipment direct to broadcast, over standard IP networks. What we’ve been tasked with is building one of the key front-end applications – the video mixer. This will enable someone to mix an entire live television programme, at high quality, from within a standard web-browser on a normal laptop.

In this series of posts we’ll be talking in great depth about the design decisions, implementation technologies and opportunities presented by these platforms.

What is video mixing?

Video editing used to be a specialist skill requiring very expensive, specialist equipment. Like most things this has changed because of commodity, high-powered computers and now anyone can edit video using modestly priced equipment and software such as the industry standard Adobe Premiere. This has fed the development of services such as YouTube where 300 hours of video are uploaded every minute.

“Video Mixing” is the activity of getting the various different videos and stills in your source material and mixing them together to produce a single, linear output. It can involve showing single sources, cutting and fading between them, compositing them together, showing still images and graphics and running effects over them. Sound can be similarly manipulated. Anyone who has used Premiere, even to edit their family videos, will have some idea of the options involved.

Live television is a very different problem

First you need to produce high-fidelity output in real time. If you’ve ever used something like Premiere you’ll know that when you finally render your output it can take quite a long time – it can easily spend an hour rendering 20 minutes of output. That would be no good if you were broadcasting live! This means the technology used is very different – you can’t just use commodity hardware, you need specialist equipment that can work with these streams in realtime.

Second the capacity for screw up is immensely higher. Any mistakes in a live broadcast are immediately apparent, and potentially tricky to correct. It is a high-stress environment, even for experienced operators.

Finally, the range of things you might choose to do is much more limited, because you can spend little time setting it up. This means live television tends to use a far smaller ‘palette’ of mixing operations.

Even then, a live broadcast might require half a dozen people even for a modest production. You need someone to set up the cameras and control them, a sound engineer to get the sound right, someone to mix the audio, a vision mixer, a VT Operator (to run any pre-recorded videos you insert – perhaps the titles and credits) and someone to set up the still image overlays (for example, names and logos).

If that sounds bad, imagine a live broadcast away from the studio – the Outside Broadcast. All the people and equipment needs to be on site, hence the legendary “OB Van”:


Inside one of those vans is the equipment and people needed to run a live broadcast for TV. They’d normally transmit the final output directly to air by satellite – which is why you generally see a van with a massive dish on it nearby. This equipment runs into millions and millions of pounds and can’t be deployed on a whim. When you only have a few channels of course you don’t need many vans…

The Internet Steamroller

The Internet is changing all of this. Services like YouTube Live and Facebook Live mean that anyone with a phone and decent coverage can run their own outside broadcast. Where once you needed a TV network and millions of pounds of equipment now anyone can do it. Quality is poor and there are few options for mixing, but it is an amazingly powerful tool for citizen journalism and live reporting.

Also, the constraints of “channels” are going. Where once there was no point owning more OB Vans than you have channels, now you could run dozens of live feeds simultaneously over the Internet. As the phone becomes the first screen and the TV in the corner turns into just another display many of the constraints that we have taken for granted look more and more anachronistic.

These new technologies provide an opportunity, but also some significant challenges. The major one is standards – there is a large ecosystem of manufacturers and suppliers whose equipment needs to interoperate. The standards used, such as SDI (Serial Digital Interface) have been around for decades and are widely supported. Moving to an Internet-based standard needs cooperation across the industry.

BBC R&D has been actively working towards this with their IP Studio  project, and the standards they are developing with industry for Networked Media.

Read part 2 of this project with BBC R&D where I’ll describe some of the technologies involved, and how we’re approaching the project.

Our Plants Need Watering Part II

This is the second post in a series doing a deep dive into Internet of Things implementation.  If you didn’t read the first post, Our Plants Need Watering Part I, then you should read that first.

This post talks about one of the most important decisions you’ll make in an IoT project: which microcontroller to use. There are lots of factors and some of them are quite fractal – but that said I think I can make some concrete recommendations based on what I’ve learned so far on this project, that might help you in your next IoT project.

This post gets really technical I am afraid – there’s no way of comparing microprocessors without getting into the weeds.

There are thousands of different microcontrollers on the market, and they are all different.  How you choose the one you want depends on a whole range of factors –  there is no one-size-fits all answer.

Inside a microcontroller

A microcontroller is a single chip that provides all the parts you require to connect software and hardware together. You can think of it as a tiny, complete, computer with CPU, RAM, storage and IO. That is where the resemblance ends though, because each of these parts is quite different from the computers you might be used to.



The Central Processing Unit (CPU) takes software instructions and executes them. This is the bit that controls the rest of the microcontroller, and runs your software.

Microcontroller CPUs come in all shapes and sizes all of which governs the performance and capabilities of the complete package. Mostly the impact of your CPU choice is smaller than you might think – toolchains and libraries protect you from most of the differences between CPU platforms.

Really it is price and performance that matter most, unless you need very specific capabilities. If you want to do floating point calculations or do high-speed video or image processing then you’re going to select a platform with those specific capabilities.

Flash Memory

The kind of computers we are used to dealing with have hard disks to hold permanent storage. Microcontrollers generally do not have access to hard disks. Instead they have what is called “flash” memory. This is permanent – it persists even if power is disconnected. The name “flash” comes from the way the memory is erased “like a camera flash”. It’s colloquially known as just “flash”.

You need enough flash to store your code. The amount of flash available varies tremendously. For example the Atmel ATtiny25 has a whole 2KB of flash whereas the Atmel ATSAM4SD32 has 2MB.

Determining how big your code will be is an important consideration, and often depends on the libraries you need to use. Some quotidian things we take for granted in the macro world, like C’s venerable printf function are too big to fit onto many microcontrollers in its normal form.

Static RAM (SRAM)

Flash is not appropriate for storing data that changes. This means your working data needs somewhere else to go. This is generally SRAM. You will need enough SRAM to hold your all changeable data.  

The amount of SRAM available varies widely. The ATtiny25 has a whole 128 bytes (far less than the first computer I ever programmed, the ZX81, and that was 35 years ago!). At the other end of the scale the ATSAM4SD32 has 160K, and can support separate RAM chips if you need them.

I/O Pins

Microcontrollers need to talk to the outside world, and they do this via their I/O pins. You are going to need to count the pins you need, which will depend on the devices you plan to connect your microcontroller to.

Simple things like buttons, switches, LEDs and so forth can use I/O pins on an individual basis in software, and this is a common use case. Rarely do you build anything that doesn’t use a switch, button or an LED.

If you are going to talk digital protocols however you might well want hardware support for those protocols. This means you might consider things like I²C, RS232 or ISP.

A good example of this is plain old serial. Serial is a super-simple protocol that dates back to the dark ages of computing. One bit at a time is sent over a single pin, and these are assembled together into characters. Serial support needs a bit of buffering, some timing control and possibly some flow control, but that’s it.

The ATtiny range of microprocessors have no hardware support for serial, so if you want to even print text out to your computer’s serial port you will need to do that in software on the microprocessor. This is slow, unreliable and takes up valuable flash. It does work though, at slow speeds – timing gets unreliable pretty quickly when doing things in software.

At the other end you have things like the SAM3X8E based on the ARM Cortex M3 which have a UART and 3 USARTs – hardware support for high speed (well 115200 baud) connections to several devices simultaneously and reliably.


There are loads of different packaging formats for integrated circuits. Just check out the list on Wikipedia. Note that when you are developing your product you are likely to use a “development board”, where the microcontroller is already mounted on something that makes it easy to work with.

Here is a dev board for the STM32 ARM microprocessor:

(screwdriver shown for scale).

You can see the actual microprocessor here on the board:

Everything else on the board is to make it easier to work with that CPU – for example adding protection (so you don’t accidentally fry it), making the pins easier to connect, adding debug headers and also a USB interface with a programmer unit, so it is easy to program the chip from a PC.

For small-scale production use, “through hole” packages like DIP can be worked with easily on a breadboard, or soldered by hand. For example, here is a complete microcontroller, the LPC1114FN28:

Some, others, like “chip carriers” can fit into mounts that you can use relatively easily, and finally there are “flat packages”, which you would struggle to solder by hand:

Development support

It is all very well choosing a microcontroller that will work in production – but you need to get your software written first. This means you want a “dev board” that comes with the microcontroller helpfully wired up so you can use it easily.

There are dev boards available for every major platform, and mostly they are really quite cheap.

Here are some examples I’ve collected over the last few years:

The board at the bottom there is an Arduino Due, which I’ve found really useful.  The white box connected to it is an ATMEL debug device, which gives you complete IDE control of the code running on the CPU, including features like breakpoints, watchpoints, stepping and so forth.

Personally I think you should find a dev board that fits your needs first, then you need to choose a microcontroller that is sufficiently similar. A workable development environment is absolutely your number one goal!

Frameworks, toolchains and libraries

This is another important consideration – you want it to be as easy as possible to write your code, whilst getting full access to the capabilities of the equipment you’ve chosen.


Arduino deserves a special mention here, as a spectacularly accessible way into programming microprocessors. There is a huge range of Arduino, and Arduino compatible, devices starting at only a few pounds and going right up to some pretty high powered equipment.

Most Arduino boards have a standard layout allowing “shields” to be easily attached to them, giving easy standardised access to additional equipment.

The great advantage of Arduino is that you can get started very easily. The disadvantage is that you aren’t using equipment you could go into production with directly. It is very much a hobbyist solution (although I would love to hear of production devices using Arduino code).

Other platforms

Other vendors have their own IDEs and toolchains – many of which are quite expensive.  Of the ones I have tried Atmel Studio is the best by far.  First it is free – which is pretty important.  Second it uses the gcc toolchain, which makes debugging a lot easier for the general programmer.  Finally the IDE itself is really quite good.

Next time I’ll walk through building some simple projects on a couple of platforms and talk about using the Wifi module in earnest.


Internet Security Threats – When DDoS Attacks

On Friday evening an unknown entity launched one of the largest Distributed Denial of Service (DDoS) attacks yet recorded, against Dyn, a DNS provider. Dyn provide service for some of the Internet’s most popular services, and they duly suffered problems. Twitter, Github and others were unavailable for hours, particularly in the US.

DDoS attacks happen a lot, and are generally uninteresting. What is interesting about this one is:

  1. the devices used to mount the attack
  2. the similarity with the “Krebs attack” last month
  3. the motive
  4. the potential identity of the attacker

Together these signal that we are entering a new phase in development of the Internet, one with some worrying ramifications.

The devices

Unlike most other kinds of “cyber” attack, DDoS attacks are brute force – they rely on sending more traffic than the recipient can handle. Moving packets around the Internet costs money so this is ultimately an economic contest – whoever spends more money wins. The way you do this cost-effectively, of course, is to steal the resources you use to mount the attack. A network of compromised devices like this is called a “botnet“.

Most computers these days are relatively well-protected – basic techniques like default-on firewalls and automated patching have hugely improved their security. There is a new class of device though, generally called the Internet of Things (IoT) which have none of these protections.

IoT devices demonstrate a “perfect storm” of security problems:

  1. Everything on them is written in the low-level ‘C’ programming language. ‘C’ is fast and small (important for these little computers) but it requires a lot of skill to write securely. Skill that is not always available
  2. Even if the vendors fix a security problem, how does the fix get onto the deployed devices in the wild? These devices rarely have the capability to patch themselves, so the vendors need to ship updates to householders, and provide a mechanism for upgrades – and the customer support this entails
  3. Nobody wants to patch these devices themselves anyway. Who wants to go round their house manually patching their fridge, toaster and smoke alarm?
  4. Because of their minimal user interfaces (making them difficult to operate if something goes wrong), they often have default-on [awful] debug software running. Telnet to a high port and you can get straight in to adminster them
  5. They rarely have any kind of built in security software
  6. They have crap default passwords, that nobody ever changes

To see how shockingly bad these things are, follow Matthew Garrett on Twitter. He takes IoT devices to pieces to see how easy they are to compromise. Mostly he can get into them within a few minutes. Remarkably one of the most secure IoT device he’s found so far was a Barbie doll.

That most of these devices are far worse than a Barbie doll should give everyone pause for thought. Then imagine the dozens of them so many of us have scattered around our house.  Multiply that by the millions of people with connected devices and it should be clear this is a serious problem.

Matthew has written on this himself, and he’s identified this as an economic problem of incentives. There is nobody who has an incentive to make these devices secure, or to fix them afterwards. I think that is fair, as far as it goes, but I would note that ten years ago we had exactly the same problem with millions of unprotected Windows computers on the Internet that, it seemed, nobody cared about.

The Krebs attack

A few weeks ago, someone launched a remarkably similar attack on a security researcher Brian Krebs. Again the attackers are unknown and they launched the attack using a global network of IoT devices.

Given the similarities in the attack on Krebs and the attack on Dyn, it is probable that both of these attacks were undertaken by the same party. This doesn’t, by itself, tell us very much.

It is common for botnets to be owned by criminal organisations that hire them out by the hour. They often have online payment gateways, telephone customer support and operate basically like normal businesses.

So, if this botnet is available for hire then the parties who hired it might be different. However, there is one other similarity which makes this a lot spookier – the lack of an obvious commercial motive.

The motive

Mostly DDoS attacks are either (a) political or (b) extortion. In both cases the identity of the attackers is generally known, in some sense. For political DDOS attacks (“hacktivism”) the targets have often recently been in the news, and are generally quite aware of why they’re attacked.

Extortion using DDoS attacks is extremely common – anyone who makes money on the Internet will have received threats, and have been attacked and many will have paid out to prevent or stop a DDoS.  Banks, online gaming, DNS providers, VPN providers and ecommerce sites are all common targets – many of them so common that they have experienced operations teams in place who know how to handle these things.

To my knowledge no threats were made to Dyn or Krebs before the attacks and nobody tried to get money out of them to stop them.

What they have in common is their state-of-the-art protection. Brian Krebs was hosted by Akamai, a very well-respected content delivery company who have huge resources – and for whom protecting against DDOS is a line of business. Dyn host the DNS for some of the world’s largest Internet firms, and similarly are able to deploy huge resources to combat DDOS.

This looks an awful lot like someone testing out their botnet on some very well protected targets, before using it in earnest.

The identity of the attacker

It looks likely therefore that there are two possibilities for the attacker. Either it is (a) a criminal organisation looking to hire out their new botnet or (b) a state actor.

If it is a criminal organisation then right now they have the best botnet in the world. Nobody is able to combat this effectively.  Anyone who owns this can hire it out to the highest bidder, who can threaten to take entire countries off the Internet – or entire financial institutions.

A state actor is potentially as disturbing. Given the targets were in the US it is unlikely to be a western government that controls this botnet – but it could be one of dozens from North Korea to Israel, China, Russia, India, Pakistan or others.

As with many weapons a botnet is most effective if used as a threat, and we many never know if it is used as a threat – or who the victims might be.

What should you do?

As an individual, DDoS attacks aren’t the only risk from a compromised device. Anyone who can compromise one of these devices can get into your home network, which should give everyone pause – think about the private information you casually keep on your home computers.

So, take some care in the IoT devices you buy, and buy from reputable vendors who are likely to be taking care over their products. Unfortunately the devices most likely to be secure are also likely to be the most expensive.

One of the greatest things about the IoT is how cheap these devices are, and the capability they can provide at this low price. Many classes of device don’t necessarily even have reliable vendors working in that space. Being expensive and well made is no long-term protection – devices routinely go out of support after a few years and become liabilities.

Anything beyond this is going to require concerted effort on a number of fronts. Home router vendors need to build in capabilities for detecting compromised devices and disconnecting them. ISPs need to take more responsibility for the traffic coming from their networks. Until being compromised causes devices to malfunction for their owner there will be no incentives to improve them.

It is likely that the ultimate fix for this will be Moore’s Law – the safety net our entire industry has relied on for decades. Many of the reasons for IoT vulnerabilities are to do with their small amounts of memory and low computing power. When these devices can run more capable software they can also have the management interfaces and automated patching we’ve become used to on home computers.


The economics of innovation

One of the services we provide is innovation support. We help companies of all sizes when they need help with the concrete parts of developing new digital products or services for their business, or making significant changes to their existing products.

A few weeks ago the Royal Swedish Academy of Sciences awarded the Nobel Prize for Economics to Oliver Hart and Bengt Holmström for their work in contract theory. This prompted me to look at some of his previous work (for my sins I find economics fascinating), and I came across his 1998 paper Agency Costs and Innovation. This is so relevant to some of my recent experiences I wanted to share it.

Imagine you have a firm or a business unit and you have decided that you need to innovate.

This is a pretty common situation – you know strategically that your existing product is starting to lose traction. Maybe you can see commoditisation approaching in your sector. Or perhaps, as is often the case, you can see the Internet juggernaut bearing down on your traditional business and you know you need to change things up to survive.

What do you do about it?  If you’ve been in this situation the following will probably resonate:


This describes the principal-agent problem, which is a classic in economics. This describes how a principal (who wants something) can incentivise an agent to do what they want. The agent and “contracting” being discussed here could be any kind of contracting including full time staff.

A good example of the principal-agent problem is how you pay a surgeon. You want to reward their work, but you can’t observe everything they do. The outcome of surgery depends on team effort, not just an individual. They have other things they need to do other than just surgery – developing standards, mentoring junior staff and so forth. Finally the activity itself is very high risk inherently – which means surgeons will make mistakes, no matter how competent. This means their salary would be at risk, which means you need to pay huge bonuses to encourage them to undertake the work at all.

In fact commonly firms will try and innovate using their existing teams, who are delivering the existing product. These teams understand their market. They know the capabilities and constraints of existing systems. They have domain expertise and would seem to be the ideal place to go.

However, these teams have a whole range of tasks available to them (just as with our surgeon above), and choices in how they allocate their time. This is the “multitasking effect”. This is particularly problematic for innovative tasks.

My personal experience of this is that, when people have choices between R&D type work and “normal work”, they will choose to do the normal work (all the while complaining that their work isn’t interesting enough, of course):


This leads large firms to have separate R&D divisions – this allows R&D investment decisions to take place between options that have some homogeneity of risk, which means incentives are more balanced.

However, large firms have a problem with bureaucratisation. This is a particular problem when you wish to innovate:


Together this leads to a problem we’ve come across a number of times, where large firms have strong market incentives to spend on innovation – but find their own internal incentive systems make this extremely challenging.

If you are experiencing these sorts of problems please do give us a call and see how we can help.

I am indebted to Kevin Bryan’s excellent A Fine Theorem blog for introducing me to Holmström’s work.


A new Isotoma Whitepaper: Chatbots

Over the last six months we’ve had a lot of interest from customers in the emerging area of chatbots, particularly ones using Facebook Messenger as a platform.

While bots have been around, in some form or other, for a very long time the Facebook Messenger platform has catapulted them into prominence.  Access to one billion of the world’s consumers is a tempting prospect for many businesses.

We’ve reviewed the ecosystem that is emerging around chatbots and provide a guide to some of the factors you should consider if you are thinking about building and deploying chatbots, in our new whitepaper.


The contents include

  • The history of chat interfaces
  • What conversational interfaces can do, and why
  • Natural Language Processing
  • Features provided by chatbot platforms
  • An in-depth review of eight of the top chatbot platforms
  • Recommendations for next steps, and a look to the future

Please, download the whitepaper, and let us know what you think.


Our plants need watering, part I

Here at Isotoma Towers we’ve recently started filling our otherwise spartan office with plants. Plants are lovely but they do require maintenance, and in particular they need timely watering.


Since we’re all about automation here, we decided to use this as a test case for building some Internet of Things (IoT) devices.  One of my colleagues pointed out this great moisture sensor from Catnip (right).

This forms the basis of our design.Catnip I2C soil moisture sensor

There are lots and lots of choices for how to build something like this, and this blog post is going to talk about design decisions.  See below the fold for more.


Continue reading


There is a new version of gunicorn, 19.0 which has a couple of significant changes, including some interesting workers (gthread and gaiohttp) and actually responding to signals properly, which will make it work with Heroku.

The HTTP RFC, 2616, is now officially obsolete. It has been replaced by a bunch of RFCs from 7230 to 7235, covering different parts of the specification. The new RFCs look loads better, and it’s worth having a look through them to get familiar with them.

Some kind person has produced a recommended set of SSL directives for common webservers, which provide an A+ on the SSL Labs test, while still supporting older IEs. We’ve struggled to find a decent config for SSL that provides broad browser support, whilst also having the best levels of encryption, so this is very useful.

A few people are still struggling with Git.  There are lots of git tutorials around the Internet, but this one from Git Tower looks like it might be the best for the complete beginner. You know it’s for noobs, of course, because they make a client for the Mac 🙂

I haven’t seen a lot of noise about this, but the EU has outlawed pre-ticked checkboxes.  We have always recommended that these are not used, since they are evil UX, but now there’s an argument that might persuade everyone.

Here is a really nice post about splitting user stories. I think we are pretty good at this anyhow, but this is a nice way of describing the approach.

@monkchips gave a talk at IBM Impact about the effect of Mobile First. I think we’re on the right page with most of these things, but it’s interesting to see mobile called-out as one of the key drivers for these changes.

I’d not come across the REST Cookbook before, but here is a decent summary of how to treat PUT vs POST when designing RESTful APIs.

Fastly have produced a spectacularly detailed article about how to get tracking cookies working with Varnish.  This is very relevant to consumer facing projects.

This post from Thought Works is absolutely spot on, and I think accurately describes an important aspect of testing The Software Testing Cupcake.

As an example for how to make unit tests less fragile, this is a decent description of how to isolate tests, which is a key technique.

The examples are Ruby, but the principle is valid everywhere. Still on unit testing, Facebook have open sourced a Javascript unit testing framework called Jest. It looks really very good.

A nice implementation of “sudo mode” for Django. This ensures the user has recently entered their password, and is suitable for protecting particularly valuable assets in a web application like profile views or stored card payments.

If you are using Redis directly from Python, rather than through Django’s cache wrappers, then HOT Redis looks useful. This provides atomic operations for compound Python types stored within Redis.

The problem with Backing Stores, or what is NoSQL and why would you use it anyway

Durability is something that you normally want somewhere in a system: where the data will survive reboots, crashes, and other sorts of things that routinely happen to real world systems.

Over the many years that I have worked in system design, there has been a recurring thorny problem of how to handle this durable data.  What this means in practice when building a new system is the question “what should we use as our backing store?”.  Backing stores are often called “databases”, but everyone has a different view of what database means, so I’ll try and avoid it for now.

In a perfect world a backing store would be:

  • Correct
  • Quick
  • Always available
  • Geographically distributed
  • Highly scalable

While we can do these things quite easily these days with the stateless parts of an application, doing them with durable data is non-trivial. In fact, in the general case, it’s impossible to do all of these things at once (The CAP theorem describes this quite well).

This has always been a challenge, but as applications move onto the Internet, and as businesses become more geographically distributed, the problem has become more acute.

Relational databases (RDBMSes) have been around a very long time, but they’re not the only kind of database you can use. There have always been other kinds of store around, but the so-called NoSQL Movement has had particular prominence recently. This champions the use of new backing stores not based on the relational design, and not using SQL as a language. Many of these have radically different designs from the sort of RDBMS system that has been widely used for the last 30 years.

When and how to use NoSQL systems is a fascinating question, and I put forward our thinking on this. As always, it’s kind of complicated.  It certainly isn’t the case that throwing out an RDBMS and sticking in Mongo will make your application awesome.

Although they are lumped together as “NoSQL”, this is not actually a useful definition, because there is very little that all of these have in common. Instead I suggest that there are these types of NoSQL backing store available to us right now:

  • Document stores – MongoDB, XML databases, ZODB
  • Graph databases – Neo4j
  • Key/value stores – Dynamo, BigTable, Cassandra, Redis, Riak, Couch

These are so different from each other that lumping them in to the same category together is really quite unhelpful.

Graph databases

Graph databases have some very specific use cases, for which they are excellent, and probably a lot of utility elsewhere. However, for our purposes they’re not something we’d consider generally, and I’ll not say any more about them here.

Document stores

I am pretty firmly in the camp that Document stores, such as MongoDB, should never be used generally either (for which I will undoubtedly catch some flak). I have a lot of experience with document databases, particularly ZODB and dbxml, and I know whereof I speak.

These databases store “documents” as schema-less objects. What we mean by a “document” here is something that is:

  • self-contained
  • always required in it’s entirety
  • more valuable than the links between documents or it’s metadata.

My experience is that although often you may think you have documents in your system, in practice this is rarely the case, and it certainly won’t continue to be the case. Often you start with documents, but over time you gain more and more references between documents, and then you gain records and and all sorts of other things.

Document stores are poor at handling references, and because of the requirement to retrieve things in their entirety you denormalise a lot. The end result of this is loss of consistency, and eventually doom with no way of recovering consistency.

We do not recommend document stores in the general case.

Key/value stores

These are the really interesting kind of NoSQL database, and I think these have a real general potential when held up against the RDBMS options.  However, there is no magic bullet and you need to choose when to use them carefully.

You have to be careful when deciding to build something without an RDBMS. An RDBMS delivers a huge amount of value in a number of areas, and for all sorts of reasons. Many of the reasons are not because the RDBMS architecture is necessarily better but because they are old, well-supported and well-understood.

For example, PostgreSQL (our RDBMS of choice):

  • has mature software libraries for all platforms
  • has well-understood semantics for backup and restore, which work reliably
  • has mature online backup options
  • has had decades of performance engineering
  • has well understood load and performance characteristics
  • has good operational tooling
  • is well understood by many developers

These are significant advantages over newer stores, even if they might technically be better in specific use cases.

All that said, there are some definite reasons you might consider using a key/value store instead of an RDBMS.

Reason 1: Performance

Key/value stores often naively appear more performant than RDBMS products, and you can see some spectacular performance figures in direct comparisons. However, none of them really provide magic performance increases over RDBMS systems, what they do is provide different tradeoffs. You need to decide where your performance tradeoffs lie for your particular system.

In practice what key/value stores mostly do is provide some form of precomputed cache of your data, by making it easy (or even mandatory) to denormalize your data, and by providing the performance characteristics to make pre-computation reasonable.

If you have a key/value store that has high write throughput characteristics, and you write denormalized data into it in a read-friendly manner then what you are actually doing is precomputing values. This is basically Just A Cache. Although it’s a pattern that is often facilitated by various NoSQL solutions, it doesn’t depend on them.

RDBMS products are optimised for correctness and query performance and  write performance takes second place to these.  This means they are often not a good place to implement a pre-computed cache (where you often write values you never read).

It’s not insane to combine an RDBMS as your master source of data with something like Redis as an intermediate cache.  This can give you most of the advantages of a completely NoSQL solution, without throwing out all of the advantages of the RDBMS backing store, and it’s something we do a lot.

Reason 2: Distributed datastores

If you need your data to be highly available and distributed (particularly geographically) then an RDBMS is probably a poor choice. It’s just very difficult to do this reliably and you often have to make some very painful and hard-to-predict tradeoffs in application design, user interface and operational procedures.

Some of these key/value stores (particularly Riak) can really deliver in this environment, but there are a few things you need to consider before throwing out the RDBMS completely.

Availability is often a tradeoff one can sensibly make.  When you understand quite what this means in terms of cost, both in design and operational support (all of these vary depending on the choices you make), it is often the right tradeoff to tolerate some downtime occasionally.  In practice a system that works brilliantly almost all of the time, but goes down in exceptional circumstances, is generally better than one that is in some ways worse all of the time.

If you really do need high availability though, it is still worth considering a single RDBMS in one physical location with distributed caches (just as with the performance option above).  Distribute your caches geographically, offload work to them and use queue-based fanout on write. This gives you eventual consistency, whilst still having an RDBMS at the core.

This can make sense if your application has relatively low write throughput, because all writes can be sent to the single location RDBMS, but be prepared for read-after-write race conditions. Solutions to this tend to be pretty crufty.

Reason 3: Application semantics vs SQL

NoSQL databases tend not to have an abstraction like SQL. SQL is decent in its core areas, but it is often really hard to encapsulate some important application semantics in SQL.

A good example of this is asynchronous access to data as parts of calculations. It’s not uncommon to need to query external services, but SQL really isn’t set up for this. Although there are some hacky workarounds if you have a microservice architecture you may find SQL really doesn’t do what you need.

Another example is staleness policies.  These are particularly problematic when you have distributed systems with parts implemented in other languages such as Javascript, for example if your client is a browser or a mobile application and it encapsulates some business logic.

Endpoint caches in browsers and mobile apps need to represent the same staleness policies you might have in your backing store and you end up implementing the same staleness policies in Javascript and then again in SQL, and maintaining them. These are hard to maintain and test at the best of times. If you can implement them in fewer places, or fewer languages, that is a significant advantage.

In addition, it is a practical case that we’re not all SQL gurus. Having something that is suboptimal in some cases but where we are practically able to exploit it more cheaply is a rational economic tradeoff.  It may make sense to use a key/value store just because of the different semantics it provides – but be aware of how much you are losing without including an RDBMS, and don’t be surprised if you end up reintroducing one later as a platform for analysis of your key/value data.

Reason 4: Load patterns

NoSQL systems can exhibit very different performance characteristics from SQL systems under real loads. Having some choice in where load falls in a system is sometimes useful.

For example, if you have something that scales front-end webservers horizontally easily, but you only have one datastore, it can be really useful to have the load occur on the application servers rather than the datastore – because then you can distribute load much more easily.

Although this is potentially less efficient, it’s very easy and often cheap to spin up more application servers at times of high load than it is to scale a database server on the fly.

Also, SQL databases tend to have far better read performance than write performance, so fan-out on write (where you might have 90% writes to 10% reads as a typical load pattern) is probably better implemented using a different backing store that has different read/write performance characteristics.

Which backing store to use, and how to use it, is the kind of decision that can have huge ramifications for every part of a system.  This post has only had an opportunity to scratch the surface of this subject and I know I’ve not given some parts of it the justice they deserve – but hopefully it’s clear that every decision has tradeoffs and there is no right answer for every system.

About us: Isotoma is a bespoke software development company based in York and London specialising in web apps, mobile apps and product design. If you’d like to know more you can review our work or get in touch.