Sunday 27 April 2014

This program is captioned live.

Ever see those little words? Maybe you were watching TV at a pub or cafe, and the sound was off. Maybe it was late at night, and you didn't want to wake the baby. Maybe you have a hearing impairment, and wish to remain engaged with the intellectual life of your society. If so, you'll have seen my work, and those of my colleagues.

There are a number of different ways of captioning TV. Sometimes we are furnished with pre-captioned content. A film which already exists on DVD, for example, will likely have pre-existing caption files, which can then simply be amended to the relevant market (re-timed to account for ad breaks, recoloured to adhere to Australian standards, spell-checked to fit Australian English etc). In other cases, we have early (anything from weeks in advance, to just before it goes to air) access to the audiovisual content, but no script. In that case, a transcript is prepared and checked back against the program. Both of these are considered "offline" captioning. They should be word-perfect, and will usually appear as "block" captions. In other words, a two-line phrase appears onscreen all at once, usually arranged to cover a discrete utterance, sentence, or phrase.

But where does that leave us when TV content is completely unscripted, as with live sport? Or partially scripted, as with news content? Well, that's where my work begins. I'm a live voice captioner.

There are two main types of live captioner - voice captioners and stenographers. Both produce "scrolling" captions (one word at a time, as distinct from block captions) from the streams of live sound being broadcast. Stenographers use shorthand machines, like you may see in courtrooms in old-timey movies, together with software which translates, formats and broadcasts it in real time. Their keystrokes directly send a cluster of letters, so they can rapidly type every sound they hear. If you ever see a word appear onscreen which looks like a cluster of unmatched letters, you might be seeing a stenographer making a typo. But go buy a lottery ticket - it doesn't happen much. They achieve very high levels of accuracy. I'm just over 98% accurate, they're 99+. But they take a lot of training, and there aren't that many around. They're more expensive, and their Jedi-like talents are best reserved for very important, difficult or popular live programming.

Voice captioners like me basically respeak everything we hear into a piece of voice recognition software (I use Dragon, which is like Siri's badass cousin) which is trained and tailored to my voice. I speak in a clear, expressionless tone, giving spoken commands for colour changes and punctuation. I then watch what is coming out and can type quick corrections. If you see an error which looks like a homophone, like "Netanyahu" coming out as "net and Yahoo", you're probably seeing a voice captioner at work. People tend to assume that such errors are the result of an entirely automated process which has gone haywire, but while there is this element of software misrecognition, there is generally a human at the wheel. This is in part because voice recognition software is not great at responding to changes in speaker, in part because the software is hopeless at sensing conversational punctuation, and in part because a human operator is far better at disambiguating similar words and phrases. Simply put, we know which "there/they're/their" the newsreader means, and can even adjust for their much-loved puns. Dragon makes educated guesses based on context, and is good with common phrases, but the work of captioning remains a human (or at best a little cyborg) affair. We also employ what are called "house styles", which are like manually created autocorrect rules, and which can be grouped as required. Thus a "cycling" house style might include a rule that "pelican" is always changed to "peloton", and you could apply it for Tour de France coverage, then take it off for nature documentaries.

We don't have to get everything - between me and the software the captions lag about two seconds behind the action, so in sport it would be inconvenient to have every word captioned when the action will have already moved on. Also, people can speak at about 140 words a minute, but many don't read that fast. It's thus quite common for a 15 minute live stint to yield a text output of over 2000 words, which is an awful lot of moving text to read. But we get as much as we can of what we hear, and paraphrase when we have to.

I hope in the course of this blog to share some thoughts, insights and anecdotes from this strange little corner of broadcast journalism and disability services. As a lover of the written word, I want to capture how the superimposition of the very old medium of spoken word storytelling and the very new medium of digital television creates interesting tensions. I want to share some of the funny mishaps, some of the unique perspectives and some of the quirky personalities that inhabit the world of the live captioner.


Disclaimer.

No comments:

Post a Comment