Inside the SpinVox Brain

The power of human hardware rarely fails to impress...

COMMENT

How much human interaction powers SpinVox's voicemail-to-text conversion system? Natasha Lomas was invited to the company's HQ to see a demo of the system. Did it impress?

A trip to the HQ of SpinVox - the voicemail-to-text conversion company I wrote about last week - has given me a newfound respect for human hardware. By which I mean the ear, the brain and above all the brain's ability to grub and process a grain of meaning from the polluted and chaotic environments humans create.

Listening to a friend explain the implications of the subplot of Moon from across a Tube carriage tortured by the sound of screeching brakes and screaming children? No problem. Filtering out the omnipresent swoosh of lorries and vans on the walk to work to eavesdrop on the conversation of the man on his mobile behind you? It can be done.

Yep, the brain and its tools are impressive alright. But what about SpinVox's Brain and SpinVox's tools?

Along with several other journalists who have been following 'SpinGate' by publicly wondering how much human intervention is required in SpinVox's Voice Message Conversion System (aka The Brain), I was invited to the corporate headquarters in Marlow-on-Thames for a demo of the system - led by company CIO Rob Wheatley.

SpinVox
The reception desk at SpinVox HQ (Photo credit: Natasha Lomas/silicon.com)

It was also billed as a chance to ask some of the questions not cleared up by last week's flutter of press releases - for me the biggest lure. I was expecting the tech demo to be interesting and competent but, as it would obviously be operating in test conditions, a mere taster of a business that can surely only be understood in the daily grind and grit of real-world operation. After all, three journalists in a room can only make so much noise.

So what does SpinVox's technology look like? Although we were shown a diagram of the workflow process - with both its automated and human components - we were forbidden from taking photos or filming. Wheatley also gave us an impassioned plea to "please be sensitive" with what they were telling us - although we were not asked to sign an NDA. A somewhat contradictory message that.

So in the interests of a) brevity and b) sensitivity here's my rough translation of how SpinVox's system works:

After cleaning up and rating the audio quality - and doing some fundamental checks such as 'what language is this?' and 'is there a message at all?' - the system uses the words it can pluck out of the mire to hazard a guess on the identity of the words it can't. Think 'Spears' coming after 'Britney'.

Wheatley talked about the system building "a lattice" of probabilities of what might be being said - and this is where the terminology starts to sound a tad over-engineered to my ear. A 'lattice of probabilities' is surely kith and kin to the predictive text you get on your phone - i.e. sometimes kind of useful but all too frequently annoyingly misguided as to what it is you're actually trying to say despite the fact you've stacked it with your favourite swearwords by adding them to the user dictionary.

(Does predictive text get better over time? For what it's worth I actually find my phone gets worse at helping me write text messages as more and more once-favoured words accumulate in the dictionary and then plonk themselves into phrases where they're no longer wanted. But I digress.)

Wheatley talked up the 'statistical analysis, acoustic modelling and user learning' that the system apparently uses to get better at predicting the next word each user might have said. And if humans had the vocabulary of sheep this might be an easy task but there's surely no escaping the fact the spoken word does anything but conform to type - even if CEO and co-founder Christina Domecq reckons many speakers can be described as 'average Joes'... (continued on page 2)

  • 1
  • 2

Comments

There are 3 comments. Join the discussion

  1. 1. anonymous

    Reminds me of a comment about stylus writing recognition on a palm I think.
    The person basically ended up writing for the software so it was recognised.

    Maybe the same will happen over time with all the automated voice recognition.
    Not necessarily a bad thing if the software recognition is reasonable. As it might improve the standard of spoken language and normalise it a bit.

  2. 2. Jonathan Present

    I should imagine that this company has Intellectual Property which they should be happy to reference, such as Patents, and Algorithms. Do they have employees with the requisite expertise to create a system of the sophistication that they claim?

  3. 3. Nick Barnett

    Excellent article Natasha. Intelligent, dry style, with just the right amount of objectivity. Leaves me with the same conclusion you drew, about the 'real-world-readiness' of SpinVox. Trailing in the wake of it all is the fickleness - or apathy? - of the millions of average joes (and janes) who are now no longer getting their knickers in a twist about Ellie and others eavesdropping on their private conversations.
    Ho-hum.

    • 25 August 2009 10:15
    • Add comment

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your silicon.com account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy.

Questions about membership? Find the answers in the Membership FAQ

Get silicon.com's daily newsletter

  • Register on silicon.com

    Enter your email to register

Keep in touch with silicon.com

silicon.com newsletters