SpinVox: Are you listening?

Customer trust is not a game of semantics...

COMMENT

Using a bit of mystique to protect company secrets is all very well but too many foggy messages risk driving a wedge between you and your customers, warns Natasha Lomas.

An interesting little tale unfolded last week courtesy of a report by BBC's tech correspondent Rory Cellan-Jones, or 'Rory Katherine Jones' as voicemail-to-text conversion service SpinVox inevitably rebrands him.

And it's SpinVox's service that was worrying Cellan-Jones.

His report claimed the majority of voicemail messages the company processes are actually heard and transcribed by call centre staff in South Africa and the Philippines - rather than being fed into a fancy bit of speech conversion software.

If you've parsed the company's website or come across any of its PR you'll be familiar with the notion of its "Voice Message Conversion System (VMCS)" - also known as 'D2' and/or 'The Brain' - which eats words and spits them back out as text, as seen in this cutesy cartoon from the company website.

Describing how The Brain works SpinVox says it's "a combination of artificial intelligence, voice recognition and natural linguistics".

Which is about as illuminating as saying 'this chicken tastes good because of the colonel's secret recipe of 11 herbs & spices'. But this is business and there are trade secrets so that's just a bit of smoke and mirrors right?

The website also notes The Brain "is able to call on human experts for assistance" - and for 'human experts' read 'call centre operatives'.

So while the public face of the company may be rather coy about calling a spade a spade, it's not guilty of a cover-up either.

This is only a 21st Century Mechanical Turk if you failed to read the small print, right?

Or is it?

SpinVox put out the following denial of Cellan-Jones' report:

"Claims have been made to the BBC, suggesting that the majority of messages have been heard and transcribed by call centre staff in South Africa and the Philippines. These are incorrect," it says in the statement.

Which is somewhat awkwardly worded. Does it mean the majority of messages are not heard and transcribed by any call centre staff, or just not by call centre staff in South Africa and the Philippines?

The statement continues: "Today, SpinVox now requires only a few hundred agents per market as its system is capable of automatically converting all standard messages without learning assistance."

What's a 'standard message' when it's at home? Slang is anything but, that's for sure. And that's even before you feed in distorting factors such as background noise, poor sound quality, broad accents, sweet nothings and the rest of it.

The company goes on to say that all speech technology has to be 'trained' by humans - by which it means people have to listen to a portion of the audio and then check the corresponding transcription to correct any discrepancies.

"We have always been absolutely clear in our communications that humans form an important component of our learning system - they are a key component by which the system learns," the statement adds.

Suddenly we find the role of the human ear in SpinVox's business has gone from that of back-up understudy ("is able to call on") to serious actor ("important component" - and even "key component"). Which is a pretty radical linguistic transformation whichever way you look at it.

I decided to try a direct approach and asked the company what proportion of voicemails are converted by its speech technology, and what proportion require the human ear to be transcribed, and also how many call centres it has, where they are located and how many staff it employs to parse voicemails.

SpinVox told me it would get back to me with an answer in due course. A doubtless oversubscribed company spokeswoman said: "We are in the middle of discussing many of the issues which are being discussed."

Which is a wonderful way of saying it's not sure what it can say at this point as it's still trying to decide what can be said. A disclosure of sorts, I grant.

Click here for page 2

  • 1
  • 2

Comments

There are 2 comments. Join the discussion

  1. 1. anonymous

    When Daniel Doulton (is he D2 incarnate?) claims that "our automation has taken over 98% of the task" how does he define "task".

    Let's say that a complete SpinVox service involves 50 steps, from receiving a message, to transcoding the audio, to checking for blank (slamdown) messages, to logging it, to backing it up, prior to conversion, and then after the conversion, there might be steps of formatting the message into an SMS, and formatting for email, and adding any advertising or marketing messages, steps for delivering the message, and then reporting and billing, etc.

    So if only one of these hypothetical 50 steps was the conversion of the speech to text, and that conversion was done 90 per cent by humans, would that mean that the system is 98 per cent automated?

    Just askin.

  2. 2. Jonathan Present

    WHO PUT THE SPIN IN SPINVOX???

    I believe that this company has received 100 Million in Venture Capital to date. Perhaps that gives them some latitude to get the job done, one way or another. However, they would require sophisticated speech algorithms to process the information in the manner they claim to. Do they have any Intellectual Property, Patents?

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your silicon.com account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy.

Questions about membership? Find the answers in the Membership FAQ

Get silicon.com's daily newsletter

  • Register on silicon.com

    Enter your email to register

Keep in touch with silicon.com

silicon.com newsletters