Throw away your keyboards
Speech recognition is now a reliable and productive technology that can enhance client service, explains Martyn Wells
Away from the loud and rowdy clamour that surrounds artificial intelligence and machine learning lies a more subtle and silent revolution. A transformational series of complementary technologies that have already started to change the way that we consume, the way we interact, and the way we socialise with each other. Almost by stealth they have appeared in all our day-to-day activities and now we’ll never need to live without them.
The rise of the smartphone started this revolution when we were all introduced to the power of touch. Over the last ten years (yes, it really is that long since Steve Jobs announced the launch of the iPhone), we’ve learnt to use our digits to pinch, stretch, and swipe at our inanimate objects. User interface evolved as our smartphone dexterity became more sophisticated. Hardware grew cheaper and screens grew larger; smartphones heralded the way for tablets to be affordable to the consumer market. Evolution branched as screens shrank and become wearable, or attached to a wall, or appeared in a supermarket in place of a cashier. Touchscreen technology is now a mature, acceptable, and affordable human interface.
Elsewhere, speech recognition has had a much longer burn to emerge as a mature technology. Back in the days when we were all worrying about the millennium bug, I was busy playing with Dragon NaturallySpeaking version 3 in Word 97. If I’m honest it wasn’t a particularly immersive experience, and despite how much I wanted it to work the boxed copy soon ended up in the bin. I wrote it off as an emerging technology and forgot all about it. That was until about four years ago, when I bought a shiny new iPhone 4s running iOS 5. IT had a new Siri thing that was fun.
Ironically, Siri uses Nuance’s speech recognition services (Nuance developed Dragon NaturallySpeaking). Siri paved the way for speech recognition to become a generally consumable service, and growth in this area since has been phenomenal. We now refer to this technology as an intelligent personal assistant (IPA) and growing public adoption rates parallel the explosion of the touchscreen market after the introduction of the smartphone.
Driven once more by our personal consumption, IPAs now reside on your smartphone, in your car, and in rooms all around your home. Every platform conceivable now has its own resident IPA, whether it is Alexa, Cortana, Google Now, Siri, or other emerging technologies such as Viv, Baidu in China, or M on Facebook (not presently available in the UK). These IPAs are pretty useful too: ‘Dim the lights.’ ‘Tell me how long the journey will be.’ ‘Remind me when I get home to put the bin out every Wednesday night.’ Your use cases are catered for, so it’s not just about Cookie Monster baking cookies.
At this point I can sense some doubt creeping in, some healthy scepticism about the accuracy of these assistants, despite the accuracy rates on all the main platforms now exceeding 95 per cent and new dialect packs being delivered all the time. If you’ve not tried an IPA, or if it’s been a while, then I urge you to try (again), because underneath this seemingly simple technology lies a world of advanced machine learning. Hidden Markov models run regression algorithms in Bayesian networks within a dynamic process driven by a Viterbi path. Fantastic stuff, but I digress. I intended to stay away from that murky underworld. I wanted you to understand that speech is now a reliable and advanced human interface too.
Knowing that these technologies are developing so rapidly, I looked down at my keyboard as I prepared this article and wondered how this obsolete piece of plastic, whose roots go back to the 16th century, can continue to provide a reliable and productive human interface in the future. Why would I want to think about a subject matter, dictate it for someone else to transcribe, and then check it over? We just won’t need to, will we?
Picture this: you attend a client meeting at the location of their choice and run through a number of changes they need to a document. Using your touchscreen device you download the document from the document management system, dictate, then use touch and speech recognition to drive the amends there and then, before emailing the client the marked-up document. How about that for client service?
For these reasons the days of the keyboard are well and truly numbered. How long before voice and touch are widely used across the workplace in our industry? Do you think you’ll still be using these antiques in five years’ time?
Martyn Wells is IT director at www.wrighthassall.co.uk