AI Voice Assistant Technology Still Sucks

“OK Dragon – post ‘I’m too lazy to write this status myself’ to Facebook.” No problem. OK to post “I’m too lazy to write this status myself”?

“OK Google – navigate me to Helvetica Nude’s Font of Women, a Strip Club for Typographers.” Turn left onto I-73 and take exit 33 towards I-24 South, then merge onto US 428 to Nasty Park Road.

“Siri – remind me in 1 day to call my therapist and get a refill on my prescriptions.” Reminder scheduled for tomorrow.

Voice assistants are all too commonplace today, if you bother to seek one out and use it. Siri may have made it mainstream, but there are decent competitors out there such as Dragon Assistant and Android’s Google Now. But the technology and the paradigm of using it in everyday life has far to go before we wonder how we ever did without it.

Talking to your phone or computer is ridiculous

No matter how often you probably fantasized about being able to get information or perform tasks just by using your voice, you’ve never had the sort of AI experience that people on board the Enterprise had on TV’s Star Trek. To be honest: they looked stupid using it, too. But at least the computer gave them useful information and seemed to understand most natural language input (unless the writers felt like making a point about how computers aren’t human, and vice versa with an unrecognized command. If you’ve ever spent more time trying to get voice commands to do what you want than it would have taken just to use your hands and do it manually, then you feel how stupid you look doing it. Only the obstinate tech enthusiast will continue to try to get it to work.

You can’t do this naturally in public either, or you just look like an elitist jackass. “Hey Siri, find me some friends.” I’m sorry, but I’m unable to detect anyone nearby with a severe lack of social skills. Have you tried browsing Tinder?

When the Singularity happens and our computers try to kill us gracious robot overlords rule over us with wisdom and justice, we might have lengthy conversations when they will it. We might discuss philosophy, religion, politics, and the human slave destruction and feeding schedules. But for now, voice AI is a party trick. It’s a point and pray solution to a problem we didn’t have.

Until we fix the paradigm and how foolish it makes us look, I think we should put people who use voice commands into a dimly lit room with people who continuously wear their bluetooth headset when they’re not driving or even taking a call.

OK Dragon, find me a bug report form

My rage against voice commands came recently when I picked up my replacement laptop from Best Buy, an upgrade from my previous model that I was able to get under the Geek Squad protection plan (more on that in another post). The laptop I’m writing this on is the Toshiba Satellite Radius (model P55W-B5220) – it’s a nice convertible Windows 8.1 laptop that turns into a tablet and sometimes sits up like a teepee. It came with Dragon Voice Assistant and that kind of excited me.

Maybe some of that was excitement over Cortana being in Windows 10, and thinking somehow it might finally be cool to talk to my computer. But it’s not. It’s just ludicrous. In this particular case, I’m giving a bad review to Dragon because it’s pre-installed on a machine in a big box store and it’s just not up to snuff.

Dragon posted a status to Facebook for me without much fuss, once I connected it up to my Facebook account and took some basic steps to get it to recognize my voice saying “OK, Dragon,” its default activation phrase. It also responds to “Hey Dragon,” “Yo, Dragon,” but does not respond to “Dragon,” “Dammit, Dragon,” or “Dragon, you ho bag,” despite my repeated efforts.

Dragon got confused when I tried to get it to open specific programs on my computer that weren’t Windows 8.1 modern apps. Then it couldn’t do simple things I wanted it to like “Close this tab,” while I was in a Firefox browsing session. It shined when I asked it in plain language to find something out for me, essentially doing a search on whatever type of site was appropriate, within reason.

More often than not, though, Dragon tells me that it’s “not sure what to do with that,” or “OK, cancelled.” And of course, with an activation phrase, it frequently misheard the television or people in the room, like your eager friend who hopes someone nearby is talking about them. You can tell Dragon to go back to sleep, but that gets sad after a while.

Google Now or um, you know, whenever you get around to it – if you have time, you know, no big deal.

My other voice assistant, Google Now on my LG G3 Android phone (now running Lollipop, thanks Verizon!) is thankfully a lot more cognizant of my needs and desires than Dragon. While Google Now taps into Google’s ever-expanding far-reaching knowledge and metadata chasm, its primary annoyance is getting the activation phrase to work and to get it to do something besides search for exactly what I just said.

I have on a few occasions managed to get Google Now to recognize “OK Google, what’s this song?” and pull off some Shazam-like magic to identify a song for me. It’s almost the only thing I use it for anymore. Outside of that, I once got it to set a reminder somewhere on my phone that never went off. It acknowledged the time and date and content of the reminder, but when the time came and passed, I never received any notifications.

I’m still not 100% sure how to get Google Now to just jump into Navigate mode and get me where I need to go without pressing anything. And while driving, which is where I’d talk instead of type, I can’t take my eyes off the road to type or confirm the location I need.

Ya got no follow-through, kid.

My biggest pet peeve about AI voice assistants, above all the rest – they have no sense of context. They’re designed essentially to take one command at a time and interpret that into a task to complete. They typically have no natural language follow-up capabilities to understand a contextual command, especially as a follow-up to an initial command. Google Now almost gets it right by letting me say “the first one” when navigating search results, but has a hard time understanding that I need it to keep listening to say, dial the number in the listing or do something useful like text the result to a contact.

Just the convenience of being able to use pronouns and have the AI understand the context and know what to do with follow-up commands would make significant progress towards useful voice controls. In essence, the more human we can make our AI, the more pleasant it will be to interact with it.

It won’t solve the problem of douchebags talking to their devices looking like douchebags, but now we’ll at least have some sympathy for the poor device that has to listen to him.

Final thoughts

I can’t give any insight to Siri in earnest, because I don’t have any iDevices around the house or at work. From what I’ve seen, it seems to have better adoption and engagement from iPhone users because the developers gave it a bit of a personality. Functionally, people who have it seem to really appreciate it – but I still say the problems persist regardless of platform. We need something better – we need a real life JARVIS to run all our gadgets, our homes, and our transportation. You can’t feel anything but pleasant talking to JARVIS – he’s realistic, friendly, well-tempered, and always knows what you need.

The person who designs a voice assistant like JARVIS, well that person has Vision.

[Editor’s note: I confess, I just saw Avengers: Age of Ultron yesterday evening and may be fanboying out still.]

RAGING TECH