The Touch Gesture Motion conference (TGM) covers various technologies related to up-and-coming human-machine interface approaches. And its middle name is “Gesture.” How we doin’ there?
Well, first off, some of the consistent names in gesture – regular faces in past years – were not present this year. That caught my eye. And then there was an interesting presentation providing evidence that consumers aren’t delighted with gesture technology. Another red flag.
So let’s look at some evidence and then go over some of the challenges that gesture technology may need to overcome.
I personally only have one piece of evidence, which, scientifically, would be considered not evidence, but an anecdote. I wrote about it before: answering a phone call overlapped with a hang-up gesture. Yeah, you can see where that went.
But there’s another source: a company called Argus Insights monitors… um… well, online social discussion. And they intuit from that how people are feeling. Note that this doesn’t really provide information on why folks are reacting the way they are; it simply provides the reaction.
They get this by mining the social media buzz surrounding various products. They check not only the amount of discussion, but they also characterize whether it’s positive or negative. For instance, they found that the Samsung Galaxy S3 started with a 0.75 “delight” rating, but the S4 had a rather rocky debut, starting as low as 0.25 and eventually crawling up to about 0.70 or so. Later, the S5 nailed it at around 0.85 or so out of the chute, declining to around 0.8.
Depending on how they mine this stuff, they extract information on different aspects of technology. I’m not privy to the details of how they do the extraction (if they were my algorithms, I certainly wouldn’t make them public), so I can’t swear as to the accuracy, but folks are listening.
And here’s what Argus says about gestures: consumers are not thrilled. The following chart shows consumer reaction to touchscreens, touchscreen responsiveness specifically, and gesture recognition – and the latter shows a pretty dramatic dropoff.
Click to enlarge. Graph courtesy Argus Insights
While this data doesn’t provide cause, other presentations and discussions from the conference can shed some light. In fact, it’s not easy to see why it might be a problem.
John Feland, cofounder and CEO of Argus Insights, related one incident where he was consulting with a system house, and they declared, “We should assemble a vocabulary of 35 gestures!” as a response to other systems having growing gesture vocabularies. As if the number of gestures defined success. As you might imagine, Mr. Feland advised against that.
Why? Because who wants to memorize 35 gestures? OK, perhaps it’s possible – if we, as a culture, standardize on gestures and start teaching kids at an early age, the way we do typing keyboarding today. It becomes ingrained and we carry it with us the rest of our lives.
But that’s not what’s happening. Each system maker has its own vocabulary. Those vocabularies are enabled, typically, by separate companies specializing in gesture technology. Those providers each have different vocabularies. And those vocabularies sometimes relate to the technology used to see the gestures. Is it ultrasound? A camera? What camera technology?
So it’s not a matter simply of learning 35 gestures. In fact, let’s drop the issue of too many gestures; let’s assume there are, oh, eight. That’s not many – especially with symmetries (up/down/left/right are probably – hopefully – analogous). But if you have two tablets in the house and three phones and an entertainment system, each of which has eight gestures, and they’re all a different set of eight gestures, then you have to remember for each system which gestures do what. Kids, with their annoying plastic minds, can probably do that. Adults? Not so much. (OK, we could. But we’re old enough to have other things to do with our time and gray matter.)
Of course, the solution is to standardize on eight gestures to be implemented throughout the industry. Yeah, you can imagine how fun that discussion would be. In addition to picking the eight, you’d also want to be culturally sensitive, meaning a different eight for different cultures, meaning also defining which cultures get their own and where the boundaries will be. Great rollicking fun for the entire family to watch if UFC isn’t on at the moment.
And it’s not just the gestures themselves. There are also… what to call them… framing issues. How do you end one gesture and start another? One system might do gestures all in a single plane; in that case, pulling your hand back towards you could be interpreted as ending a gesture. But another system might use a pulling-towards-you gesture for zooming, with some other way of indicating that the gesture is complete.
My own observation is that gesture technology has largely been viewed as a cool thing to bolt onto systems. And let’s be clear on this: it is cool. At least I think it is. That simple cameras or other devices can watch our hands and sort out what we’re doing in complicated scenes and settings is really amazing.
But it also feels like we’ve added them to systems in an, “Isn’t this cool??” manner instead of an, “Isn’t this useful??” way. And consumers like cool for only so long, after which they get bored – unless it’s also useful. Which would be consistent with higher satisfaction early and then a drop off.
Probably the biggest question ends up being, is it useful enough to generate revenues that will fund the further development and refinement of the technology? That value question has also not been unambiguously decided one way or the other.
So there are lots of data points here; they all suggest that there’s more to be done. I’ll leave it to the participants in this battle to decide the best fixes… or you can add your own thoughts below.