Complex Gesture Recognition in iOS – Part 1: The Research

This Ain’t No 3-Finger Swipe

Given that touch-based input has become a staple of mobile devices, I was surprised to find a dearth of good, open source complex gesture recognition libraries in the wild.

Well, maybe I should explain what I’m referring to when I say, “complex gesture.” I’m not talking about the swipes and pinches that dominate our current interactions with touch devices. I’m more interested in what I like to call “glyphs” – a series of strokes that form a symbol.

A good example of what I consider a glyph is something like a letter in the alphabet. A ‘Q’ glyph is the circle stroke and the southeast stroke positioned in such a way that it’s recognizable. But, OCR is not quite what I’m looking for because it’s specific to text. The ideal solution should also recognize abstract glyphs — symbols that aren’t hard coded into a lookup dictionary.

Psh, I Can Write That

Like all foolhardy, naive developers, I initially tried to just wing it. Research shmeshmearch. This would be easy — just take user input, break it down into individual strokes and then compare the input to a template for a glyph.

Here are the primary challenges I ran in to while trying to implement this approach:

  • Users don’t draw well, especially with their fingers — lots of variance.
  • The touch events trigger at a pre-determined interval — fast strokes lead to fewer data points.
  • You can create single stroke gestures with multiple strokes and vice versa — i.e. a checkmark can be made with two strokes or one fluid motion.
  • There are multiple permutations for a given stroke as shown below, hand drawn by me for your enjoyment.


Just some of the stroke combinations to create a ‘Q’

I spent a few hours hacking away at different solutions to these problems, but ultimately decided this sub-stroke comparison approach wasn’t going to cut it.

Step 1 Should Always Be “Search Google For Some Context”

I found two projects that attempt to approach recognition the same way I had — sub-stroke combinations:

Ultimately, like my approach they were too limiting and didn’t address key issues. Both allowed for abstract gestures, but in testing they didn’t allow for the variance you’d find in real-world user input scenarios.

Then, I stumbled on the 1-Dollar Recognizer. Its unique way of comparing input to a pre-defined template (more on this later) made it very good at error correcting, and it’s order/direction agnostic. But, unfortunately it’s also uni-stroke only. The user would have to make one continuous stroke to ensure recognition. Close, but not quite there.

N Dollar Baby

Fortunately, the same group who developed 1-Dollar extended it to include multi-stroke detection and called it the N-Dollar Recognizer1.

Multistroke? Check. Order and direction agnostic? Check. Error tolerant? Check. Fast and easy to implement? Check. High pun potential? Check. Well, N Dollar certainly has the biggest bang for the buck, but can it put its money where its mouth is?

How The N Dollar Recognizer Works

If you’re interested in the specifics, I recommend reading the N Dollar research paper. I’ll keep it high-level here.

Error Correction

N Dollar, and its predecessor 1 Dollar, both use the nearest neighbor approach to recognition. It compares user input to each of a set of pre-defined templates and gives each a score. The highest score wins if it is within a certain margin of error. This allows sub-strokes in a glyph to be sloppy, as long as the overall glyph is closest to the intended template.

Data Points

N Dollar creates a vector from all of the input points and interpolates additional points if necessary. Since sub-strokes are not as important in its algorithm, the final glyph just has to be closest to the intended template regardless of the number of data points.

Multi-stroke vs Uni-stroke

N Dollar doesn’t care how many strokes the user inputs to create a glyph. Before comparing against templates, all sub-strokes are connected into a uni-stroke. The illustration to the right (from the N Dollar limitations page) shows that although the user input two horizontal lines, N Dollar connected them before it ran its matching algorithm. This turns out to be one of N Dollar’s biggest limitations, but also the key to its speed and multi-stroke support. In my testing, this drawback was outweighed by N Dollar’s other factors.

Order and Direction Dependency

N Dollar calculates all the permutations of the generated uni-stroke before matching. The illustration below (taken from the research paper) shows how a user-drawn X gets defined 8 different ways regardless of the original input order. The comparison between the input and template is actually direction dependent, but N Dollar overcomes this by including all the possible directions in its calculation.

There’s a lot more to N Dollar than what I describe above, and its authors are nothing short of brilliant. Playing with the javascript implementation, I’m amazed at how quickly and accurately it’s able to calculate and compare even the most complex glyphs I throw at it.

Knowledge is Half the Battle

Implementation is the other half. That’s how that goes, right? I’ve started a basic iOS implementation here (the original is in javascript), and once I finish I’ll put together a summary in part 2 of this series. Stay tuned!

Update: I’ve finished the implementation here. Read about it here

———-

1 Anthony, L. and Wobbrock, J.O. (2010). A lightweight multistroke recognizer for user interface prototypes. Proceedings of Graphics Interface (GI ’10). Ottawa, Ontario (May 31-June 2, 2010). Toronto, Ontario: Canadian Information Processing Society, pp. 245-252.