This Ain’t No 3-Finger Swipe
Given that touch-based input has become a staple of mobile devices, I was surprised to find a dearth of good, open source complex gesture recognition libraries in the wild.
Well, maybe I should explain what I’m referring to when I say, “complex gesture.” I’m not talking about the swipes and pinches that dominate our current interactions with touch devices. I’m more interested in what I like to call “glyphs” – a series of strokes that form a symbol.
A good example of what I consider a glyph is something like a letter in the alphabet. A ‘Q’ glyph is the circle stroke and the southeast stroke positioned in such a way that it’s recognizable. But, OCR is not quite what I’m looking for because it’s specific to text. The ideal solution should also recognize abstract glyphs — symbols that aren’t hard coded into a lookup dictionary.
Psh, I Can Write That
Like all foolhardy, naive developers, I initially tried to just wing it. Research shmeshmearch. This would be easy — just take user input, break it down into individual strokes and then compare the input to a template for a glyph.
Here are the primary challenges I ran in to while trying to implement this approach:
- Users don’t draw well, especially with their fingers — lots of variance.
- The touch events trigger at a pre-determined interval — fast strokes lead to fewer data points.
- You can create single stroke gestures with multiple strokes and vice versa — i.e. a checkmark can be made with two strokes or one fluid motion.
- There are multiple permutations for a given stroke as shown below, hand drawn by me for your enjoyment.
Just some of the stroke combinations to create a ‘Q’
I spent a few hours hacking away at different solutions to these problems, but ultimately decided this sub-stroke comparison approach wasn’t going to cut it.
Step 1 Should Always Be “Search Google For Some Context”
I found two projects that attempt to approach recognition the same way I had — sub-stroke combinations:
Ultimately, like my approach they were too limiting and didn’t address key issues. Both allowed for abstract gestures, but in testing they didn’t allow for the variance you’d find in real-world user input scenarios.
Then, I stumbled on the 1-Dollar Recognizer. Its unique way of comparing input to a pre-defined template (more on this later) made it very good at error correcting, and it’s order/direction agnostic. But, unfortunately it’s also uni-stroke only. The user would have to make one continuous stroke to ensure recognition. Close, but not quite there.
N Dollar Baby
Fortunately, the same group who developed 1-Dollar extended it to include multi-stroke detection and called it the N-Dollar Recognizer1.
Multistroke? Check. Order and direction agnostic? Check. Error tolerant? Check. Fast and easy to implement? Check. High pun potential? Check. Well, N Dollar certainly has the biggest bang for the buck, but can it put its money where its mouth is?
How The N Dollar Recognizer Works
If you’re interested in the specifics, I recommend reading the N Dollar research paper. I’ll keep it high-level here.
N Dollar, and its predecessor 1 Dollar, both use the nearest neighbor approach to recognition. It compares user input to each of a set of pre-defined templates and gives each a score. The highest score wins if it is within a certain margin of error. This allows sub-strokes in a glyph to be sloppy, as long as the overall glyph is closest to the intended template.
N Dollar creates a vector from all of the input points and interpolates additional points if necessary. Since sub-strokes are not as important in its algorithm, the final glyph just has to be closest to the intended template regardless of the number of data points.
Multi-stroke vs Uni-stroke
N Dollar doesn’t care how many strokes the user inputs to create a glyph. Before comparing against templates, all sub-strokes are connected into a uni-stroke. The illustration to the right (from the N Dollar limitations page) shows that although the user input two horizontal lines, N Dollar connected them before it ran its matching algorithm. This turns out to be one of N Dollar’s biggest limitations, but also the key to its speed and multi-stroke support. In my testing, this drawback was outweighed by N Dollar’s other factors.
Order and Direction Dependency
N Dollar calculates all the permutations of the generated uni-stroke before matching. The illustration below (taken from the research paper) shows how a user-drawn X gets defined 8 different ways regardless of the original input order. The comparison between the input and template is actually direction dependent, but N Dollar overcomes this by including all the possible directions in its calculation.
Knowledge is Half the Battle
Update: I’ve finished the implementation here. Read about it here
1 Anthony, L. and Wobbrock, J.O. (2010). A lightweight multistroke recognizer for user interface prototypes. Proceedings of Graphics Interface (GI ’10). Ottawa, Ontario (May 31-June 2, 2010). Toronto, Ontario: Canadian Information Processing Society, pp. 245-252.