liwenzhe: September 2010

Sunday, September 5, 2010

Reading #6: Protractor: A Fast and Accurate Gesture Recognizer (Li)

Comment :
Chris Aikens
Summary:

This paper introduce anther light-weighted single gesture recognizer, Protractor, which is very easy to implement and incorporated into any application system as well as mobile devices due to its simplicity and efficiency. Li, who is the designer of 1$ recognizer, developes Protractor based on his past work for 1$ recognizer. These two recognizers are both featured as super simplicty, they are similar in many aspects, however, Protractor is much more efficent than 1$ recognizer because it uses cosine similarity instead of distance measure which used in 1$ recognzier.

For implementation detail, they used similar preprocessing steps, except for calculating rotation angle and they uses different scaling mechanisim (1$ uses rescaling to fix a square while Protractor does not use rescaling)
Next, the most different part is , to calculate the optimal angular distance. Protractor stores stroke points as vector, then try to calculate the cosine similartiy between template vector and new stroke vector. Because indicative value is only an approximation measure of a gesture's orientation, Protractor should futhur rotate template gesture to get the best angle, which makes two vectors has largest similarty. In order to find such optimal angle, Li first shows us the formula which looks like :

In order to make left part of formula makes minimum, we only need to calcualte angle theta which makes its derivative function equals to 0. Li also shows us the final calculation result. By doing in this way, computation cost is significantly decreased, He no longer need to use iterative way to find such optimum value which is time consuming.

In the evaluation part, he compared Protractor with 1$ recognizer in terms of accuracy and efficiency. He found that Protracor actually outperformed its peer in many aspects including 1$ recognizer.

The following table shows the comparison between Protractor and 1$ recognizer.

	1$ Recognizer	Protractor
Preprocessing	Resample (N = 64) Rotate by Indicative Angle Translate to (0,0) Rescale to Square	Resample (N = 16) Rotate by Indicative Angle or aligne the orientation makes least rotation Translate to (0,0) NA
Classification	Use distance measure Find the minimum distance	Use vector similarity measure Find Maximum similarity
Accuracy	High for previous testing set Lower than Protractor for larger gesture set Both become lower	High for previous testing set Bettern than 1$ recognizer for larger gesture set Both become lower
Time cost	MUCH slower than Protractor, time grows fast as the number of training samples grows	MUCH faster than 1$ recognizer, time grows much slower than 1$ recognizer

Discussion :

This paper is an improvement of Li's last paper, 1$ recognizer. In this paper, he uses different similarity measure metric which is much faster than 1$ recognizer. He noticed that when he implements 1$ recognizer it uses iterative computation for calculating optimal angle which is time consuming. In this paper, he improves this part by using vector respentation instead of pure 2D representation of strokes as in 1$ recognizer. Due to the fast computation, it becomes feasible to apply to much larger gesture set. I like the smart idea behind computing optimal angle by using mathematical equaiton.

Reading #5: Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock)

Comment :
Sam

Summary :

This paper introduces the 1$ recognizer developed by Dr.Wobbrock. 1$ recognizer is so simple that can be integrated into any system without any trouble and only requires hundreds lines of code, which is the major feature of 1$ recognizer. 1$ recognizer only handles single stroke, and new stroke class can be easily added to the exsting stroke set. It does not use complicated algorithms like training, but template maching is used which means no training is required. It use the distance measure to calcualte similarity of two strokes. It is very easy to understand and easy to impelement, that's why 1$ recognizer can be used even without any prior knowledge about AI and gesture recognition.

There are four steps for recognizing; Resample, Rotate, Scale and Translate, Find the optimal angle and best scores, respectively. For resample, he uses fixed number of points per stroke, and distance between each neighboring points are equal. For Rotate, he used "Indicative angle", which formed between centroid of gesture and gesutre's first point. This step can eliminate angle variance. For scale and translate step, he made the stroke as fixed size bounded by a fixed lengh of squre, then moves the gesture to the reference point so that the centroid will be (0,0) after translating. After the first three steps,we still does not guarantee that two strokes are both at best angle when comparing these two strokes. Thus, in this step, he furthur calculates the best angle, which makes the distance between given two strokes is minimum. Finally, the minumum distance among all the comparision is selected, then output the class label.

1$ recognizer : http://depts.washington.edu/aimgroup/proj/dollar/
N$ recognizer: http://faculty.washington.edu/wobbrock/pubs/gi-10.2.pdf

Discussion :

1$ recognizer is so simple that can be integrated into any system, despite of its simplicity, the accuracy is very high. However, there are some drawbacks of 1$ recognizer.

Low Efficiency, template maching is time consuming, especailly when there are many gesture classes and many sample gestures per class, 1$ recognizer will become unfeasible to use in that situation. No training required, but recognizing process costs too much time.
Only handle single stroke. There are ways can improve 1$ recognizer to handle multstrokes. But it needs some trickes to do that, because when we use template matching, the number of sample gestures will be exponentially grow as the number of stroke per gesture grows.

Reading #4: Sutherland. Sketchpad: A Man-Made Graphical Communication System (Sutherland)

Comment :
Hong-Hoe(Ayden)-Kim
Summary :
This paper introduces the first pen-based input system, SketchPad, which began in 1964, was the seminal work of Ivan Sutherland, who recevied Turing Award later in 1988 for this system. He started by illustraiting how to draw Haxagonal lattice (Figure 1) using SketchPad.

He first created SIX-SIDED figure and one CIRCLE and then applied several operations like "Move","Delete" to get HAXAGON which makes "Subpicture", and, actually the Haxagonal Lattice consists of large number of the same HAXAGON. Thus, once the HAXAGON is created, then the system can easily generate the Haxagonal lattice, author also said that handling this kind of repetitive work was the most important feature of SketchPad system. Along with this example, some capabilities of system were mentioned, Subpicture, constraint and Definition copying.
In the remaining part of paper, author shows the design details of the system, more specifically,

Ring Structure and its basic operations including insert, delete, copy..etc.
Strcutre of Sketchpad system, which was the typical object oriented system.
Introuducing LIGHT PEN
Introuducing the displaying system, such as how to magnify the picture, how to display line, circle,digit,text,etc.
How to use recursive functions to manipulate operations, functions such as Recursive deleting , recursive merging and recursive display, as well as how to copy the drawings.
Constraint satisfaction, can use swtich to turn on or off this feature, most importantly, it can distinguish Sketchpad drawing from a traditional drawing. By using this feature, computer can understand the people's intention. So, even people makes mistakes when drawing picutres, computer can correct and beautify them once people set this contratint to the computer.

(Following demo videos of SketchPad system shows us how it was working)

1. Brief overview of SketchPad System

2. SketchPad Demo Part(1/2)

3. SketchPad Demo Part(2/2)

Discussion :

What was an excellent system that made even 50 years ago! There is no doubt that how this work greatly impacted later HCI field and thousdans of researchers. Today, pen-based devices are becoming more and more popular and makes them easy for people to use. When compared with today's system, SketchPad is nothing more than a drawing pad. People have to use Buttons and knobs to control the drawing, which seems very unefficient and too complicated to use. But anyway, it is the milestone of today's HCI, we can see the revolutionary process of new technology from this work.

Reading #3: “Those Look Similar!” Issues in Automating Gesture Design Advice (Long)

Comment:

Chris aikens

Summary :

This paper introduces an interface design tool, quill, that uses unsolicited advice to help designers to design gestures. Authors notice that it is difficult to design gestures that will be recognized well by the computer due to the similarity of different gestures. In order to handle this problem, they developed quill, to analyze the gestures, find similar gestures and warn designers by showing messages and tell them how to fix the gesture. When training quill, Rubine’s training algorithm was used, and for each type of gesture, ten to fifteen examples were drawn.

In the remaining parts of paper, author shows some challenges about designing this system. When giving advice to designers, there will be some considerations such as advice timing, how much advice to display and advice content. They analyze the designers’ common habits and shows detailed information for how to design system to make is more friendly for designers to use. They also examined long-running operations and gives some optional choices for executing these operations but not make designers confused about it.

Discussion:

The idea showed in this paper is very fresh to me. Despite of the lack of implementation details for the system, it clearly shows us what the system do, how the system works as well as how to make it more friendly for designers. What impressed me most is that it shows me overview of desigining feedback system and the way how to show the feedback to users, which seems very challenging, these concepts can be incorporated into our own system in the future.

Saturday, September 4, 2010

Reading #2: Specifying Gestures by Example (Rubine)

Comment :

Francisco

Summary :

In this paper, Dean Rubine introduce his program GRANDMA and describe his single-stroke gesture recognizer which used linear classifier to recognize new single stroke.

In this paper, he describe 13 features he used to train recognizer, some of which are still widely used today. For machine learning algorithm, he used linear classifier, which is rather simple and easy to implement. Training process calculates the weight value for each feature for each class label. In the evaluation part, he shows that his recognizer has high accuracy. He also shows one very important fact that when training the recognizer, 15 examples per class is enough. Besides this, anther contribution of this paper was that he described rejection method for some ambiguous situation.

Discussion :

This paper plays very important role in sketch recognition field, and it tells the basic idea of building real time recognizer. Even though recognizer only handle single stroke, this work can be extended to build multi-stroke recognizer, which is very active field today. In fact, in real world, most of sketches are multi-strokes. We cannot put restriction that user only use single stroke to draw diagram or write characters. For extension part, Rubine briefly introduce two important things: Eager Recognition and Multi-finger recognition. For eager recognition, he said system can recognize while user was making it, this idea is very important for build real time system, but it must be able to give feedback and continually check and modify the recognition result by utilizing context information to maximize the overall confidence. I am pretty interest about multi-finger recognition, which is the most important feature of Apple product. I am also surprised about that this idea actually comes from decades ago. Perhaps in the near future, we don't need keyboard, mouse, what we only need are our fingers and mouth, which are enough to fully control computer and other electronic devices.

by Li 09/04/2010

Wednesday, September 1, 2010

Reading#1 Gesture Recognition BY Hammond

Commented on :
Sam

Summary :

This survey chapter gives us brief overview of what is gesture recogntion and how to write a simple gesture recognizer by Introducing Rubine,Long and 1$ recognizer. Although these recognizers are very simple and easy to implement, this chapter clearly shows us the basic idea of sketch recognition, which still is the main idea of implementing much more complicated things.

For these three recognizers, the first two use classifer to recognizer gesture while 1$ recognizer use template matching technique to classify gesture. Rubine provided 13 features which still beging widely used today. For the extension of the these 13 features, long provides 22 features to slightly improve the acuracy. After calculating all the features for given stroke, they use linear classifer to recognize this stroke. ( the major part of training is to calcualte the weight value for each feature i of each class label c, noted as Wci, then used linear combination of these weight values to calculate the labe c which makes the confidence value maximum)

For 1$ recognizer, it used template matching technique. It calcualtes the distance between new gesture and every sample gesture stored in database, finally the least pair will be returend. In 1$ recognizer, it uses indicative angle value to handle the rotation problem. 1$ recognizer is so simple that can be embedded in any application easily which is the main advantages of this recognizer.

Discussion :

1. Rubine recognizer has good recognition rate, Long extends the feature set and makes it contain 22 features. However, most of features actually derives from the Rubine features, I am still wondering whether it really improves the accuracy ? More features also means more computation, and accuracy does not increase as the the number of features increase, what we should do is to find the features mostly distinguish different class symbols..

2. Template Matching technique has very high accuracy, it is true. But the most important problem is computation costs. Thus, for 1$ recognizer, the number of symbol class should be small, and we should provide limited number of samples gestures. Obviously 1$ recognizer can not apply to large system, but the idea is excellent. However I prefer to use linear or non-linear classier first to get the top N list and then use template matching algorithms which might be fast and gain higher accuracy... ?? 1$ recognizer use fixed length and width to preprocessing the stroke, however, we might use fixed length while makes width flexible to keep the ratio unchanged.

3. These three recognizers can only applied to unistroke. If we write one symbol as multistrokes, it becomes impossible to recognize it. Especially when we want to use template matching for multistrokes, time complexity will become exponential to the number of strokes.

4. How do you think of if we use the same feature set but different machine learning algorithms?
For example, we just assume using Rubine Features, and we can choose any algorithm from linear classifier, ANN, HMM, DT, SVM, Boosting..etc. We might try to guess what result we can get. Does the algorithm we choose strongly related to the recognition accuracy.. ?

My First Post

1. Photo

2. Email Address : nadalwz1115@hotmail.com or liwenzhe@tamu.edu
3. Graduate standing: 2nd year MS
4. Why are you taking this class?
Handwritting is very exciting area for me, and this is related to my thesis.
5. What experience do you bring to this class?
Interest & some knowledge about sketch recognition AND AI
6. What do you expect to be doing in 10 years?
Be an expert in computer technology, also I am quite interested in economics area , so I will try to figure out what kinds of computer skill that I can bring into these fields.
7. What do you think will be the next biggest technological advancement in computer science?
AI makes computer more intelligent, but anyway, I always believe that can not replace human brain
HCI makes computer much easier to use. Handwritting and Speech are the most areas that make this become true. Future computer will be smart enough that can talk with people, play with children.. Portal device will have full functionality that today's computer has.
8. What was your favorite course when you were an undergraduate (computer science or otherwise)?
OS, Data structure and Algorithms, etc.
9. What is your favorite movie and why?
Favorite movie?? hmm..... "Avatar", first 3D Movie I watched .
10. If you could travel back in time, who would you like to meet and why?
Robin Li, I would choose to be classmate with him.
11. Give some interesting fact about yourself.
I really like sports.. football/basketball/tenis/swimming/pingpong/skate/running......

Total Pageviews

Sunday, September 5, 2010

Saturday, September 4, 2010

Wednesday, September 1, 2010