liwenzhe: November 2010

Tuesday, November 23, 2010

Reading#18 Spatial Recognition and Grouping of Text and Graphics

Comment:
Chris

Summary:

In this paper, the author presents a framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagram. Their approach is completely spatial, and does not require any ordering on the strokes. Their implementation can be summarized as following.

1) Building building neighborhood graph. For graph (V,E), vertices are the individual stroks, and if two stroke are connected, then there will be edge between them. In order to determine if two strokes are connected, they use proximity distance measurement, they use threshold to control it. If distance between two stroks is less than the threshold, two vertices are connected.

2) The whole graph is divided into seperate subgraphs and each subgraph is recognized by recognizer. They will search through all these groupings and find optimal one. This is combinatorial problem which is computationaly very expensive. In this paper, they use some optimization techniques to decrease the complexity very much. There are two techniques used, the first one is they construct a neighborhood graph in which vertices that are colse to each other, the second one is to restrict the sizeof each subset to be less than a constant K. For searching, they used two approahces, dynamic programming and A* search. In this paper they mentioned the A* search approach.

3) The remaining thing is to build a recognizer, which is built using boosted decision trees. For avoiding overfitting, they used depth 3 decision trees.

Sunday, November 21, 2010

Reading#15 An image-based, trainable symbol recognizer for hand-drawn sketches

Comment
Chris

Summary:

This papers deals with recognition problem using purely off-line approach, which consider each sketch or symbol as sets of pixels. The authors combined four different classifiers which are based on template matching. Most importantly, the author proposed one beautiful way of tacking orient invariant by transforming screen coordinate into polar coordinate. All the example templates are stored in the database, and each input symbol is matched with templates and return the top N best lists.
The four template matching metrics are Hausdorff distance, Modified Hausdoff distance, Tanimoto coefficient and Yule coefficient. The first two measure the dissimilarity which based on distance between two symbols, and the remaining two measure the similarity which based on the number of black pixels, number of white pixels and number of overlapping black or white pixels. After getting these four classifiers, they are normalized into [0,1] range and then combined for recognition.
The most important contribution of this paper is they proposed very smart way to tackle with orientation invariant. Because calculating the rotation angle in screen coordinate is computationally very expensive, they choose to first transform the symbol image into polar coordinates and mapped into [-pi, pi] range. In polar coordinate they can easily calculate the rotation angle and again transformed into screen coordinate to continue their recognition. Besides calculating rotation angle using polar coordinate, another one important usage of this coordinate is to prune template examples before getting into recognition step to decrease the time complexity. They reported that this method can eliminate about 90% template examples before applying classifiers.
The overall result seems very promising, in most of cases, they get more than 90% accuracy for top 1 returned result. And the users studies are based on graph symbols and digit symbols.

Discussion:

Another nice paper, purely off-line approach paper. There are several contribution of this paper. 1) Multi-classifier combination. 2) different distance matching approaches 3) Handling rotation using polar coordinate. 4) Decrease the impact of nearest points of centroid when transforming into polar coordinate. Paper very clearly shows the idea, and nicely organized. However, vision based recognition has difficulty of recognizing similar shapes, as mentioned by author. the future research might be combined vision-based approach with on-line stroke information.

Friday, November 19, 2010

Reading#17 Distinguishing Text From graphics in On-line Handwritten Ink

Comment:
Chris
Summary:

Another paper for shape vs text. But different from previous text vs shape papers we've read. The most important difference is that in this paper it actually utilize the context information. And finally they build HMM model for the sequence of all strokes. Besides individual stroke features, they use temporal information and gap information for sequence of strokes.

1. Independant stroke model. For each stroke, eleven-valued features are extracted. The process needs training samples. And for the testing phase, for given stroke s, they actually calcualte the probability of being text, can be described as P(TextStroke s). So, given input stroke, we can simply calcualte the probability by using trained model. Independant stroke model is only for each stroke, and in some case, it results in much error, so author proposed another improvement for this.

2. Hidden Markov model. The author also observe that probability of states transformation provides valuable information, which served as context information for given stroke sequence. In this case, there are two states, text and shape. And there are four transformation for these two states. Text to text, text to shape, shape to shape and shape to text. For the shape vs text problem, the problems is actually to assign state to each stroke. Thus, the problem can be easily modeld as Hidden markov process. The hidden layers obviously are the two states. and observation layers are the features we acutally calculate. So the problem changes to make the following formula have the maximum value.

However, P(XT) cannot be directly calculated from the models. So they used baysian rule to calculate it.

3. Bi-partite HMM In this step, they used the gap information between two strokes. In order to characterize the gap, they choose 5 features, and the training and mode process is similar as the first step for individual stroke model. After that, we simply incorporated this information into the HMM we got from step 2. The incorporation is very straighforward.
The result shows that their approach gains very good accuracy for shape vs text, especially the step 2 promote the accuracy very much although the step 3 does not impact the accuracy rate very much.

Disccussion:

What a great paper!!! i really like it and gives me so much valuable information!! First, it consider the context information for shape vs text task. this is greate improvement over the traditional method that only based on individual stroke features. Second, they build HMM model for the whole stroke sequence, which is purely probability approach and gives pretty good result. Third, it provides an easy way to add other context information into the HMM that already built. Besides, gap context information,we can futhur incoporate other context information very easily. The beautiful idea of this paper can be very useful for other research. Only one disadvantage that might have is people will not draw the text in natural order. Because HMM greatly depends on the temporal order, once people violate this order, the method might tends to get lower accuracy.

Sunday, November 14, 2010

Reading#16 An Efficient Graph-Based Symbol Recognizer

Comment:
Chris

Summary:
This paper is completely different from what we read before. The paper deals with graph-based recognzier instead of using features to create recognizer. They represent each symbol as ARG(Attributed relational graph), each node is for primitive, each edge is for relationship between two primitive. Each node has two properties, primitive type and raletive length. Each edge has three properties, number of intersections, angle, position information, respectively. After convering each symbol into ARG, the remaining task is maching this ARG with best sample symbol. There are two steps. 1) Graph maching. Here, we find the each corresponding node between two graphs. This is well-known NP -complete problem. In this paper, author provides 4 different ways to approximate the best match. 2) Measureing maching score. author defines six different maching score metric in this paper as well as its weight value, which get from empirical study. System go through each sample and simply returns the best N-list. Result shows that about 93% of chance that the top 1 result is correctly returned.

Comments:
This paper shows another important area of pattern recognition. For graph-based approach there is one important issue we have to consider, which requires high computational cost, that is , graph matching or graph isomorphism, which remains NP-complete problem. But graph based approach has very good advantages while comparing to statistical or syntatic approach. The system does not depend on the drawn orientation and order, or scale(if use relative value), and can represent components' relationship and topology very accurately. The structural method is very appealing for me, and personaly perfer this structural approach to statistical method. Some papers also use structural way to recognize handwritting characters which seems harder than the work presented in this paper.
In all, this is pretty nice paper, and gives me very much information about modeling problem using structural way and use graph theory to solve it.

Reading #14Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams (Bhat)

Comment:
Chris

Summary:
This is an another paper dealing with shape versus text. But it uses completely different method comparing to Reading#13. This paper only use entropy information to classify shape or text. In reading#13, the author use decision tree to find the most disdinguishable features and use these features to classify shape or text. However, in this paper, author use entropy measure, which represents uncertainty measurement, to classify shape or text. Paper shows that text is more randomly structured, which means has larger uncertainty rate than shape. Using this intution they create a code mapping model. They use angle between two consecutive points to represents the whole stroke, and these angles are put into seven different angle beams. Thus, after this processing, each stroke can be represented as sequence of characters -string. The next is to compute entropy value for this string, and they use this value to classify stroke as shape or text. For grouping strokes step, they simply use temporal information to group strokes. The result system shows good accuracy rate for shape vs text .

Discussion :
Smart intution & idea!! Idea is great, and very benificial for other applications. However.... I am wondering if it gets high accuracy in most of cases. This algorithms greatly depends on how the shape drawn, if the shape itself is very complex, and drawn cursively, the algorithms cannot classify it. And the grouping approach is not good.
The most ideal way to accomplish this task is , in my option, utilizing context information. Without the aid of context information, any recognition system cannot gain acceptable recognition rate.
Anyway, I have to say the idea is good, it is definitely a nice paper.

Total Pageviews