This tutorial requires decent math skills and a good understanding of programming concepts. You need to know about vectors, linear interpolation, and how to work with data. This is not recommended for beginners!
I created a number recognition system in Petit Computer for my Sudoku program, and although I promised I would release a stand-alone version of the recognition system, I never did. Here's the next best thing: a (hopefully) descriptive rundown of the system itself. Here we go!
Basic Idea[]
In a number recognition system, you want the computer to convert a hand-drawn number or symbol into something the computer can understand, like the literal number "4". The computer isn't able to do this unless it has some previous data to compare against. The computer can't magically know that a bunch of lines that somebody drew just happens to be a 4; it'll have to look at previously drawn numbers and say "Oh, this bunch of lines looks like another bunch of lines which you previously told me was a 4, so I think this new bunch of lines is also a 4". In a sense, this is a standard process of learning: using previous knowledge to identify similar patterns.
In the broadest sense, there's only three steps to the recognition system:
- Save original drawings as data and pair them with the actual number it represents. This is "training" the computer.
- Compare a new drawing against original drawings and assign a "closeness" score for each one
- The best fit number is the one with the lowest score (i.e. the closest drawing)
The first and last points are pretty straightforward, but the second point is the complicated one (and the most important). Let's break down how to assign a closeness score:
- Break drawing down into a set number of line segments
- Compare line segments between a saved drawing and the new drawing
- Use vector math to determine the "difference" between each line segment and add them all up. This is the closeness score
What kind of vector math, you ask?
- Calculate the vector difference between the two line segments (one from new drawing, one from old)
- Calculate the length of this vector
- Square this length
That's all there is to it! The core of the whole recognition system is just subtracting lines and adding lengths. All the fluff around it is what makes it complicated; the main part is easy.
Exact Steps[]
Now that we have a basic idea of what's going on, let's see the exact steps:
Training:
- Save reference drawings as sequence of points.
- Normalize amount of points in reference drawings to a standard amount (say 10 points)
- Scale reference drawing points to fill a standard size (say 100 X 100).
- The computer is now "trained"
Recognition:
- Save sequence of points in new drawing
- Normalize amount of points in new drawing to the standard amount
- Scale points in new drawing to the standard size
- Compare new drawing against each reference drawing and score the closeness. For each reference drawing:
- Use the vectors which are created by travelling from one point to the next
- Compute the vector difference between each vector from the new drawing and the given reference drawing
- Compute the length of each of these vector differences
- Individually square each vector length
- Add all squared vector lengths together. This is the closeness score
- Recognized number is the one with the LOWEST score (the one that is the closest).
How To Save A Drawing As Data[]
First, we need to store a drawing as data. Since Petit Computer (and most other touchscreen systems) allow you to query the X and Y position that is currently being touched, we can simply save a drawn number as the series of points which make up the number. This means we can't use an image; we'll have to take the data directly from the touchscreen while the user is drawing it. For every frame the touchscreen is being used, we're going to store the X and Y position. If we took a whole second to draw a number, we'd have 60 points of data (because 60 frames per second). It's up to you to handle the minutia; all we really need are the points which make up the number. Note that this will be the same for storing the data for the "original" drawings as well as storing the (temporary) data for the new drawings we want to recognize. What you'll want to do is ask the user to draw the numbers 0-9 and save the points for each drawing along with the actual number it represents. It doesn't matter how you do this, as long as you have something like:
1 - (X1,Y1) (X2,Y2) (X3,Y3) etc. 2 - (X1,Y1) (X2,Y2) (X3,Y3) etc. 3 - etc.
We're going to call these original "training" drawings the "reference" points.
How To Turn Drawing Into N Line Segments[]
We could technically attempt to recognize numbers with only the points. We could say "How close are the points in a new drawing to the points in all the old drawings" and leave it at that. The problem is that this is a terrible recognizer. What if the user drew a number in a different spot on the screen? What if they drew it a different size? What if they drew it fast this time and it doesn't have the same number of points? There's a whole bunch of things which make comparing just the points a bad thing. Instead, what we're going to do is convert a sequence of points into a set number of line segments. I think for Sudoku I only use 5 line segments, so even if your number has 60 lines (61 points), I convert it to 5 (6 points). It's important to realize that even though our drawing data is a sequence of points, we can use these points to determine the lines which connect the points. But how do we get from more lines to less? Let's start with an easy example: I want to convert 10 points down to 5. This should be as easy as taking every other point, which would be point 1, 3, 5, 7, 9. I'm still working with points instead of lines because this is how the data is stored. But what if we want to convert 10 points down to 9? We don't want to just throw away some random point, because that would deform our number. Instead, we're going to use linear interpolation to determine a new set of points based on the midway points on the "contiguous" line which represents the whole drawing. This sounds complicated, but it's a lot like organizing chairs. If we originally had 10 chairs in a row all equally spaced from each other but then we remove one, we have to shift the remaining chairs over a bit in order for them to be equally spaced again. How does this work? We're basically going to move along the drawing a certain amount and pick our new points. The amount we move along the drawing's contiguous line is determined by how many points we want to end up with. If we want to end up with 5 points, we'll want to move through the contiguous line 1/5 at a time. If the line is originally 10 points, we can move along the line at a pace of 10/5 of a point. Oh hey, this is 2. If we move along this line of 10 points at a pace of 2 points at a time, it's like selecting every other point, and we still end up with 5 points. If we want to take 3 points from 10, we'd move 10/3 of a point. This is where linear interpolation comes into play: the second extracted point ends up being between the 3rd and 4th point (3.3333), so we'll use linear interpolation to get the point which is 0.3333 between the 3rd and 4th point.
HOWEVER! These are numbers we're talking about, and the starting and ending points are REALLY important. As a result, we don't want to shift the starting point nor the ending point, which changes how we determine where to place new points. Remember that we keep the same starting point though, so we really only need to worry about the ending point being the same. This is as easy as subtracting one from both the original number of points and the desired number of points so it's like we're not including the last point. So instead of saying we're going from 10 to 9 points, we say we're going from 9 to 8 points and just tack on the original last point.
Here's some pseudo code (lerp is linear interpolation):
double position = 0 //This is used to travel along the line int currentPoint = 0 //This is the current point we're assigning while currentPoint < newPointCount { point1 = floor(position) //The point "behind" the position point2 = ceiling(position) //The point "in front of" the position portion = decimal(position) //The decimal portion of the position, used in lerp //The most important part is right here: this assigns points based on the linear //interpolation of the closest points while travelling along the continuous line //of the drawing newPointsX[currentPoint] = Lerp(oldPointsX[point1], oldPointsX[point2], portion) newPointsY[currentPoint] = Lerp(oldPointsY[point1], oldPointsY[point2], portion) //Move along the line (notice the minus 1 described in the previous paragraph) position += (originalPointCount - 1) / (newPointCount - 1) currentPoint++; } //Assign last point directly since it is unchanged newPointsX[newPointCount - 1] = oldPointsX[oldPointCount - 1] newPointsY[newPointCount - 1] = oldPointsY[oldPointCount - 1]
Remember, we're doing this so that there is an equal amount of data to compare between the original drawings and a new drawing we want to recognize. Thus we should apply this "point reduction" to all drawings we encounter so that they're all "normalized".
Scaling[]
Converting to a set number of lines helps with a lot of recognition problems, but not all. One of the major things we still need to deal with is recognizing a number even if it's smaller or larger than the original stored drawings. We'll want to scale the points so that the new drawing takes up the same dimensions as the reference drawings. The steps are:
- Choose the desired dimensions. I scaled everything to fill a 100X100 square in Sudoku
- Determine the min and max values for the X and Y points in the drawing. The dimensions are simply Max - Min for X and Y.
- Determine the scaling factor for X and Y. This is simply the desired dimensions (100 in X, 100 in Y) divided by the real dimensions. For instance, let's say we had a drawing that was 20 X 25. The scaling factor in X would be 5 (100 / 20), and the scaling factor in Y would be 4 (100 / 25)
- Multiply each point's dimension by that dimension's scaling factor.
There we go: now we have a drawing that takes up the same amount of space as all the other drawings (assuming we apply the scaling to all drawings, including reference drawings and new ones). This means that even if someone drew something small, it'll scale it up to the appropriate size. This makes the vector math (the next part) much more accurate.
Comparing Line Segments (Vector Difference)[]
Now that we've gotten all the gross stuff out of the way, it's time to just compare numbers based on the vectors which they are composed of. If you standardized the points to just 6, then you'll have 5 vectors which you can compare (1-2, 2-3, 3-4, 4-5, 5-6). All we're going to do is go through each vector in the new drawing and subtract it from the vector in a reference drawing, then add the squared length of the difference vector to the score. We square the lengths so that vectors which are substantially different make even more of an impact on the score. Here's some more pseudo code:
int differenceScore = 0 for int i = 0 to numVectors - 1 { vectorDiff = newVectors[i] - referenceVectors[i] //Get the vector difference length = Length(vectorDiff) //Compute the vector length differenceScore += length ^ 2 //Add the length squared to the score } return differenceScore
At the end of this segment of code, differenceScore should contain a number which accurately represents how close a newly drawn number is to a given reference number. You'll want to do this for all the reference numbers to produce a score for each, then chose the score with the lowest value. Remember, if a new number is drawn EXACTLY like a reference number, the vector differences will be the 0 vector, which means the lengths will be 0 and thus the score 0. If a number is very close, the difference vectors will be small and the score will be small. If a number doesn't match the reference at all, the vector differences will be large and thus the squared length will be even larger.
Oh hey, now that we have the reference drawing with the lowest score, that's it for recognition! You can also only accept scores within a certain threshold, so if a newly drawn number looks too much like 2 numbers, you can reject it.
I don't know if I'm going to add any more to this tutorial, but if you have specific questions, please let me know. I can't really teach you all the math and programming concepts required to fully understand and implement this, but if you're just confused about some particular points, I can totally try to help!