This morning I added images gathered during the night and the previous day and tested the results. A common score was 100% recognition rate and zero edit distance. But knowing that cross-fold validation doesn’t have a high confidence with low number of samples I ran the test a couple of times until I got this result (table edited for visualization purposes):
distance Recog top1 Recog top5
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
0.250000 75.0000000 75.0000000
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
0.000000 100.000000 100.000000
---------------------------------------------------
0.025000 97.5000000 97.5000000
* distance = average edit distance
* Recog topX = found a match in the first X estimations
One fold only got a 75% score and this is the result of only 1 mismatch. This time, the digit could have been recognized, because it was in the training set. The digit on the right shows the one that was recognized incorrectly. Because this five is in the process of rotating, it kind of touches the upper edge of the dial window. Because there is no black space between this edge and the digit, the edge is also selected as being part of the digit and this causes the digit to be cropped with the edge attached. Somehow this looks more like a three than a five to the system.
After running more tests, it seems that sometimes the digit ’1′ is recognized as being a ’9′. Both errors show that the recognizer doesn’t have enough data yet (because most of the time the test result in a 100% recognition score). Maybe this could be improved by looking at the features some more and varying the parameters, but more data will fix this problem too.
The annoying thing is that the mismatches always occur on the least significant digit, which is the digit we know the least about so we cannot estimate this in another way. The cause is probably that there are more examples of the more significant digits (for example the leftmost 3 digits are the same in every sample). The digits should be the same regardless of their position, but because of lens effects and different lighting there are small differences. This is another reason why I think more data will solve the problem; there are just not enough examples of least significant digits to be recognized confidently.
So, gathering more data it is! Meanwhile, I can start looking into using the recognizer to automate database insertions so I don’t have to do this every day anymore :)
edit: I just remembered that I can solve the ‘not enough data’ problem by testing with leave-one-out cross validation. This results in a recognition score of 100% on 40 samples. Looks like we’re good to go ^_^
