Has anyone here ever written one? I'm working on one now but it uses quite a lot of memory. I'm not sure whether to use the same weights for every neuron in a given layer, or to use different weights for each neuron: currently I'm doing the latter but I'm wondering if it's necessary, because it means using a ton of memory (something like 48 MB for OCR on a 124x24px TGA image).
> I'm not sure whether to use the same weights for every neuron in a given layer
¿what will be the point in that? all your neurons would give the same result (unless you are training the function, o_O).
Consider that you need to train each weight, and for that you'll need a lot of patterns. So you may want to reduce the number of inputs.
Also, consider a different structure ¿what's your structure?
It's a 3-layer feed-forward network, but I'm writing it in a general way rather than creating a new network every time I want to solve a new problem. It has two constructors: one takes a number of input, hidden and output neurons and initialises the weights for each one to random values between -1 and 1; the other takes a path to a CSV file and loads everything from that. That way, I can train the network (using backprop) and have it keep its weight values between sessions, and also use a different CSV file for each problem the network needs to solve (so long as it's solvable with exactly three layers, which most problems are).
I'm not sure whether to use the same weights for every neuron in a given layer, or to use different weights for each neuron: currently I'm doing the latter but I'm wondering if it's necessary, because it means using a ton of memory (something like 48 MB for OCR on a 124x24px TGA image).
As ne555 noticed, the first approach doesn't make much sense. Also, if you want to do OCR on an image using neural networks, you'll probably want to first somehow separate the characters and then use each character (e.g. a 16x16px image) as input in the neural network.
Do you by any chance want this for captcha breaking? If yes, consider simpler methods first; you may be lucky. For example, Stanford's Decaptcha has a 10% - 24% success rate on breaking the CNN captcha[1], but I was able to achieve a 40% success rate (on breaking the same captcha) with simple pixel matching. I first made a character 'database' consisting of small images of every possible captcha character and then, whenever I got a captcha, I did the following:
- Pixel match the first captcha character with every database character. Keep the best match.
for (int i = 2; i <= 5; ++ i) {
- The next character should start roughly where the previous one ended.
- Pixel match the next captcha character with every database character. Keep the best match.
}
I believe that my relatively high success rate is due to the fact that the separation and recognition parts are interleaved in my approach, whereas in Stanford's Decaptcha they are two distinct phases of the program. This can make a huge difference in captchas where the character separation is difficult (e.g. the CNN captcha).
My point is that if you have a very specific goal in mind, you should first try implementing a very specific solution. If this fails, then (and only then) you can try more general approaches.
if you want to do OCR on an image using neural networks, you'll probably want to first somehow separate the characters and then use each character (e.g. a 16x16px image) as input in the neural network.
That makes more sense; that would cut the network's memory usage by a vast amount (a factor of nearly 12 for a 124x24 px image). I'll look into separating the characters into tiles when I've finished implementing my network. Thanks.
Do you by any chance want this for captcha breaking?
No, although that's an interesting idea. I'm just learning about neural networks and trying to create a general one that I can use in lots of projects, just with different weight values (stored in CSV files, one for each network). I've been trying for a few days but I keep changing my mind about architecture. I think I've settled on one I like, although it's not the most elegant it is quite simple. I use three basic classes for each kind of neuron (input, hidden and output) which derive from a single base class. Each neuron object stores a reference to its inputs (a vector of doubles for input neurons, input neurons for hidden neurons, or hidden neurons for the output neurons) and a vector of its weights, and each one has an activation function that returns the sigmoid function of the weighted sum of its inputs. So, when someone tries to get the activation value of an output neuron, it returns the sigmoid of the weighted sum of the activations of the hidden neurons, whose values are the sigmoid of the weighted sum of the activations of the input neurons, whose values are the sigmoid of the weighted sum of all the input values. There's also some basic caching: a vector of doubles that stores the output values, and a vector of bools that says whether each output has been modified so that the cache can be updated. That way, if someone reads the same output neuron more than once, it won't have to update the entire network. It's kind of a backwards feed-forward network.
Sounds nice :) One more thing you could do is add more activation functions, e.g. linear, hyperbolic tangent, etc. Another thing you could do is provide easy ways to build complex architectures out of simple ones. Actually, you should take a look at PyBrain[1] if you haven't already. You'll probably find many good ideas there, and you could end up contributing to the project too.
I hadn't read about that, although I assumed there would be neural network libraries floating around the 'net. Thanks, it looks interesting, I'll remember to look into it.