Learning to count: a neural network model of the successor function
What does it mean for a neural network to become a “cardinal principal knower”? We trained a multilayer perceptron to compute the successor of the numbers 0-99. N and N+1 were one-hot encoded on the input and output layers, respectively; the hidden layer had 8 units. 80% of the (N, N+1) pairs constituted the training data, the remaining 20% the test data. The mean cosine similarities of the hidden layer representations of the (N, N+1) pairs was 0.77 (0.79) when N was in the training (test) set. The network learned a continuous notion of number: the hidden-layer representations of N and N+1 were comparable whether they did (0.74) or did not (0.78) cross a tens boundary. Thus, the network did not “discover” place-value. Ongoing research is exploring place-value encoding of inputs and outputs, and also structuring of the training data to better reflect the numerical environment of the child.