Simple recurrent networks have been used extensively in modelling of learning various aspects of linguistic structure. W e discuss h o w such networks can be trained, and empirically compare two training algorithms, Elman's "copyback" regime and back-propagation through time, on simple tasks. Although these studies reveal that the copyback architecture has only a limited ability to pay attention to past input, other work has shown that this scheme can learn interesting linguistic structure in small grammars. In particular, the hidden unit activations cluster together to reveal linguistically interesting categories. W e explore various ways in whkrh this clustering of hidden units can be performed, and find that a wide variety of different measures produce similar results and appear to be implicit in the statisticsof the sequences learnt This perspective suggests a number of avenues for further research.