Two types of simple recurrent networks (Jordan, 1986; Elman, 1988) were trained and compared on the task of adding two multi-digit numbers. Results showed that: (1) A manipulation of the training environment,called Combined Subset Training (CST), was found to be necessary to learn the large set of patterns used; (2) if the networks are viewed as learning simple programming constructs such as conditional branches, while-loops and sequences, then there is a clear way to demonstrate a capacity difference between the two types of networks studied. In particular, we found that there are programs that one type of network can perform that the other cannot. Finally, an analysis of the dynamics of one of the networks is described.