We introduce a memory-augmented neural network, calledDifferentiable Working Memory (DWM), that captures somekey aspects of attention in working memory. We tested DWMon a suite of psychology inspired tasks, where the model had todevelop a strategy only by processing sequences of inputs anddesired outputs. Thanks to novel attention control mechanismscalled bookmarks, the model was able to rapidly learn a goodstrategy—generalizing to sequence lengths even two orders ofmagnitude larger than that used for training—allowing it to re-tain, ignore or forget information based on its relevance. Thebehavior of DWM is interpretable and allowed us to analyzeits performance on different tasks. Surprisingly, as the train-ing progressed, we observed that in some cases the model wasable to discover more than one successful strategy, possiblyinvolving sophisticated use of memory and attention.