Parsing actions entails that relations between objects are dis-covered. A pervasively neural account of this process requiresthat fundamental problems are solved: the neural pointer prob-lem, the binding problem, and the problem of generating dis-crete processing steps from time-continuous neural processes.We present a prototypical solution to these problems in a neuraldynamic model that comprises dynamic neural fields holdingrepresentations close to sensorimotor surfaces as well as dy-namic nodes holding discrete, language-like representations.Making the connection between these two types of represen-tations enables the model to parse actions as well as groundmovement phrases—all based on real visual input. We demon-strate how the dynamic neural processes autonomously gen-erate the processing steps required to parse or ground object-oriented action.