False-belief task have mainly been associated with the ex-planatory notion of the theory of mind and the theory-theory.However, it has often been pointed out that this kind of high-level reasoning is computational and time expensive. Dur-ing the last decades, the idea of embodied intelligence, i.e.complex behavior caused by sensorimotor contingencies, hasemerged in both the fields of neuroscience, psychology andartificial intelligence. Viewed from this perspective, the fail-ing in a false-belief test can be the result of the impairment torecognize and track others’ sensorimotor contingencies and af-fordances. Thus, social cognition is explained in terms of low-level signals instead of high-level reasoning. In this work, wepresent a generative model for optimal action selection whichsimultaneously can be employed to make predictions of others’actions. As we base the decision making on a hidden state rep-resentation of sensorimotor signals, this model is in line withthe ideas of embodied intelligence. We demonstrate how thetracking of others’ hidden states can give rise to correct false-belief inferences, while a lack thereof leads to failing. Withthis work, we want to emphasize the importance of sensorimo-tor contingencies in social cognition, which might be a key toartificial, socially intelligent systems.