Adaptive systems -- such as a biological organism gaining survival advantage,
an autonomous robot executing a functional task, or a motor protein
transporting intracellular nutrients -- must model the regularities and
stochasticity in their environments to take full advantage of thermodynamic
resources. Analogously, but in a purely computational realm, machine learning
algorithms estimate models to capture predictable structure and identify
irrelevant noise in training data. This happens through optimization of
performance metrics, such as model likelihood. If physically implemented, is
there a sense in which computational models estimated through machine learning
are physically preferred? We introduce the thermodynamic principle that work
production is the most relevant performance metric for an adaptive physical
agent and compare the results to the maximum-likelihood principle that guides
machine learning. Within the class of physical agents that most efficiently
harvest energy from their environment, we demonstrate that an efficient agent's
model explicitly determines its architecture and how much useful work it
harvests from the environment. We then show that selecting the maximum-work
agent for given environmental data corresponds to finding the
maximum-likelihood model. This establishes an equivalence between
nonequilibrium thermodynamics and dynamic learning. In this way, work
maximization emerges as an organizing principle that underlies learning in
adaptive thermodynamic systems.