Learning is a physical process, and at a fundamental level is an irreversible process that consumes energy. Our goal in this work is to construct a minimal physical model of learning
that makes this energy-learning relation easy to study. We ask:
We try to answer these questions in this work.
The springs and sticks (SS) model is defined by attaching springs from data points to a grid of sticks. By dissipating the elastic potential energy, the system relaxes to a minimum-energy configuration that equivalently minimizes a mean-squared error loss. The dynamics of the SS system are governed by a Langevin equation, which describes noisy, damped motion toward configurations that minimize the energy:
$$ \begin{aligned} \frac{d}{dt} \mathbf{x} &= \mathbf{\dot{x}}, \\ \frac{d}{dt} \mathbf{\dot{x}} &= {\mathbf{M}}^{-1} \mathbf{f}(\mathbf{x}, \mathbf{\dot{x}},t) - \gamma \mathbf{\dot{x}} + \sigma \dot{\mathbf{\xi}} (t). \end{aligned} $$ Here, $\mathbf{x}$ are the node positions, $\mathbf{M}$ is the mass matrix, $\mathbf{f}$ are the forces from the springs, $\gamma$ is a damping coefficient, and $\sigma \dot{\mathbf{\xi}} (t)$ is a Gaussian noise term with strength $\sigma$ related to an effective temperature $T$ via the fluctuation-dissipation theorem: $\sigma^2 = 2 k_b T \gamma / M$.The figure below shows the evolution of node positions under different choices of damping ($\gamma$) and noise temperature ($T$):
We compare our SS model to one layer MLPs. Despite its simplicity, the model achieves comparable regression accuracy to MLPs. This is because the sticks-grid perform a piecewise-linear (trapezoidal) approximation, which is a universal function approximator.
Finally, we analyze the thermodynamics of learning. We find that when the system's free-energy change saturates due to environmental fluctuations, calculated via the Jarzynski equality $$ \Delta F = k_b T \ln \langle e^{-W/k_b T} \rangle, $$ learning fails — a phenomenon we call the thermodynamic learning barrier. A way to think about this barrier is that learning requires a decrease in entropy, but when the system is too noisy (in small scales), this entropy decrease is no longer possible.
@misc{mantilla2025springssticks,
title={Learning with springs and sticks},
author={Luis Mantilla Calder{\'o}n and Al{\'a}n Aspuru-Guzik},
year={2025},
eprint={2508.19015},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.19015},
}