Learning with springs and sticks

Luis Mantilla Calderón1,2 and Alán Aspuru-Guzik1,2,3,4
1 University of Toronto, 2 Vector Institute, 3 Acceleration Consortium, 4 NVIDIA
(2025)

Motivation

Learning is a physical process, and at a fundamental level is an irreversible process that consumes energy. Our goal in this work is to construct a minimal physical model of learning that makes this energy-learning relation easy to study. We ask:

  • Can a simple mechanical system, built from springs and sticks, approximate arbitrary functions like neural networks do?
  • What does learning look like in such mechanical system?
  • Is there a fundamental, thermodynamic limit that constrains how well this system can learn?

We try to answer these questions in this work.

The springs and sticks model

The springs and sticks (SS) model is defined by attaching springs from data points to a grid of sticks. By dissipating the elastic potential energy, the system relaxes to a minimum-energy configuration that equivalently minimizes a mean-squared error loss. The dynamics of the SS system are governed by a Langevin equation, which describes noisy, damped motion toward configurations that minimize the energy:

$$ \begin{aligned} \frac{d}{dt} \mathbf{x} &= \mathbf{\dot{x}}, \\ \frac{d}{dt} \mathbf{\dot{x}} &= {\mathbf{M}}^{-1} \mathbf{f}(\mathbf{x}, \mathbf{\dot{x}},t) - \gamma \mathbf{\dot{x}} + \sigma \dot{\mathbf{\xi}} (t). \end{aligned} $$ Here, $\mathbf{x}$ are the node positions, $\mathbf{M}$ is the mass matrix, $\mathbf{f}$ are the forces from the springs, $\gamma$ is a damping coefficient, and $\sigma \dot{\mathbf{\xi}} (t)$ is a Gaussian noise term with strength $\sigma$ related to an effective temperature $T$ via the fluctuation-dissipation theorem: $\sigma^2 = 2 k_b T \gamma / M$.

The figure below shows the evolution of node positions under different choices of damping ($\gamma$) and noise temperature ($T$):

Node trajectories under different parameters

Approximating any smooth function

We compare our SS model to one layer MLPs. Despite its simplicity, the model achieves comparable regression accuracy to MLPs. This is because the sticks-grid perform a piecewise-linear (trapezoidal) approximation, which is a universal function approximator.

Comparison with neural networks

Thermodynamic limits

Finally, we analyze the thermodynamics of learning. We find that when the system's free-energy change saturates due to environmental fluctuations, calculated via the Jarzynski equality $$ \Delta F = k_b T \ln \langle e^{-W/k_b T} \rangle, $$ learning fails — a phenomenon we call the thermodynamic learning barrier. A way to think about this barrier is that learning requires a decrease in entropy, but when the system is too noisy (in small scales), this entropy decrease is no longer possible.

Thermodynamic learning barrier

BibTeX

@misc{mantilla2025springssticks,
      title={Learning with springs and sticks}, 
      author={Luis Mantilla Calder{\'o}n and Al{\'a}n Aspuru-Guzik},
      year={2025},
      eprint={2508.19015},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.19015}, 
}