Are deep neural networks the new spin glasses?

Pubblicato il 26 febbraio 2026 alle ore 09:13

How Statistical Physics and Giorgio Parisi’s Nobel Prize Help Us Understand (and Not Understand) Artificial Intelligence

“However incredible it may seem, what precisely happens inside deep neural networks trained with backpropagation still escapes us.” This statement, written by Giorgio Parisi, sounds almost like a paradox. How is it possible that an omnipresent technology – from search engines to self-driving cars – works so well, and at the same time is not fully understood from a theoretical point of view? And yet, that is exactly the case. And perhaps the very fact that these words come from a Nobel Prize in Physics suggests a direction: to understand artificial intelligence, we may need the eyes of a physicist.

From the Perceptron to Complex Systems

In the 1980s and 1990s, physicists studied the Perceptron in depth, the simplest model of a neural network. With a single layer, the theory is relatively well developed: learning capacity, separability limits, and generalization properties can be analyzed. It was a linear system, and as such, mathematically tractable. However, when moving to deep multilayer networks with millions or billions of parameters, the nature of the problem changes. It is no longer just a classifier: it becomes a highly high-dimensional complex system. And for complex systems, there is a leading discipline: statistical physics.

Loss as an Energy Landscape

From a mathematical perspective, training a neural network consists of minimizing a cost function, the so-called loss. A physicist would interpret this function as an energy function in a space with millions of degrees of freedom. Let us imagine a mountainous landscape in a multidimensional space:
- the coordinates are the weights of the network
- the height represents the value of the loss
- training (gradient descent) is a walk in search of the deepest valleys in this landscape

The problem? This landscape is extremely irregular, a tangle of mountain ranges, with an enormous number of valleys, that is, local minima. Classical intuition would suggest that falling into the “wrong” valley (a bad minimum) would lead to poor performance. And yet, surprisingly, optimization works — and it works well.

The Parallel with Spin Glasses

Parisi’s Nobel Prize is linked to the study of spin glasses: “disordered” magnetic systems where interactions between atoms compete with one another, creating an extremely complex energy landscape — a true labyrinth of deep valleys and metastable states. The analogies with deep neural networks are striking:
- both have an enormous number of degrees of freedom
- both exhibit complex and rugged energy landscapes
- both admit a vast number of nearly equivalent solutions (states)

But there is a crucial discovery coming from spin glass physics that helps resolve the initial paradox: in high dimensions, the global structure of the landscape matters more than any single local minimum. “Bad” local minima (those with very high loss) are extremely rare. Most saddle points and minima encountered have loss values surprisingly close to the global minimum. In practice, if the landscape is sufficiently complex, almost all valleys are deep enough. This helps explain why deep networks, despite their apparent chaos, not only converge, but do so toward solutions with excellent generalization properties.

Transformers, Scaling Laws and Emergent Phenomena

In recent years, theoretical attention has shifted toward Transformers, the architecture underlying modern large language models (such as GPT). Here even more fascinating phenomena emerge. By systematically increasing the number of parameters and the amount of data, we observe:
- regular scaling laws: performance improves predictably, often following power-law behavior
- emergent abilities: beyond certain dimensional thresholds, new capabilities suddenly appear that were absent in smaller models (such as multi-step reasoning or translation in previously unseen contexts)

The parallel with phase transitions in physics is almost unavoidable. In thermodynamics, small variations in a parameter (such as temperature) can radically change the state of matter: from liquid to gas, from conductor to superconductor. Similarly, in language models, an increase in scale can produce a qualitative and sudden change in system behavior.

Why Don’t We Yet Have a Complete Theory?

It would be incorrect to say that “we know nothing.” We understand very well:
- the optimization algorithm (gradient descent)
- the crucial role of overparameterization (having more parameters than data, which instead of causing overfitting can aid convergence, in a phenomenon known as double descent)
- several mathematical results on generalization capacity

However, we still lack a unified theory capable of predicting in advance which abilities will emerge as the model scales, or explaining why certain architectures work better than others. This situation closely resembles turbulence: the fundamental equations (the Navier–Stokes equations) have been known for over a century, yet the collective behavior of a turbulent fluid remains one of the unresolved challenges of classical physics.

A New Frontier for Physics?

Perhaps the most interesting point is this: deep neural networks are not merely computational tools. They are, in every respect, new physical objects. They are highly high-dimensional complex systems, created by humans, in which collective structures and non-trivial behaviors emerge, just as in a spin glass or a magnetic system. For this reason, artificial intelligence is also becoming a theoretical laboratory for statistical physics. Concepts originally developed to study disordered materials (energy landscapes, phase transitions, symmetry breaking) are now proving invaluable for deciphering the functioning of the artificial brain we are building. If Parisi is right, the dialogue between physics and AI is not a nostalgic return to the 1980s, but the beginning of a new and fertile season of research. Truly understanding what happens inside deep neural networks is not merely a computer science issue: it is one of the great theoretical challenges of the coming decades — and physics has much to say about it.