Overparametrized systems: from Smale's 17th problem to two-layer neural networks

Tuesday, March 11, 2025 - 4:15pm to 5:15pm

Refreshments:

4:00 PM

Location:

32-G449 (Kiva/Patil)

Speaker:

Andrea Montanari (Stanford)

Biography:

http://web.stanford.edu/~montanar/

Seminar group:

Theory of Computation (TOC) Seminars

Training modern machine learning models requires to optimize highly non-convex risk function and yet simple gradient-based methods are able to find global minima for very high-dimensional problems. Randomness and overparametrization are likely to have something to do with this magic.

In the 90s, Stephen Smale conjectured that a system of n random polynomial equations in n complex unknowns can be solved in average-case polynomial time.

I will present recent progress on the real-variables version of this problem and characterize the overparametrization that is required in that case to make the problem efficiently solvable.

I will then show that closely related ideas can be used to analyze two-layer neural networks. In that case, reaching a global minimum (in absence of regularization) can be problematic from a statistical viewpoint, leading to overfitting.

[Based on joint papers with Eliran Subag and with Pierfrancesco Urbani]