Overparametrized systems: from Smale's 17th problem to two-layer neural networks

Tuesday, March 11, 2025 - 4:15pm to 5:15pm
Refreshments: 
4:00 PM
Location: 
32-G449 (Kiva/Patil)
Speaker: 
Andrea Montanari (Stanford)
Biography: 
http://web.stanford.edu/~montanar/

Training modern machine learning models requires to optimize highly non-convex risk function and yet simple gradient-based methods are able to find global minima for very high-dimensional problems. Randomness and overparametrization are likely to have something to do with this magic.

In the 90s, Stephen Smale conjectured that a system of n random polynomial equations in n complex unknowns can be solved in average-case polynomial time.

I will present recent progress on the real-variables version of this problem and characterize the overparametrization that is required in that case to make the problem efficiently solvable.

I will then show that closely related ideas can be used to analyze two-layer neural networks. In that case, reaching a global minimum (in absence of regularization) can be problematic from a statistical viewpoint, leading to overfitting. 

[Based on joint papers with Eliran Subag and with Pierfrancesco Urbani]