Training modern machine learning models requires to optimize highly non-convex risk function and yet simple gradient-based methods are able to find global minima for very high-dimensional problems. Randomness and overparametrization are likely to have something to do with this magic.
In the 90s, Stephen Smale conjectured that a system of n random polynomial equations in n complex unknowns can be solved in average-case polynomial time.
I will present recent progress on the real-variables version of this problem and characterize the overparametrization that is required in that case to make the problem efficiently solvable.
I will then show that closely related ideas can be used to analyze two-layer neural networks. In that case, reaching a global minimum (in absence of regularization) can be problematic from a statistical viewpoint, leading to overfitting.
[Based on joint papers with Eliran Subag and with Pierfrancesco Urbani]