Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them¶
Venue: iclr2026 (Reject) Authors: OpenReview: https://openreview.net/forum?id=2El3N64oAH
Relevance¶
LLM score: 2/3 — The paper makes sparsity (enforcing weight sparsity on a 50M parameter model) a main contribution, directly aligning with Sutro Group's sparsity priority.
Keyword hits: sparsity
TLDR¶
(none provided)
Abstract¶
This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabling many diverse applications. Empirically, we demonstrate: (i) scalability by enforcing weight sparsity on a 50 million parameter model; (ii) imposing conservativeness constraints in score-based generative models; and (iii) hyperparameter tuning of Gaussian processes based on predictive performance. By doing this, our work represents the next iteration in developing differentiable linear-algebra tools and making them widely accessible to machine learning practitioners.
Keywords¶
Automatic differentiation, Numerical Linear Algebra, Constrained Optimization, Implicit Differentiation, Gaussian Process