The memory perturbation equation | Proceedings of the 37th International Conference on Neural Information Processing Systems (2024)

Advanced Search
Browse
About
- Sign in
- Register

Advanced Search

nips

research-article

Free Access

Authors:
Peter Nickl RIKEN Center for AI Project, Tokyo, Japan

RIKEN Center for AI Project, Tokyo, Japan
Search about this author

,
Lu Xu RIKEN Center for AI Project, Tokyo, Japan

RIKEN Center for AI Project, Tokyo, Japan
Search about this author

,
Dharmesh Tailor University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
Search about this author

,
Thomas Möllenhoff RIKEN Center for AI Project, Tokyo, Japan

RIKEN Center for AI Project, Tokyo, Japan
Search about this author

,
Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan

RIKEN Center for AI Project, Tokyo, Japan
Search about this author

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsDecember 2023Article No.: 1170Pages 26923–26949

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

The memory perturbation equation: understanding model's sensitivity to data

Pages 26923–26949

PreviousChapterNextChapter

ABSTRACT

Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.

References

Vincent Adam, Paul Chang, Mohammad Emtiyaz Khan, and Arno Solin. Dual Parameterization of Sparse Variational Gaussian Processes. Advances in Neural Information Processing Systems, 2021. 20Google Scholar
Chirag Agarwal, Daniel D'Souza, and Sara Hooker. Estimating Example Difficulty using Variance of Gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 3Google ScholarCross Ref
Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. A Closer Look at Memorization in Deep Networks. In International Conference on Machine Learning, 2017. 3, 6Google Scholar
Gregor Bachmann, Thomas Hofmann, and Aurélien Lucchi. Generalization Through The Lens of Leave-One-Out Error. In International Conference on Learning Representations, 2022. 2, 7Google Scholar
Samyadeep Basu, Phil Pope, and Soheil Feizi. Influence Functions in Deep Learning Are Fragile. In International Conference on Learning Representations, 2021. 6Google Scholar
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. 5, 7Google ScholarDigital Library
R Dennis Cook. Detection of Influential Observation in Linear Regression. Technometrics, 1977. 1, 3, 5Google Scholar
R Dennis Cook and Sanford Weisberg. Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression. Technometrics, 1980. 1, 2, 3Google Scholar
R Dennis Cook and Sanford Weisberg. Residuals and Influence in Regression. Chapman and Hall, 1982. 14Google Scholar
Corinna Cortes and Vladimir Vapnik. Support-Vector Networks. Machine learning, 1995. 4Google Scholar
Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. Laplace Redux-Effortless Bayesian Deep Learning. Advances in Neural Information Processing Systems, 2021. 8Google Scholar
Gintare Karolina Dziugaite and Daniel M Roy. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters Than Training Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2017. 2, 7Google Scholar
Vitaly Feldman and Chiyuan Zhang. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. In Advances in Neural Information Processing Systems, 2020. 3Google Scholar
Wing K Fung and CW Kwan. A Note on Local Influence Based on Normal Curvature. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1997. 1, 3Google ScholarCross Ref
Ryan Giordano, Tamara Broderick, and Michael I Jordan. Covariances, robustness and variational Bayes. Journal of Machine Learning Research, 19(51), 2018. 5Google Scholar
Satoshi Hara, Atsushi Nitanda, and Takanori Maehara. Data Cleansing for Models Trained with SGD. In Advances in Neural Information Processing Systems, 2019. 3Google Scholar
Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Estimating Informativeness of Samples with Smooth Unique Information. In International Conference on Learning Representations, 2021. 3Google Scholar
James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian Processes for Big Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2013. 20Google Scholar
Alexander Immer, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and Mohammad Emtiyaz Khan. Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning. In International Conference on Machine Learning, 2021. 2, 7Google Scholar
Alexander Immer, Maciej Korzepa, and Matthias Bauer. Improving Predictions of Bayesian Neural Nets via Local Linearization. International Conference on Artificial Intelligence and Statistics, 2021. 7Google Scholar
Louis A Jaeckel. The Infinitesimal Jackknife. Technical report, Bell Lab., 1972. 1, 3Google Scholar
Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. Fantastic Generalization Measures and Where To Find Them. In International Conference on Learning Representations, 2020. 2, 7, 10Google Scholar
Angelos Katharopoulos and Francois Fleuret. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. In International Conference on Machine Learning, 2018. 3Google Scholar
Mohammad Emtiyaz Khan. Variational Bayes Made Easy. Fifth Symposium on Advances in Approximate Bayesian Inference, 2023. 5Google Scholar
Mohammad Emtiyaz Khan, Alexander Immer, Ehsan Abedi, and Maciej Korzepa. Approximate Inference Turns Deep Networks into Gaussian Processes. Advances in Neural Information Processing Systems, 2019. 4, 7, 17Google Scholar
Mohammad Emtiyaz Khan and Wu Lin. Conjugate-Computation Variational Inference: Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models. In International Conference on Artificial Intelligence and Statistics, 2017. 4, 15, 20Google Scholar
Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash Srivastava. Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam. In International Conference on Machine Learning, 2018. 6, 17, 18, 25Google Scholar
Mohammad Emtiyaz Khan and Hávard Rue. The Bayesian Learning Rule. Journal of Machine Learning Research, 2023. 1, 4, 5, 6, 16, 17, 18Google Scholar
George S Kimeldorf and Grace Wahba. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines. The Annals of Mathematical Statistics, 1970. 4Google ScholarCross Ref
Diederik Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015. 6Google Scholar
Pang Wei Koh, Kai-Siang Ang, Hubert Teo, and Percy S Liang. On the Accuracy of Influence Functions for Measuring Group Effects. In Advances in Neural Information Processing Systems, 2019. 2, 3, 5Google Scholar
Pang Wei Koh and Percy Liang. Understanding Black-Box Predictions via Influence Functions. In International Conference on Machine Learning, 2017. 1, 3, 5, 7Google ScholarDigital Library
Aran Komatsuzaki. One Epoch is All You Need. ArXiv e-Prints, 2019. 3Google Scholar
Pierre-Simon Laplace. Mémoires de Mathématique et de Physique. Tome Sixieme, 1774. 5Google Scholar
Wu Lin, Mark Schmidt, and Mohammad Emtiyaz Khan. Handling the Positive-Definite Constraint in the Bayesian Learning Rule. In International Conference on Machine Learning, 2020. 6, 9, 17, 18Google Scholar
Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. International Conference on Learning Representations, 2019. 23Google Scholar
David JC MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. 5Google ScholarDigital Library
Roman Novak, Yasaman Bahri, Daniel A Abolafia, Jeffrey Pennington, and Jascha Sohl-Dickstein. Sensitivity and Generalization in Neural Networks: An Empirical Study. In International Conference on Learning Representations, 2018. 1, 2, 3Google Scholar
Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, and Torsten Hoefler. ASDL: A Unified Interface for Gradient Preconditioning in PyTorch. In NeurIPS Workshop Order up! The Benefits of Higher-Order Optimization in Machine Learning, 2023. 8Google Scholar
Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. Deep Learning on a Data Diet: Finding Important Examples Early in Training. In Advances in Neural Information Processing Systems, 2021. 3, 6, 7Google Scholar
Daryl Pregibon. Logistic Regression Diagnostics. The Annals of Statistics, 1981. 14Google ScholarCross Ref
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating Training Data Influence by Tracing Gradient Descent. In Advances in Neural Information Processing Systems, 2020. 3, 6Google Scholar
Kamiar Rahnama Rad and Arian Maleki. A Scalable Estimate of the Out-of-Sample Prediction Error via Approximate Leave-One-Out Cross-Validation. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2020. 7Google ScholarCross Ref
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In International Conference on Machine Learning, 2014. 7, 17Google Scholar
Hugh Salimbeni, Stefanos Eleftheriadis, and James Hensman. Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models. In International Conference on Artificial Intelligence and Statistics, 2018. 20Google Scholar
Frank Schneider, Lukas Balles, and Philipp Hennig. DeepOBS: A Deep Learning Optimizer Benchmark Suite. In International Conference on Learning Representations, 2019. 21Google Scholar
Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. A Generalized Representer Theorem. In International Conference on Computational Learning Theory, 2001. 4Google Scholar
Saurabh Singh and Shankar Krishnan. Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 21Google ScholarCross Ref
Ryutaro Tanno, Melanie F Pradier, Aditya Nori, and Yingzhen Li. Repairing Neural Networks by Leaving the Right Past Behind. Advances in Neural Information Processing Systems, 2022. 5Google Scholar
Luke Tierney and Joseph B Kadane. Accurate Approximations for Posterior Moments and Marginal Densities. Journal of the American Statistical Association, 1986. 5Google ScholarCross Ref
Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J. Gordon. An Empirical Study of Example Forgetting during Deep Neural Network Learning. In International Conference on Learning Representations, 2019. 3, 6Google Scholar
Robert Weiss. An Approach to Bayesian Sensitivity Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996. 3Google ScholarCross Ref
Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng, and Yang You. To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis. ArXiv e-Prints, 2023. 3Google Scholar
Hongtu Zhu, Joseph G. Ibrahim, Sikyum Lee, and Heping Zhang. Perturbation Selection and Influence Measures in Local Influence Analysis. The Annals of Statistics, 2007. 1Google ScholarCross Ref

Cited By

View all

Recommendations

hom*otopy perturbation method for fractional Fornberg-Whitham equation
This article presents the approximate analytical solutions to solve the nonlinear Fornberg-Whitham equation with fractional time derivative. By using initial values, the explicit solutions of the equations are solved by using a reliable algorithm like ...
Read More
See Also
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection
Semi-supervised Learning with Multimodal Perturbation
ISNN '09: Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural Networks
In this paper, a new co-training style semi-supervised algorithm is proposed, which employs Bagging based multimodal perturbation to label the unlabeled data. In detail, through perturbing the training data, input attributes and learning parameters ...
Read More
A singular perturbation of the heat equation with memory
In this paper we consider a hyperbolic equation, with a memory term in time, which can be seen as a singular perturbation of the heat equation with memory. The qualitative properties of the solutions of the initial boundary value problems associated ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Information
Contributors

Published in
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages
Editors:
A. Oh,
T. Naumann,
A. Globerson,
K. Saenko,
M. Hardt,
S. Levine
Copyright © 2023 Neural Information Processing Systems Foundation, Inc.
Sponsors
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
- Published: 30 May 2024
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations0

Article Metrics
- Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

The memory perturbation equation | Proceedings of the 37th International Conference on Neural Information Processing Systems (2024)

New Citation Alert added!

New Citation Alert!

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

ABSTRACT

References

Cited By

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations