research-article Free Access
- Authors:
- Peter Nickl RIKEN Center for AI Project, Tokyo, Japan
RIKEN Center for AI Project, Tokyo, Japan
Search about this author
- Lu Xu RIKEN Center for AI Project, Tokyo, Japan
RIKEN Center for AI Project, Tokyo, Japan
Search about this author
- Dharmesh Tailor University of Amsterdam, Amsterdam, Netherlands
University of Amsterdam, Amsterdam, Netherlands
Search about this author
- Thomas Möllenhoff RIKEN Center for AI Project, Tokyo, Japan
RIKEN Center for AI Project, Tokyo, Japan
Search about this author
- Mohammad Emtiyaz Khan RIKEN Center for AI Project, Tokyo, Japan
RIKEN Center for AI Project, Tokyo, Japan
Search about this author
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsDecember 2023Article No.: 1170Pages 26923–26949
Published:30 May 2024Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
The memory perturbation equation: understanding model's sensitivity to data
Pages 26923–26949
PreviousChapterNextChapter
ABSTRACT
Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.
References
- Vincent Adam, Paul Chang, Mohammad Emtiyaz Khan, and Arno Solin. Dual Parameterization of Sparse Variational Gaussian Processes. Advances in Neural Information Processing Systems, 2021. 20Google Scholar
- Chirag Agarwal, Daniel D'Souza, and Sara Hooker. Estimating Example Difficulty using Variance of Gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 3Google ScholarCross Ref
- Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. A Closer Look at Memorization in Deep Networks. In International Conference on Machine Learning, 2017. 3, 6Google Scholar
- Gregor Bachmann, Thomas Hofmann, and Aurélien Lucchi. Generalization Through The Lens of Leave-One-Out Error. In International Conference on Learning Representations, 2022. 2, 7Google Scholar
- Samyadeep Basu, Phil Pope, and Soheil Feizi. Influence Functions in Deep Learning Are Fragile. In International Conference on Learning Representations, 2021. 6Google Scholar
- Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. 5, 7Google ScholarDigital Library
- R Dennis Cook. Detection of Influential Observation in Linear Regression. Technometrics, 1977. 1, 3, 5Google Scholar
- R Dennis Cook and Sanford Weisberg. Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression. Technometrics, 1980. 1, 2, 3Google Scholar
- R Dennis Cook and Sanford Weisberg. Residuals and Influence in Regression. Chapman and Hall, 1982. 14Google Scholar
- Corinna Cortes and Vladimir Vapnik. Support-Vector Networks. Machine learning, 1995. 4Google Scholar
- Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. Laplace Redux-Effortless Bayesian Deep Learning. Advances in Neural Information Processing Systems, 2021. 8Google Scholar
- Gintare Karolina Dziugaite and Daniel M Roy. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters Than Training Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2017. 2, 7Google Scholar
- Vitaly Feldman and Chiyuan Zhang. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. In Advances in Neural Information Processing Systems, 2020. 3Google Scholar
- Wing K Fung and CW Kwan. A Note on Local Influence Based on Normal Curvature. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1997. 1, 3Google ScholarCross Ref
- Ryan Giordano, Tamara Broderick, and Michael I Jordan. Covariances, robustness and variational Bayes. Journal of Machine Learning Research, 19(51), 2018. 5Google Scholar
- Satoshi Hara, Atsushi Nitanda, and Takanori Maehara. Data Cleansing for Models Trained with SGD. In Advances in Neural Information Processing Systems, 2019. 3Google Scholar
- Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Estimating Informativeness of Samples with Smooth Unique Information. In International Conference on Learning Representations, 2021. 3Google Scholar
- James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian Processes for Big Data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2013. 20Google Scholar
- Alexander Immer, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, and Mohammad Emtiyaz Khan. Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning. In International Conference on Machine Learning, 2021. 2, 7Google Scholar
- Alexander Immer, Maciej Korzepa, and Matthias Bauer. Improving Predictions of Bayesian Neural Nets via Local Linearization. International Conference on Artificial Intelligence and Statistics, 2021. 7Google Scholar
- Louis A Jaeckel. The Infinitesimal Jackknife. Technical report, Bell Lab., 1972. 1, 3Google Scholar
- Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. Fantastic Generalization Measures and Where To Find Them. In International Conference on Learning Representations, 2020. 2, 7, 10Google Scholar
- Angelos Katharopoulos and Francois Fleuret. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. In International Conference on Machine Learning, 2018. 3Google Scholar
- Mohammad Emtiyaz Khan. Variational Bayes Made Easy. Fifth Symposium on Advances in Approximate Bayesian Inference, 2023. 5Google Scholar
- Mohammad Emtiyaz Khan, Alexander Immer, Ehsan Abedi, and Maciej Korzepa. Approximate Inference Turns Deep Networks into Gaussian Processes. Advances in Neural Information Processing Systems, 2019. 4, 7, 17Google Scholar
- Mohammad Emtiyaz Khan and Wu Lin. Conjugate-Computation Variational Inference: Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models. In International Conference on Artificial Intelligence and Statistics, 2017. 4, 15, 20Google Scholar
- Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, and Akash Srivastava. Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam. In International Conference on Machine Learning, 2018. 6, 17, 18, 25Google Scholar
- Mohammad Emtiyaz Khan and Hávard Rue. The Bayesian Learning Rule. Journal of Machine Learning Research, 2023. 1, 4, 5, 6, 16, 17, 18Google Scholar
- George S Kimeldorf and Grace Wahba. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines. The Annals of Mathematical Statistics, 1970. 4Google ScholarCross Ref
- Diederik Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, 2015. 6Google Scholar
- Pang Wei Koh, Kai-Siang Ang, Hubert Teo, and Percy S Liang. On the Accuracy of Influence Functions for Measuring Group Effects. In Advances in Neural Information Processing Systems, 2019. 2, 3, 5Google Scholar
- Pang Wei Koh and Percy Liang. Understanding Black-Box Predictions via Influence Functions. In International Conference on Machine Learning, 2017. 1, 3, 5, 7Google ScholarDigital Library
- Aran Komatsuzaki. One Epoch is All You Need. ArXiv e-Prints, 2019. 3Google Scholar
- Pierre-Simon Laplace. Mémoires de Mathématique et de Physique. Tome Sixieme, 1774. 5Google Scholar
- Wu Lin, Mark Schmidt, and Mohammad Emtiyaz Khan. Handling the Positive-Definite Constraint in the Bayesian Learning Rule. In International Conference on Machine Learning, 2020. 6, 9, 17, 18Google Scholar
- Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. International Conference on Learning Representations, 2019. 23Google Scholar
- David JC MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. 5Google ScholarDigital Library
- Roman Novak, Yasaman Bahri, Daniel A Abolafia, Jeffrey Pennington, and Jascha Sohl-Dickstein. Sensitivity and Generalization in Neural Networks: An Empirical Study. In International Conference on Learning Representations, 2018. 1, 2, 3Google Scholar
- Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, and Torsten Hoefler. ASDL: A Unified Interface for Gradient Preconditioning in PyTorch. In NeurIPS Workshop Order up! The Benefits of Higher-Order Optimization in Machine Learning, 2023. 8Google Scholar
- Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. Deep Learning on a Data Diet: Finding Important Examples Early in Training. In Advances in Neural Information Processing Systems, 2021. 3, 6, 7Google Scholar
- Daryl Pregibon. Logistic Regression Diagnostics. The Annals of Statistics, 1981. 14Google ScholarCross Ref
- Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating Training Data Influence by Tracing Gradient Descent. In Advances in Neural Information Processing Systems, 2020. 3, 6Google Scholar
- Kamiar Rahnama Rad and Arian Maleki. A Scalable Estimate of the Out-of-Sample Prediction Error via Approximate Leave-One-Out Cross-Validation. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2020. 7Google ScholarCross Ref
- Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In International Conference on Machine Learning, 2014. 7, 17Google Scholar
- Hugh Salimbeni, Stefanos Eleftheriadis, and James Hensman. Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models. In International Conference on Artificial Intelligence and Statistics, 2018. 20Google Scholar
- Frank Schneider, Lukas Balles, and Philipp Hennig. DeepOBS: A Deep Learning Optimizer Benchmark Suite. In International Conference on Learning Representations, 2019. 21Google Scholar
- Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. A Generalized Representer Theorem. In International Conference on Computational Learning Theory, 2001. 4Google Scholar
- Saurabh Singh and Shankar Krishnan. Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 21Google ScholarCross Ref
- Ryutaro Tanno, Melanie F Pradier, Aditya Nori, and Yingzhen Li. Repairing Neural Networks by Leaving the Right Past Behind. Advances in Neural Information Processing Systems, 2022. 5Google Scholar
- Luke Tierney and Joseph B Kadane. Accurate Approximations for Posterior Moments and Marginal Densities. Journal of the American Statistical Association, 1986. 5Google ScholarCross Ref
- Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J. Gordon. An Empirical Study of Example Forgetting during Deep Neural Network Learning. In International Conference on Learning Representations, 2019. 3, 6Google Scholar
- Robert Weiss. An Approach to Bayesian Sensitivity Analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996. 3Google ScholarCross Ref
- Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng, and Yang You. To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis. ArXiv e-Prints, 2023. 3Google Scholar
- Hongtu Zhu, Joseph G. Ibrahim, Sikyum Lee, and Heping Zhang. Perturbation Selection and Influence Measures in Local Influence Analysis. The Annals of Statistics, 2007. 1Google ScholarCross Ref
Cited By
View all
Recommendations
- hom*otopy perturbation method for fractional Fornberg-Whitham equation
This article presents the approximate analytical solutions to solve the nonlinear Fornberg-Whitham equation with fractional time derivative. By using initial values, the explicit solutions of the equations are solved by using a reliable algorithm like ...
- Semi-supervised Learning with Multimodal Perturbation
ISNN '09: Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural Networks
In this paper, a new co-training style semi-supervised algorithm is proposed, which employs Bagging based multimodal perturbation to label the unlabeled data. In detail, through perturbing the training data, input attributes and learning parameters ...
Read More
- A singular perturbation of the heat equation with memory
In this paper we consider a hyperbolic equation, with a memory term in time, which can be seen as a singular perturbation of the heat equation with memory. The qualitative properties of the solutions of the initial boundary value problems associated ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
- Information
- Contributors
Published in
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages
- Editors:
- A. Oh,
- T. Naumann,
- A. Globerson,
- K. Saenko,
- M. Hardt,
- S. Levine
Copyright © 2023 Neural Information Processing Systems Foundation, Inc.
Sponsors
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
- Published: 30 May 2024
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Table of Contents
Export Citations
Your Search Results Download Request
We are preparing your search results for download ...
We will inform you here when the file is ready.
Download now!
Your Search Results Download Request
Your file of search results citations is now ready.
Download now!
Your Search Results Download Request
Your search export query has expired. Please try again.