In the contemporary landscape of the digital world where industry relies on the technology of artificial intelligence which fundamentally depends on the concepts of machine learning. Machine learning is a field where it utilizes the immense amount of data and then feeds this data into a structure called models. This data “trains” this model. Abundant data isused to train these models, for this data to be as accurate as it can be optimally. However, reliance on this abundant data exposes us to a significant risk to user privacy which is a matter of concern. It directly challenges the existence of “right to be forgotten”. There is an intricate relation between the model and the data with which it is trained. Traditional data management systems can easily erase user information from databases, but the scenario becomes considerably complex with machine learning models. This gives rise to the whole new concept called machine unlearning. This project addresses this challenge by developing a standalone tool and API specifically designed to facilitate the forgetting of data by machine learning models. Our objective is to pioneer a practical approach to enhance user privacy in the context of machine learning technologies. By creating an efficient and reliable solution, we aim to bridge the gap between data privacy rights and the intricate workings of machine learning models. Through this endeavor, we contribute to the evolving discourse on privacy, data security, and ethical AI practices in the digital age.
Introduction
Overview
In the era of massive user data generation, privacy regulations like GDPR and CCPA have introduced the "Right to be Forgotten", which mandates that personal data must be removed from systems, including machine learning (ML) models, upon user request. While deleting data from databases is straightforward, removing it from trained ML models is complex and resource-intensive. This has led to the emergence of a new concept called Machine Unlearning, which allows selective removal of user data from trained models without full retraining.
1. Existing Systems
Current unlearning approaches face several limitations:
Require access to original or proxy data.
Incur high computational/storage costs.
May not guarantee full data forgetting.
Can negatively affect model performance or introduce bias.
2. Proposed System
The proposed framework offers a practical, Python-based solution for machine unlearning:
Removes user data from ML models without full retraining.
Compatible with TensorFlow, PyTorch, and Scikit-learn.
Preserves model performance and generalization.
Supports privacy and fairness regulations.
Designed for ease of use with a user-friendly interface and documentation.
3. Scope
The project targets:
Selective forgetting of specific data points.
Upholding privacy, ethics, and security.
Enabling user control over their data in AI systems.
4. System Design
Typical ML pipeline: Data → Preprocessing → Training → Evaluation → Deployment.
In unlearning: The system identifies and modifies model parameters influenced by unwanted data.
Emphasizes transparency, compliance, and iterative refinement.
5. Federated Unlearning
Federated Learning trains models across decentralized data on client devices. Unlearning in this context means:
Removing a client's data contribution from the global model.
Uses the unlearn_global_model parameter to coordinate data removal system-wide.
Uses Federated Averaging (FedAvg) to update the global model using only remaining clients' data.
6. Implementation: Types of Unlearning Algorithms
A. Model-Agnostic Unlearning
Independent of the model’s architecture.
Uses statistical/data mining techniques to remove unwanted data.
Offers flexibility and wide compatibility across ML models.
B. Model-Intrinsic Unlearning
Tailored to specific ML architectures.
Offers fine-grained control over which data/features to remove.
Uses optimization techniques like gradient-based methods to minimize performance loss.
Conclusion
A. Framework Conclusion
In summary, the machine unlearning framework represents a significant advancement in machine learning, offering efficient solutions for addressing data privacy concerns and enabling the selective removal of data from models. Its ability to provide different tools and combine various algorithms according to specific requirements makes it a versatile and powerful tool for managing and controlling data while maintaining model integrity and performance
References
[1] A Survey Of Machine Unlearning - Nguyen, T. T., Huynh, T. T., Nguyen, P. L., Liew, A. W. C., Yin, H., & Nguyen, Q. V. H. (2022). A survey of machine unlearning. arXiv preprint arXiv:2209.02299
[2] Coded Machine Unlearning - Aldaghri, N., Mahdavifar, H., &Beirami, A. (2021). Coded machine unlearning. IEEE Access, 9, 88137-88150
[3] Toward Highly-Efficient and Accurate Services QoS Prediction via Machine Unlearning - Zeng, Y., Xu, J., Li, Y., Chen, C., Dai, Q., & Du, Z. (2023). Towards Highly-efficient and Accurate Services QoS Prediction via Machine Unlearning. IEEE Access
[4] Approximate Data Deletion from Machine Learning Models - Izzo, Z., Smart, M. A., Chaudhuri, K., &Zou, J. (2021, March). Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics (pp. 2008-2016). PMLR.
[5] Fast Yet Effective Machine Unlearning - Tarun, A. K., Chundawat, V. S., Mandal, M., &Kankanhalli, M. (2023). Fast yet effective machine unlearning. IEEE Transactions on Neural Networks and Learning Systems. arXiv:2111.08947 [cs.LG]
[6] Lifelong Anomaly Detection Through Unlearning - Min Du, Zhi Chen, Chang Liu, Rajvardhan Oak, and Dawn Song. 2019. Lifelong Anomaly Detection Through Unlearning. In 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ’19), November 11–15, 2019, London, United Kingdom. ACM, New York, NY, USA, 15 pages.
[7] Machine Unlearning: Its Nature, Scope, and Importance for a “Delete Culture” - Floridi, L. (2023). Machine Unlearning: its nature, scope, and importance for a “delete culture”. Philosophy & Technology, 36(2), 42.
[8] “Amnesia” - A Selection of Machine Learning Models That Can Forget User Data Very Fast - Schelter, S. (2020). Amnesia-a selection of machine learning models that can forget user data very fast. suicide, 8364(44035), 46992.
[9] Efficient Repair of Polluted Machine Learning Systems via Causal Unlearning - Yinzhi Cao, Alexander Fangxiao Yu, Andrew Aday, Eric Stahl, Jon Merwine, and Junfeng Yang. 2018. Efficient Repair of Polluted Machine Learning Systems via Causal Unlearning. In Proceedings of 2018 ACM Asia Conference on Computer and Communications Security, Incheon, Republic of Korea, June 4–8, 2018 (ASIA CCS ’18), 13 pages.
[10] Remember what you want to forget - Sekhari, A., Acharya, J., Kamath, G., & Suresh, A. T. (2021). Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34, 18075-18086
[11] When Machine Unlearning Jeopardizes Privacy - Chen, Min, et al. \"When machine unlearning jeopardizes privacy.\" Proceedings of the 2021 ACM SIGSAC conference on computer and communications security. 2021. arXiv:2005.02205 [cs.CR]
[12] Towards making systems forget with machine unlearning - Y. Cao and J. Yang, \"Towards Making Systems Forget with Machine Unlearning,\" 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 2015, pp. 463-480, doi: 10.1109/SP.2015.35.
[13] Algorithms that remember: model inversion attacks and data protection law - Veale M, Binns R, Edwards L. 2018 Algorithms that remember: model inversion attacks and data protection law.Phil. Trans. R. Soc. A 376: 20180083
[14] Forgeability and Membership Inference Attacks - Zhifeng Kong, Amrita Roy Chowdhury?, and KamalikaChaudhuri. 2022. Forgeability and Membership Inference Attacks. In Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security (AISec ’22), November 11, 2022, Los Angeles, CA, USA. ACM, New York, NY, USA, 7 pages.
[15] Certifiable Machine Unlearning for Linear Models - Mahadevan, A., &Mathioudakis, M. (2021). Certifiable machine unlearning for linear models. arXiv preprint arXiv:2106.15093.
[16] Certified Data Removal from Machine Learning Models - Guo, C., Goldstein, T., Hannun, A., & Van Der Maaten, L. (2019). Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030.
[17] Machine unlearning: linear filtration for logit based classifiers - Baumhauer, T., Schöttle, P., &Zeppelzauer, M. (2022). Machine unlearning: Linear filtration for logit-based classifiers. Machine Learning, 111(9), 3203-3226
[18] Delta Boost: Gradient Boosting Decision Trees with Efficient Machine Unlearning - Zhaomin Wu, Junhui Zhu, Qinbin Li, and Bingsheng He. 2023. DeltaBoost: Gradient Boosting Decision Trees with Efficient Machine Unlearning. Proc. ACM Manag. Data 1, 2, Article 168 (June 2023), 26 pages.
[19] Machine Unlearning for Random Forests - Brophy, J., &Lowd, D. (2021, July). Machine unlearning for random forests. In International Conference on Machine Learning (pp. 1092-1104). PMLR.
[20] Asynchronous Federated Unlearning - Ningxin Su, Baochun Li. “Asynchronous Federated Unlearning,” in the Proceedings of IEEE INFOCOM 2023, New York Area, U.S.A., May 17–20, 2023.
[21] Machine Unlearning - Bourtoule, Lucas, et al. \"Machine unlearning.\" 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021.
[22] Nguyen, Q. P., Low, B. K. H., &Jaillet, P. (2020). Variationalbayesian unlearning. Advances in Neural Information Processing Systems, 33, 16025-16036.