Master Thesis: Semantically Aware Attacks on Text-Based Modes
AI Sweden is now looking for master thesis students(s) to further strengthen the LeakPro team.
Artificial intelligence is transforming society. AI Sweden is the national centre for applied artificial intelligence, jointly funded by the Swedish government and our partners, both public and private. Our mission is to accelerate the use of AI for the benefit of our society, our competitiveness, and for everyone living in Sweden.
We are now looking for master thesis student(s) to further strengthen our LeakPro team.
Introduction
Training deep-learning models require large amounts of data. When this data is sensitive, e.g., containing personal information, it is important to ensure that no information can be extracted from the trained models. Lately, adversarial attempts at extracting training data have grown in interest. Two prominent attacks are membership inference attacks, attempting to guess if a given data point was present in the training data, and reconstruction attacks, also called model-inversion attacks, which attempt to recreate training data by interacting with the trained model.
Although such attacks are relevant for any data modality, perhaps the most pressing issue pertains to text-data where the issue of copy-right has recently been in the media attention due to the lawsuit against Open-AI [8]. In light of this, there is a pressing need for content creators to confidently test whether or not their outputs have been included and leveraged in the training of commercial models. Membership inference attacks offer a promising venue for such an assessment.
Project Background
AI Sweden is currently leading a project within adversarial information extraction against trained machine learning models. The project is called LeakPro and is a collaboration including RISE, Sahlgrenska, Region Halland, Astra Zeneca, Syndata, and Scaleout. The main goal of LeakPro is to create an opensource tool to stress-test trained machine learning models to understand the risk of leaking sensitive information from the training data. Currently, LeakPro supports image, tabular, and graph data. However, the platform is supposed to be data-modality agnostic.
In a membership inference attack, an adversary is expected to have access to a trained model θ and a data sample d, sampled from the same distribution as the unknown training data. The goal of the adversary is to construct an algorithm A(θ, d) → {0, 1}, where the binary output guesses if d was part of the unknown training data or not.
For text, the definition of a data point being part of a dataset is not clear-cut. For example, in the most common approach to membership inference attacks, the adversary attempts to deduct if a specific sentence was part of the dataset [5, 6, 3, 9]. However, this definition does not account for different text snippets having equivalent semantics, something that was recently pointed out in [7]. Moreover, in many situations, one is interested in inferring more than just a single sentence, in which case it may become even more important to consider the semantics. An attack along this direction has been proposed for RAG-based LLMs [4].
Outline
The goal of this project is to investigate how semantics may be incorporated into membership inference attacks on generative text-based models. Although the literature is nascent, there are already attacks being proposed [5, 6, 3, 9] with some also considering the importance of semantics [7, 4]. The objectives of the project are outlined below.
1. Literature study of text-based membership inference attacks: The goal of this part is to summarize i) different ways to think about membership inference for text, e.g., sentence level vs corpus level, ii) how to think about and measure semantic similarity, iii) what are relevant membership inference attacks and settings, and iv) relevant benchmark datasets.
2. Implementation and evaluation of benchmark methods: From the literature study, formulate a benchmark suite involving a threat model, dataset(s), and relevant models. Examples include black-box access to a fine-tuned LLM where the adversary is trying to infer fine-tuning data or black-box access to an RAG-based LLM where the adversary is attempting to infer entries in the RAG databases. These models should be evaluated in a meaningful way, following the TPR@FPR approach in [2].
3. Enhanced Membership Inference Attacks: Based on the literature survey and the benchmark suite, we shall next attempt to improve on current state-of-the-art by incorporating knowledge from attacks on other modalities (7+ experts are actively working on this within LeakPro). There are already several ideas for this step, stemming from a 2-month project conducted during the summer of 2024.
If time permits and if the student has an interest, there is also an opportunity to contribute to the open-source platform LeakPro that is currently under development [1]. For this, several contributions are interesting, e.g., a taxonomy for text-based membership inference attacks and what components are important, similarities between text-based attacks and attacks on other modalities, and/or implementation of the benchmarks/novel attacks into LeakPro.
Contact
Johan Östman: johan.ostman@ai.se
Fazeleh Hoseini: fazeleh.hoseini@ai.se
References
- Sayanton V Dibbo. Sok: Model inversion attack landscape: Taxonomy, challenges, and future roadmap. In 2023 IEEE 36th Computer Security Foundations Symposium (CSF), pages 439–456. IEEE, 2023.
- AI Sweden et al. Leakpro: Leakage profiling and risk oversight of machine learning models. Leakpro, 2024. Accessed: 2024-09-17.
- Rongke Liu, Dong Wang, Yizhi Ren, Zhen Wang, Kaitian Guo, Qianqian Qin, and Xiaolei Liu. Unstoppable attack: Label-only model inversion via conditional diffusion model. IEEE Transactions on Information Forensics and Security, 19, 2024.
- Bao-Ngoc Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, and Ngai-Man Man Che- ung. Label-only model inversion attacks via knowledge transfer. Advances in Neural Information Processing Systems, 36, 2024.
- Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, and Ngai-Man Cheung. Re- thinking model inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Ahmed Salem, Giovanni Cherubin, David Evans, Boris K¨opf, Andrew Paverd, Anshuman Suri, Shruti Tople, and Santiago Zanella-B´eguelin. Sok: Let the privacy games begin! a unified treatment of data inference privacy in machine learning. In IEEE Symposium on Security and Privacy (SP), 2023.
- Xiaoxiao Sun, Nidham Gazagnadou, Vivek Sharma, Lingjuan Lyu, Hongdong Li, and Liang Zheng. Privacy assessment on reconstructed images: are existing evaluation metrics faithful to human per- ception? Advances in Neural Information Processing Systems, 36, 2024.
- Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, and Dawn Song. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 253–261, 2020.
Application closes November 10th. You can apply to this thesis alone or as a pair of students. The LeakPro team is mostly located in Gothenburg but remote work is okay.
AI Sweden does not accept unsolicited support and kindly ask not to be contacted by any advertisement agents, recruitment agencies or manning companies.
- Locations
- Flexible location, Sweden
About AI Sweden
AI Sweden is the national center for applied artificial intelligence, jointly funded by the Swedish government and our partners, both public and private. Our mission is to accelerate the use of AI for the benefit of our society, our competitiveness, and for everyone living in Sweden.
Listen to Johanna, Vinutha, and Martin to hear what they say about working at AI Sweden in this podcast episode on Spotify!
Master Thesis: Semantically Aware Attacks on Text-Based Modes
AI Sweden is now looking for master thesis students(s) to further strengthen the LeakPro team.
Loading application form