Master Thesis: Similarity-based Inference Attacks

Artificial intelligence is transforming society. AI Sweden is the national centre for applied artificial intelligence, jointly funded by the Swedish government and our partners, both public and private. Our mission is to accelerate the use of AI for the benefit of our society, our competitiveness, and for everyone living in Sweden.

We are now looking for master thesis student(s) to further strengthen our LeakPro team.

Introduction

Machine learning models are now indispensable across numerous sectors, from healthcare to finance, where they routinely handle sensitive personal data. While these models offer significant benefits, they also raise critical privacy concerns. One of the most pressing issues is the potential for adversaries to deduce whether specific data points were part of a model’s training set, a vulnerability exploited through Membership Inference Attacks (MIAs). These attacks pose serious privacy risks, allowing malicious actors to infer sensitive information about individuals.

Much of the existing research focuses on MIAs that target exact data points in the training set [4, 8]. However, an important and often overlooked threat lies in range membership inference attacks [7, 6]. These attacks exploit the similarities between new data and training data, allowing adversaries to infer information about data points that are close—but not identical—to those used in training. This gap in the literature represents a significant privacy risk, as these near-identical data points can contain similarly sensitive information.

This thesis will investigate range membership inference attacks and evaluate their impact on the privacy of machine learning models. By extending the scope of traditional MIAs, the goal is to establish a broader understanding of actual risks when adversaries are not interested in inferring exact data points but only approximations.

Project Background

AI Sweden is leading a project to develop an open-source privacy auditing tool called LeakPro, designed to assess information leakage risks in machine learning models. This initiative, undertaken in collaboration with RISE, Sahlgrenska, Region Halland, AstraZeneca, Syndata, and Scaleout, aims to evaluate the risk of sensitive information disclosure when models trained on confidential data are made publicly available.

LeakPro supports a variety of data types, including images, tabular data, and graph structures. At the core of this tool are MIAs, which represent the most basic privacy risk, i.e., determining whether a specific data point was included in the training set. These attacks are foundational as they constitute building blocks for more sophisticated attacks.

In MIAs, the adversary’s goal is to determine if a particular data point, denoted as d, was part of the model’s training set. By interacting with a trained model, θ, the adversary constructs an algorithm, A(θ, d) → {0, 1}, where the binary output indicates whether d is believed to be a member of the training set. However, in many real-world scenarios, direct access to the actual training data d is impractical, especially when the data involves sensitive or personal information.

To address this challenge, the goal of this thesis is to explore range membership inference attacks, which take a more realistic approach. Instead of identifying specific training data points, the adversary aims to determine if any data within a given range, R, overlaps with the model’s training set. The adversary’s goal here is to construct an algorithm, A(θ,R) → {0, 1}, that outputs whether R contains any training points. This extension of MIAs offers a critical advancement towards understanding privacy risks in more complex and realistic settings.

Outline

The primary goal of this thesis is to explore the privacy risks associated with range membership inference attacks, where an adversary attempts to infer data points that are sufficiently similar to the training data, extending beyond traditional membership inference. Range MIAs are nascent with only few papers being available [7, 6], hence, this thesis is at the very forefront of research. Moreover, there are interesting connections to adjacent fields including range searching within computational geometry [1], range querying in databases [3], and reconstruction attacks [5]. The specific objectives are outlined as follows:

1. Literature study of membership inference attacks: Conduct an extensive review of existing membership inference attacks, focusing on those that can be adapted to range membership inference.
2. Define and Formalize Range Membership Inference: Develop a formal definition of range membership inference including the definition of points similarity and how to perform membership inference in multiple points within a given range.
3. Implementation and evaluation of benchmark methods: From the literature study, formulate a benchmark suite involving a threat model, dataset(s), and relevant models.
4. Enhanced Membership Inference Attacks: Based on the literature survey and the benchmark suite, we shall attempt to improve current state-of-the-art by incorporating knowledge from other attacks (7+ experts are actively working on this within LeakPro).

If time permits and the student is interested, there is also an opportunity to contribute to the opensource platform LeakPro that is currently under development [2]. Several contributions are interesting for this, e.g., a taxonomy of range membership inference attacks and/or implementation of the benchmarks/novel attacks into LeakPro.

Contact

Johan Östman: johan.ostman@ai.se

Fazeleh Hoseini: fazeleh.hoseini@ai.se

References

[1] Pankaj K Agarwal. Range searching. In Handbook of discrete and computational geometry, pages 1057–1092. Chapman and Hall/CRC, 2017.

[2] AI Sweden et al. Leakpro: Leakage profiling and risk oversight of machine learning models. https://github.com/aidotse/LeakPro.

[3] Dmytro Bogatov, George Kollios, and Leonid Reyzin. A comparative evaluation of order-revealing encryption schemes and secure range-query protocols. Proceedings of the VLDB Endowment, 12(8), 2019.

[4] Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In IEEE Symposium on Security and Privacy (SP), 2022.

[5] Marie-Sarah Lacharit´e, Brice Minaud, and Kenneth G Paterson. Improved reconstruction attacks on encrypted data using range query leakage. In IEEE Symposium on Security and Privacy (SP), 2018.

[6] Hamid Mozaffari and Virendra J. Marathe. Semantic membership inference attack against large language models. arXiv:2406.10218 [cs.LG], 2024.

[7] Jiashu Tao and Reza Shokri. Range membership inference attacks. arXiv:2408.05131 [cs.LG], 2024.

[8] Sajjad Zarifzadeh, Philippe Liu, and Reza Shokri. Low-cost high-power membership inference attacks. In International Conference on Machine Learning, 2024.

Application closes November 10th. You can apply to this thesis alone or as a pair of students. The LeakPro team is mostly located in Gothenburg but remote work is okay.

AI Sweden does not accept unsolicited support and kindly ask not to be contacted by any advertisement agents, recruitment agencies or manning companies.

Master Thesis: Similarity-based Inference Attacks

AI Sweden is now looking for master thesis students(s) to further strengthen the LeakPro team.

Introduction

Project Background

Outline

Contact

References

About AI Sweden

Master Thesis: Similarity-based Inference Attacks

AI Sweden is now looking for master thesis students(s) to further strengthen the LeakPro team.

Master Thesis: Similarity-based Inference Attacks

AI Sweden is now looking for master thesis students(s) to further strengthen the LeakPro team.

Introduction

Project Background

Outline

Contact

References

New job openings

About AI Sweden

Master Thesis: Similarity-based Inference Attacks

AI Sweden is now looking for master thesis students(s) to further strengthen the LeakPro team.