AISec '23
Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security
Last Update : [26 November, 2023]
SESSION: Session 1: Privacy-Preserving Machine Learning
Differentially Private Logistic Regression with Sparse Solutions
- Amol Khanna
- Fred Lu
- Edward Raff
- Brian Testa
LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.
Equivariant Differentially Private Deep Learning: Why DP-SGD Needs Sparser Models
- Florian A. Hölzl
- Daniel Rueckert
- Georgios Kaissis
Differentially Private Stochastic Gradient Descent (DP-SGD) limits the amount of private information deep learning models can memorize during training. This is achieved by clipping and adding noise to the model's gradients, and thus networks with more parameters require proportionally stronger perturbation. As a result, large models have difficulties learning useful information, rendering training with DP-SGD exceedingly difficult on more challenging training tasks. Recent research has focused on combating this challenge through training adaptations such as heavy data augmentation and large batch sizes. However, these techniques further increase the computational overhead of DP-SGD and reduce its practical applicability. In this work, we propose using the principle of sparse model design to solve precisely such complex tasks with fewer parameters, higher accuracy, and in less time, thus serving as a promising direction for DP-SGD. We achieve such sparsity by design by introducing equivariant convolutional networks for model training with Differential Privacy. Using equivariant networks, we show that small and efficient architecture design can outperform current state-of-the-art with substantially lower computational requirements. On CIFAR-10, we achieve an increase of up to 9% in accuracy while reducing the computation time by more than 85%. Our results are a step towards efficient model architectures that make optimal use of their parameters and bridge the privacy-utility gap between private and non-private deep learning for computer vision.
Probing the Transition to Dataset-Level Privacy in ML Models Using an Output-Specific and Data-Resolved Privacy Profile
- Tyler LeBlond
- Joseph Munoz
- Fred Lu
- Maya Fuchs
- Elliot Zaresky-Williams
- Edward Raff
- Brian Testa
Differential privacy (DP) is the prevailing technique for protecting user data in machine learning models. However, deficits to this framework include a lack of clarity for selecting the privacy budget ε and a lack of quantification for the privacy leakage for a particular data row by a particular trained model. We make progress toward these limitations and a new perspective by which to visualize DP results by studying a privacy metric that quantifies the extent to which a model trained on a dataset using a DP mechanism is ''covered'' by each of the distributions resulting from training on neighboring datasets. We connect this coverage metric to what has been established in the literature and use it to rank the privacy of individual samples from the training set in what we call a privacy profile. We additionally show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as ε decreases, which we suggest is a tool that can enable the selection of ε by the ML practitioner wishing to make use of DP.
Information Leakage from Data Updates in Machine Learning Models
- Tian Hui
- Farhad Farokhi
- Olga Ohrimenko
In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting.
Membership Inference Attacks Against Semantic Segmentation Models
- Tomas Chobola
- Dmitrii Usynin
- Georgios Kaissis
Membership inference attacks aim to infer whether a data record has been used to train a target model by observing its predictions. In sensitive domains such as healthcare, this can constitute a severe privacy violation. In this work we attempt to address an existing knowledge gap by conducting an exhaustive study of membership inference attacks and defences in the domain of semantic image segmentation. Our findings indicate that for certain threat models, these learning settings can be considerably more vulnerable than the previously considered classification settings. We quantitatively evaluate the attacks on a number of popular model architectures across a variety of semantic segmentation tasks, demonstrating that membership inference attacks in this domain can achieve a high success rate and defending against them may result in unfavourable privacy-utility trade-offs or increased computational costs.
Utility-preserving Federated Learning
- Reza Nasirigerdeh
- Daniel Rueckert
- Georgios Kaissis
We investigate the concept of utility-preserving federated learning (UPFL) in the context of deep neural networks. We theoretically prove and experimentally validate that UPFL achieves the same accuracy as centralized training independent of the data distribution across the clients. We demonstrate that UPFL can fully take advantage of the momentum and weight decay techniques compared to centralized training, but it incurs substantial communication overhead. Ordinary federated learning, on the other hand, provides much higher communication efficiency, but it can partially benefit from the aforementioned techniques to improve utility. Given that, we propose a method called weighted gradient accumulation to gain more benefit from the momentum and weight decay akin to UPFL, while providing practical communication efficiency similar to ordinary federated learning.
SESSION: Session 2: Machine Learning Security
Certifiers Make Neural Networks Vulnerable to Availability Attacks
- Tobias Lorenz
- Marta Kwiatkowska
- Mario Fritz
To achieve reliable, robust, and safe AI systems, it is vital to implement fallback strategies when AI predictions cannot be trusted. Certifiers for neural networks are a reliable way to check the robustness of these predictions. They guarantee for some predictions that a certain class of manipulations or attacks could not have changed the outcome. For the remaining predictions without guarantees, the method abstains from making a prediction, and a fallback strategy needs to be invoked, which typically incurs additional costs, can require a human operator, or even fail to provide any prediction. While this is a key concept towards safe and secure AI, we show for the first time that this approach comes with its own security risks, as such fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback with high probability. This transfers the main system load onto the fallback, reducing the overall system's integrity and/or availability. We design two novel availability attacks, which show the practical relevance of these threats. For example, adding 1% poisoned data during training is sufficient to trigger the fallback and hence make the model unavailable for up to 100% of all inputs by inserting the trigger. Our extensive experiments across multiple datasets, model architectures, and certifiers demonstrate the broad applicability of these attacks. An initial investigation into potential defenses shows that current approaches are insufficient to mitigate the issue, highlighting the need for new, specific solutions.
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Sahar Abdelnabi
- Kai Greshake
- Shailesh Mishra
- Christoph Endres
- Thorsten Holz
- Mario Fritz
Large Language Models (LLMs) are increasingly being integrated into applications, with versatile functionalities that can be easily modulated via natural language prompts. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We show that LLM-Integrated Applications blur the line between data and instructions and reveal several new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (i.e., without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved at inference time. We derive a comprehensive taxonomy from a computer security perspective to broadly investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We then demonstrate the practical viability of our attacks against both real-world systems, such as Bing Chat and code-completion engines, and GPT-4 synthetic applications. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing reliance on LLMs, effective mitigations of these emerging threats are lacking. By raising awareness of these vulnerabilities, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users from potential attacks.
Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning
- Chris Hicks
- Vasilios Mavroudis
- Myles Foley
- Thomas Davies
- Kate Highnam
- Tim Watson
Communication networks able to withstand hostile environments are critically important for disaster relief operations. In this paper, we consider a challenging scenario where drones have been compromised in the supply chain, during their manufacture, and harbour malicious software capable of wide-ranging and infectious disruption. We investigate multi-agent deep reinforcement learning as a tool for learning defensive strategies that maximise communications bandwidth despite continual adversarial interference. Using a public challenge for learning network resilience strategies, we propose a state-of-the-art symbolic technique and study its superiority over deep reinforcement learning agents. Correspondingly, we identify three specific methods for improving the performance of our neural agents: (1) ensuring each observation contains the necessary information, (2) using symbolic agents to provide a curriculum for learning, and (3) paying close attention to reward. We apply our methods and present a new mixed strategy enabling symbolic and neural agents to work together and improve on all prior results.
The Adversarial Implications of Variable-Time Inference
- Dudi Biton
- Aditi Misra
- Efrat Levy
- Jaidip Kotak
- Ron Bitton
- Roei Schuster
- Nicolas Papernot
- Yuval Elovici
- Ben Nassi
Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess the ability to query the model and observe its outputs (e.g., labels). In this work, we demonstrate, for the first time, the ability to enhance such decision-based attacks. To accomplish this, we present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack. The leakage of inference-state elements into algorithmic timing side channels has never been studied before, and we have found that it can contain rich information that facilitates superior timing attacks that significantly outperform attacks based solely on label outputs. In a case study, we investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors. In our examination of the timing side-channel vulnerabilities associated with this algorithm, we identified the potential to enhance decision-based attacks. We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference. Our experiments show that our adversarial examples exhibit superior perturbation quality compared to a decision-based attack. In addition, we present a new threat model in which dataset inference based solely on timing leakage is performed. To address the timing leakage vulnerability inherent in the NMS algorithm, we explore the potential and limitations of implementing constant-time inference passes as a mitigation strategy.
Dictionary Attack on IMU-based Gait Authentication
- Rajesh Kumar
- Can Isik
- Chilukuri Krishna Mohan
We present a novel adversarial model for authentication systems that use gait patterns recorded by the inertial measurement unit (IMU) built into smartphones. The attack idea is inspired by and named after the concept of a dictionary attack on knowledge (PIN or password) based authentication systems. In particular, this work investigates whether it is possible to build a dictionary of IMUGait patterns and use it to launch an attack or find an imitator who can actively reproduce IMUGait patterns that match the target's IMUGait pattern. Nine physically and demographically diverse individuals walked at various levels of four predefined controllable and adaptable gait factors (speed, step length, step width, and thigh-lift), producing 178 unique IMUGait patterns. Each pattern attacked a wide variety of user authentication models. The deeper analysis of error rates (before and after the attack) challenges the belief that authentication systems based on IMUGait patterns are the most difficult to spoof; further research is needed on adversarial models and associated countermeasures.
When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence
- Benoît Coqueret
- Mathieu Carbone
- Olivier Sentieys
- Gabriel Zaid
Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way to fool the network while being undetectable to the human observer: the adversarial examples. While previous studies propose frameworks to implement such attacks in black box settings, those often rely on the hypothesis that the attacker has access to the logits of the neural network, breaking the assumption of the traditional black box. In this paper, we investigate a real black box scenario where the attacker has no access to the logits. In particular, we propose an architecture-agnostic attack which solve this constraint by extracting the logits. Our method combines hardware and software attacks, by performing a side-channel attack that exploits electromagnetic leakages to extract the logits for a given input, allowing an attacker to estimate the gradients and produce state-of-the-art adversarial examples to fool the targeted neural network. Through this example of adversarial attack, we demonstrate the effectiveness of logits extraction using side-channel as a first step for more general attack frameworks requiring either the logits or the confidence scores.
Task-Agnostic Safety for Reinforcement Learning
- Md Asifur Rahman
- Sarra Alqahtani
Reinforcement learning (RL) has been an attractive potential for designing autonomous systems due to its learning-by-exploration approach. However, this learning process makes RL inherently vulnerable and thus unsuitable for applications where safety is a top priority. To address this issue, researchers have either jointly optimized task and safety or imposed constraints to restrict exploration. This paper takes a different approach, by utilizing exploration as an adaptive means to learn a robust and safe behavior. To this end, we propose Task-Agnostic Safety for Reinforcement Learning (TAS-RL) framework to ensure safety in RL by learning a representation of unsafe behaviors and excluding them from the agent's policy. TAS-RL is task-agnostic and can be integrated with any RL task policy in the same environment, providing a self-protection layer for the system. To evaluate the robustness of TAS-RL, we present a novel study where TAS-RL and 7 safe RL baselines are tested in constrained Markov decision processes (CMDP) environments under white-box action space perturbations and changes in the environment dynamics. The results show that TAS-RL outperforms all baselines by achieving consistent near-zero safety constraint violations in continuous action space with 10 times more variations in the testing environment dynamics.
Broken Promises: Measuring Confounding Effects in Learning-based Vulnerability Discovery
- Erik Imgrund
- Tom Ganz
- Martin Härterich
- Lukas Pirch
- Niklas Risse
- Konrad Rieck
Several learning-based vulnerability detection methods have been proposed to assist developers during the secure software development life-cycle. In particular, recent learning-based large transformer networks have shown remarkably high performance in various vulnerability detection and localization benchmarks. However, these models have also been shown to have difficulties accurately locating the root cause of flaws and generalizing to out-of-distribution samples. In this work, we investigate this problem and identify spurious correlations as the main obstacle to transferability and generalization, resulting in performance losses of up to 30% for current models. We propose a method to measure the impact of these spurious correlations on learning models and estimate their true, unbiased performance. We present several strategies to counteract the underlying confounding bias, but ultimately our work highlights the limitations of evaluations in the laboratory for complex learning tasks such as vulnerability discovery.
Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition
- Luke E. Richards
- Edward Raff
- Cynthia Matuszek
Over the past decade, the machine learning security community has developed a myriad of defenses for evasion attacks. An understudied question in that community is: for whom do these defenses defend? This work considers common approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations. We outline appropriate parity metrics for analysis and begin to answer this question through empirical results of the fairness implications of machine learning security methods. We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training. The framework we propose for measuring defense equality can be applied to robustly trained models, preprocessing-based defenses, and rejection methods. We identify a set of datasets with a user-centered application and a reasonable computational cost suitable for case studies in measuring the equality of defenses. In our case study of speech command recognition, we show how such adversarial training and augmentation have non-equal but complex protections for social subgroups across gender, accent, and age in relation to user coverage. We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups. This represents the first work examining the disparity in the adversarial robustness in the speech domain and the fairness evaluation of rejection-based defenses.
SESSION: Session 3: Machine Learning for Cybersecurity
Certified Robustness of Static Deep Learning-based Malware Detectors against Patch and Append Attacks
- Daniel Gibert
- Giulio Zizzo
- Quan Le
Machine learning-based (ML) malware detectors have been shown to be susceptible to adversarial malware examples. Given the vulnerability of deep learning detectors to small changes on the input file, we propose a practical and certifiable defense against patch and append attacks on malware detection. Our defense is inspired by the concept of (de)randomized smoothing, a certifiable defense against patch attacks on image classifiers, which we adapt by: (1) presenting a novel chunk-based smoothing scheme that operates on subsequences of bytes within an executable; (2) deriving a certificate that measures the robustness against patch attacks and append attacks. Our approach works as follows: (i) during the training phase, a base classifier is trained to make classifications on a subset of continguous bytes or chunk of bytes from an executable; (ii) at test time, an executable is divided into non-overlapping chunks of fixed size and our detection system classifies the original executable as the majority vote over the predicted classes of the chunks. Leveraging the fact that patch and append attacks can only influence a certain number of chunks, we derive meaningful large robustness certificates against both attacks. To demonstrate the suitability of our approach we have trained a classifier with our chunk-based scheme on the BODMAS dataset. We show that the proposed chunk-based smoothed classifier is more robust against the benign injection attack and state-of-the-art evasion attacks in comparison to a non-smoothed classifier.
AVScan2Vec: Feature Learning on Antivirus Scan Data for Production-Scale Malware Corpora
- Robert J. Joyce
- Tirth Patel
- Charles Nicholas
- Edward Raff
When investigating a malicious file, searching for related files is a common task that malware analysts must perform. Given that production malware corpora may contain over a billion files and consume petabytes of storage, many feature extraction and similarity search approaches are computationally infeasible. Our work explores the potential of antivirus (AV) scan data as a scalable source of features for malware. This is possible because AV scan reports are widely available through services such as VirusTotal and are ~100x smaller than the average malware sample. The information within an AV scan report is abundant with information and can indicate a malicious file's family, behavior, target operating system, and many other characteristics. We introduce AVScan2Vec, a neural model trained to comprehend the semantics of AV scan data. AVScan2Vec ingests AV scan data for a malicious file and outputs a meaningful vector representation. AVScan2Vec vectors are ~3 to 85x smaller than popular alternatives in use today, enabling faster vector comparisons and lower memory usage. By incorporating Dynamic Continuous Indexing, we show that nearest-neighbor queries on AVScan2Vec vectors can scale to even the largest malware production datasets. We also demonstrate that AVScan2Vec vectors are superior to other leading malware feature vector representations across nearly all classification, clustering, and nearest-neighbor lookup algorithms that we evaluated.
Drift Forensics of Malware Classifiers
- Theo Chow
- Zeliang Kan
- Lorenz Linhardt
- Lorenzo Cavallaro
- Daniel Arp
- Fabio Pierazzi
The widespread occurrence of mobile malware still poses a significant security threat to billions of smartphone users. To counter this threat, several machine learning-based detection systems have been proposed within the last decade. These methods have achieved impressive detection results in many settings, without requiring the manual crafting of signatures. Unfortunately, recent research has demonstrated that these systems often suffer from significant performance drops over time if the underlying distribution changes---a phenomenon referred to as concept drift. So far, however, it is still an open question which main factors cause the drift in the data and, in turn, the drop in performance of current detection systems. To address this question, we present a framework for the in-depth analysis of dataset affected by concept drift. The framework allows gaining a better understanding of the root causes of concept drift, a fundamental stepping stone for building robust detection methods. To examine the effectiveness of our framework, we use it to analyze a commonly used dataset for Android malware detection as a first case study. Our analysis yields two key insights into the drift that affects several state-of-the-art methods. First, we find that most of the performance drop can be explained by the rise of two malware families in the dataset. Second, we can determine how the evolution of certain malware families and even goodware samples affects the classifier's performance. Our findings provide a novel perspective on previous evaluations conducted using this dataset and, at the same time, show the potential of the proposed framework to obtain a better understanding of concept drift in mobile malware and related settings.
Lookin' Out My Backdoor! Investigating Backdooring Attacks Against DL-driven Malware Detectors
- Mario D'Onghia
- Federico Di Cesare
- Luigi Gallo
- Michele Carminati
- Mario Polino
- Stefano Zanero
Given their generalization capabilities,deep learning algorithms may represent a powerful weapon in the arsenal of antivirus developers. Nevertheless, recent works in different domains (e.g., computer vision) have shown that such algorithms are susceptible to backdooring attacks, namely training-time attacks that aim toteach a deep neural network to misclassify inputs containing a specific trigger. This work investigates the resilience of deep learning models for malware detection against backdooring attacks. In particular, we devise two classes of attacks for backdooring a malware detector that targets the update process of the underlying deep learning classifier. While the first and most straightforward approach relies onsuperficial triggers made of static byte sequences, the second attack we propose employslatent triggers, namely specific feature configurations in the latent space of the model. The latent triggers may be produced by different byte sequences in the binary inputs, rendering the triggerdynamic in the input space and thus more challenging to detect.
We evaluate the resilience of two state-of-the-art convolutional neural networks for malware detection against both strategies and under different threat models. Our results indicate that the models do not easily learn superficial triggers in aclean label setting, even when allowing a high rate (\geq 30%) of poisoning samples. Conversely, an attacker manipulating the training labels (\textitdirty label attack) can implant an effective backdoor that activates with a superficial, static trigger into both models. The results obtained from the experimental evaluation carried out on the latent trigger attack instead show that the knowledge of the adversary on the target classifier may influence the success of the attack. Assuming perfect knowledge, an attacker can implant a backdoor that perfectly activates in 100% of the cases with a poisoning rate as low as 0.1% of the whole updating dataset (namely, 32 poisoning samples in a dataset of 32000 elements).
Lastly, we experiment with two known defensive techniques that were shown effective against other backdooring attacks in the malware domain. However, none proved reliable in detecting the backdoor or triggered samples created by our latent space attack. We then discuss some modifications to those techniques that may render them effective against latent backdooring attacks.
Reward Shaping for Happier Autonomous Cyber Security Agents
- Elizabeth Bates
- Vasilios Mavroudis
- Chris Hicks
As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of the penalties and their relative size. Then, we combine penalties with positive external rewards and study their effect compared to penalty-only training. Finally, we evaluate intrinsic curiosity as an internal positive reward mechanism and discuss why it might not be as advantageous for high-level network monitoring tasks.
Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors
- Biagio Montaruli
- Luca Demetrio
- Maura Pintor
- Luca Compagna
- Davide Balzarotti
- Battista Biggio
Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.