AISec ’18- Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security
SESSION: Keynote Address
A Marauder’s Map of Security and Privacy in Machine Learning: An overview of current and future research directions for making machine learning secure and private
There is growing recognition that machine learning (ML) exposes new security and privacy vulnerabilities in software systems, yet the technical community’s understanding of the nature and extent of these vulnerabilities remains limited but expanding. In this talk, we explore the threat model space of ML algorithms through the lens of Saltzer and Schroeder’s principles for the design of secure computer systems. This characterization of the threat space prompts an investigation of current and future research directions. We structure our discussion around three of these directions, which we believe are likely to lead to significant progress. The first seeks to design mechanisms for assembling reliable records of compromise that would help understand the degree to which vulnerabilities are exploited by adversaries, as well as favor psychological acceptability of machine learning applications. The second encompasses a spectrum of approaches to input verification and mediation, which is a prerequisite to enable fail-safe defaults in machine learning systems. The third pursues formal frameworks for security and privacy in machine learning, which we argue should strive to align machine learning goals such as generalization with security and privacy desirata like robustness or privacy. Key insights resulting from these three directions pursued both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by systematizing best practices in our growing community.
SESSION: AI Security / Adversarial Machine Learning
With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective – a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.
Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial attacks, even in a black-box scenario. However, most of the existing black-box attack algorithms need to make a huge amount of queries to perform attacks, which is not practical in the real world. We note one of the main reasons for the massive queries is that the adversarial example is required to be visually similar to the original image, but in many cases, how adversarial examples look like does not matter much. It inspires us to introduce a new attack called input-free attack, under which an adversary can choose an arbitrary image to start with and is allowed to add perceptible perturbations on it. Following this approach, we propose two techniques to significantly reduce the query complexity. First, we initialize an adversarial example with a gray color image on which every pixel has roughly the same importance for the target model. Then we shrink the dimension of the attack space by perturbing a small region and tiling it to cover the input image. To make our algorithm more effective, we stabilize a projected gradient ascent algorithm with momentum, and also propose a heuristic approach for region size selection. Recent studies have highlighted that deep neural networks (DNNs) are vulnerable to adversarial attacks, even in a black-box scenario. However, most of the existing black-box attack algorithms need to make a huge amount of queries to perform attacks, which is not practical in the real world. We note one of the main reasons for the massive queries is that the adversarial example is required to be visually similar to the original image, but in many cases, how adversarial examples look like does not matter much. It inspires us to introduce a new attack called input-free attack, under which an adversary can choose an arbitrary image to start with and is allowed to add perceptible perturbations on it. Following this approach, we propose two techniques to significantly reduce the query complexity. First, we initialize an adversarial example with a gray color image on which every pixel has roughly the same importance for the target model. Then we shrink the dimension of the attack space by perturbing a small region and tiling it to cover the input image. To make our algorithm more effective, we stabilize a projected gradient ascent algorithm with momentum, and also propose a heuristic approach for region size selection. Through extensive experiments, we show that with only 1,701 queries on average, we can perturb a gray image to any target class of ImageNet with a 100% success rate on InceptionV3. Besides, our algorithm has successfully defeated two real-world systems, the Clarifai food detection API and the Baidu Animal Identification API.
Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses
It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for example, by calculating gradients of a carefully defined loss function with respect to the input. As a countermeasure, some researchers have tried to design robust models by blocking or obfuscating gradients, even in white-box settings. Another line of research proposes introducing a separate detector to attempt to detect adversarial examples. This approach also makes use of gradient obfuscation techniques, for example, to prevent the adversary from trying to fool the detector. In this paper, we introduce stochastic substitute training, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients. For those defenses that have tried to make models more robust, with our technique, an adversary can craft adversarial examples with no knowledge of the defense. For defenses that attempt to detect the adversarial examples, with our technique, an adversary only needs very limited information about the defense to craft adversarial examples. We demonstrate our technique by applying it against two defenses which make models more robust and two defenses which detect adversarial examples.
SESSION: AI for Detecting Software Vulnerabilities
Fuzz testing, or “fuzzing,” refers to a widely deployed class of techniques for testing programs by generating a set of inputs for the express purpose of finding bugs and identifying security flaws. Grey-box fuzzing, the most popular fuzzing strategy, combines light program instrumentation with a data driven process to generate new program inputs. In this work, we present a machine learning approach that builds on AFL, the preeminent grey-box fuzzer, by adaptively learning a probability distribution over its mutation operators on a program-specific basis. These operators, which are selected uniformly at random in AFL and mutational fuzzers in general, dictate how new inputs are generated, a core part of the fuzzer’s efficacy. Our main contributions are two-fold: First, we show that a sampling distribution over mutation operators estimated from training programs can significantly improve performance of AFL. Second, we introduce a Thompson Sampling, bandit-based optimization approach that fine-tunes the mutator distribution adaptively, during the course of fuzzing an individual program and outperforms offline training. A set of experiments across complex programs demonstrates that tuning the mutational operator distribution generates sets of inputs that yield significantly higher code coverage and finds more crashes faster and more reliably than both baseline versions of AFL as well as other AFL-based learning approaches.
A Cyber Reasoning System (CRS) is designed to automatically find and exploit software vulnerabilities in complex software. To be effective, CRSs integrate multiple vulnerability detection tools (VDTs), such as symbolic executors and fuzzers. Determining which VDTs can best find bugs in a large set of target programs, and how to optimally configure those VDTs, remains an open and challenging problem. Current solutions are based on heuristics created by security analysts that rely on experience, intuition and luck. In this paper, we present Central Exploit Organizer (CEO), a proof-of-concept tool to optimize VDT selection. CEO uses machine learning to optimize the selection and configuration of the most suitable vulnerability detection tool. We show that CEO can predict the relative effectiveness of a given vulnerability detection tool, configuration, and initial input. The estimation accuracy presents an improvement between $11%$ and $21%$ over random selection. We are releasing CEO and our dataset as open source to encourage further research.
SESSION: AI for Detecting Attacks
Online fraud such as search engine poisoning, groups of fake accounts and opinion fraud is conducted by fraudsters controlling a large number of mobile devices. The key to detect such fraudulent activities is to identify devices controlled by fraudsters. Traditional approaches that fingerprint devices based on device metadata only consider single device information. However, these techniques do not utilize the relationship among different devices, which is crucial to detect fraudulent activities. In this paper, we propose an effective device fraud detection framework called FeatNet, which incorporates device features and device relationships in network representation learning. Specifically, we partition the device network into bipartite graphs and generate the neighborhoods of vertices by revised truncated random walk. Then, we generate the feature signature according to device features to learn the representation of devices. Finally, the embedding vectors of all bipartite graphs are used for fraud detection. We conduct experiments on a large-scale data set and the result shows that our approach can achieve better accuracy than existing algorithms and can be deployed in the real production environment with high performance.
Encryption is widely used across the internet to secure communications and ensure that information cannot be intercepted and read by a third party. However, encryption also allows cybercriminals to hide their messages and carry out successful malware attacks while avoiding detection. Further aiding criminals is the fact that web browsers display a green lock symbol in the URL bar when a connection to a website is encrypted. This symbol gives a false sense of security to users, who are in turn more likely to fall victim to phishing attacks. The risk of encrypted traffic means that information security researchers must explore new techniques to detect, classify, and take countermeasures against malicious traffic. So far there exists no approach for TLS detection in the wild. In this paper, we propose a method for identifying malicious use of web certificates using deep neural networks. Our system uses the content of TLS certificates to successfully identify legitimate certificates as well as malicious patterns used by attackers. The results show that our system is capable of identifying malware certificates with an accuracy of 94.87% and phishing certificates with an accuracy of 88.64%.
SESSION: AI for Forensics
De-anonymizing the authors of anonymous code (i.e., code stylometry) entails significant privacy and security implications. Most existing code stylometry methods solely rely on static (e.g., lexical, layout, and syntactic) features extracted from source code, while neglecting its key difference from regular text — it is executable! In this paper, we present Sundae, a novel code de-anonymization framework that integrates both static and dynamic stylometry analysis. Compared with the existing solutions, Sundae departs in significant ways: (i) it requires much less number of static, hand-crafted features; (ii) it requires much less labeled data for training; and (iii) it can be readily extended to new programmers once their stylometry information becomes available Through extensive evaluation on benchmark datasets, we demonstrate that Sundae delivers strong empirical performance. For example, under the setting of 229 programmers and 9 problems, it outperforms the state-of-art method by a margin of 45.65% on Python code de-anonymization. The empirical results highlight the integration of static and dynamic analysis as a promising direction for code stylometry research.
Nowadays, image captchas are being widely used across the Internet to defend against abusive programs. However, the ever-advancing capabilities of computer vision techniques are gradually diminishing the security of image captchas; yet, little is known thus far about the vulnerability of image captchas deployed in real-world settings. In this paper, we conduct the first systematic study on the security of image captchas in the wild. We classify the currently popular image captchas into three categories: selection-, slide- and click-based captchas. We propose three effective and generic attacks, each against one of these categories. We evaluate our attacks against 10 real-world popular image captchas, including those from tencent.com, google.com, and 12306.cn. Furthermore, we compare our attacks with 9 online image recognition services and human labors from 8 underground captcha-solving services. Our studies show that: (1) all of those popular image captchas are vulnerable to our attacks; (2) our attacks significantly outperform the state-of-the-arts in almost all the scenarios; and (3) our attacks achieve effectiveness comparable to human labors but with much higher efficiency. Based on our evaluation, we identify the design flaws of those popular schemes, the best practices, and the design principles towards more secure captchas.