ACM CCS 2020 - November 9-13, 2020
PPMLP'20: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in PracticeFull Citation in the ACM Digital Library
SESSION: Keynote Talks
With the rapid development of technology, user privacy and data security are drawing much attention over the recent years. On one hand, how to protect user privacy while making use of customers? data is a challenging task. On the other hand, data silos are becoming one of the most prominent issues for the society. How to bridge these isolated data islands to build better AI and BI systems while meeting the data privacy and regulatory compliance requirements has imposed great challenges. Secure Collaborative Intelligence (SCI) lab at Ant Group dedicates to leverage multiple privacy-preserving technologies on AI and BI to solve these challenges. The goal of SCI lab is to build enterprise-level solutions that allow multiple data owners to achieve joint risk control, joint marketing, joint data analysis and other cross-organization collaboration scenarios without compromising information privacy or violating any related security policy. Compared with other solution providers, SCI lab has been working with top universities and research organizations to build the first privacy-preserving open platform for collaborative intelligence computation in the world. It is the first platform that combines all three cutting-edge privacy-preserving technologies, secure multi-party computation (MPC), differential privacy (DP) and trusted execution environment (TEE) that are based on cryptography, information theory and computer hardware respectively, on multi-party AI and BI collaboration scenarios. During multi-party collaboration, all inputs, computations and results are protected under specific security policy dedicatedly designed for each data owner. At this time, the platform has been applied to various business scenarios in Ant group and Alibaba Group, including joint lending, collaborative data analysis, joint payment fraud detection, etc. More than 20 financial organizations, have been benefited from the secure data collaboration and computing services provided by SCI lab.
Privacy-preserving machine learning (PPML) protocols allow to privately evaluate or even train machine learning (ML) models on sensitive data while simultaneously protecting the data and the model. So far, most of these protocols were built and optimized by hand, which requires expert knowledge in cryptography and also a thorough understanding of the ML models. Moreover, the design space is very large as there are many technologies that can even be combined with several trade-offs. Examples for the underlying cryptographic building blocks include homomorphic encryption (HE) where computation typically is the bottleneck, and secure multi-party computation protocols (MPC) that rely mostly on symmetric key cryptography where communication is often the~bottleneck.
In this keynote, I will describe our research towards engineering practical PPML protocols that protect models and data. First of all, there is no point in designing PPML protocols for too simple models such as Support Vector Machines (SVMs) or Support Vector Regression Machines (SVRs), because they can be stolen easily  and hence do not benefit from protection. Complex models can be protected and evaluated in real-time using Trusted Execution Environments (TEEs) which we demonstrated for speech recognition using Intel SGX and for keyword recognition using ARM TrustZone as respective commercial TEE technologies. Our goal is to build tools for non-experts in cryptography to automatically generate highly optimized mixed PPML protocols given a high-level specification in a ML framework like TensorFlow. Towards this, we have built tools to automatically generate optimized mixed protocols that combine HE and different MPC protocols [6-8]. Such mixed protocols can for example be used for the efficient privacy-preserving evaluation of decision trees [1, 2, 9, 13] and neural networks[2, 11, 12]. The first PPML protocols for these ML classifiers were proposed long before the current hype on PPML started [1, 2, 12]. We already have first results for compiling high-level ML specifications via our tools into mixed protocols for neural networks (from TensorFlow)  and sum-product networks (from SPFlow) , and I will conclude with major open challenges.
Multiple organizations often wish to aggregate their sensitive data and learn from it, but they cannot do so because they cannot share their data. For example, banks wish to train models jointly over their aggregate transaction data to detect money launderers because criminals hide their traces across different banks. To address such problems, my students and I developed MC2, a framework for secure collaborative computation. My talk will overview our MC2 platform, from the technical approach to results and adoption.
Biography: Raluca Ada Popa is a computer security professor at UC Berkeley. She is a co-founder and co-director of the RISELab at UC Berkeley, where her research is on systems security and applied cryptography. She is also a co-founder and CTO of a cybersecurity startup called PreVeil. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of a Sloan Foundation Fellowship award, NSF Career, Technology Review 35 Innovators under 35, and a George M. Sprowls Award for best MIT CS doctoral thesis.
Machine learning has become increasingly prominent and is widely used in various applications in practice. Despite its great success, the integrity of machine learning predictions and accuracy is a rising concern. The reproducibility of machine learning models that are claimed to achieve high accuracy remains challenging, and the correctness and consistency of machine learning predictions in real products lack any security guarantees. We introduce some of our recent results on applying the cryptographic primitive of zero knowledge proofs to the domain of machine learning to address these issues. The protocols allow the owner of a machine learning model to convince others that the model computes a particular prediction on a data sample, or achieves a high accuracy on public datasets, without leaking any information about the machine learning model itself. We developed efficient zero knowledge proof protocols for decision trees, random forests and neural networks.
SESSION: Session 1: Full Paper Presentations
The ubiquitous deployment of machine learning (ML) technologies has certainly improved many applications but also raised challenging privacy concerns, as sensitive client data is usually processed remotely at the discretion of a service provider. Therefore, privacy-preserving machine learning (PPML) aims at providing privacy using techniques such as secure multi-party computation (SMPC).
Recent years have seen a rapid influx of cryptographic frameworks that steadily improve performance as well as usability, pushing PPML towards practice. However, as it is mainly driven by the crypto community, the PPML toolkit so far is mostly restricted to well-known neural networks (NNs). Unfortunately, deep probabilistic models rising in the ML community that can deal with a wide range of probabilistic queries and offer tractability guarantees are severely underrepresented. Due to a lack of interdisciplinary collaboration, PPML is missing such important trends, ultimately hindering the adoption of privacy technology.
In this work, we introduce CryptoSPN, a framework for privacy-preserving inference of sum-product networks (SPNs) to significantly expand the PPML toolkit beyond NNs. SPNs are deep probabilistic models at the sweet-spot between expressivity and tractability, allowing for a range of exact queries in linear time. In an interdisciplinary effort, we combine techniques from both ML and crypto to allow for efficient, privacy-preserving SPN inference via SMPC.
We provide CryptoSPN as open source and seamlessly integrate it into the SPFlow library (Molina et al., arXiv 2019) for practical use by ML experts. Our evaluation on a broad range of SPNs demonstrates that CryptoSPN achieves highly efficient and accurate inference within seconds for medium-sized SPNs.
Deployment of deep learning in different fields and industries is growing day by day due to its performance, which relies on the availability of data and compute. Data is often crowd-sourced and contains sensitive information about its contributors, which leaks into models that are trained on it. To achieve rigorous privacy guarantees, differentially private training mechanisms are used. However, it has recently been shown that differential privacy can exacerbate existing biases in the data and have disparate impacts on the accuracy of different subgroups of data. In this paper, we aim to study these effects within differentially private deep learning. Specifically, we aim to study how different levels of imbalance in the data affect the accuracy and the fairness of the decisions made by the model, given different levels of privacy. We demonstrate that even small imbalances and loose privacy guarantees can cause disparate impacts.
In recent years, gradient boosted decision tree learning has proven to be an effective method of training robust models. Moreover, collaborative learning among multiple parties has the potential to greatly benefit all parties involved, but organizations have also encountered obstacles in sharing sensitive data due to business, regulatory, and liability concerns.
We propose Secure XGBoost, a privacy-preserving system that enables multiparty training and inference of XGBoost models. Secure XGBoost protects the privacy of each party's data as well as the integrity of the computation with the help of hardware enclaves. Crucially, Secure XGBoost augments the security of the enclaves using novel data-oblivious algorithms that prevent access side-channel attacks on enclaves induced via access pattern leakage.
Many companies provide neural network prediction services to users for a wide range of applications. However, current prediction systems compromise one party's privacy: either the user has to send sensitive inputs to the service provider for classification, or the service provider must store its proprietary neural networks on the user's device. The former harms the personal privacy of the user, while the latter reveals the service provider's proprietary model.
We design, implement, and evaluate Delphi, a secure prediction system that allows two parties to execute neural network inference without revealing either party's data. Delphi approaches the problem by simultaneously co-designing cryptography and machine learning. We first design a hybrid cryptographic protocol that improves upon the communication and computation costs over prior work. Second, we develop a planner that automatically generates neural network architecture configurations that navigate the performance-accuracy trade-offs of our hybrid protocol. Together, these techniques allow us to achieve a 22x improvement in online prediction latency compared to the state-of-the-art prior work.
Federated learning aggregates data from multiple sources while protecting privacy, which makes it possible to train efficient models in real scenes. However, although federated learning uses encrypted security aggregation, its decentralised nature makes it vulnerable to malicious attackers. A deliberate attacker can subtly control one or more participants and upload malicious model parameter updates, but the aggregation server cannot detect it due to encrypted privacy protection. Based on these problems, we find a practical and novel security risk in the design of federal learning. We propose an attack for conspired malicious participants to adjust the training data strategically so that the weight of a certain dimension in the aggregation model will rise or fall with a pattern. The trend of weights or parameters in the aggregation model forms meaningful signals, which is the risk of information leakage. The leakage is exposed to other participants in this federation but only available for participants who reach an agreement with the malicious participant, i.e., the receiver must be able to understand patterns of changes in weights. The attack effect is evaluated and verified on open-source code and data sets.
Graph Neural Networks (GNNs) has achieved tremendous development on perceptual tasks in recent years, such as node classification, graph classification, link prediction, etc. However, recent studies show that deep learning models of GNNs are incredibly vulnerable to adversarial attacks, so enhancing the robustness of such models remains a significant challenge. In this paper, we propose a subgraph based adversarial sample detection against adversarial perturbations. To the best of our knowledge, this is the first work on the adversarial detection in the deep-learning graph classification models, using the Subgraph Networks (SGN) to restructure the graph's features. Moreover, we develop the joint adversarial detector to cope with the more complicated and unknown attacks. Specifically, we first explain how adversarial attacks can easily fool the models and then show that the SGN can facilitate the distinction of adversarial examples generated by state-of-the-art attacks. We experiment on five real-world graph datasets using three different kinds of attack strategies on graph classification. Our empirical results show the effectiveness of our detection method and further explain the SGN's capacity to tell apart malicious graphs.
SESSION: Session 3: Spotlight Presentations
We present an extended abstract of MP2ML, a machine learning framework which integrates Intel nGraph-HE, a homomorphic encryption (HE) framework, and the secure two-party computation framework ABY, to enable data scientists to perform private inference of deep learning (DL) models trained using popular frameworks such as TensorFlow at the push of a button. We benchmark MP2ML on the CryptoNets network with ReLU activations, on which it achieves a throughput of 33.3 images/s and an accuracy of 98.6%. This throughput matches the previous state-of-the-art frameworks.
Most of the secure multi-party computation (MPC) machine learning methods can only afford simple gradient descent (sGD 1) optimizers, and are unable to benefit from the recent progress of adaptive GD optimizers (e.g., Adagrad, Adam and their variants), which include square-root and reciprocal operations that are hard to compute in MPC. To mitigate this issue, we introduce InvertSqrt, an efficient MPC protocol for computing 1/√x. Then we implement the Adam adaptive GD optimizer based on InvertSqrt and use it for training on different datasets. The training costs compare favorably to the sGD ones, indicating that adaptive GD optimizers in MPC have become practical.
Currently, financial institutions utilize personal sensitive information extensively in machine learning. It results in significant privacy risks to customers. As an essential standard of privacy, differential privacy is often applied to machine learning in recent years. To establish a prediction model of credit card default under the premise of protecting personal privacy, we consider the problems of customer data contribution difference and data sample distribution imbalance, propose weighted SVM algorithm under differential privacy. Through theoretical analysis, we have ensured the security of differential privacy. The algorithm solves the problem of prediction result deviation caused by sample distribution imbalance and effectively reduces the data sensitivity and noise error. The experimental results show that the algorithm proposed in this paper can accurately predict whether a customer is default while protecting personal privacy.
This work provides a comprehensive review of existing frameworks based on secure computing techniques in the context of private image classification. The in-depth analysis of these approaches is followed by careful examination of their performance costs, in particular runtime and communication overhead.
To further illustrate the practical considerations when using different privacy-preserving technologies, experiments were conducted using four state-of-the-art libraries implementing secure computing at the heart of the data science stack: PySyft and CrypTen supporting private inference via Secure Multi-Party Computation, TF-Trusted utilising Trusted Execution Environments and HE-Transformer relying on Homomorphic encryption.
Our work aims to evaluate the suitability of these frameworks from a usability, runtime requirements and accuracy point of view. In order to better understand the gap between state-of-the-art protocols and what is currently available in practice for a data scientist, we designed three neural network architecture to obtain secure predictions via each of the four aforementioned frameworks. Two networks were evaluated on the MNIST dataset and one on the Malaria Cell image dataset. We observed satisfying performances for TF-Trusted and CrypTen and noted that all frameworks perfectly preserved the accuracy of the corresponding plaintext model.
The membership inference attack refers to the attacker's purpose to infer whether the data sample is in the target classifier training dataset. The ability of an adversary to ascertain the presence of an individual constitutes an obvious privacy threat if relate to a group of users that share a sensitive characteristic. Many defense methods have been proposed for membership inference attack, but they have not achieved the expected privacy effect. In this paper, we quantify the impact of these choices on privacy in experiments using logistic regression and neural network models. Using both formal and empirical analyses, we illustrate that differential privacy and L2 regularization can effectively prevent member inference attacks.
We present TinyGarble2 -- a C++ framework for privacy-preserving computation through the Yao's Garbled Circuit (GC) protocol in both the honest-but-curious and the malicious security models. TinyGarble2 provides a rich library with arithmetic and logic building blocks for developing GC-based secure applications. The framework offers abstractions among three layers: the C++ program, the GC back-end and the Boolean logic representation of the function being computed. TinyGarble2 thus allowing the most optimized versions of all pertinent components. These abstractions, coupled with secure share transfer among the functions make TinyGarble2 the fastest and most memory-efficient GC framework. In addition, the framework provides a library for Convolutional Neural Networks (CNN). Our evaluations show that TinyGarble2 is the fastest among the current end-to-end GC frameworks while also being scalable in terms of memory footprint. Moreover, it performs 18x faster on the CNN LeNet-5 compared to the existing scalable frameworks.