CCS '23

Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
Last Update : [26 November, 2023]

SESSION: Session 1: Cryptography for Anonymity

ASMesh: Anonymous and Secure Messaging in Mesh Networks Using Stronger, Anonymous Double Ratchet
  • Alexander Bienstock
  • Paul Rösler
  • Yi Tang

The majority of secure messengers have single, centralized service providers that relay ciphertexts between users to enable asynchronous communication. However, in some scenarios such as mass protests in censored networks, relying on a centralized provider is fatal. Mesh messengers attempt to solve this problem by building ad hoc networks in which user clients perform the ciphertext-relaying task. Yet, recent analyses of widely deployed mesh messengers discover severe security weaknesses (Albrecht et al. CT-RSA'21 & USENIX Security'22).

To support the design of secure mesh messengers, we provide a new, more complete security model for mesh messaging. Our model captures forward and post-compromise security, as well as forward and post-compromise anonymity, both of which are especially important in this setting. We also identify novel, stronger confidentiality goals that can be achieved due to the special characteristics of mesh networks (e.g., delayed communication, distributed network and adversary).

Finally, we develop a new protocol, called ASMesh, that provably satisfies these security goals. For this, we revisit Signal's Double Ratchet and propose non-trivial enhancements. On top of that, we add a mechanism that provides forward and post-compromise anonymity. Thus, our protocol efficiently provides strong confidentiality and anonymity under past and future user corruptions. Most of our results are also applicable to traditional messaging.

We prove security of our protocols and evaluate their performance in simulated mesh networks. Finally, we develop a proof of concept implementation.

Lattice-Based Blind Signatures: Short, Efficient, and Round-Optimal
  • Ward Beullens
  • Vadim Lyubashevsky
  • Ngoc Khanh Nguyen
  • Gregor Seiler

We propose a 2-round blind signature protocol based on the random oracle heuristic and the hardness of standard lattice problems (Ring/Module-SIS/LWE and NTRU) with a signature size of 20 KB. The protocol is round-optimal and has a transcript size that can be as small as 60 KB. This blind signature is around 4 times shorter than the most compact lattice-based scheme based on standard assumptions of del Pino and Katsumata (Crypto 2022) and around 2 times shorter than the scheme of Agrawal et al. (CCS 2022) based on their newly-proposed one-more-ISIS assumption. We also propose a "keyed-verification'' blind signature scheme in which the verifier and the signer need to share a secret key. This scheme has a smaller signature size of only 48 bytes, but further work is needed to explore the efficiency of its signature generation protocol.

Aggregate Signatures with Versatile Randomization and Issuer-Hiding Multi-Authority Anonymous Credentials
  • Omid Mir
  • Balthazar Bauer
  • Scott Griffy
  • Anna Lysyanskaya
  • Daniel Slamanig

Anonymous credentials (AC) offer privacy in user-centric identity management. They enable users to authenticate anonymously, revealing only necessary attributes. With the rise of decentralized systems like self-sovereign identity, the demand for efficient AC systems in a decentralized setting has grown. Relying on conventional AC systems, however, require users to present independent credentials when obtaining them from different issuers, leading to increased complexity. AC systems should ideally support being multi-authority for efficient presentation of multiple credentials from various issuers. Another vital property is issuer hiding, ensuring that the issuer's identity remains concealed, revealing only compliance with the verifier's policy. This prevents unique identification based on the sole combination of credential issuers. To date, there exists no AC scheme satisfying both properties simultaneously.

This paper introduces Issuer-Hiding Multi-Authority Anonymous Credentials (IhMA), utilizing two novel signature primitives: Aggregate Signatures with Randomizable Tags and Public Keys and Aggregate Mercurial Signatures. We provide two constructions of IhMA with different trade-offs based on these primitives and believe that they will have applications beyond IhMA. Besides defining the notations and rigorous security definitions for our primitives, we provide provably secure and efficient constructions, and present benchmarks to showcase practical efficiency.

Concurrent Security of Anonymous Credentials Light, Revisited
  • Julia Kastner
  • Julian Loss
  • Omar Renawi

We revisit the concurrent security guarantees of the well-known Anonymous Credentials Light (ACL) scheme (Baldimtsi and Lysyanskaya, CCS'13). This scheme was originally proven secure when executed sequentially, and its concurrent security was left as an open problem. A later work of Benhamouda et al. (EUROCRYPT'21) gave an efficient attack on ACL when executed concurrently, seemingly resolving this question once and for all.

In this work, we point out a subtle flaw in the attack of Benhamouda et al. on ACL and show, in spite of popular opinion, that it can be proven concurrently secure. Our modular proof in the algebraic group model uses an ID scheme as an intermediate step and leads to a major simplification of the complex security argument for Abe's Blind Signature scheme by Kastner et al. (PKC'22).

SESSION: Session 2: Machine Learning Applications I

Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance
  • Savino Dambra
  • Yufei Han
  • Simone Aonzo
  • Platon Kotzias
  • Antonino Vitale
  • Juan Caballero
  • Davide Balzarotti
  • Leyla Bilge

Many studies have proposed machine-learning (ML) models for malware detection and classification, reporting an almost-perfect performance. However, they assemble ground-truth in different ways, use diverse static- and dynamic-analysis techniques for feature extraction, and even differ on what they consider a malware family. As a consequence, our community still lacks an understanding of malware classification results: whether they are tied to the nature and distribution of the collected dataset, to what extent the number of families and samples in the training dataset influence performance, and how well static and dynamic features complement each other.

This work sheds light on those open questions by investigating the impact of datasets, features, and classifiers on ML-based malware detection and classification. For this, we collect the largest balanced malware dataset so far with 67k samples from 670 families (100 samples each), and train state-of-the-art models for malware detection and family classification using our dataset. Our results reveal that static features perform better than dynamic features, and that combining both only provides marginal improvement over static features. We discover no correlation between packing and classification accuracy, and that missing behaviors in dynamically-extracted features highly penalise their performance. We also demonstrate how a larger number of families to classify makes the classification harder, while a higher number of samples per family increases accuracy. Finally, we find that models trained on a uniform distribution of samples per family better generalize on unseen data.

Privacy Leakage via Speech-induced Vibrations on Room Objects through Remote Sensing based on Phased-MIMO
  • Cong Shi
  • Tianfang Zhang
  • Zhaoyi Xu
  • Shuping Li
  • Donglin Gao
  • Changming Li
  • Athina Petropulu
  • Chung-Tse Michael Wu
  • Yingying Chen

Speech eavesdropping has long been an important threat to the privacy of individuals and enterprises. Recent research has shown the possibility of deriving private speech information from sound-induced vibrations. Acoustic signals transmitted through a solid medium or air may induce vibrations upon solid surfaces, which can be picked up by various sensors (e.g., motion sensors, high-speed cameras and lasers), without using a microphone. To date, these threats are limited to scenarios where the sensor is in contact with the vibration surface or at least in the visual line-of-sight.

In this paper, we revisit this important line of research and show that a remote, long-distance, and even thru-the-wall speech eavesdropping attack is possible. We discover a new form of speech eavesdropping attack that remotely elicits speech from minute surface vibrations upon common room objects (e.g., paper bags, plastic storage bin) via mmWave sensing, signal processing, and advanced deep learning techniques. While mmWave signals have high sensitivity for vibrations, they have limited sensing distance and normally do not penetrate through walls. We overcome this key challenge through designing and implementing a high-resolution software-defined phased-MIMO radar that integrates transmit beamforming, virtual array, and receive beamforming. The proposed system enhances sensing directivity by focusing all the mmWave beams toward a target room object, allowing mmWave signals to pick up minute speech-induced vibrations from a long distance and even through walls. To realize the attack, we design an object identification technique that scans objects in a room and identifies a prominent object that is most sensitive to speech vibrations for vibration feature extraction. We successfully demonstrate speech privacy leakage using speech-induced vibrations via the development of a deep learning framework. Our framework can leverage domain adaptation techniques to infer speech content based only on the unlabeled vibration data of a victim. We validate the proof-of-concept attack on digit recognition through extensive experiments, involving 40 speakers, five common room objects, and attack scenarios with mmWave devices inside and outside the room. Our phased-MIMO-based attack can achieve success rates of 88% ~ 98% and 64% ~ 86% with and without using speech labels for training. The success rates are 81% ~ 94% and 58% ~ 74% for thru-the-wall attacks. Furthermore, we discuss possible defense methods to mitigate this unprecedented security threat.

Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting
  • Ping He
  • Yifan Xia
  • Xuhong Zhang
  • Shouling Ji

The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AMD methods demonstrate remarkable performance but rely on strong assumptions that may not be realistic in real-world scenarios, e.g., the knowledge requirements about feature space, model parameters, and training dataset. To address this limitation, we introduce AdvDroidZero, an efficient query-based attack framework against ML-based AMD methods that operates under the zero knowledge setting. Our extensive evaluation shows that AdvDroidZero is effective against various mainstream ML-based AMD methods, in particular, state-of-the-art such methods and real-world antivirus solutions.

Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication
  • Francesco Marchiori
  • Mauro Conti

Lithium-ion (Li-ion) batteries are the primary power source in various applications due to their high energy and power density. Their market was estimated to be up to 48 billion U.S. dollars in 2022. However, the widespread adoption of Li-ion batteries has resulted in counterfeit cell production, which can pose safety hazards to users. Counterfeit cells can cause explosions or fires, and their prevalence in the market makes it difficult for users to detect fake cells. Indeed, current battery authentication methods can be susceptible to advanced counterfeiting techniques and are often not adaptable to various cells and systems.

In this paper, we improve the state of the art on battery authentication by proposing two novel methodologies, DCAuth and EISthentication, which leverage the internal characteristics of each cell through Machine Learning models. Our methods automatically authenticate lithium-ion battery models and architectures using data from their regular usage without the need for any external device. They are also resilient to the most common and critical counterfeit practices and can scale to several batteries and devices. To evaluate the effectiveness of our proposed methodologies, we analyze time-series data from a total of 20 datasets that we have processed to extract meaningful features for our analysis. Our methods achieve high accuracy in battery authentication for both architectures (up to 0.99) and models (up to 0.96). Moreover, our methods offer comparable identification performances. By using our proposed methodologies, manufacturers can ensure that devices only use legitimate batteries, guaranteeing the operational state of any system and safety measures for the users.

SESSION: Session 3: Attacks & Threats

TxPhishScope: Towards Detecting and Understanding Transaction-based Phishing on Ethereum
  • Bowen He
  • Yuan Chen
  • Zhuo Chen
  • Xiaohui Hu
  • Yufeng Hu
  • Lei Wu
  • Rui Chang
  • Haoyu Wang
  • Yajin Zhou

The prosperity of Ethereum attracts many users to send transactions and trade crypto assets. However, this has also given rise to a new form of transaction-based phishing scam, named TxPhish. Specifically, tempted by high profits, users are tricked into visiting fake websites and signing transactions that enable scammers to steal their crypto assets. The past year has witnessed 11 large-scale TxPhish incidents causing a total loss of more than 70 million.

In this paper, we conduct the first empirical study of TxPhish on Ethereum, encompassing the process of a TxPhishTxPhish campaign and details of phishing transactions. To detect TxPhish websites and extract phishing accounts automatically, we present TxPhish, which dynamically visits the suspicious websites, triggers transactions, and simulates results. Between November 25, 2022, and July 31, 2023, we successfully detected and reported 26,333 TxPhish websites and 3,486 phishing accounts. Among all of documented TxPhish websites, 78.9% of them were first reported by us, making TxPhish the largest TxPhish website detection system. Moreover, we provided criminal evidence of four phishing accounts and their fund flow totaling 1.5 million to aid in the recovery of funds for the victims. In addition, we identified bugs in six Ethereum projects and received appreciation.

Uncle Maker: (Time)Stamping Out The Competition in Ethereum
  • Aviv Yaish
  • Gilad Stern
  • Aviv Zohar

We present and analyze an attack on Ethereum 1's consensus mechanism, which allows miners to obtain higher mining rewards compared to their honest peers. This attack is novel in that it relies on manipulating block timestamps and the difficulty-adjustment algorithm (DAA) to give the miner an advantage whenever block races ensue. We call our attack Uncle Maker, as it induces a higher rate of uncle blocks. We describe several variants of the attack. Among these, one that is risk-free for miners.

Our attack differs from past attacks such as Selfish Mining, that have been shown to be profitable but were never observed in practice: We analyze data from Ethereum's blockchain and show that some of Ethereum's miners have been actively running a variant of this attack for several years without being detected, making this the first evidence of miner manipulation of a major consensus mechanism. We present our evidence, as well as estimates of the profits gained by attackers, at the expense of honest miners.

Since several blockchains are still running Ethereum 1's protocol, we suggest concrete fixes and implement them as a patch for geth.

How Hard is Takeover in DPoS Blockchains? Understanding the Security of Coin-based Voting Governance
  • Chao Li
  • Balaji Palanisamy
  • Runhua Xu
  • Li Duan
  • Jiqiang Liu
  • Wei Wang

Delegated-Proof-of-Stake (DPoS) blockchains, such as EOSIO, Steem and TRON, are governed by a committee of block producers elected via a coin-based voting system. We recently witnessed the first de facto blockchain takeover that happened between Steem and TRON. Within one hour of this incident, TRON founder took over the entire Steem committee, forcing the original Steem community to leave the blockchain that they maintained for years. This is a historical event in the evolution of blockchains and Web 3.0. Despite its significant disruptive impact, little is known about how vulnerable DPoS blockchains are in general to takeovers and the ways in which we can improve their resistance to takeovers.

In this paper, we demonstrate that the resistance of a DPoS blockchain to takeovers is governed by both the theoretical design and the actual use of its underlying coin-based voting governance system. When voters actively cooperate to resist potential takeovers, our theoretical analysis reveals that the current active resistance of DPoS blockchains is far below the theoretical upper bound. However in practice, voter preferences could be significantly different. This paper presents the first large-scale empirical study of the passive takeover resistance of EOSIO, Steem and TRON. Our study identifies the diversity in voter preferences and characterizes the impact of this diversity on takeover resistance. Through both theoretical and empirical analyses, our study provides novel insights into the security of coin-based voting governance and suggests potential ways to improve the takeover resistance of any blockchain that implements this governance model.

Demystifying DeFi MEV Activities in Flashbots Bundle
  • Zihao Li
  • Jianfeng Li
  • Zheyuan He
  • Xiapu Luo
  • Ting Wang
  • Xiaoze Ni
  • Wenwu Yang
  • Xi Chen
  • Ting Chen

Decentralized Finance, mushrooming in permissionless blockchains, has attracted a recent surge in popularity. Due to the transparency of permissionless blockchains, opportunistic traders can compete to earn revenue by extracting Miner Extractable Value (MEV), which undermines both the consensus security and efficiency of blockchain systems. The Flashbots bundle mechanism further aggravates the MEV competition because it empowers opportunistic traders with the capability of designing more sophisticated MEV extraction. In this paper, we conduct the first systematic study on DeFi MEV activities in Flashbots bundle by developing ActLifter, a novel automated tool for accurately identifying DeFi actions in transactions of each bundle, and ActCluster, a new approach that leverages iterative clustering to facilitate us to discover known/unknown DeFi MEV activities. Extensive experimental results show that ActLifter can achieve nearly 100% precision and recall in DeFi action identification, significantly outperforming state-of-the-art techniques. Moreover, with the help of ActCluster, we obtain many new observations and discover 17 new kinds of DeFi MEV activities, which occur in 53.12% of bundles but have not been reported in existing studies.

SESSION: Session 4: Usable Privacy

Marketing to Children Through Online Targeted Advertising: Targeting Mechanisms and Legal Aspects
  • Tinhinane Medjkoune
  • Oana Goga
  • Juliette Senechal

Many researchers and organizations, such as WHO and UNICEF, have raised awareness of the dangers of advertisements targeted at children. While most existing laws only regulate ads on television that may reach children, lawmakers have been working on extending regulations to online advertising and, for example, forbid (e.g., the DSA) or restrict (e.g., the COPPA) advertising based on profiling to children.

At first sight, ad platforms such as Google seem to protect children by not allowing advertisers to target their ads to users that are less than 18 years old. However, this paper shows that other targeting features can be exploited to reach children. For example, on YouTube, advertisers can target their ads to users watching a particular video through placement-based targeting, a form of contextual targeting. Hence, advertisers can target children by simply placing their ads in children-focused videos. Through a series of ad experiments, we show that placement-based targeting is possible on children-focused videos and, hence, enables marketing to children. In addition, our ad experiments show that advertisers can use targeting based on profiling (e.g., interest, location, behavior) in combination with placement-based advertising on children-focused videos. We discuss the lawfulness of these two practices with respect to DSA and COPPA.

Finally, we investigate to which extent real-world advertisers are employing placement-based targeting to reach children with ads on YouTube. We propose a measurement methodology consisting of building a Chrome extension able to capture ads and instrumenting six browser profiles to watch children-focused videos. Our results show that 7% of ads that appear in the children-focused videos we test use placement-based targeting. Hence, targeting children with ads on YouTube is not only hypothetically possible but also occurs in practice. We believe that the current legal and technical solutions are not enough to protect children from harm due to online advertising. A straightforward solution would be to forbid placement-based advertising on children-focused content.

Pakistani Teens and Privacy - How Gender Disparities, Religion and Family Values Impact the Privacy Design Space
  • Maryam Mustafa
  • Abdul Moeed Asad
  • Shehrbano Hassan
  • Urooj Haider
  • Zainab Durrani
  • Katharina Krombholz

The understanding of how teenagers perceive, manage and perform privacy is less well-understood in spaces outside of Western, educated, industrialised, rich and democratic countries.

To fill this gap we interviewed 30 teens to investigate the privacy perceptions, practices, and experienced digital harms of young people in Pakistan, a particularly interesting context as privacy in this context is not seen as an individual right or performed within an individualistic framework but instead is influenced by a combination of factors including social norms, family dynamics and religious beliefs.

Based on our findings, we developed four personas to systematize the needs and values of this specific population and then conducted focus groups with co-design activities to further explore privacy conflicts. Among other things that confirm and extend existing theories on teen's privacy practices and perceptions, our findings suggest that young women are disproportionately impacted by privacy violations and the harms extend beyond themselves to include their families.

Comprehension from Chaos: Towards Informed Consent for Private Computation
  • Bailey Kacsmar
  • Vasisht Duddu
  • Kyle Tilbury
  • Blase Ur
  • Florian Kerschbaum

Private computation, which includes techniques like multi-party computation and private query execution, holds great promise for enabling organizations to analyze data they and their partners hold while maintaining data subjects' privacy. Despite recent interest in communicating about differential privacy, end users' perspectives on private computation have not previously been studied. To fill this gap, we conducted 22 semi-structured interviews investigating users' understanding of, and expectations for, private computation over data about them. Interviews centered on four concrete data-analysis scenarios (e.g., ad conversion analysis), each with a variant that did not use private computation and another that did. While participants struggled with abstract definitions of private computation, they found the concrete scenarios enlightening and plausible even though we did not explain the complex cryptographic underpinnings. Private computation increased participants' acceptance of data sharing, but not unconditionally; the purpose of data sharing and analysis was the primary driver of their attitudes. Through collective activities, participants emphasized the importance of detailing the purpose of a computation and clarifying that inputs to private computation are not shared across organizations when describing private computation to end users.

Privacy in the Age of Neurotechnology: Investigating Public Attitudes towards Brain Data Collection and Use
  • Emiram Kablo
  • Patricia Arias-Cabarcos

Brain Computer Interfaces (BCIs) are expanding beyond the medical realm into entertainment, wellness, and marketing. However, as consumer neurotechnology becomes more popular, privacy concerns arise due to the sensitive nature of brainwave data and its potential commodification. Attacks on privacy have been demonstrated and AI advancements in brain-to-speech and brain-to-image decoding pose a new unique set of risks. In this space, we contribute with the first user study (n=287) to understand people's neuroprivacy expectations and awareness of neurotechnology implications. Our analysis shows that, while users are interested in the technology, privacy is a critical issue for acceptability. The results underscore the importance of consent and the need for implementing effective transparency about neurodata sharing. Our insights provide a ground to analyse the gap in current privacy protection mechanisms, adding to the debate on how to design privacy-respecting neurotechnology.

SESSION: Session 5: Side-Channels

Password-Stealing without Hacking: Wi-Fi Enabled Practical Keystroke Eavesdropping
  • Jingyang Hu
  • Hongbo Wang
  • Tianyue Zheng
  • Jingzhi Hu
  • Zhe Chen
  • Hongbo Jiang
  • Jun Luo

The contact-free sensing nature of Wi-Fi has been leveraged to achieve privacy breaches, yet existing attacks relying on Wi-Fi CSI (channel state information) demand hacking Wi-Fi hardware to obtain desired CSIs. Since such hacking has proven prohibitively hard due to compact hardware, its feasibility in keeping up with fast-developing Wi-Fi technology becomes very questionable. To this end, we propose WiKI-Eve to eavesdrop keystrokes on smartphones without the need for hacking. WiKI-Eve exploits a new feature, BFI (beamforming feedback information), offered by latest Wi-Fi hardware: since BFI is transmitted from a smartphone to an AP in clear-text, it can be overheard (hence eavesdropped) by any other Wi-Fi devices switching to monitor mode. As existing keystroke inference methods offer very limited generalizability, WiKI-Eve further innovates in an adversarial learning scheme to enable its inference generalizable towards unseen scenarios. We implement WiKI-Eve and conduct extensive evaluation on it; the results demonstrate that WiKI-Eve achieves 88.9% inference accuracy for individual keystrokes and up to 65.8% top-10 accuracy for stealing passwords of mobile applications (e.g., WeChat).

Recovering Fingerprints from In-Display Fingerprint Sensors via Electromagnetic Side Channel
  • Tao Ni
  • Xiaokuan Zhang
  • Qingchuan Zhao

Recently, in-display fingerprint sensors have been widely adopted in newly-released smartphones. However, we find this new technique can leak information about the user's fingerprints during a screen-unlocking process via the electromagnetic (EM) side channel that can be exploited for fingerprint recovery. We propose FPLogger to demonstrate the feasibility of this novel side-channel attack. Specifically, it leverages the emitted EM emanations when the user presses the in-display fingerprint sensor to extract fingerprint information, then maps the captured EM signals to fingerprint images and develops 3D fingerprint pieces to spoof and unlock the smartphones. We have extensively evaluated the effectiveness of FPlogger on five commodity smartphones equipped with both optical and ultrasonic in-display fingerprint sensors, and the results show it achieves promising similarities in recovering fingerprint images. In addition, results from 50 end-to-end spoofing attacks also present FPLogger achieves 24% (top-1) and 54% (top-3) success rates in spoofing five different smartphones.

Optical Cryptanalysis: Recovering Cryptographic Keys from Power LED Light Fluctuations
  • Ben Nassi
  • Ofek Vayner
  • Etay Iluz
  • Dudi Nassi
  • Jan Jancar
  • Daniel Genkin
  • Eran Tromer
  • Boris Zadov
  • Yuval Elovici

Although power LEDs have been integrated in various devices that perform cryptographic operations for decades, the cryptanalysis risk they pose has not yet been investigated. In this paper, we present optical cryptanalysis, a new form of cryptanalytic side-channel attack, in which secret keys are extracted by using a photodiode to measure the light emitted by a device's power LED and analyzing subtle fluctuations in the light intensity during cryptographic operations. We analyze the optical leakage of power LEDs of various consumer devices and the factors that affect the optical SNR. We then demonstrate end-to-end optical cryptanalytic attacks against a range of consumer devices (smartphone, smartcard, and Raspberry Pi, along with their USB peripherals) and recover secret keys (RSA, ECDSA, SIKE) from prior and recent versions of popular cryptographic libraries (GnuPG, Libgcrypt, PQCrypto-SIDH) from a maximum distance of 25 meters.

The Danger of Minimum Exposures: Understanding Cross-App Information Leaks on iOS through Multi-Side-Channel Learning
  • Zihao Wang
  • Jiale Guan
  • XiaoFeng Wang
  • Wenhao Wang
  • Luyi Xing
  • Fares Alharbi

Research on side-channel leaks has long been focusing on the information exposure from a single channel (memory, network traffic, power, etc.). Less studied is the risk of learning from multiple side channels related to a target activity (e.g., website visits) even when individual channels are not informative enough for an effective attack. Although the prior research made the first step on this direction, inferring the operations of foreground apps on iOS from a set of global statistics, still less clear are how to determine the maximum information leaks from all target-related side channels on a system, what can be learnt about the target from such leaks and most importantly, how to control information leaks from the whole system, not just from an individual channel.

To answer these fundamental questions, we performed the first systematic study on multi-channel inference, focusing on iOS as the first step. Our research is based upon a novel attack technique, called Mischief, which given a set of potential side channels related to a target activity (e.g., foreground apps), utilizes probabilistic search to approximate an optimal subset of the channels exposing most information, as measured by Merit Score, a metric for correlation-based feature selection. On such an optimal subset, an inference attack is modeled as a multivariate time series classification problem, so the state-of-the-art deep-learning based solution, InceptionTime in particular, can be applied to achieve the best possible outcome. Mischief is found to work effectively on today's iOS (16.2), identifying foreground apps, website visits, sensitive IoT operations (e.g., opening the door) with a high confidence, even in an open-world scenario, which demonstrates that the protection Apple puts in place against the known attack is inadequate. Also importantly, this new understanding enables us to develop more comprehensive protection, which could elevate today's side-channel research from suppressing leaks from individual channels to controlling information exposure across the whole system.

SESSION: Session 6: Cryptography & DNS

Silence is not Golden: Disrupting the Load Balancing of Authoritative DNS Servers
  • Fenglu Zhang
  • Baojun Liu
  • Eihal Alowaisheq
  • Jianjun Chen
  • Chaoyi Lu
  • Linjian Song
  • Yong Ma
  • Ying Liu
  • Haixin Duan
  • Min Yang

Authoritative nameservers are delegated to provide the final resource record. Since the security and robustness of DNS are critical to the general operation of the Internet, domain name owners are required to deploy multiple candidate nameservers for traffic load balancing. Once the load balancing mechanism is compromised, an adversary can manipulate a large number of legitimate DNS requests to a specified candidate nameserver. As a result, it may not only bypass the defense mechanisms used to filter malicious traffic that can overload the victim nameserver, but also lowers the bar for DNS traffic hijacking and cache poisoning attacks.

In this study, we report a class of DNS vulnerabilities and present a novel attack named Disablance. Our proposed attack allows adversaries to stealthily sabotage the DNS load balancing for authoritative nameservers at a low cost. By just performing a handful of crafted requests, an adversary can manipulate a given DNS resolver to overload a specific authoritative server for a period of time. Therefore, Disablance can redirect benign DNS requests for all hosted domains to the specific nameserver and disrupts the load balancing mechanism. The above attack undermines the robustness of DNS resolution and increases the security threat of single point of failure. Our extensive study proves the security threat of Disablance is realistic and prevalent. First, we demonstrated that mainstream DNS implementations, including BIND9, PowerDNS and Microsoft DNS, are vulnerable to Disablance. Second, we developed a measurement framework to measure vulnerable authoritative servers in the wild. 22.24% of top 1M FQDNs and 3.94% of top 1M SLDs were proven can be the victims of Disablance. Our measurement results also show that 37.88% of stable open resolvers and 10 of 14 popular public DNS services can be exploited to conduct Disablance, including Cloudflare and Quad9. Furthermore, the critical security threats of Disablance were observed and acknowledged through in-depth discussion with a world-leading DNS service provider. We have reported discovered vulnerabilities and provided recommendations to the affected vendors. Until now, Tencent Cloud (DNSPod) and Amazon have taken action to fix this issue according to our suggestions.

TsuKing: Coordinating DNS Resolvers and Queries into Potent DoS Amplifiers
  • Wei Xu
  • Xiang Li
  • Chaoyi Lu
  • Baojun Liu
  • Haixin Duan
  • Jia Zhang
  • Jianjun Chen
  • Tao Wan

In this paper, we present a new DNS amplification attack, named TsuKing. Instead of exploiting individual DNS resolvers independently to achieve an amplification effect, TsuKing deftly coordinates numerous vulnerable DNS resolvers and crafted queries together to form potent DoS amplifiers. We demconstrate that with TsuKing, an initial small amplification factor can inrease exponentially through the internal layers of coordinated amplifiers, resulting in an extremely powerful amplification attack. TsuKing has three variants, including DNSRetry, DNSChain, and DNSLoop, all of which exploit a suite of inconsistent DNS implementations to achieve enormous amplification effect. With comprehensive measurements, we found that about 14.5% of 1.3M open DNS resolvers are potentially vulnerable to TsuKing. Real-world controlled evaluations indicated that attackers can achieve a packet amplification factor of at least 3,700X (DNSChain). We have reported vulnerabilities to affected vendors and provided them with mitigation recommendations. We have received positive responses from 6 vendors, including Unbound, MikroTik, and AliDNS, and 3 CVEs were assigned. Some of them are implementing our recommendations.

Under the Dark: A Systematical Study of Stealthy Mining Pools (Ab)use in the Wild
  • Zhenrui Zhang
  • Geng Hong
  • Xiang Li
  • Zhuoqun Fu
  • Jia Zhang
  • Mingxuan Liu
  • Chuhan Wang
  • Jianjun Chen
  • Baojun Liu
  • Haixin Duan
  • Chao Zhang
  • Min Yang

Cryptocurrency mining is a crucial operation in blockchains, and miners often join mining pools to increase their chances of earning rewards. However, the energy-intensive nature of PoW cryptocurrency mining has led to its ban in New York State of the United States, China, and India. As a result, mining pools, serving as a central hub for mining activities, have become prime targets for regulatory enforcement. Furthermore, cryptojacking malware refers to self-owned stealthy mining pools to evade detection techniques and conceal profit wallet addresses. However, no systematic research has been conducted to analyze it, largely due to a lack of full understanding of the protocol implementation, usage, and port distribution of the stealth mining pool.

To the best of our knowledge, we carry out the first large-scale and longitudinal measurement research of stealthy mining pools to fill this gap. We report 7,629 stealthy mining pools among 59 countries. Further, we study the inner mechanisms of stealthy mining pools. By examining the 19,601 stealthy mining pool domains and IPs, our analysis reveals that stealthy mining pools carefully craft their domain semantics, protocol support, and lifespan to provide underground, user-friendly, and robust mining services. What's worse, we uncover a strong correlation between stealthy mining pools and malware, with 23.3% of them being labeled as malicious. Besides, we evaluate the tricks used to evade state-of-the-art mining detection, including migrating domain name resolution methods, leveraging the botnet, and enabling TLS encryption. Finally, we conduct a qualitative study to evaluate the profit gains of malicious cryptomining activities through the stealthy pool from an insider perspective. Our results show that criminals have the potential to earn more than 1 million USD per year, boasting an average ROI of 2,750%. We have informed the relevant ISPs about uncovered stealthy mining pools and have received their acknowledgments.

Travelling the Hypervisor and SSD: A Tag-Based Approach Against Crypto Ransomware with Fine-Grained Data Recovery
  • Boyang Ma
  • Yilin Yang
  • Jinku Li
  • Fengwei Zhang
  • Wenbo Shen
  • Yajin Zhou
  • Jianfeng Ma

Ransomware has evolved from an economic nuisance to a national security threat nowadays, which poses a significant risk to users. To address this problem, we propose RansomTag, a tag-based approach against crypto ransomware with fine-grained data recovery. Compared to state-of-the-art SSD-based solutions, RansomTag makes progress in three aspects. First, it decouples the ransomware detection functionality from the firmware of the SSD and integrates it into a lightweight hypervisor of Type I. Thus, it can leverage the powerful computing capability of the host system and the rich context information, which is introspected from the operating system, to achieve accurate detection of ransomware attacks and defense against potential targeted attacks on SSD characteristics. Further, RansomTag is readily deployed onto desktop personal computers due to its parapass-through architecture. Second, RansomTag bridges the semantic gap between the hypervisor and the SSD through the tag-based approach proposed by us. Third, RansomTag is able to keep 100% of the user data overwritten or deleted by ransomware, and restore any single or multiple user files to any versions based on timestamps. To validate our approach, we implement a prototype of RansomTag and collect 3,123 recent ransomware samples to evaluate it. The evaluation results show that our prototype effectively protects user data with minimal scale data backup and acceptable performance overhead. In addition, all the attacked files can be completely restored in fine-grained.

SESSION: Session 7: Digital Signatures

Threshold Signatures from Inner Product Argument: Succinct, Weighted, and Multi-threshold
  • Sourav Das
  • Philippe Camacho
  • Zhuolun Xiang
  • Javier Nieto
  • Benedikt Bünz
  • Ling Ren

Threshold signatures protect the signing key by sharing it among a group of signers so that an adversary must corrupt a threshold number of signers to be able to forge signatures. Existing threshold signatures with succinct signatures and constant verification times do not work if signers have different weights. Such weighted settings are seeing increasing importance in decentralized systems, especially in the Proof-of-Stake blockchains. This paper presents a new paradigm for threshold signatures for pairing and discrete logarithm-based cryptosystems. Our scheme has a compact verification key consisting of only 7 group elements, and a signature consisting of 8 group elements. Verifying the signature requires 8 exponentiations and 8 bilinear pairings. Our scheme supports arbitrary weight distributions among signers and arbitrary thresholds. It requires non-interactive preprocessing after a universal powers-of-tau setup. We prove the security of our scheme in the Algebraic Group Model and implement it using Golang. Our evaluation shows that our scheme achieves a comparable signature size and verification time to a standard (unweighted) threshold signature. Compared to existing multisignature schemes, our scheme has a much smaller public verification key.

Post Quantum Fuzzy Stealth Signatures and Applications
  • Sihang Pu
  • Sri AravindaKrishnan Thyagarajan
  • Nico Döttling
  • Lucjan Hanzlik

Private payments in blockchain-based cryptocurrencies have been a topic of research, both academic and industrial, ever since the advent of Bitcoin. Stealth address payments were proposed as a solution to improve payment privacy for users and are, in fact, deployed in several major cryptocurrencies today. The mechanism lets users receive payments so that none of these payments are linkable to each other or the recipient. Currently known stealth address mechanisms either (1) are insecure in certain reasonable adversarial models, (2) are inefficient in practice or (3) are incompatible with many existing currencies.

In this work, we formalize the underlying cryptographic abstraction of this mechanism, namely, stealth signatures with formal game-based definitions. We show a surprising application of our notions to passwordless authentication defined in the Fast IDentity Online (FIDO) standard. We then present SPIRIT, the first efficient post-quantum secure stealth signature construction based on the NIST standardized signature and key-encapsulation schemes, Dilithium and Kyber. The basic form of SPIRIT is only secure in a weak security model, but we provide an efficiency-preserving and generic transform, which boosts the security of SPIRIT to guarantee the strongest security notion defined in this work. Compared to state-of-the-art, there is an approximately 3.37x improvement in the signature size while keeping signing and verification as efficient as 0.2 ms.

We extend SPIRIT with a fuzzy tracking functionality where recipients can outsource the tracking of incoming transactions to a tracking server, satisfying an anonymity notion similar to that of fuzzy message detection (FMD) recently introduced in [CCS 2021]. We also extend SPIRIT with a new fuzzy tracking framework called scalable fuzzy tracking that we introduce in this work. This new framework can be considered as a dual of FMD, in that it reduces the tracking server's computational workload to sublinear in the number of users, as opposed to linear in FMD. Experimental results show that, for millions of users, the server only needs 3.4 ms to filter each incoming message which is a significant improvement upon the state-of-the-art.

Chipmunk: Better Synchronized Multi-Signatures from Lattices
  • Nils Fleischhacker
  • Gottfried Herold
  • Mark Simkin
  • Zhenfei Zhang

Multi-signatures allow for compressing many signatures for the same message that were generated under independent keys into one small aggregated signature. This primitive is particularly useful for proof-of-stake blockchains, like Ethereum, where the same block is signed by many signers, who vouch for the block's validity. Being able to compress all signatures for the same block into a short string significantly reduces the on-chain storage costs, which is an important efficiency metric for blockchains.

In this work, we consider multi-signatures in the synchronized setting, where the signing algorithm takes an additional time parameter as input and it is only required that signatures for the same time step are aggregatable. The synchronized setting is simpler than the general multi-signature setting, but is sufficient for most blockchain related applications, as signers are naturally synchronized by the length of the chain.

We present Chipmunk, a concretely efficient lattice-based multi-signature scheme in the synchronized setting that allows for signing an a-priori bounded number of messages. Chipmunk allows for non-interactive aggregation of signatures and is secure against rogue-key attacks. The construction is plausibly secure against quantum adversaries as our security relies on the assumed hardness of the short integer solution problem.

We significantly improve upon the previously best known construction in this setting by Fleischhacker, Simkin, Zhang (CCS 2022). Our aggregate signature size is 5 × smaller and for 112 bits of security our construction allows for compressing 8192 individual signatures into a multi-signature of size less than 200 KB. We provide a full implementation of Chipmunk and provide extensive benchmarks studying our construction's efficiency.

AIM: Symmetric Primitive for Shorter Signatures with Stronger Security
  • Seongkwang Kim
  • Jincheol Ha
  • Mincheol Son
  • Byeonghak Lee
  • Dukjae Moon
  • Joohee Lee
  • Sangyub Lee
  • Jihoon Kwon
  • Jihoon Cho
  • Hyojin Yoon
  • Jooyoung Lee

Post-quantum signature schemes based on the MPC-in-the-Head (MPCitH) paradigm are recently attracting significant attention as their security solely depends on the one-wayness of the underlying primitive, providing diversity for the hardness assumption in post-quantum cryptography. Recent MPCitH-friendly ciphers have been designed using simple algebraic S-boxes operating on a large field in order to improve the performance of the resulting signature schemes. Due to their simple algebraic structures, their security against algebraic attacks should be comprehensively studied.

In this paper, we refine algebraic cryptanalysis of power mapping based S-boxes over binary extension fields, and cryptographic primitives based on such S-boxes. In particular, for the Gröbner basis attack over ⅇ2, we experimentally show that the exact number of Boolean quadratic equations obtained from the underlying S-boxes is critical to correctly estimate the theoretic complexity based on the degree of regularity. Similarly, it turns out that the XL attack might be faster when all possible quadratic equations are found and used from the S-boxes. This refined cryptanalysis leads to more precise algebraic analysis of cryptographic primitives based on algebraic S-boxes.

Considering the refined algebraic cryptanalysis, we propose a new one-way function, dubbed AIM, as an MPCitH-friendly symmetric primitive with high resistance to algebraic attacks. The security of AIM is comprehensively analyzed with respect to algebraic, statistical, quantum, and generic attacks. AIM is combined with the BN++ proof system, yielding a new signature scheme, dubbed AIM. Our implementation shows that AIM outperforms existing signature schemes based on symmetric primitives in terms of signature size and signing time.

SESSION: Session 8: Machine Learning Applications II

FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis
  • Yiling He
  • Jian Lou
  • Zhan Qin
  • Kui Ren

Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility.

In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.

Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing
  • Jinwen He
  • Kai Chen
  • Guozhu Meng
  • Jiangshan Zhang
  • Congyi Li

While enjoying the great achievements brought by deep learning (DL), people are also worried about the decision made by DL models, since the high degree of non-linearity of DL models makes the decision extremely difficult to understand. Consequently, attacks such as adversarial attacks are easy to carry out, but difficult to detect and explain, which has led to a boom in the research on local explanation methods for explaining model decisions. In this paper, we evaluate the faithfulness of explanation methods and find that traditional tests on faithfulness encounter the random dominance problem, i.e., the random selection performs the best, especially for complex data. To further solve this problem, we propose three trend-based faithfulness tests and empirically demonstrate that the new trend tests can better assess faithfulness than traditional tests on image, natural language and security tasks. We implement the assessment system and evaluate ten popular explanation methods. Benefiting from the trend tests, we successfully assess the explanation methods on complex data for the first time, bringing unprecedented discoveries and inspiring future research. Downstream tasks also greatly benefit from the tests. For example, model debugging equipped with faithful explanation methods performs much better for detecting and correcting accuracy and security problems.

FaceReader: Unobtrusively Mining Vital Signs and Vital Sign Embedded Sensitive Info via AR/VR Motion Sensors
  • Tianfang Zhang
  • Zhengkun Ye
  • Ahmed Tanvir Mahdad
  • Md Mojibur Rahman Redoy Akanda
  • Cong Shi
  • Yan Wang
  • Nitesh Saxena
  • Yingying Chen

The market size of augmented reality and virtual reality (AR/VR) has been expanding rapidly in recent years, with the use of face-mounted headsets extending beyond gaming to various application sectors, such as education, healthcare, and the military. Despite the rapid growth, the understanding of information leakage through sensor-rich headsets remains in its infancy. Some of the headset's built-in sensors do not require users' permission to access, and any apps and websites can acquire their readings. While theseunrestricted sensors are generally considered free of privacy risks, we find that an adversary could uncover private information by scrutinizing sensor readings, making existing AR/VR apps and websites potential eavesdroppers. In this work, we investigate a novel, unobtrusive privacy attack called FaceReader, which reconstructs high-quality vital sign signals (breathing and heartbeat patterns) based on unrestricted AR/VR motion sensors. FaceReader is built on the key insight that the headset is closely mounted on the user's face, allowing the motion sensors to detect subtle facial vibrations produced by users' breathing and heartbeats. Based on the reconstructed vital signs, we further investigate three more advanced attacks, including gender recognition, user re-identification, and body fat ratio estimation. Such attacks pose severe privacy concerns, as an adversary may obtain users' sensitive demographic/physiological traits and potentially uncover their real-world identities. Compared to prior privacy attacks relying on speeches and activities, FaceReader targets spontaneous breathing and heartbeat activities that are naturally produced by the human body and are unobtrusive to victims. In particular, we design an adaptive filter to dynamically mitigate the impacts of body motions. We further employ advanced deep-learning techniques to reconstruct vital sign signals, achieving signal qualities comparable to those of dedicated medical instruments, as well as deriving sensitive gender, identity, and body fat information. We conduct extensive experiments involving 35 users on three types of mainstream AR/VR headsets across 3 months. The results reveal that FaceReader can reconstruct vital signs with low mean errors and accurately detect gender (over 93.33%). The attack can also link/re-identify users across different apps, websites, and longitudinal sessions with over 97.83% accuracy. Furthermore, we present the first successful attempt at revealing body fat information from motion sensor data, achieving a remarkably low estimation error of 4.43%.

AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis
  • Zhiyuan Yu
  • Shixuan Zhai
  • Ning Zhang

The rapid development of deep neural networks and generative AI has catalyzed growth in realistic speech synthesis. While this technology has great potential to improve lives, it also leads to the emergence of ''DeepFake'' where synthesized speech can be misused to deceive humans and machines for nefarious purposes. In response to this evolving threat, there has been a significant amount of interest in mitigating this threat by DeepFake detection.

Complementary to the existing work, we propose to take the preventative approach and introduce AntiFake, a defense mechanism that relies on adversarial examples to prevent unauthorized speech synthesis. To ensure the transferability to attackers' unknown synthesis models, an ensemble learning approach is adopted to improve the generalizability of the optimization process. To validate the efficacy of the proposed system, we evaluated AntiFake against five state-of-the-art synthesizers using real-world DeepFake speech samples. The experiments indicated that AntiFake achieved over 95% protection rate even to unknown black-box models. We have also conducted usability tests involving 24 human participants to ensure the solution is accessible to diverse populations.

SESSION: Session 9: Consensus Protocols

Themis: Fast, Strong Order-Fairness in Byzantine Consensus
  • Mahimna Kelkar
  • Soubhik Deb
  • Sishan Long
  • Ari Juels
  • Sreeram Kannan

We introduce Themis, a scheme for introducing fair ordering of transactions into (permissioned) Byzantine consensus protocols with at most ƒ faulty nodes among n ≥ 4ƒ + 1. Themis enforces the strongest notion of fair ordering proposed to date. It also achieves standard liveness, rather than the weaker notion of previous work with the same fair ordering property.

We show experimentally that Themis can be integrated into state-of-the-art consensus protocols with minimal modification or performance overhead. Additionally, we introduce a suite of experiments of general interest for evaluating the practical strength of various notions of fair ordering and the resilience of fair-ordering protocols to adversarial manipulation. We use this suite of experiments to show that the notion of fair ordering enforced by Themis is stronger in practice than those of competing systems.

We believe Themis offers strong practical protection against many types of transaction-ordering attacks-such as front-running and back-running-that are currently impacting commonly used smart contract systems.

Towards Practical Sleepy BFT
  • Dahlia Malkhi
  • Atsuki Momose
  • Ling Ren

Bitcoin's longest-chain protocol pioneered consensus under dynamic participation, also known as sleepy consensus, where nodes do not need to be permanently active. However, existing solutions for sleepy consensus still face two major issues, which we address in this work. First, existing sleepy consensus protocols have high latency (either asymptotically or concretely). We tackle this problem and achieve 4Δ latency (Δ is the bound on network delay) in the best case, which is comparable to classic BFT protocols without dynamic participation support. Second, existing protocols have to assume that the set of corrupt participants remains fixed throughout the lifetime of the protocol due to a problem we call costless simulation. We resolve this problem and support growing participation of corrupt nodes. Our new protocol also offers several other important advantages, including support for arbitrary fluctuation of honest participation as well as an efficient recovery mechanism for new active nodes.

ParBFT: Faster Asynchronous BFT Consensus with a Parallel Optimistic Path
  • Xiaohai Dai
  • Bolin Zhang
  • Hai Jin
  • Ling Ren

To reduce latency and communication overhead of asynchronous Byzantine Fault Tolerance (BFT) consensus, an optimistic path is often added, with Ditto and BDT as state-of-the-art representatives. These protocols first attempt to run an optimistic path that is typically adapted from partially-synchronous BFT and promises good performance in good situations. If the optimistic path fails to make progress, these protocols switch to a pessimistic path after a timeout, to guarantee liveness in an asynchronous network. This design crucially relies on an accurate estimation of the network delay Δ to set the timeout parameter correctly. A wrong estimation of Δ can lead to either premature or delayed switching to the pessimistic path, hurting the protocol's efficiency in both cases.

To address the above issue, we propose ParBFT, which employs a parallel optimistic path. As long as the leader of the optimistic path is non-faulty, ParBFT ensures low latency without requiring an accurate estimation of the network delay. We propose two variants of ParBFT, namely ParBFT1 and ParBFT2, with a trade-off between latency and communication. ParBFT1 simultaneously launches the two paths, achieves lower latency under a faulty leader, but has a quadratic message complexity even in good situations. ParBFT2 reduces the message complexity in good situations by delaying the pessimistic path, at the cost of a higher latency under a faulty leader. Experimental results demonstrate that ParBFT outperforms Ditto or BDT. In particular, when the network condition is bad, ParBFT can reach consensus through the optimistic path, while Ditto and BDT suffer from path switching and have to make progress using the pessimistic path.

Abraxas: Throughput-Efficient Hybrid Asynchronous Consensus
  • Erica Blum
  • Jonathan Katz
  • Julian Loss
  • Kartik Nayak
  • Simon Ochsenreither

Protocols for state-machine replication (SMR) often trade off performance for resilience to network delay. In particular, protocols for asynchronous SMR tolerate arbitrary network delay but sacrifice throughput/latency when the network is fast, while partially synchronous protocols have good performance in a fast network but fail to make progress if the network experiences high delay. Existing hybrid protocols are resilient to arbitrary network delay and have good performance when the network is fast, but suffer from high overhead (''thrashing'') if the network repeatedly switches between being fast and slow, e.g., in a network that is typically fast but has intermittent message delays.

We propose Abraxas, a generic approach for constructing a hybrid protocol from any ''fast'' protocol Πfast and asynchronous protocolΠslow to achieve (1) security and performance equivalent to Πslow under arbitrary network behavior, and (2) performance equivalent to Πfast when conditions are favorable. We instantiate Abraxas with the best existing protocols for Πfast (Jolteon) and Πslow (\mbox2-chain VABA), and show experimentally that the resulting protocol significantly outperforms Ditto, the previous state-of-the-art hybrid protocol.

SESSION: Session 10: Language-Based Security

Ou: Automating the Parallelization of Zero-Knowledge Protocols
  • Yuyang Sang
  • Ning Luo
  • Samuel Judson
  • Ben Chaimberg
  • Timos Antonopoulos
  • Xiao Wang
  • Ruzica Piskac
  • Zhong Shao

A zero-knowledge proof (ZKP) is a powerful cryptographic primitive used in many decentralized or privacy-focused applications. However, the high overhead of ZKPs can restrict their practical applicability. We design a programming language, Ou, aimed at easing the programmer's burden when writing efficient ZKPs, and a compiler framework, Lian, that automates the analysis and distribution of statements to a computing cluster. Ou uses programming language semantics, formal methods, and combinatorial optimization to automatically partition an Ou program into efficiently sized chunks for parallel ZK-proving and/or verification. We contribute: (1) A front-end language where users can write proof statements as imperative programs in a familiar syntax; (2) A compiler architecture and implementation that automatically analyzes the program and compiles it into an optimized IR that can be lifted to a variety of ZKP constructions; and (3) A cutting algorithm, based on Pseudo-Boolean optimization and Integer Linear Programming, that reorders instructions and then partitions the program into efficiently sized chunks for parallel evaluation and efficient state reconciliation.

Black Ostrich: Web Application Scanning with String Solvers
  • Benjamin Eriksson
  • Amanda Stjerna
  • Riccardo De Masellis
  • Philipp Rüemmer
  • Andrei Sabelfeld

Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers' limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This paper develops Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and thereby pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to process JavaScript regular expressions. We implement our approach extending and combining the Ostrich constraint solver with the Black Widow web crawler. We evaluate Black Ostrich on a set of 8,820 unique validation patterns gathered from over 21,667,978 forms from a combination of the July 2021 Common~Crawl and Tranco top 100K. For these forms and reconstructions of input elements corresponding to the patterns, we demonstrate that Black Ostrich achieves a 99% coverage of the form validations compared to an average of 36% for the state-of-the-art scanners. Moreover, out of the 66,377 domains using these patterns, we solve all patterns on 66,309 (99%) while the combined efforts of the other scanners cover 52,632 (79%). We further show that our approach can boost coverage by evaluating it on three open-source applications. Our empirical studies include a study of email validation patterns, where we find that 213 (26%) out of the 825 found email validation patterns liberally admit XSS injection payloads.

Comparse: Provably Secure Formats for Cryptographic Protocols
  • Théophile Wallez
  • Jonathan Protzenko
  • Karthikeyan Bhargavan

Data formats used for cryptographic inputs have historically been the source of many attacks on cryptographic protocols, but their security guarantees remain poorly studied. One reason is that, due to their low-level nature, formats often fall outside of the security model. Another reason is that studying all of the uses of all of the formats within one protocol is too difficult to do by hand, and requires a comprehensive, automated framework.

We propose a new framework, "Comparse'', that specifically tackles the security analysis of data formats in cryptographic protocols. Comparse forces the protocol analyst to systematically think about data formats, formalize them precisely, and show that they enjoy strong enough properties to guarantee the security of the protocol.

Our methodology is developed in three steps. First, we introduce a high-level cryptographic API that lifts the traditional game-based cryptographic assumptions over bitstrings to work over high-level messages, using formats. This allows us to derive the conditions that secure formats must obey in order for their usage to be secure. Second, equipped with these security criteria, we implement a framework for specifying and verifying secure formats in the F* proof assistant. Our approach is based on format combinators, which enable compositional and modular proofs. In many cases, we relieve the user of having to write those combinators by hand, using compile-time term synthesis via Meta-F*. Finally, we show that our F* implementation can replace the symbolic notion of message formats previously implemented in the DY* protocol analysis framework. Our newer, bit-level precise accounting of formats closes the modeling gap, and allows DY* to reason about concrete messages and identify protocol flaws that it was previously oblivious to.

We evaluate Comparse over several classic and real-world protocols. Our largest case studies use Comparse to formalize and provide security proofs for the formats used in TLS 1.3, as well as upcoming protocols like MLS and Compact TLS 1.3 (cTLS), providing confidence and feedback in the design of these protocols.

SESSION: Session 11: Quantum & Space

Exploration of Power Side-Channel Vulnerabilities in Quantum Computer Controllers
  • Chuanqi Xu
  • Ferhat Erata
  • Jakub Szefer

The rapidly growing interest in quantum computing also increases the importance of securing these computers from various physical attacks. Constantly increasing qubit counts and improvements to the fidelity of the quantum computers hold great promise for the ability of these computers to run novel algorithms with highly sensitive intellectual property. However, in today's cloud-based quantum computer setting, users lack physical control over the computers. Physical attacks, such as those perpetrated by malicious insiders in data centers, could be used to extract sensitive information about the circuits being executed on these computers. This work shows the first exploration and study of power-based side-channel attacks in quantum computers. The explored attacks could be used to recover information about the control pulses sent to these computers. By analyzing these control pulses, attackers can reverse-engineer the equivalent gate-level description of the circuits, and the algorithms being run, or data hard-coded into the circuits. This work introduces five new types of attacks, and evaluates them using control pulse information available from cloud-based quantum computers. This work demonstrates how and what circuits could be recovered, and then in turn how to defend from the newly demonstrated side-channel attacks on quantum computing systems.

Securing NISQ Quantum Computer Reset Operations Against Higher Energy State Attacks
  • Chuanqi Xu
  • Jessie Chen
  • Allen Mi
  • Jakub Szefer

Enabling the sharing of quantum computers among different users requires a secure reset operation that can reset the state of a qubit to ground state |0> and prevent leakage of the state to a post-reset circuit. This work highlights that the existing reset operations available in superconducting qubit NISQ quantum computers are not fully secure. In particular, this work demonstrates for the first time a new type of higher-energy state attack. Although NISQ quantum computers are typically abstracted as working with only energy states |0> and |1>, this work shows that it is possible for unprivileged users to set the qubit state to |2 or |3>. By breaking the abstraction of a two-level system, the new higher-energy state attack can be deployed to affect the operation of circuits or for covert communication between circuits. This work shows that common reset protocols are ineffective in resetting a qubit from a higher-energy state. To provide a defense, this work proposes a new Cascading Secure Reset (CSR) operation. CSR, without hardware modifications, is able to efficiently and reliably reset higher-energy states back to |0>. CSR achieves a reduction in |3> -initialized state leakage channel capacity by between 1 and 2 orders of magnitude, and does so with a 25x speedup compared with the default decoherence reset.

Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting
  • Joshua Smailes
  • Sebastian Köhler
  • Simon Birnbach
  • Martin Strohmeier
  • Ivan Martinovic

Due to an increase in the availability of cheap off-the-shelf radio hardware, signal spoofing and replay attacks on satellite ground systems have become more accessible than ever. This is particularly a problem for legacy systems, many of which do not offer cryptographic security and cannot be patched to support novel security measures.

Therefore, in this paper we explore radio transmitter fingerprinting in the context of satellite systems. We introduce the SatIQ system, proposing novel techniques for authenticating transmissions using characteristics of the transmitter hardware expressed as impairments on the downlinked radio signal. We look in particular at high sample rate fingerprinting, making device fingerprints difficult to forge without similarly high sample rate transmitting hardware, thus raising the required budget for spoofing and replay attacks. We also examine the difficulty of this approach with high levels of atmospheric noise and multipath scattering, and analyze potential solutions to this problem.

We focus on the Iridium satellite constellation, for which we collected 1705202 messages at a sample rate of 25 MS/s. We use this data to train a fingerprinting model consisting of an autoencoder combined with a Siamese neural network, enabling the model to learn an efficient encoding of the message headers that preserves identifying information.

We demonstrate the fingerprinting system's robustness under attack by replaying messages using a Software-Defined Radio, achieving an Equal Error Rate of 0.120, and ROC AUC of 0.946. Finally, we analyze its stability over time by introducing a time gap between training and testing data, and its extensibility by introducing new transmitters which have not been seen before. We conclude that our techniques are useful for building fingerprinting systems that are stable over time, can be used immediately with new transmitters without retraining, and provide robustness against spoofing and replay attacks by raising the required budget for attacks.

Protecting HRP UWB Ranging System Against Distance Reduction Attacks
  • Kyungho Joo
  • Dong Hoon Lee
  • Yeonseon Jeong
  • Wonsuk Choi

Ultra-wideband (UWB) communication is an emerging technology that enables secure ranging and localization. Since UWB communication enables measuring an exact distance, enhanced security would be expected based on it. Recently, however, it has been demonstrated that a distance measured by IEEE 802.15.4z high-rate pulse repetition frequency (HRP) UWB ranging system can be maliciously reduced. The HRP UWB ranging system is widely adopted by smartphone manufacturers such as Samsung and Apple.

In this paper, we present UWB with sub-template verification (UWB-SV), which is the first method that prevents a practical distance reduction attack on the current HRP UWB ranging system. UWB-SV is designed to act as a verification method, in which the scrambled timestamp sequence (STS) field of a UWB frame is divided into multiple sub-fields. By analyzing the consistency of the cross-correlation results between the sub-fields and their corresponding local templates, UWB-SV is able to detect a distance reduction attack on the HRP UWB ranging system. Since a long bit sequence is not required for a consistency analysis, UWB-SV can be applied to most commercial off-the-shelf devices that are designed with a 4,096-bit length of the STS field. We comprehensively evaluate UWB-SV under 16 different channel conditions between the victim and attacker, through which we show that UWB-SV has an attack detection rate of 96.24% in outdoor environment conditions with a false positive rate of 0.32%.

SESSION: Session 12: IoT: Attacks, Vulnerabilities, & Everything

BLUFFS: Bluetooth Forward and Future Secrecy Attacks and Defenses
  • Daniele Antonioli

Bluetooth is a pervasive technology for wireless communication. Billions of devices use it in sensitive applications and to exchange private data. The security of Bluetooth depends on the Bluetooth standard and its two security mechanisms: pairing and session establishment. No prior work, including the standard itself, analyzed the future and forward secrecy guarantees of these mechanisms, e.g., if Bluetooth pairing and session establishment defend past and future sessions when the adversary compromises the current. To address this gap, we present six novel attacks, defined as the BLUFFS attacks, breaking Bluetooth sessions' forward and future secrecy. Our attacks enable device impersonation and machine-in-the-middle across sessions by only compromising one session key. The attacks exploit two novel vulnerabilities that we uncover in the Bluetooth standard related to unilateral and repeatable session key derivation. As the attacks affect Bluetooth at the architectural level, they are effective regardless of the victim's hardware and software details (e.g., chip, stack, version, and security mode).

We also release BLUFFS, a low-cost toolkit to perform and automatically check the effectiveness of our attacks. The toolkit employs seven original patches to manipulate and monitor Bluetooth session key derivation by dynamically patching a closed-source Bluetooth firmware that we reverse-engineered. We show that our attacks have a critical and large-scale impact on the Bluetooth ecosystem, by evaluating them on seventeen diverse Bluetooth chips (eighteen devices) from popular hardware and software vendors and supporting the most popular Bluetooth versions. Motivated by our empirical findings, we develop and successfully test an enhanced key derivation function for Bluetooth that stops by-design our six attacks and their four root causes. We show how to effectively integrate our fix into the Bluetooth standard and discuss alternative implementation-level mitigations. We responsibly disclosed our contributions to the Bluetooth SIG.

When Free Tier Becomes Free to Enter: A Non-Intrusive Way to Identify Security Cameras with no Cloud Subscription
  • Yan He
  • Qiuye He
  • Song Fang
  • Yao Liu

Wireless security cameras may deter intruders. Accompanying the hardware, consumers may pay recurring monthly fees for recording videos to the cloud, or use the free tier offering motion alerts and sometimes live streams via the camera app. Many users may purchase the hardware without buying the subscription to save money, which inherently reduces their efficacy. We discover that the wireless traffic generated by a camera responding to stimulating motion may disclose whether or not video is being streamed. A malicious user such as a burglar may use such knowledge to target homes with a ''weak camera'' that does not upload video or turn on live view mode. In such cases, criminal activities would not be recorded though they are performed within the monitoring area of the camera. Accordingly, we describe a novel technique called WeakCamID that creates motion stimuli and sniffs resultant wireless traffic to infer the camera state. We perform a survey involving a total of 220 users, finding that all users think cameras have a consistent security guarantee regardless of the subscription status. Our discovery breaks such ''common sense''. We implement WeakCamID in a mobile app and experiment with 11 popular wireless cameras to show that WeakCamID can identify weak cameras with a mean accuracy of around 95% and within less than 19 seconds.

Formal Analysis of Access Control Mechanism of 5G Core Network
  • Mujtahid Akon
  • Tianchang Yang
  • Yilu Dong
  • Syed Rafiul Hussain

We present 5GCVerif, a model-based testing framework designed to formally analyze the access control framework of the 5G Core. With its modular design, 5GCVerif employs various abstraction techniques to craft an abstract model that captures the intricate details of the 5G Core's access control mechanism. This approach offers customizability and extensibility in constructing the abstract model and addresses the state explosion problem in model checking. 5GCVerif also sidesteps the challenge of exhaustively generating models for all possible core network configurations by restricting the model checker to explore policy violations only within the valid network configurations. Using 5GCVerif, we evaluated 55 security properties, leading to the discovery of five new vulnerabilities in 5G Core's access control mechanism. The uncovered vulnerabilities can result in multiple attacks including unauthorized entry to sensitive information, illegitimate access to services, and denial-of-services.

IoTFlow: Inferring IoT Device Behavior at Scale through Static Mobile Companion App Analysis
  • David Schmidt
  • Carlotta Tagliaro
  • Kevin Borgolte
  • Martina Lindorfer

The number of "smart'' devices, that is, devices making up the Internet of Things (IoT), is steadily growing. They suffer from vulnerabilities just as other software and hardware. Automated analysis techniques can detect and address weaknesses before attackers can misuse them. Applying existing techniques or developing new approaches that are sufficiently general is challenging though. Contrary to other platforms, the IoT ecosystem features various software and hardware architectures.

We introduce IoTFlow, a new static analysis approach for IoT devices that leverages their mobile companion apps to address the diversity and scalability challenges. IoTFlow combines Value Set Analysis (VSA) with more general data-flow analysis to automatically reconstruct and derive how companion apps communicate with IoT devices and remote cloud-based backends, what data they receive or send, and with whom they share it. To foster future work and reproducibility, our IoTFlow implementation is open source.

We analyze 9,889 manually verified companion apps with IoTFlow to understand and characterize the current state of security and privacy in the IoT ecosystem, which also demonstrates the utility of IoTFlow. We compare how these IoT apps differ from 947 popular general-purpose apps in their local network communication, the protocols they use, and who they communicate with. Moreover, we investigate how the results of IoTFlow compare to dynamic analysis, with manual and automated interaction, of 13 IoT devices when paired and used with their companion apps. Overall, utilizing IoTFlow, we discover various IoT security and privacy issues, such as abandoned domains, hard-coded credentials, expired certificates, and sensitive personal information being shared.

SESSION: Session 13: Homomorphic Encryption I

Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption
  • Jung Hee Cheon
  • Wonhee Cho
  • Jaehyung Kim
  • Damien Stehlé

Homomorphic Encryption (HE) schemes such as BGV, BFV, and CKKS consume some ciphertext modulus for each multiplication. Bootstrapping (BTS) restores the modulus and allows homomorphic computation to continue, but it is time-consuming and requires a significant amount of modulus. For these reasons, decreasing modulus consumption is crucial topic for BGV, BFV and CKKS, on which numerous studies have been conducted.

We propose a novel method, called Mult2, to perform ciphertext multiplication in the CKKS scheme with lower modulus consumption. Mult2 relies an a new decomposition of a ciphertext into a pair of ciphertexts that homomorphically performs a weak form of Euclidean division. It multiplies two ciphertexts in decomposed formats with homomorphic double precision multiplication, and its result approximately decrypts to the same value as does the ordinary CKKS multiplication. Mult2 can perform homomorphic multiplication by consuming almost half of the modulus.

We extend it to Multt for any t≥ 2, which relies on the decomposition of a ciphertext into t components. All other CKKS operations can be equally performed on pair/tuple formats, leading to the double-CKKS (resp. tuple-CKKS) scheme enabling homomorphic double (resp. multiple) precision arithmetic.

As a result, when the ciphertext modulus and dimension are fixed, the proposed algorithms enable the evaluation of deeper circuits without bootstrapping, or allow to reduce the number of bootstrappings required for the evaluation of the same circuits. Furthermore, they can be used to increase the precision without increasing the parameters. For example, Mult2 enables 8 sequential multiplications with 100 bit scaling factor with a ciphertext modulus of only 680 bits, which is impossible with the ordinary CKKS multiplication algorithm.

PELTA - Shielding Multiparty-FHE against Malicious Adversaries
  • Sylvain Chatel
  • Christian Mouchet
  • Ali Utkan Sahin
  • Apostolos Pyrgelis
  • Carmela Troncoso
  • Jean-Pierre Hubaux

Multiparty fully homomorphic encryption (MFHE) schemes enable multiple parties to efficiently compute functions on their sensitive data while retaining confidentiality. However, existing MFHE schemes guarantee data confidentiality and the correctness of the computation result only against honest-but-curious adversaries. In this work, we provide the first practical construction that enables the verification of MFHE operations in zero-knowledge, protecting MFHE from malicious adversaries. Our solution relies on a combination of lattice-based commitment schemes and proof systems which we adapt to support both modern FHE schemes and their implementation optimizations. We implement our construction in PELTA. Our experimental evaluation shows that PELTA is one to two orders of magnitude faster than existing techniques in the literature.

Asymptotically Faster Multi-Key Homomorphic Encryption from Homomorphic Gadget Decomposition
  • Taechan Kim
  • Hyesun Kwak
  • Dongwon Lee
  • Jinyeong Seo
  • Yongsoo Song

Homomorphic Encryption (HE) is a cryptosytem that allows us to perform an arbitrary computation on encrypted data. The standard HE, however, has a disadvantage in that the authority is concentrated in the secret key owner since computations can only be performed on ciphertexts encrypted under the same secret key. To resolve this issue, research is underway on Multi-Key Homomorphic Encryption (MKHE), which is a variant of HE supporting computations on ciphertexts possibly encrypted under different keys. Despite its ability to provide privacy for multiple parties, existing MKHE schemes suffer from poor performance due to the cost of multiplication which grows at least quadratically with the number of keys involved.

In this paper, we revisit the work of Chen et al. (ACM CCS 2019) on MKHE schemes from CKKS and BFV and significantly improve their performance. Specifically, we redesign the multi-key multiplication algorithm and achieve an asymptotically optimal complexity that grows linearly with the number of keys. Our construction relies on a new notion of gadget decomposition, which we call homomorphic gadget decomposition, where arithmetic operations can be performed over the decomposed vectors with guarantee of its functionality. Finally, we implement our MKHE schemes and demonstrate their benchmarks. For example, our multi-key CKKS multiplication takes only 0.5, 1.0, and 1.9 seconds compared to 1.6, 5.9, and 23.0 seconds of the previous work when 8, 16, and 32 keys are involved, respectively.

FPT: A Fixed-Point Accelerator for Torus Fully Homomorphic Encryption
  • Michiel Van Beirendonck
  • Jan-Pieter D'Anvers
  • Furkan Turan
  • Ingrid Verbauwhede

Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation.

We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT is the first hardware accelerator to heavily exploit the inherent noise present in FHE calculations. Instead of double or single-precision floating-point arithmetic, it implements TFHE bootstrapping entirely with approximate fixed-point arithmetic. Using an in-depth analysis of noise propagation in bootstrapping FFT computations, FPT is able to use noise-trimmed fixed-point representations that are up to 50% smaller than prior implementations that prefer floating-point or integer FFTs.

FPT is built as a streaming processor inspired by traditional streaming DSPs: it instantiates directly cascaded high-throughput computational stages, with minimal control logic and routing networks. We explore different throughput-balanced compositions of streaming kernels with a user-configurable streaming width in order to construct a full bootstrapping pipeline. Our proposed approach allows 100% utilization of arithmetic units and requires only small bootstrapping key cache, enabling an entirely compute-bound bootstrapping throughput of 1 BS / 35us. This is in stark contrast to the established classical CPU approach to FHE bootstrapping acceleration, which is typically constrained by memory and bandwidth. FPT is fully implemented and evaluated as a bootstrapping FPGA kernel for an Alveo U280 datacenter accelerator card. FPT achieves two to three orders of magnitude higher bootstrapping throughput than existing CPU-based implementations, and 2.5x higher throughput compared to recent ASIC emulation experiments.

SESSION: Session 14: Machine Learning Attacks I

Stolen Risks of Models with Security Properties
  • Yue Qin
  • Zhuoqun Fu
  • Chuyun Deng
  • Xiaojing Liao
  • Jia Zhang
  • Haixin Duan

Verifiable robust machine learning, as a new trend of ML security defense, enforces security properties (e.g., Lipschitzness, Monotonicity) on machine learning models and achieves satisfying accuracy-security trade-off. Such security properties identify a series of evasion strategies of ML security attackers and specify logical constraints on their effects on a classifier (e.g., the classifier is monotonically increasing along some feature dimensions). However, little has been done so far to understand the side effect of those security properties on the model privacy.

In this paper, we aim at better understanding the privacy impacts on security properties of robust ML models. Particularly, we report the first measurement study to investigate the model stolen risks of robust models satisfying four security properties (i.e., LocalInvariance, Lipschitzness, SmallNeighborhood, and Monotonicity). Our findings bring to light the factors that influence model stealing attacks and defense performance on models trained with security properties. In addition, to train an ML model satisfying goals in accuracy, security, and privacy, we propose a novel technique, called BoundaryFuzz, which introduces a privacy property into verifiable robust training frameworks to defend against model stealing attacks on robust models. Experimental results demonstrate the defense effectiveness of BoundaryFuzz.

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information
  • Yi Zeng
  • Minzhou Pan
  • Hoang Anh Just
  • Lingjuan Lyu
  • Meikang Qiu
  • Ruoxi Jia

Backdoor attacks introduce manipulated data into a machine learning model's training set, causing the model to misclassify inputs with a trigger during testing to achieve a desired outcome by the attacker. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." The success of current clean-label backdoor methods largely depends on access to the complete training set. Yet, accessing the complete dataset is often challenging or unfeasible since it frequently comes from varied, independent sources, like images from distinct users. It remains a question of whether backdoor attacks still present real threats.

In this paper, we provide an affirmative answer to this question by designing an algorithm to launch clean-label backdoor attacks using only samples from the target class and public out-of-distribution data. By inserting carefully crafted malicious examples totaling less than 0.5% of the target class size and 0.05% of the full training set size, we can manipulate the model to misclassify arbitrary inputs into the target class when they contain the backdoor trigger. Importantly, the trained poisoned model retains high accuracy for regular test samples without the trigger, as if the model is trained on untainted data. Our technique is consistently effective across various datasets, models, and even when the trigger is injected into the physical world.

We explore the space of defenses and find that Narcissus can evade the latest state-of-the-art defenses in their vanilla form or after a simple adaptation. We analyze the effectiveness of our attack - the synthesized Narcissus trigger contains durable features as persistent as the original target class features. Attempts to remove the trigger inevitably hurt model accuracy first.

Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks
  • Ryan Feng
  • Ashish Hooda
  • Neal Mangaokar
  • Kassem Fawaz
  • Somesh Jha
  • Atul Prakash

Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend against black-box attacks by tracking the query history and detecting and rejecting queries that are "similar" and thus preventing black-box attacks from finding useful gradients and making progress towards finding adversarial attacks within a reasonable query budget. Recent SDMs (e.g., Blacklight and PIHA) have shown remarkable success in defending against state-of-the-art black-box attacks. In this paper, we show that SDMs are highly vulnerable to a new class of adaptive black-box attacks. We propose a novel adaptive black-box attack strategy called Oracle-guided Adaptive Rejection Sampling (OARS) that involves two stages: (1) use initial query patterns to infer key properties about an SDM's defense; and, (2) leverage those extracted properties to design subsequent query patterns to evade the SDM's defense while making progress towards finding adversarial inputs. OARS is broadly applicable as an enhancement to existing black-box attacks - we show how to apply the strategy to enhance six common black-box attacks to be more effective against current class of SDMs. For example, OARS-enhanced versions of black-box attacks improved attack success rate against recent stateful defenses from almost 0% to to almost 100% for multiple datasets within reasonable query budgets.

Attack Some while Protecting Others: Selective Attack Strategies for Attacking and Protecting Multiple Concepts
  • Vibha Belavadi
  • Yan Zhou
  • Murat Kantarcioglu
  • Bhavani Thuraisingham

Machine learning models are vulnerable to adversarial attacks. Existing research focuses on attack-only scenarios. In practice, one dataset may be used for learning different concepts, and the attacker may be incentivized to attack some concepts but protect the others. For example, the attacker might tamper a profile image for the "age'' model to predict "young'', while the "attractiveness'' model still predicts "pretty''. In this work, we empirically demonstrate that attacking the classifier for one learning task may negatively impact classifiers learning other tasks on the same data. This raises an interesting research question: is it possible to attack one set of classifiers while protecting the others trained on the same data?

Answers to the above question have interesting implications for the complexity of test-time attacks against learning models, such as avoiding the violation of logical constraints. For example, attacks on images of high school students should not cause these images to be classified as a group of 30-year-old. Such misclassification of age may raise alarms and may easily expose the attacks. In this paper, we address the research question by developing novel attack techniques that can simultaneously attack one set of learning models while protecting the other. In the case of linear classifiers, we provide a theoretical framework for finding an optimal solution to generating such adversarial examples. Using this theoretical framework, we develop a "multi-concept'' attack strategy in the context of deep learning tasks. Our results demonstrate that our techniques can successfully attack the target classes while protecting the "protected'' classes in many different settings, which is not possible with the existing test-time attack-only strategies.

SESSION: Session 15: Cryptographic Constructs & Models

FIN: Practical Signature-Free Asynchronous Common Subset in Constant Time
  • Sisi Duan
  • Xin Wang
  • Haibin Zhang

Asynchronous common subset (ACS) is a powerful paradigm enabling applications such as Byzantine fault-tolerance (BFT) and multi-party computation (MPC). The most efficient ACS framework in the information-theoretic setting is due to Ben-Or, Kelmer, and Rabin (BKR, 1994). The BKR ACS protocol has been both theoretically and practically impactful. However, the BKR protocol has an O(log n) running time (where n is the number of replicas) due to the usage of n parallel asynchronous binary agreement (ABA) instances, impacting both performance and scalability. Indeed, for a network of 16 ~ 64 replicas, the parallel ABA phase occupies about 95% ~ 97% of the total runtime in BKR. A long-standing open problem is whether we can build an ACS framework with O(1) time while not increasing the message or communication complexity of the BKR protocol.

In this paper, we resolve the open problem, presenting the first constant-time ACS protocol with O(n3) messages in the information-theoretic and signature-free settings. Moreover, as a key ingredient of our new ACS framework and an interesting primitive in its own right, we provide the first information-theoretic multivalued validated Byzantine agreement (MVBA) protocol with O(1) time and O(n3) messages. Both results can improve-asymptotically and concretely-various applications using ACS and MVBA in the information-theoretic, quantum-safe, or signature-free settings. As an example, we implement FIN, a BFT protocol instantiated using our framework. Via a 121-server deployment on Amazon EC2, we show FIN is significantly more efficient than PACE (CCS 2022), the state-of-the-art asynchronous BFT protocol of the same type. In particular, FIN reduces the overhead of the ABA phase to as low as 1.23% of the total runtime, and FIN achieves up to 3.41x the throughput of PACE. We also show that FIN outperforms other BFT protocols with the standard liveness property such as Dumbo and Speeding Dumbo.

Analyzing the Real-World Security of the Algorand Blockchain
  • Erica Blum
  • Derek Leung
  • Julian Loss
  • Jonathan Katz
  • Tal Rabin

The Algorand consensus protocol is interesting both in theory and in practice. On the theoretical side, to achieve adaptive security, it introduces the novel idea of player replaceability, where each step of the protocol is executed by a different randomly selected committee whose members remain secret until they send their first and only message. The protocol provides consistency under arbitrary network conditions and liveness under intermittent network partitions. On the practical side, the protocol is used to secure the Algorand cryptocurrency, whose total value is approximately 850M at the time of writing.

The Algorand protocol in use differs substantially from the protocols described in the published literature on Algorand. Despite its significance, it lacks a formal analysis. In this work, we describe and analyze the Algorand consensus protocol as deployed today in Algorand's ecosystem. We show that the overall protocol framework is sound by characterizing network conditions and parameter settings under which the protocol can be proven secure.

Fait Accompli Committee Selection: Improving the Size-Security Tradeoff of Stake-Based Committees
  • Peter Gaži
  • Aggelos Kiayias
  • Alexander Russell

We study the problem of committee selection in the context of proof-of-stake consensus mechanisms or distributed ledgers. These settings determine a family of participating parties---each of which has been assigned a non-negative ''stake''---and are subject to an adversary that may corrupt a subset of the parties. The challenge is to select a committee of participants that accurately reflects the proportion of corrupt and honest parties, as measured by stake, in the full population. The trade-off between committee size and the probability of selecting a committee that over-represents the corrupt parties is a fundamental factor in both security and efficiency of proof-of-stake consensus, as well as committee-run layer-two protocols.

We propose and analyze several new committee selection schemes that improve upon existing techniques by adopting low-variance assignment of certain committee members that hold significant stake. These schemes provide notable improvements to the size--security trade-off arising from the stake distributions of many deployed ledgers.

LedgerLocks: A Security Framework for Blockchain Protocols Based on Adaptor Signatures
  • Erkan Tairi
  • Pedro Moreno-Sanchez
  • Clara Schneidewind

The scalability and interoperability challenges in current cryptocurrencies have motivated the design of cryptographic protocols that enable efficient applications on top and across widely used cryptocurrencies such as Bitcoin or Ethereum. Examples of such protocols include (virtual) payment channels, atomic swaps, oracle-based contracts, deterministic wallets, and coin mixing services. Many of these protocols are built upon minimal core functionalities supported by a wide range of cryptocurrencies. Most prominently, adaptor signatures (AS) have emerged as a powerful tool for constructing blockchain protocols that are (mostly) agnostic to the specific logic of the underlying cryptocurrency. Even though AS-based protocols are built upon the same cryptographic principles, there exists no modular and faithful way for reasoning about their security. Instead, all the works analyzing such protocols focus on reproving how adaptor signatures are used to cryptographically link transactions while considering highly simplified blockchain models that do not capture security-relevant aspects of transaction execution in blockchain-based consensus.

To help this, we present LedgerLocks, a framework for the secure design of AS-based blockchain applications in the presence of a realistic blockchain. LedgerLocks defines the concept of AS-locked transactions, transactions whose publication is bound to the knowledge of a cryptographic secret. We argue that AS-locked transactions are the common building block of AS-based blockchain protocols and we define GLedgerLocks a realistic ledger model in the Universal Composability framework with built-in support for AS-locked transactions. As LedgerLocks abstracts from the cryptographic realization of AS-locked transactions, it allows protocol designers to focus on the blockchain-specific security considerations instead.

SESSION: Session 16: Defenses

Capacity: Cryptographically-Enforced In-Process Capabilities for Modern ARM Architectures
  • Kha Dinh Duy
  • Kyuwon Cho
  • Taehyun Noh
  • Hojoon Lee

In-process compartmentalization and access control have been actively explored to provide in-place and efficient isolation of in-process security domains. Many works have proposed compartmentalization schemes that leverage hardware features, most notably using the new page-based memory isolation feature called Protection Keys for Userspace (PKU) on x86. Unfortunately, the modern ARM architecture does not have an equivalent feature. Instead, newer ARM architectures introduced Pointer Authentication (PA) and Memory Tagging Extension (MTE), adapting the reference validation model for memory safety and runtime exploit mitigation. We argue that those features have been underexplored in the context of compartmentalization and that they can be retrofitted to implement a capability-based in-process access control scheme.

This paper presents Capacity, a novel hardware-assisted intra-process access control design that embraces capability-based security principles. Capacity coherently incorporates the new hardware security features on ARM that already exhibit inherent characteristics of capability. It supports the life-cycle protection of the domain's sensitive objects - starting from their import from the file system to their place in memory. With intra-process domains authenticated with unique PA keys, Capacity transforms file descriptors and memory pointers into cryptographically-authenticated references and completely mediates reference usage with its program instrumentation framework and an efficient system call monitor. We evaluate our Capacity-enabled NGINX web server prototype and other common applications in which sensitive resources are isolated into different domains. Our evaluation shows that Capacity incurs a low-performance overhead of approximately 17% for the single-threaded and 13.54% for the multi-threaded webserver.

Cryptographically Enforced Memory Safety
  • Martin Unterguggenberger
  • David Schrammel
  • Lukas Lamster
  • Pascal Nasahl
  • Stefan Mangard

C/C++ memory safety issues, such as out-of-bounds errors, are still prevalent in today's applications. The presence of a single exploitable software bug allows an adversary to gain unauthorized memory access and ultimately compromise the entire system. Typically, memory safety schemes only achieve widespread adaption if they provide lightweight and practical security. Thus, hardware support is indispensable. However, countermeasures often restrict unauthorized access to data using heavy-weight protection mechanisms that extensively reshape the processor's microarchitecture and break legacy compatibility.

This paper presents cryptographically sealed pointers, a novel approach for memory safety based on message authentication codes (MACs) and object-granular metadata that is efficiently scaled and stored in tagged memory. The MAC cryptographically binds the object's bounds and liveness information, represented by the corresponding address range and memory tag, to the pointer. Through recent low-latency block cipher designs, we are able to authenticate sealed pointers on every memory access, cryptographically enforcing temporal and spatial memory safety. Our lightweight ISA extension only requires minimal hardware changes while maintaining binary compatibility. We systematically analyze the security and efficacy of our design using the NIST Juliet C/C++ test suite. The simulated performance overhead of our prototype implementation showcases competitive results for the SPEC CPU2017 benchmark suite with an average overhead of just 1.3 % and 9.5 % for the performance and efficiency modes, respectively.

Put Your Memory in Order: Efficient Domain-based Memory Isolation for WASM Applications
  • Hanwen Lei
  • Ziqi Zhang
  • Shaokun Zhang
  • Peng Jiang
  • Zhineng Zhong
  • Ningyu He
  • Ding Li
  • Yao Guo
  • Xiangqun Chen

Memory corruption vulnerabilities can have more serious consequences in WebAssembly than in native applications. Therefore, we present \tool, the first WebAssembly runtime with memory isolation. Our insight is to use MPK hardware for efficient memory protection in WebAssembly. However, MPK and WebAssembly have different memory models: MPK protects virtual memory pages, while WebAssembly uses linear memory that has no pages. Mapping MPK APIs to WebAssembly causes memory bloating and low running efficiency. To solve this, we propose \acfdilm, which protects linear memory at function-level granularity. We implemented \acdilm into the official WebAssembly runtime to build \tool. Our evaluation shows that \tool can prevent memory corruption in real projects with a 1.77% average overhead and negligible memory cost.

PANIC: PAN-assisted Intra-process Memory Isolation on ARM
  • Jiali Xu
  • Mengyao Xie
  • Chenggang Wu
  • Yinqian Zhang
  • Qijing Li
  • Xuan Huang
  • Yuanming Lai
  • Yan Kang
  • Wei Wang
  • Qiang Wei
  • Zhe Wang

Intra-process memory isolation is a well-known technique to enforce least privilege within a process. In this paper, we propose a generic and efficient intra-process memory isolation technique named PANIC, by leveraging Privileged Access Never (PAN) and load/store unprivileged (LSU) instructions on AArch64. PANIC executes process code in kernel mode and compartments code into trusted and untrusted components. The untrusted code is restricted from accessing the isolated memory region, which is located on user pages, and the trusted code is allowed to access the isolated memory region by using LSU instructions. To mitigate threats induced by running user code in kernel mode, PANIC provides two novel security mechanisms: shim-based memory isolation and sensitive instruction emulation. PANIC provides a generic and efficient isolation primitive that can be applied in three different isolation scenarios: protecting sensitive data in CFI, creating isolated execution environments, and hardening JIT code cache. We have implemented a prototype of PANIC and experimental evaluation shows that PANIC incurs very low performance overhead, and performs better than existing methods.

SESSION: Session 17: Secure Hardware

Security Verification of Low-Trust Architectures
  • Qinhan Tan
  • Yonathan Fisseha
  • Shibo Chen
  • Lauren Biernacki
  • Jean-Baptiste Jeannin
  • Sharad Malik
  • Todd Austin

Low-trust architectures work on, from the viewpoint of software, always-encrypted data, and significantly reduce the amount of hardware trust to a small software-free enclave component. In this paper, we perform a complete formal verification of a specific low-trust architecture, the Sequestered Encryption (SE) architecture, to show that the design is secure against direct data disclosures and digital side channels for all possible programs. We first define the security requirements of the ISA of SE low-trust architecture. Looking upwards, this ISA serves as an abstraction of the hardware for the software, and is used to show how any program comprising these instructions cannot leak information, including through digital side channels. Looking downwards this ISA is a specification for the hardware, and is used to define the proof obligations for any RTL implementation arising from the ISA-level security requirements. These cover both functional and digital side-channel leakage. Next, we show how these proof obligations can be successfully discharged using commercial formal verification tools. We demonstrate the efficacy of our RTL security verification technique for seven different correct and buggy implementations of the SE architecture.

TunneLs for Bootlegging: Fully Reverse-Engineering GPU TLBs for Challenging Isolation Guarantees of NVIDIA MIG
  • Zhenkai Zhang
  • Tyler Allen
  • Fan Yao
  • Xing Gao
  • Rong Ge

Recent studies have revealed much detailed information about the translation lookaside buffers (TLBs) of modern CPUs, but we find that many properties of such components in modern GPUs still remain unknown or unclear. To fill this knowledge gap, we develop a new GPU TLB reverse-engineering method and apply it to a variety of consumer- and server-grade GPUs in Turing and Ampere generations. Aside from learning significantly more comprehensive and accurate GPU TLB properties, we discover a design flaw of NVIDIA Multi-Instance GPU (MIG) feature. MIG claims full partitioning of the entire GPU memory system for secure GPU sharing in cloud computing. However, we surprisingly find that MIG does not partition the last-level TLB, which is shared by all the compute units in a GPU. Exploiting this design flaw and learned TLB properties, we are able to construct a covert channel for data exfiltration across MIG-enforced isolation. To the best of our knowledge, this is the first attack on MIG. We evaluate the proposed attack on a commercial cloud platform, and we successfully achieve reliable data exfiltration from a victim tenant at a speed of up to 31 kbps with a very high accuracy around 99.8%. Even when the victim is using the GPU for deep neural network training, the transmission can still reach more than 25 kbps with a more than 99.5% accuracy. We propose and implement a mitigation approach that can effectively thwart data exfiltration through this covert channel. Additionally, we present a preliminary study on exploiting the access patterns of the last-level TLB to infer the identity of applications running in other MIG-created GPU instances.

FetchBench: Systematic Identification and Characterization of Proprietary Prefetchers
  • Till Schlüter
  • Amit Choudhari
  • Lorenz Hetterich
  • Leon Trampert
  • Hamed Nemati
  • Ahmad Ibrahim
  • Michael Schwarz
  • Christian Rossow
  • Nils Ole Tippenhauer

Prefetchers speculatively fetch memory using predictions on future memory use by applications. Different CPUs may use different prefetcher types, and two implementations of the same prefetcher can differ in details of their characteristics, leading to distinct runtime behavior. For a few implementations, security researchers showed through manual analysis how to exploit specific prefetchers to leak data. Identifying such vulnerabilities required tedious reverse-engineering, as prefetcher implementations are proprietary and undocumented. So far, no systematic study of prefetchers in common CPUs is available, preventing further security assessment.

In this work, we address the following question: How can we systematically identify and characterize under-specified prefetchers in proprietary processors? To answer this question, we systematically analyze approaches to prefetching, design cross-platform tests to identify and characterize prefetchers on a given CPU, and demonstrate that our implementation FetchBench can characterize prefetchers on 19 different ARM and x86-64 CPUs. For example, FetchBench uncovers and characterizes a previously unknown replay-based prefetcher on the ARM Cortex-A72 CPU. Based on these findings, we demonstrate two novel attacks that exploit this undocumented prefetcher as a side channel to leak secret information, even from the secure TrustZone into the normal world.

Combined Private Circuits - Combined Security Refurbished
  • Jakob Feldtkeller
  • Tim Güneysu
  • Thorben Moos
  • Jan Richter-Brockmann
  • Sayandeep Saha
  • Pascal Sasdrich
  • Francois-Xavier Standaert

Physical attacks are well-known threats to cryptographic implementations. While countermeasures against passive Side-Channel Analysis (SCA) and active Fault Injection Analysis (FIA) exist individually, protecting against their combination remains a significant challenge. A recent attempt at achieving joint security has been published at CCS 2022 under the name CINI-MINIS. The authors introduce relevant security notions and aim to construct arbitrary-order gadgets that remain trivially composable in the presence of a combined adversary. Yet, we show that all CINI-MINIS gadgets at any order are susceptible to a devastating attack with only a single fault and probe due to a lack of error correction modules in the compression. We explain the details of the attack, pinpoint the underlying problem in the constructions, propose an additional design principle, and provide new (fixed) provably secure and composable gadgets for arbitrary order. Luckily, the changes in the compression stage help us to save correction modules and registers elsewhere, making the resulting Combined Private Circuits (CPC) more secure and more efficient than the original ones. We also explain why the discovered flaws have been missed by the associated formal verification tool VERICA (TCHES 2022) and propose fixes to remove its blind spot. Finally, we explore alternative avenues to repair the compression stage without additional corrections based on non-completeness, i.e. constructing a compression that never recombines any secret. Yet, while this approach could have merit for low-order gadgets, it is, for now, hard to generalize and scales poorly to higher orders. We conclude that our refurbished arbitrary order CINI gadgets provide a solid foundation for further research.

SESSION: Session 18: Traffic Analysis

Point Cloud Analysis for ML-Based Malicious Traffic Detection: Reducing Majorities of False Positive Alarms
  • Chuanpu Fu
  • Qi Li
  • Ke Xu
  • Jianping Wu

As an emerging security paradigm, machine learning (ML) based malicious traffic detection is an essential part of automatic defense against network attacks. Powered by dedicated traffic features, the ML based methods can detect various sophisticated attacks, in particular capturing zero-day attacks, which cannot be achieved by the traditional non-ML methods. However, false positive alarms raised by these advanced ML methods become the major obstacle to real-world deployment. These methods require experts to manually analyze false positives, which incurs significant labor costs. Thus, it is vital that we can reduce such false positives without heavyweight manual investigations.

In this paper, we propose pVoxel, an unsupervised method that identifies false positives for existing ML based traffic detection systems without requiring any prior knowledge on the alarms. To effectively process each alarm, pVoxel treats the traffic feature vector associated with the alarm as a point in the traffic feature space, and utilizes point cloud analysis to capture the topological features among the points for classifying the alarms. In particular, we aggregate the points into voxels, i.e., high-dimensional cubes, which allows us to develop an unsupervised method to identify the voxels indicating false positives according to their density features. Our experiments with 75 real-world datasets demonstrate that pVoxel can effectively reduce 95.55% false positives for 11 state-of-the-art traffic detection methods under various settings. Meanwhile, pVoxel can handle 201.10 thousand alarms per second, which demonstrates that it can achieve efficient alarm processing.

Learning from Limited Heterogeneous Training Data: Meta-Learning for Unsupervised Zero-Day Web Attack Detection across Web Domains
  • Peiyang Li
  • Ye Wang
  • Qi Li
  • Zhuotao Liu
  • Ke Xu
  • Ju Ren
  • Zhiying Liu
  • Ruilin Lin

Recently unsupervised machine learning based systems have been developed to detect zero-day Web attacks, which can effectively enhance existing Web Application Firewalls (WAFs). However, prior arts only consider detecting attacks on specific domains by training particular detection models for the domains. These systems require a large amount of training data, which causes a long period of time for model training and deployment. In this paper, we propose RETSINA, a novel meta-learning based framework that enables zero-day Web attack detection across different domains in an organization with limited training data. Specifically, it utilizes meta-learning to share knowledge across these domains, e.g., the relationship between HTTP requests in heterogeneous domains, to efficiently train detection models. Moreover, we develop an adaptive preprocessing module to facilitate semantic analysis of Web requests across different domains and design a multi-domain representation method to capture semantic correlations between different domains for cross-domain model training. We conduct experiments using four real-world datasets on different domains with a total of 293M Web requests. The experimental results demonstrate that RETSINA outperforms the existing unsupervised Web attack detection methods with limited training data, e.g., RETSINA needs only 5-minute training data to achieve comparable detection performance to the existing methods that train separate models for different domains using 1-day training data. We also conduct real-world deployment in an Internet company. RETSINA captures on average 126 and 218 zero-day attack requests per day in two domains, respectively, in one month.

Realistic Website Fingerprinting By Augmenting Network Traces
  • Alireza Bahramali
  • Ardavan Bozorgi
  • Amir Houmansadr

Website Fingerprinting (WF) is considered a major threat to the anonymity of Tor users (and other anonymity systems). While state-of-the-art WF techniques have claimed high attack accuracies, e.g., by leveraging Deep Neural Networks (DNN), several recent works have questioned the practicality of such WF attacks in the real world due to the assumptions made in the design and evaluation of these attacks. In this work, we argue that such impracticality issues are mainly due to the attacker's inability in collecting training data in comprehensive network conditions, e.g., a WF classifier may be trained only on high-bandwidth samples collected on specific high-bandwidth network links but deployed on connections with different network conditions. We show that augmenting network traces can enhance the performance of WF classifiers in unobserved network conditions. Specifically, we introduce NetAugment, an augmentation technique tailored to the specifications of Tor traces. We instantiate NetAugment through semi-supervised and self-supervised learning techniques. Our extensive open-world and close-world experiments demonstrate that under practical evaluation settings, our WF attacks provide superior performances compared to the state-of-the-art; this is due to their use of augmented network traces for training, which allows them to learn the features of target traffic in unobserved settings (e.g., unknown bandwidth, Tor circuits, etc.). For instance, with a 5-shot learning in a closed-world scenario, our self-supervised WF attack (named NetCLR) reaches up to 80% accuracy when the traces for evaluation are collected in a setting unobserved by the WF adversary. This is compared to an accuracy of 64.4% achieved by the state-of-the-art Triplet Fingerprinting [34]. We believe that the promising results of our work can encourage the use of network trace augmentation in other types of network traffic analysis.

Transformer-based Model for Multi-tab Website Fingerprinting Attack
  • Zhaoxin Jin
  • Tianbo Lu
  • Shuang Luo
  • Jiaze Shang

While the anonymous communication system Tor can protect user privacy, website fingerprinting (WF) attackers can still identify the websites that users access over encrypted network connections by analyzing the metadata generated during network communication. Despite the emergence of new WF attack techniques in recent years, most research in this area has focused on pure traffic traces generated from single-tab browsing behavior. However, multi-tab browsing behavior significantly degrades the performance of WF classification models based on the single-tab assumption. As a result, some research has shifted its focus to multi-tab WF attacks, although most of these works have limited utilization of the mixed information contained in multi-tab traces. In this paper, we propose an end-to-end multi-tab WF attack model, called Transformer-based model for Multi-tab Website Fingerprinting attack (TMWF). Inspired by object detection algorithms in computer vision, we treat multi-tab WF recognition as a problem of predicting ordered sets with a maximum length. By adding enough single-tab queries to the detection model and letting each query extract WF features from different positions in the multi-tab traces, our model's Transformer architecture capitalizes more fully on trace features. Paired with our new proposed model training approach, we accomplish adaptive recognition of multi-tab traces with varying numbers of web pages. This approach successfully eliminates a strong and unrealistic assumption in the field of multi-tab WF attacks - that the number of tabs contained in a sample belongs to the attacker's prior knowledge. Experimental results in various scenarios demonstrate that the performance of TMWF is significantly better than existing multi-tab WF attack models. To evaluate model performance in more authentic scenarios, we present a dataset of multi-tab trace data collected from real open-world environments.

SESSION: Session 19: Advanced Public Key Encryption

Efficient Registration-Based Encryption
  • Noemi Glaeser
  • Dimitris Kolonelos
  • Giulio Malavolta
  • Ahmadreza Rahimi

Registration-based encryption (RBE) was recently introduced as an alternative to identity-based encryption (IBE), to resolve the key-escrow problem: In RBE, the trusted authority is substituted with a weaker entity, called the key curator, who has no knowledge of any secret key. Users generate keys on their own and then publicly register their identities and their corresponding public keys to the key curator. RBE is a promising alternative to IBE, retaining many of its advantages while removing the key-escrow problem, the major drawback of IBE. Unfortunately, all existing constructions of RBE use cryptographic schemes in a non black-box way, which makes them prohibitively expensive. It has been estimated that the size of an RBE ciphertext would be in the order of terabytes (though no RBE has even been implemented).

In this work, we propose a new approach to construct RBE, from standard assumptions in bilinear groups. Our scheme is black-box and it is concretely highly efficient-a ciphertext is 914 bytes. To substantiate this claim, we implemented a prototype of our scheme and we show that it scales to millions of users. The public parameters of the scheme are on the order of kilobytes. The most expensive operation (registration) takes at most a handful of seconds, whereas the encryption and decryption runtimes are on the order of milliseconds. This is the first-ever implementation of an RBE scheme and demonstrates that the practical deployment of RBE is already possible with today's hardware.

Efficient Set Membership Encryption and Applications
  • Matthew Green
  • Abhishek Jain
  • Gijs Van Laer

The emerging area of laconic cryptography [Cho et al., CRYPTO'17] involves the design of two-party protocols involving a sender and a receiver, where the receiver's input is large. The key efficiency requirement is that the protocol communication complexity must be independent of the receiver's input size. In recent years, many tasks have been studied under this umbrella, including laconic oblivious transfer (ℓOT).

In this work, we introduce the notion of Set Membership Encryption (SME) - a new member in the area of laconic cryptography. SME allows a sender to encrypt to one recipient from a universe of receivers, while using a small digest from a large subset of receivers. A recipient is only able to decrypt the message if and only if it is part of the large subset. We show that ℓOT can be derived from SME.

We provide efficient constructions of SME using bilinear groups. Our solutions achieve orders of magnitude improvements in decryption times than state-of-the-art (on ℓOT) and significant improvements overall in concrete efficiency over initial works in the area of laconic cryptography, albeit at the cost of worse asymptotics.

Realizing Flexible Broadcast Encryption: How to Broadcast to a Public-Key Directory
  • Rachit Garg
  • George Lu
  • Brent Waters
  • David J. Wu

Suppose a user wants to broadcast an encrypted message to K recipients. With public-key encryption, the sender would construct K different ciphertexts, one for each recipient. The size of the broadcasted message then scales linearly with K. A natural question is whether the sender can encrypt the message with a ciphertext whose size scales \em sublinearly with the number of recipients.

Broadcast encryption offers one solution to this problem, but at the cost of introducing a central \em trusted party who issues keys to different users (and correspondingly, has the ability to decrypt all ciphertexts). Recently, several works have introduced notions like distributed broadcast encryption and flexible broadcast encryption, which combine the decentralized, trustless model of traditional public-key encryption with the efficiency guarantees of broadcast encryption. In the specific case of a flexible broadcast encryption scheme, users generate their own public/private keys and can then post their public key in any public-key directory. Subsequently, a user can encrypt to an \em arbitrary set of user public keys with a ciphertext whose size scales polylogarithmically with the number of public keys in the broadcast set. A distributed broadcast encryption scheme is a more restrictive primitive where each public key is also associated with an index, and one can only encrypt to a set of public keys corresponding to different indices.

In this work, we introduce a generic compiler that takes any distributed broadcast encryption scheme and produces a flexible broadcast encryption scheme. Moreover, whereas existing concretely-efficient constructions of distributed broadcast encryption have public keys whose size scales with the maximum number of users in the system, our resulting flexible broadcast encryption scheme has the appealing property that the size of each public key scales with the size of the maximum broadcast set. We provide an implementation of the flexible broadcast encryption scheme obtained by applying our compiler to the distributed broadcast encryption scheme of Kolonelos, Malavolta, and Wee (ASIACRYPT 2023). With our scheme, a sender can encrypt a 128-bit symmetric key to a set of over 1000 recipients (from a directory with a million users) with a 2~KB ciphertext. This is 16× smaller than separately encrypting to each user using standard ElGamal encryption. The cost is that the user public keys in flexible broadcast encryption are much larger (50 KB) compared to standard ElGamal public keys (32 bytes). Compared to the similarly-instantiated distributed broadcast encryption scheme, we achieve a 32× reduction in the user's public key size (50~KB vs. 1.6~MB) without changing the ciphertext size. Thus, flexible broadcast encryption provides an efficient way to encrypt messages to large groups of users at the cost of larger individual public keys (relative to vanilla public-key encryption).

Post-Quantum Multi-Recipient Public Key Encryption
  • Joël Alwen
  • Dominik Hartmann
  • Eike Kiltz
  • Marta Mularczyk
  • Peter Schwabe

A multi-message multi-recipient PKE (mmPKE) encrypts a batch of messages, in one go, to a corresponding set of independently chosen receiver public keys. The resulting ''multi-recipient ciphertext'' can be then be reduced (by any 3rd party) to a shorter, receiver specific, ''invidual ciphertext.'' Finally, to recover the i-th message in the batch from their indvidual ciphertext the i-th receiver only needs their own decryption key. A special case of mmPKE is multi-recipient PKE (mPKE) where all receivers are sent the same message. By treating (m)mPKE and their KEM counterparts as a stand-alone primitives we allow for more efficient constructions than trivially composing individual PKE/KEM instances. This is especially valuable in the post-quantum setting, where PKE/KEM ciphertexts and public keys tend to be far larger than their classic counterparts.

In this work we describe a collection of new results around mKEMs and (m)mPKEs. We provide both classic and post-quantum proofs for all results. Our results are geared towards practical constructions and applications (for example in the domain of PQ-secure group messaging).

Concretely, our results include a new non-adaptive to adaptive compiler for CPA-secure mKEMs resulting in public keys roughly half the size of the previous state-of-the-art [Hashimoto et al., CCS'21]. We also prove their FO transform for mKEMs to be secure in the presence of adaptive corruptions in the quantum random oracle model. Further, we provide the first mKEM combiner. Finally, we give two mmPKE constructions. The first is an arbitrary message-length black-box construction from an mKEM (e.g. one produced by combining a PQ with a classic mKEM). The second is optimized for short messages (which is suited for several recent mmPKE applications) and achieves hybrid PQ/classic security more directly. When encrypting n short messages at 256-bits of security the mmPKE ciphertext are 144n bytes shorter than the generic construction. Finally, we provide an optimized implementation of the (CCA secure) mKEM construction based on the NIST PQC winner Kyber and report benchmarks showing a significant speedup for encapsulation and up to 79% savings in ciphertext size compared to a naive solution.

SESSION: Session 20: Machine Learning Attacks II

Prediction Privacy in Distributed Multi-Exit Neural Networks: Vulnerabilities and Solutions
  • Tejas Kannan
  • Nick Feamster
  • Henry Hoffmann

Distributed Multi-exit Neural Networks (MeNNs) use partitioning and early exits to reduce the cost of neural network inference on low-power sensing systems. Existing MeNNs exhibit high inference accuracy using policies that select when to exit based on data-dependent prediction confidence. This paper presents a side-channel attack against distributed MeNNs employing data-dependent early exit policies. We find that an adversary can observe when a distributed MeNN exits early using encrypted communication patterns. An adversary can then use these observations to discover the MeNN's predictions with over 1.85× the accuracy of random guessing. In some cases, the side-channel leaks over 80% of the model's predictions. This leakage occurs because prior policies make decisions using a single threshold on varying prediction confidence distributions. We address this problem through two new exit policies. The first method, Per-Class Exiting (PCE), uses multiple thresholds to balance exit rates across predicted classes. This policy retains high accuracy and lowers prediction leakage, but we prove it has no privacy guarantees. We obtain these guarantees with a second policy, Confidence-Guided Randomness (CGR), which randomly selects when to exit using probabilities biased toward PCE's decisions. CGR provides statistically equivalent privacy with consistently higher inference accuracy than exiting early uniformly at random. Both PCE and CGR have low overhead, making them viable security solutions in resource-constrained settings.

Unforgeability in Stochastic Gradient Descent
  • Teodora Baluta
  • Ivica Nikolic
  • Racchit Jain
  • Divesh Aggarwal
  • Prateek Saxena

Stochastic Gradient Descent (SGD) is a popular training algorithm, a cornerstone of modern machine learning systems. Several security applications benefit from determining if SGD executions are forgeable, i.e., whether the model parameters seen at a given step are obtainable by more than one distinct set of data samples. In this paper, we present the first attempt at proving impossibility of such forgery. We furnish a set of conditions, which are efficiently checkable on concrete checkpoints seen during training runs, under which checkpoints are provably unforgeable at that step. Our experiments show that the conditions are somewhat mild and hence always satisfied at checkpoints sampled in our experiments. Our results sharply contrast prior findings at a high level: We show that checkpoints we find to be provably unforgeable have been deemed to be forgeable using the same methodology and experimental setup suggested in prior work. This discrepancy arises because of unspecified subtleties in definitions. We experimentally confirm that the distinction matters, i.e., small errors amplify during training to produce significantly observable difference in final models trained. We hope our results serve as a cautionary note on the role of algebraic precision in forgery definitions and related security arguments.

Devil in Disguise: Breaching Graph Neural Networks Privacy through Infiltration
  • Lingshuo Meng
  • Yijie Bai
  • Yanjiao Chen
  • Yutong Hu
  • Wenyuan Xu
  • Haiqin Weng

Graph neural networks (GNNs) have been developed to mine useful information from graph data of various applications, e.g., healthcare, fraud detection, and social recommendation. However, GNNs open up new attack surfaces for privacy attacks on graph data. In this paper, we propose Infiltrator, a privacy attack that is able to pry node-level private information based on black-box access to GNNs. Different from existing works that require prior information of the victim node, we explore the possibility of conducting the attack without any information of the victim node. Our idea is to infiltrate the graph with attacker-created nodes to befriend the victim node. More specifically, we design infiltration schemes that enable the adversary to infer the label, neighboring links, and sensitive attributes of a victim node. We evaluate Infiltrator with extensive experiments on three representative GNN models and six real-world datasets. The results demonstrate that Infiltrator can achieve an attack performance of more than 98% in all three attacks, outperforming baseline approaches. We further evaluate the defense resistance of Infiltrator against the graph homophily defender and the differentially private model.

Evading Watermark based Detection of AI-Generated Content
  • Zhengyuan Jiang
  • Jinghuai Zhang
  • Neil Zhenqiang Gong

A generative AI model can generate extremely realistic-looking content, posing growing challenges to the authenticity of information. To address the challenges, watermark has been leveraged to detect AI-generated content. Specifically, a watermark is embedded into an AI-generated content before it is released. A content is detected as AI-generated if a similar watermark can be decoded from it. In this work, we perform a systematic study on the robustness of such watermark-based AI-generated content detection. We focus on AI-generated images. Our work shows that an attacker can post-process a watermarked image via adding a small, human-imperceptible perturbation to it, such that the post-processed image evades detection while maintaining its visual quality. We show the effectiveness of our attack both theoretically and empirically. Moreover, to evade detection, our adversarial post-processing method adds much smaller perturbations to AI-generated images and thus better maintain their visual quality than existing popular post-processing methods such as JPEG compression, Gaussian blur, and Brightness/Contrast. Our work shows the insufficiency of existing watermark-based detection of AI-generated content, highlighting the urgent needs of new methods. Our code is publicly available: https://github.com/zhengyuan-jiang/WEvade.

SESSION: Session 21: Defenses & Smart Contract Security

Phoenix: Detect and Locate Resilience Issues in Blockchain via Context-Sensitive Chaos
  • Fuchen Ma
  • Yuanliang Chen
  • Yuanhang Zhou
  • Jingxuan Sun
  • Zhuo Su
  • Yu Jiang
  • Jiaguang Sun
  • Huizhong Li

Resilience is vital to blockchain systems and helps them automatically adapt and continue providing their service when adverse situations occur, e.g., node crashing and data discarding. However, due to the vulnerabilities in their implementation, blockchain systems may fail to recover from the error situations, resulting in permanent service disruptions. Such vulnerabilities are called resilience issues.

In this paper, we propose Phoenix, a system that helps detect and locate blockchain systems' resilience issues by context-sensitive chaos. First, we identify two typical types of resilience issues in blockchain systems: node unrecoverable and data unrecoverable. Then, we design three context-sensitive chaos strategies tailored to the blockchain feature. Additionally, we create a coordinator to effectively trigger resilience issues by scheduling these strategies. To better analyze them, we collect and sort all strategies into a pool and generate a reproducing sequence to locate and reproduce those issues. We evaluated Phoenix on 5 widely used commercial blockchain systems and detected 13 previous-unknown resilience issues. Besides, Phoenix successfully reproduces all of them, with 5.15 steps on average. The corresponding developers have fixed these issues. After that, the chaos resistance time of blockchains is improved by 143.9% on average. This indicates that Phoenix can significantly improve the resilience of these blockchains.

Fuzz on the Beach: Fuzzing Solana Smart Contracts
  • Sven Smolka
  • Jens-Rene Giesen
  • Pascal Winkler
  • Oussama Draissi
  • Lucas Davi
  • Ghassan Karame
  • Klaus Pohl

Solana has quickly emerged as a popular platform for building decentralized applications (DApps), such as marketplaces for non-fungible tokens (NFTs). A key reason for its success are Solana's low transaction fees and high performance, which is achieved in part due to its stateless programming model. Although the literature features extensive tooling support for smart contract security, current solutions are largely tailored for the Ethereum Virtual Machine. Unfortunately, the very stateless nature of Solana's execution environment introduces novel attack patterns specific to Solana requiring a rethinking for building vulnerability analysis methods.

In this paper, we address this gap and propose FuzzDelSol, the first binary-only coverage-guided fuzzing architecture for Solana smart contracts. FuzzDelSol faithfully models runtime specifics such as smart contract interactions. Moreover, since source code is not available for the large majority of Solana contracts, FuzzDelSol operates on the contract's binary code. Hence, due to the lack of semantic information, we carefully extracted low-level program and state information to develop a diverse set of bug oracles covering all major bug classes in Solana. Our extensive evaluation on 6049 smart contracts shows that FuzzDelSol's bug oracles finds impactful vulnerabilities with a high precision and recall. To the best of our knowledge, this is the largest evaluation of the security landscape on the Solana mainnet.

Lanturn: Measuring Economic Security of Smart Contracts Through Adaptive Learning
  • Kushal Babel
  • Mojan Javaheripi
  • Yan Ji
  • Mahimna Kelkar
  • Farinaz Koushanfar
  • Ari Juels

We introduce Lanturn: a general purpose adaptive learning-based framework for measuring the cryptoeconomic security of composed decentralized-finance (DeFi) smart contracts. Lanturn discovers strategies comprising of concrete transactions for extracting economic value from smart contracts interacting with a particular transaction environment. We formulate the strategy discovery as a black-box optimization problem and leverage a novel adaptive learning-based algorithm to address it.

Lanturn features three key properties. First, it needs no contract-specific heuristics or reasoning, due to our black-box formulation of cryptoeconomic security. Second, it utilizes a simulation framework that operates natively on blockchain state and smart contract machine code, such that transactions returned by Lanturn's learning-based optimization engine can be executed on-chain without modification. Finally, Lanturn is scalable in that it can explore strategies comprising a large number of transactions that can be reordered or subject to insertion of new transactions.

We evaluate Lanturn on the historical data of the biggest and most active DeFi Applications: Sushiswap, UniswapV2, UniswapV3, and AaveV2. Our results show that Lanturn not only rediscovers existing, well-known strategies for extracting value from smart contracts, but also discovers new strategies that are previously undocumented. Lanturn also consistently discovers higher value than evidenced in the wild, surpassing a natural baseline computed using value extracted by bots and other strategic agents.

Riggs: Decentralized Sealed-Bid Auctions
  • Nirvan Tyagi
  • Arasu Arun
  • Cody Freitag
  • Riad Wahby
  • Joseph Bonneau
  • David Mazières

We introduce the first practical protocols for fully decentralized sealed-bid auctions using timed commitments. Timed commitments ensure that the auction is finalized fairly even if all participants drop out after posting bids or if n bidders collude to try to learn the nth bidder's bid value. Our protocols rely on a novel non-malleable timed commitment scheme which efficiently supports range proofs to establish that bidders have sufficient funds to cover a hidden bid value. This allows us to penalize users who abandon bids for exactly the bid value, while supporting simultaneous bidding in multiple auctions with a shared collateral pool. Our protocols are concretely efficient and we have implemented them in an Ethereum-compatible smart contract which automatically enforces payment and delivery of an auctioned digital asset.

SESSION: Session 22: Fuzzing I

DSFuzz: Detecting Deep State Bugs with Dependent State Exploration
  • Yinxi Liu
  • Wei Meng

Traditional random mutation-based fuzzers are ineffective at reaching deep program states that require specific input values. Consequently, a large number of deep bugs remain undiscovered. To enhance the effectiveness of input mutation, previous research has utilized taint analysis to identify control-dependent critical bytes and only mutates those bytes. However, existing works do not consider indirect control dependencies, in which the critical bytes for taking one branch can only be set in a basic block that is control dependent on a series of other basic blocks. These critical bytes cannot be identified unless that series of basic blocks are visited in the execution path. Existing approaches would take an unacceptably long time and computation resources to attempt multiple paths before setting these critical bytes. In other words, the search space for identifying the critical bytes cannot be effectively explored by the current mutation strategies.

In this paper, we aim to explore a new input generation strategy for satisfying a series of indirect control dependencies that can lead to deep program states. We present DSFuzz, a directed fuzzing scheme that effectively constructs inputs for exploring particular deep states. DSFuzz focuses on the deep targets reachable by only satisfying a set of indirect control dependencies. By analyzing the conditions that a deep state indirectly depends on, it can generate dependent critical bytes for taking the corresponding branches. It also rules out the control flows that are unlikely to lead to the target state. As a result, it only needs to mutate under a limited search space. DSFuzz significantly outperformed state-of-the-art directed greybox fuzzers in detecting bugs in deep program states: it detected eight new bugs that other tools failed to find.

Profile-guided System Optimizations for Accelerated Greybox Fuzzing
  • Yunhang Zhang
  • Chengbin Pang
  • Stefan Nagy
  • Xun Chen
  • Jun Xu

Greybox fuzzing is a highly popular option for security testing, incentivizing tremendous efforts to improve its performance. Prior research has brought many algorithmic advancements, leading to substantial performance growth. However, less attention has been paid to the system-level designs of greybox fuzzing tools, despite the high impacts of such designs on fuzzing throughput.

In this paper, we explore system-level optimizations for greybox fuzzing. Throughout an empirical study, we unveil two system-level optimization opportunities. First, the common fuzzing mode with a fork server visibly slows down the target execution, which can be optimized by coupling persistent mode with efficient state recovery. Second, greybox fuzzing tools rely on the native Operating System (OS) to support interactions issued by the target program, involving complex but fuzzing-irrelevant operations. Simplification of OS interactions represents another optimization opportunity.

We develop two techniques, informed by a short profiling phase of the fuzzing tool, to achieve the optimizations above. The first technique enables reliable and efficient persistent mode by learning critical execution states from the profiling and patching the target program to reset them. The second technique introduces user-space abstractions to simulate OS functionality, reducing expensive OS interactions. Evaluated with 20 programs and the MAGMA benchmark, we demonstrate that our optimizations can accelerate AFL and AFL++ for higher code coverage and faster bug finding.

NestFuzz: Enhancing Fuzzing with Comprehensive Understanding of Input Processing Logic
  • Peng Deng
  • Zhemin Yang
  • Lei Zhang
  • Guangliang Yang
  • Wenzheng Hong
  • Yuan Zhang
  • Min Yang

Fuzzing is one of the most popular and practical techniques for security analysis. In this work, we aim to address the critical problem of high-quality input generation with a novel input-aware fuzzing approach called NestFuzz. NestFuzz can universally and automatically model input format specifications and generate valid input.

The key observation behind NestFuzz is that the code semantics of the target program always highly imply the required input formats. Hence, NestFuzz applies fine-grained program analysis to understand the input processing logic, especially the dependencies across different input fields and substructures. To this end, we design a novel data structure, namely Input Processing Tree, and a new cascading dependency-aware mutation strategy to drive the fuzzing.

Our evaluation of 20 intensively-tested popular programs shows that NestFuzz is effective and practical. In comparison with the state-of-the-art fuzzers (AFL, AFLFast, AFL++, MOpt, AFLSmart, WEIZZ, ProFuzzer, and TIFF), NestFuzz achieves outperformance in terms of both code coverage and security vulnerability detection. NestFuzz finds 46 vulnerabilities that are both unique and serious. Until the moment this paper is written, 39 have been confirmed and 37 have been assigned with CVE-ids.

Lifting Network Protocol Implementation to Precise Format Specification with Security Applications
  • Qingkai Shi
  • Junyang Shao
  • Yapeng Ye
  • Mingwei Zheng
  • Xiangyu Zhang

While inferring protocol formats is critical for many security applications, existing techniques often fall short of coverage, inasmuch as almost all of them are in a fashion of dynamic analysis and driven by a limited number of network packets. If a feature is not present in the input packets, the feature will be missed in the resulting formats. To tackle this problem, we develop a novel static program analysis that infers protocol message formats from the implementation of common top-down protocol parsers. However, to achieve the trifecta of coverage, precision, and efficiency, we have to address two challenges, namely path explosion and disordered path constraints. To this end, our approach uses abstract interpretation to produce a novel data structure called the abstract format graph. The graph structure delimits precise but costly operations to only small regions, thus ensuring precision and efficiency at the same time. Our inferred formats are of high coverage and precisely specify both field boundaries and semantic constraints among packet fields. Our evaluation shows that we can infer formats for a protocol in one minute with over 95% precision and recall, much better than four baselines. Our inferred formats can substantially enhance existing protocol fuzzers, improving the coverage by 20% to 260% and discovering 53 zero-days with 47 assigned CVEs. We also provide case studies of adopting our inferred formats in network traffic auditing and network intrusion detection.

SESSION: Session 23: IoT & Embedded Security

MicPro: Microphone-based Voice Privacy Protection
  • Shilin Xiao
  • Xiaoyu Ji
  • Chen Yan
  • Zhicong Zheng
  • Wenyuan Xu

Hundreds of hours of audios are recorded and transmitted over the Internet for voice interactions such as virtual calls or speech recognitions. As these recordings are uploaded, embedded biometric information, i.e., voiceprints, is unnecessarily exposed. This paper proposes the first privacy-enhanced microphone module (i.e., MicPro) that can produce anonymous audio recordings with biometric information suppressed while preserving speech quality for human perception or linguistic content for speech recognition. Limited by the hardware capabilities of microphone modules, previous works that modify recording at the software level are inapplicable. To achieve anonymity in this scenario, MicPro transforms formants, which are distinct for each person due to the unique physiological structure of the vocal organs, and formant transformations are done by modifying the linear spectrum frequencies (LSFs) provided by a popular codec (i.e., CELP) in low-latency communications.

To strike a balance between anonymity and usability, we use a multi-objective genetic algorithm (NSGA-II) to optimize the transformation coefficients. We implement MicPro on an off-the-shelf microphone module and evaluate the performance of MicPro on several ASV systems, ASR systems, corpora, and in real-world setup. Our experiments show that for the state-of-the-art ASV systems, MicPro outperforms existing software-based strategies that utilize signal processing (SP) techniques, achieving an EER that is 5~10% higher and MMR that is 20% higher than existing works while maintaining a comparable level of usability.

TileMask: A Passive-Reflection-based Attack against mmWave Radar Object Detection in Autonomous Driving
  • Yi Zhu
  • Chenglin Miao
  • Hongfei Xue
  • Zhengxiong Li
  • Yunnan Yu
  • Wenyao Xu
  • Lu Su
  • Chunming Qiao

In autonomous driving, millimeter wave (mmWave) radar has been widely adopted for object detection because of its robustness and reliability under various weather and lighting conditions. For radar object detection, deep neural networks (DNNs) are becoming increasingly important because they are more robust and accurate, and can provide rich semantic information about the detected objects, which is critical for autonomous vehicles (AVs) to make decisions. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. Despite the rapid development of DNN-based radar object detection models, there have been no studies on their vulnerability to adversarial attacks. Although some spoofing attack methods are proposed to attack the radar sensor by actively transmitting specific signals using some special devices, these attacks require sub-nanosecond-level synchronization between the devices and the radar and are very costly, which limits their practicability in real world. In addition, these attack methods can not effectively attack DNN-based radar object detection. To address the above problems, in this paper, we investigate the possibility of using a few adversarial objects to attack the DNN-based radar object detection models through passive reflection. These objects can be easily fabricated using 3D printing and metal foils at low cost. By placing these adversarial objects at some specific locations on a target vehicle, we can easily fool the victim AV's radar object detection model. The experimental results demonstrate that the attacker can achieve the attack goal by using only two adversarial objects and conceal them as car signs, which have good stealthiness and flexibility. To the best of our knowledge, this is the first study on the passive-reflection-based attacks against the DNN-based radar object detection models using low-cost, readily-available and easily concealable geometric shaped objects.

SHERLOC: Secure and Holistic Control-Flow Violation Detection on Embedded Systems
  • Xi Tan
  • Ziming Zhao

Microcontroller-based embedded systems are often programmed in low-level languages and are vulnerable to control-flow hijacking attacks. One approach to prevent such attacks is to enforce control-flow integrity (CFI), but inlined CFI enforcement can pose challenges in embedded systems. For example, it increases binary size and changes memory layout. Trace-based control-flow violation detection (CFVD) offers an alternative that doesn't require instrumentation of the protected software or changes to its memory layout. However, existing CFVD methods used in desktop systems require kernel modifications to store and analyze the trace, which limits their use to monitoring unprivileged applications. But, embedded systems are interrupt-driven, with the majority of processing taking place in the privileged mode. Therefore, it is critical to provide a holistic and system-oriented CFVD solution that can monitor control-flow transfers both within and among privileged and unprivileged components.

In this paper, we present SHERLOC, a Secure and Holistic Control-Flow Violation Detection mechanism designed for microcontroller-based embedded systems. SHERLOC ensures security by configuring the hardware tracing unit, storing trace records, and executing the violation detection algorithm in a trusted execution environment, which prevents privileged programs from bypassing monitoring or tampering with the trace. We address the challenges of achieving holistic and system-oriented CFVD by formalizing the problem and monitoring forward and backward edges of unprivileged and privileged programs, as well as control-flow transfers among unprivileged and privileged components. Specifically, SHERLOC overcomes the challenges of identifying legitimate asynchronous interrupts and context switches at run-time by using an interrupt- and scheduling-aware violation detection algorithm. Our evaluations on the ARMv8-M architecture demonstrate the effectiveness and efficiency of SHERLOC.

Caveat (IoT) Emptor: Towards Transparency of IoT Device Presence
  • Sashidhar Jakkamsetti
  • Youngil Kim
  • Gene Tsudik

As many types of IoT devices worm their way into numerous settings and many aspects of our daily lives, awareness of their presence and functionality becomes a source of major concern. Hidden IoT devices can snoop (via sensing) on nearby unsuspecting users, and impact the environment where unaware users are present, via actuation. This prompts, respectively, privacy and security/safety issues. The dangers of hidden IoT devices have been recognized and prior research suggested some means of mitigation, mostly based on traffic analysis or using specialized hardware to uncover devices. While such approaches are partially effective, there is currently no comprehensive approach to IoT device transparency.

Prompted in part by recent privacy regulations (GDPR and CCPA), this paper1 motivates and constructs a privacy-agile Root-of-Trust architecture for IoT devices, called PAISA: <u>P</u>rivacy-<u>A</u>gile <u>I</u>oT <u>S</u>ensing and <u>A</u>ctuation. It guarantees timely and secure announcements of nearby IoT devices' presence and their capabilities. PAISA has two components: one on the IoT device that guarantees periodic announcements of its presence even if all device software is compromised, and the other on the user device, which captures and processes announcements. PAISA requires no hardware modifications; it uses a popular off-the-shelf Trusted Execution Environment (TEE) -- ARM TrustZone. To demonstrate its viability, PAISA is instantiated as an open-source prototype which includes: an IoT device that makes announcements via IEEE 802.11 WiFi beacons and an Android smartphone-based app that captures and processes announcements. Security and performance of PAISA design and its prototype are also discussed.

SESSION: Session 24: Formal Analysis of Cryptographic Protocols

CryptoBap: A Binary Analysis Platform for Cryptographic Protocols
  • Faezeh Nasrabadi
  • Robert Künnemann
  • Hamed Nemati

We introduce CryptoBap, a platform to verify weak secrecy and authentication for the (ARMv8 and RISC-V) machine code of cryptographic protocols. We achieve this by first transpiling the binary of protocols into an intermediate representation and then performing a crypto-aware symbolic execution to automatically extract a model of the protocol that represents all its execution paths. Our symbolic execution resolves indirect jumps and supports bounded loops using the loop-summarization technique, which we fully automate. The extracted model is then translated into models amenable to automated verification via ProVerif and CryptoVerif using a third-party toolchain. We prove the soundness of the proposed approach and used CryptoBap to verify multiple case studies ranging from toy examples to real-world protocols, TinySSH, an implementation of SSH, and WireGuard, a modern VPN protocol.

A Generic Methodology for the Modular Verification of Security Protocol Implementations
  • Linard Arquint
  • Malte Schwerhoff
  • Vaibhav Mehta
  • Peter Müller

Security protocols are essential building blocks of modern IT systems. Subtle flaws in their design or implementation may compromise the security of entire systems. It is, thus, important to prove the absence of such flaws through formal verification. Much existing work focuses on the verification of protocol models, which is not sufficient to show that their implementations are actually secure. Verification techniques for protocol implementations (e.g., via code generation or model extraction) typically impose severe restrictions on the used programming language and code design, which may lead to sub-optimal implementations. In this paper, we present a methodology for the modular verification of strong security properties directly on the level of the protocol implementations. Our methodology leverages state-of-the-art verification logics and tools to support a wide range of implementations and programming languages. We demonstrate its effectiveness by verifying memory safety and security of Go implementations of the Needham-Schroeder-Lowe, Diffie-Hellman key exchange, and WireGuard protocols, including forward secrecy and injective agreement for WireGuard. We also show that our methodology is agnostic to a particular language or program verifier with a prototype implementation for C.

Provably Unlinkable Smart Card-based Payments
  • Sergiu Bursuc
  • Ross Horne
  • Sjouke Mauw
  • Semen Yurkov

The most prevalent smart card-based payment method, EMV, currently offers no privacy to its users. Transaction details and the card number are sent in cleartext, enabling the profiling and tracking of cardholders. Since public awareness of privacy issues is growing and legislation, such as GDPR, is emerging, we believe it is necessary to investigate the possibility of making payments anonymous and unlikable without compromising essential security guarantees and functional properties of EMV. This paper draws attention to trade-offs between functional and privacy requirements in the design of such a protocol. We present the UTX protocol - an enhanced payment protocol satisfying such requirements, and we formally certify key security and privacy properties using techniques based on the applied π-calculus.

CheckMate: Automated Game-Theoretic Security Reasoning
  • Lea Salome Brugger
  • Laura Kovács
  • Anja Petkovic Komel
  • Sophie Rain
  • Michael Rawson

We present the CheckMate framework for full automation of game-theoretic security analysis, with particular focus on blockchain technologies. CheckMate analyzes protocols modeled as games for their game-theoretic security - that is, for incentive compatibility and Byzantine fault-tolerance. The framework either proves the protocols secure by providing defense strategies or yields all possible attack vectors. For protocols that are not secure, CheckMate can also provide weakest preconditions under which the protocol becomes secure, if they exist. CheckMate implements a sound and complete encoding of game-theoretic security in first-order linear real arithmetic, thereby reducing security analysis to satisfiability solving. CheckMate further automates efficient handling of case splitting on arithmetic terms. Experiments show CheckMate scales, analyzing games with trillions of strategies that model phases of Bitcoin's Lightning Network.

SESSION: Session 25: Zero Knowledge Proofs

Recursion over Public-Coin Interactive Proof Systems; Faster Hash Verification
  • Alexandre Belling
  • Azam Soleimanian
  • Olivier Bégassat

SNARK is a well-known family of cryptographic tools that is increasingly used in the field of computation integrity at scale. In this area, multiple works have introduced SNARK-friendly cryptographic primitives: hashing, but also encryption and signature verification. Despite all the efforts to create cryptographic primitives that can be proved faster, it remains a major performance hole in practice. In this paper, we present a recursive technique that can improve the efficiency of the prover by an order of magnitude compared to proving MiMC hashes (a SNARK-friendly hash function, Albrecht et al. 2016) with a Groth16 (Eurocrypt 2016) proof. We use GKR (a well-known public-coin argument system by Goldwasser et al., STOC 2008) to prove the integrity of hash computations and embed the GKR verifier inside a SNARK circuit. The challenge comes from the fact that GKR is a public-coin interactive protocol, and applying Fiat-Shamir naively may result in worse performance than applying existing techniques directly. This is because Fiat-Shamir itself is involved with hash computation over a large string. We take advantage of a property that SNARK schemes commonly have, to build a protocol in which the Fiat-Shamir hashes have very short inputs. The technique we present is generic and can be applied over any SNARK-friendly hash, most known SNARK schemes, and any (one-round) public-coin argument system in place of GKR. We emphasize that while our general compiler is secure in the random oracle model, our concrete instantiation (i.e., GKR plus outer SNARK) is only proved to be heuristically secure. This is due to the fact we first need to convert the GKR protocol to a one-round protocol. Thus, the random oracle of GKR, starting from the second round, is replaced with a concrete hash inside the outer layer SNARK which makes the security-proof heuristic.

Modular Sumcheck Proofs with Applications to Machine Learning and Image Processing
  • David Balbás
  • Dario Fiore
  • Maria Isabel González Vasco
  • Damien Robissout
  • Claudio Soriente

Cryptographic proof systems provide integrity, fairness, and privacy in applications that outsource data processing tasks. However, general-purpose proof systems do not scale well to large inputs. At the same time, ad-hoc solutions for concrete applications - e.g., machine learning or image processing - are more efficient but lack modularity, hence they are hard to extend or to compose with other tools of a data-processing pipeline.

In this paper, we combine the performance of tailored solutions with the versatility of general-purpose proof systems. We do so by introducing a modular framework for verifiable computation of sequential operations. The main tool of our framework is a new information-theoretic primitive called Verifiable Evaluation Scheme on Fingerprinted Data (VE) that captures the properties of diverse sumcheck-based interactive proofs, including the well-established GKR protocol. Thus, we show how to compose VEs for specific functions to obtain verifiability of a data-processing pipeline.

We propose a novel VE for convolution operations that can handle multiple input-output channels and batching, and we use it in our framework to build proofs for (convolutional) neural networks and image processing. We realize a prototype implementation of our proof systems, and show that we achieve up to 5x faster proving time and 10x shorter proofs compared to the state-of-the-art, in addition to asymptotic improvements.

Batchman and Robin: Batched and Non-batched Branching for Interactive ZK
  • Yibin Yang
  • David Heath
  • Carmit Hazay
  • Vladimir Kolesnikov
  • Muthuramakrishnan Venkitasubramaniam

Vector Oblivious Linear Evaluation (VOLE) supports fast and scalable interactive Zero-Knowledge (ZK) proofs. Despite recent improvements to VOLE-based ZK, compiling proof statements to a control-flow oblivious form (e.g., a circuit) continues to lead to expensive proofs. One useful setting where this inefficiency stands out is when the statement is a disjunction of clauses \mathcalL _1 łor \cdots łor \mathcalL _B. Typically, ZK requires paying the price to handle all B branches. Prior works have shown how to avoid this price in communication, but not in computation.

Our main result, \mathsfBatchman , is asymptotically and concretely efficient VOLE-based ZK for batched disjunctions, i.e. statements containing R repetitions of the same disjunction. This is crucial for, e.g., emulating CPU steps in ZK. Our prover and verifier complexity is only \bigO(RB+R|\C|+B|\C|), where |\C| is the maximum circuit size of the B branches. Prior works' computation scales in RB|\C|. For non-batched disjunctions, we also construct a VOLE-based ZK protocol, \mathsfRobin , which is (only) communication efficient. For small fields and for statistical security parameter łambda, this protocol's communication improves over the previous state of the art (\mathsfMac'n'Cheese , Baum et al., CRYPTO'21) by up to factor łambda.

Our implementation outperforms prior state of the art. E.g., we achieve up to 6× improvement over \mathsfMac'n'Cheese (Boolean, single disjunction), and for arithmetic batched disjunctions our experiments show we improve over \mathsfQuickSilver (Yang et al., CCS'21) by up to 70× and over \mathsfAntMan (Weng et al., CCS'22) by up to 36×.

Verifiable Mix-Nets and Distributed Decryption for Voting from Lattice-Based Assumptions
  • Diego F. Aranha
  • Carsten Baum
  • Kristian Gjøsteen
  • Tjerand Silde

Cryptographic voting protocols have recently seen much interest from practitioners due to their (planned) use in countries such as Estonia, Switzerland, France, and Australia. Practical protocols usually rely on tested designs such as the mixing-and-decryption paradigm. There, multiple servers verifiably shuffle encrypted ballots, which are then decrypted in a distributed manner. While several efficient protocols implementing this paradigm exist from discrete log-type assumptions, the situation is less clear for post-quantum alternatives such as lattices. This is because the design ideas of the discrete log-based voting protocols do not carry over easily to the lattice setting, due to specific problems such as noise growth and approximate relations.

This work proposes a new verifiable secret shuffle for BGV ciphertexts and a compatible verifiable distributed decryption protocol. The shuffle is based on an extension of a shuffle of commitments to known values which is combined with an amortized proof of correct re-randomization. The verifiable distributed decryption protocol uses noise drowning, proving the correctness of decryption steps in zero-knowledge. Both primitives are then used to instantiate the mixing-and-decryption electronic voting paradigm from lattice-based assumptions.

We give concrete parameters for our system, estimate the size of each component and provide implementations of all important sub-protocols. Our experiments show that the shuffle and decryption protocol is suitable for use in real-world e-voting schemes.

SESSION: Session 26: Federated Learning

Turning Privacy-preserving Mechanisms against Federated Learning
  • Marco Arazzi
  • Mauro Conti
  • Antonino Nocera
  • Stjepan Picek

Recently, researchers have successfully employed Graph Neural Networks (GNNs) to build enhanced recommender systems due to their capability to learn patterns from the interaction between involved entities. In addition, previous studies have investigated federated learning as the main solution to enable a native privacy-preserving mechanism for the construction of global GNN models without collecting sensitive data into a single computation unit. Still, privacy issues may arise as the analysis of local model updates produced by the federated clients can return information related to sensitive local data. For this reason, researchers proposed solutions that combine federated learning with Differential Privacy strategies and community-driven approaches, which involve combining data from neighbor clients to make the individual local updates less dependent on local sensitive data.

In this paper, we identify a crucial security flaw in such a configuration and design an attack capable of deceiving state-of-the-art defenses for federated learning. The proposed attack includes two operating modes, the first one focusing on convergence inhibition (Adversarial Mode), and the second one aiming at building a deceptive rating injection on the global federated model (Backdoor Mode). The experimental results show the effectiveness of our attack in both its modes, returning on average 60% performance detriment in all the tests on Adversarial Mode and fully effective backdoors in 93% of cases for the tests performed on Backdoor Mode.

martFL: Enabling Utility-Driven Data Marketplace with a Robust and Verifiable Federated Learning Architecture
  • Qi Li
  • Zhuotao Liu
  • Qi Li
  • Ke Xu

The development of machine learning models requires a large amount of training data. Data marketplace is a critical platform to trade high-quality and private-domain data that is not publicly available on the Internet. However, as data privacy becomes increasingly important, directly exchanging raw data becomes inappropriate. Federated Learning (FL) is a distributed machine learning paradigm that exchanges data utilities (in form of local models or gradients) among multiple parties without directly sharing the original data. However, we recognize several key challenges in applying existing FL architectures to construct a data marketplace. (i) In existing FL architectures, the Data Acquirer (DA) cannot privately assess the quality of local models submitted by different Data Providers (DPs) prior to trading; (ii)The model aggregation protocols in existing FL designs cannot effectively exclude malicious DPs without "overfitting'' to the DA's (possibly biased) root dataset; (iii) Prior FL designs lack a proper billing mechanism to enforce the DA to fairly allocate the reward according to contributions made by different DPs. To address above challenges, we propose martFL, the first federated learning architecture that is specifically designed to enable a secure utility-driven data marketplace. At a high level, martFL is empowered by two innovative designs: (i) a quality-aware model aggregation protocol that allows the DA to properly exclude local-quality or even poisonous local models from the aggregation, even if the DA's root dataset is biased; (ii) a verifiable data transaction protocol that enables the DA to prove, both succinctly and in zero-knowledge, that it has faithfully aggregated these local models according to the weights that the DA has committed to. This enables the DPs to unambiguously claim the rewards proportional to their weights/contributions. We implement a prototype of martFL and evaluate it extensively over various tasks. The results show that martFL can improve the model accuracy by up to 25% while saving up to 64% data acquisition cost.

Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks
  • Chulin Xie
  • Yunhui Long
  • Pin-Yu Chen
  • Qinbin Li
  • Sanmi Koyejo
  • Bo Li

Federated learning (FL) provides an efficient paradigm to jointly train a global model leveraging data from distributed users. As local training data comes from different users who may not be trustworthy, several studies have shown that FL is vulnerable to poisoning attacks. Meanwhile, to protect the privacy of local users, FL is usually trained in a differentially private way (DPFL). Thus, in this paper, we ask: What are the underlying connections between differential privacy and certified robustness in FL against poisoning attacks? Can we leverage the innate privacy property of DPFL to provide certified robustness for FL? Can we further improve the privacy of FL to improve such robustness certification? We first investigate both user-level and instance-level privacy of FL and provide formal privacy analysis to achieve improved instance-level privacy. We then provide two robustness certification criteria: certified prediction and certified attack inefficacy for DPFL on both user and instance levels. Theoretically, we provide the certified robustness of DPFL based on both criteria given a bounded number of adversarial users or instances. Empirically, we conduct extensive experiments to verify our theories under a range of poisoning attacks on different datasets. We find that increasing the level of privacy protection in DPFL results in stronger certified attack inefficacy; however, it does not necessarily lead to a stronger certified prediction. Thus, achieving the optimal certified prediction requires a proper balance between privacy and utility loss.

MESAS: Poisoning Defense for Federated Learning Resilient against Adaptive Attackers
  • Torsten Krauß
  • Alexandra Dmitrienko

Federated Learning (FL) enhances decentralized machine learning by safeguarding data privacy, reducing communication costs, and improving model performance with diverse data sources. However, FL faces vulnerabilities such as untargeted poisoning attacks and targeted backdoor attacks, posing challenges to model integrity and security. Preventing backdoors proves especially challenging due to their stealthy nature. Existing mitigation techniques have shown efficacy but often overlook realistic adversaries and diverse data distributions.

This work introduces the concept of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. Extensive empirical testing reveals existing defenses' vulnerability in this adversary model. We present <u>Me</u>tric-Ca<u>s</u>c<u>a</u>de<u>s</u> (MESAS), a novel defense method tailored to more realistic scenarios and adversary models. MESAS employs multiple detection metrics simultaneously to combat poisoned model updates, posing a complex multi-objective problem for adaptive attackers. In a comprehensive evaluation across nine backdoors and three datasets, MESAS outperforms existing defenses in distinguishing backdoors from data distribution-related distortions within and across clients. MESAS offers robust defense against strong adaptive adversaries in real-world data settings, with a modest average overhead of just 24.37 seconds.

SESSION: Session 27: Interoperability & 2nd Layer Solutions

Accio: Variable-Amount, Optimized-Unlinkable and NIZK-Free Off-Chain Payments via Hubs
  • Zhonghui Ge
  • Jiayuan Gu
  • Chenke Wang
  • Yu Long
  • Xian Xu
  • Dawu Gu

Payment channel hubs (PCHs) serve as a promising solution to achieving quick off-chain payments between pairs of users. They work by using an untrusted tumbler to relay the payments between the payer and payee and enjoy the advantages of low cost and high scalability. However, the most recent privacy-preserving payment channel hub solution that supports variable payment amounts suffers from limited unlinkability, e.g., being vulnerable to the abort attack. Moreover, this solution utilizes zero-knowledge proofs, which bring huge costs on both computation time and communication overhead. Therefore, how to design PCHs that support variable amount payments and unlinkability, but reduce the use of huge-cost cryptographic tools as much as possible, is significant for the large-scale practical applications of off-chain payments.

In this paper, we propose Accio, a variable amount payment channel hub solution with optimized unlinkability, by deepening research on unlinkability and constructing a new cryptographic tool. We provide the detailed Accio protocol and formally prove its security and privacy under the Universally Composable framework. Our prototype demonstrates its feasibility and the evaluation shows that Accio outperforms the other state-of-the-art works in both communication and computation costs.

CryptoConcurrency: (Almost) Consensusless Asset Transfer with Shared Accounts
  • Andrei Tonkikh
  • Pavel Ponomarev
  • Petr Kuznetsov
  • Yvonne-Anne Pignolet

A typical blockchain protocol uses consensus to make sure that mutually mistrusting users agree on the order in which their operations on shared data are executed. However, it is known that asset transfer systems, by far the most popular application of blockchains, can be implemented without consensus. Assuming that no account can be accessed concurrently and every account belongs to a single owner, one can efficiently implement an asset transfer system in a purely asynchronous, consensus-free manner. It has also been shown that implementing asset transfer with shared accounts is impossible without consensus.

In this paper, we propose CryptoConcurrency, an asset transfer protocol that allows concurrent accesses to be processed in parallel, without involving consensus, whenever possible. More precisely, if concurrent transfer operations on a given account do not lead to overspending, i.e. can all be applied without the account balance going below zero, they proceed in parallel. Otherwise, the account's owners may have to access an external consensus object. Notably, we avoid relying on a central, universally-trusted, consensus mechanism and allow each account to use its own consensus implementation, which only the owners of this account trust. This provides greater decentralization and flexibility.

TrustBoost: Boosting Trust among Interoperable Blockchains
  • Peiyao Sheng
  • Xuechao Wang
  • Sreeram Kannan
  • Kartik Nayak
  • Pramod Viswanath

Currently there exist many blockchains with weak trust guarantees, limiting applications and participation. Existing solutions to boost the trust using a stronger blockchain, e.g., via checkpointing, requires the weaker blockchain to give up sovereignty. In this paper, we propose a family of protocols in which multiple blockchains interact to create a combined ledger with boosted trust. We show that even if several of the interacting blockchains cease to provide security guarantees, the combined ledger continues to be secure - our Trustboost protocols achieve the optimal threshold of tolerating the insecure blockchains. This optimality, along with the necessity of blockchain interactions, is formally shown within the classic shared memory model, tackling the long standing open challenge of solving consensus in the presence of both Byzantine objects and processes. Furthermore, our proposed construction of Trustboost simply operates via smart contracts and require no change to the underlying consensus protocols of the participating blockchains, a form of "consensus on top of consensus''. The protocols are lightweight and can be used on specific (e.g., high value) transactions; we demonstrate the practicality by implementing and deploying Trustboost as cross-chain smart contracts in the Cosmos ecosystem using approximately 3,000 lines of Rust code, made available as open source [52]. Our evaluation shows that using 10 Cosmos chains in a local testnet, Trustboost has a gas cost of roughly $2 with a latency of 2 minutes per request, which is in line with the cost on a high security chain such as Bitcoin or Ethereum.

Interchain Timestamping for Mesh Security
  • Ertem Nusret Tas
  • Runchao Han
  • David Tse
  • Mingchao Yu

Fourteen years after the invention of Bitcoin, there has been a proliferation of many permissionless blockchains. Each such chain provides a public ledger that can be written to and read from by anyone. In this multi-chain world, a natural question arises: what is the optimal security an existing blockchain, a consumer chain, can extract by only reading and writing to k other existing blockchains, the provider chains? We design a protocol, called interchain timestamping, and show that it extracts the maximum economic security from the provider chains, as quantified by the slashable safety resilience. We observe that interchain timestamps are already provided by light-client based bridges, so interchain timestamping can be readily implemented for Cosmos chains connected by the Inter-Blockchain Communication (IBC) protocol. We compare interchain timestamping with cross-staking, the original solution to mesh security, as well as with Trustboost, another recent security sharing protocol.

SESSION: Session 28: Fuzzing II

Hopper: Interpretative Fuzzing for Libraries
  • Peng Chen
  • Yuxuan Xie
  • Yunlong Lyu
  • Yuxiao Wang
  • Hao Chen

Despite the fact that the state-of-the-art fuzzers can generate inputs efficiently, existing fuzz drivers still cannot adequately cover entries in libraries. Most of these fuzz drivers are crafted manually by developers, and their quality depends on the developers' understanding of the code. Existing works have attempted to automate the generation of fuzz drivers by learning API usage from code and execution traces. However, the generated fuzz drivers are limited to a few specific call sequences by the code being learned. To address these challenges, we present HOPPER, which can fuzz libraries without requiring any domain knowledge to craft fuzz drivers. It transforms the problem of library fuzzing into the problem of interpreter fuzzing. The interpreters linked against libraries under test can interpret the inputs that describe arbitrary API usage. To generate semantically correct inputs for the interpreter, HOPPER learns the intra-and inter-API constraints in the libraries and mutates the program with grammar awareness. We implemented HOPPER and evaluated its effectiveness on 11 real-world libraries against manually crafted fuzzers and other automatic solutions. Our results show that HOPPER greatly outperformed the other fuzzers in both code coverage and bug finding, having uncovered 25 previously unknown bugs that other fuzzers couldn't. Moreover, we have demonstrated that the proposed intra- and inter-API constraint learning methods can correctly learn constraints implied by the library and, therefore, significantly improve the fuzzing efficiency. The experiment results indicate that HOPPER is able to explore a vast range of API usages for library fuzzing out of the box.

Greybox Fuzzing of Distributed Systems
  • Ruijie Meng
  • George Pîrlea
  • Abhik Roychoudhury
  • Ilya Sergey

Grey-box fuzzing is the lightweight approach of choice for finding bugs in sequential programs. It provides a balance between efficiency and effectiveness by conducting a biased random search over the domain of program inputs using a feedback function from observed test executions. For distributed system testing, however, the state-of-practice is represented today by only black-box tools that do not attempt to infer and exploit any knowledge of the system's past behaviours to guide the search for bugs.

In this work, we present MALLORY: the first framework for grey-box fuzz-testing of distributed systems. Unlike popular black-box distributed system fuzzers, such as JEPSEN, that search for bugs by randomly injecting network partitions and node faults or by following human-defined schedules, MALLORY is adaptive. It exercises a novel metric to learn how to maximize the number of observed system behaviors by choosing different sequences of faults, thus increasing the likelihood of finding new bugs. Our approach relies on timeline-driven testing. MALLORY dynamically constructs Lamport timelines of the system behaviour and further abstracts these timelines into happens-before summaries, which serve as a feedback function guiding the fuzz campaign. Subsequently, MALLORY reactively learns a policy using Q-learning, enabling it to introduce faults guided by its real-time observation of the summaries.

We have evaluated MALLORY on a diverse set of widely-used industrial distributed systems. Compared to the start-of-the-art black-box fuzzer JEPSEN, MALLORY explores 54.27% more distinct states within 24 hours while achieving a speed-up of 2.24X. At the same time, MALLORY finds bugs 1.87X faster, thereby finding more bugs within the given time budget. MALLORY discovered 22 zero-day bugs (of which 18 were confirmed by developers), including 10 new vulnerabilities, in rigorously tested distributed systems such as Braft, Dqlite and Redis. 6 new CVEs have been assigned.

SyzDirect: Directed Greybox Fuzzing for Linux Kernel
  • Xin Tan
  • Yuan Zhang
  • Jiadong Lu
  • Xin Xiong
  • Zhuang Liu
  • Min Yang

Bug reports and patch commits are dramatically increasing for OS kernels, incentivizing a critical need for kernel-level bug reproduction and patch testing. Directed greybox fuzzing (DGF), aiming to stress-test a specific part of code, is a promising approach for bug reproduction and patch testing. However, the existing DGF methods exclusively target user-space applications, presenting intrinsic limitations in handling OS kernels. In particular, these methods cannot pinpoint the appropriate system calls and the needed syscall parameter values to reach the target location,resulting in low efficiency and waste of resources.

In this paper, we present SyzDirect, a DGF solution for the Linux kernel. With a novel, scalable static analysis of the Linux kernel, SyzDirect identifies valuable information such as correct system calls and conditions on their arguments to reach the target location. During fuzzing, SyzDirect utilizes the static analysis results to guide the generation and mutation of test cases, followed by leveraging distance-based feedback for seed prioritization and power scheduling. We evaluated SyzDirect on upstream Linux kernels for bug reproduction and patch testing. The results show that SyzDirect can reproduce 320% more bugs and reach 25.6% more target patches than generic kernel fuzzers. It also improves the speed of bug reproduction and patch reaching by a factor of 154.3 and 680.9, respectively.

PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing
  • Wen Li
  • Haoran Yang
  • Xiapu Luo
  • Long Cheng
  • Haipeng Cai

Given the widespread use of Python and its sustaining impact, the security and reliability of the Python runtime system is highly and broadly critical. Yet with real-world bugs in Python runtimes being continuously and increasingly reported, technique/tool support for automated detection of such bugs is still largely lacking. In this paper, we present PyRTFuzz, a novel fuzzing technique/tool for holistically testing Python runtimes including the language interpreter and its runtime libraries. PyRTFuzz combines generationand mutation-based fuzzing at the compiler- and application-testing level, respectively, as enabled by static/dynamic analysis for extracting runtime API descriptions, a declarative, specification language for valid and diverse Python code generation, and a custom type-guided mutation strategy for format/structure-aware application input generation. We implemented PyRTFuzz for the primary Python implementation (CPython) and applied it to three versions of the runtime. Our experiments revealed 61 new, demonstrably exploitable bugs including those in the interpreter and most in the runtime libraries. Our results also demonstrated the promising scalability and cost-effectiveness of PyRTFuzz and its great potential for further bug discovery. The two-level collaborative fuzzing methodology instantiated in PyRTFuzz may also apply to other language runtimes especially those of interpreted languages.

SESSION: Session 29: Cryptography & Side-Channels

FITS: Matching Camera Fingerprints Subject to Software Noise Pollution
  • Liu Liu
  • Xinwen Fu
  • Xiaodong Chen
  • Jianpeng Wang
  • Zhongjie Ba
  • Feng Lin
  • Li Lu
  • Kui Ren

Physically unclonable hardware fingerprints can be used for device authentication. The photo-response non-uniformity (PRNU) is the most reliable hardware fingerprint of digital cameras and can be conveniently extracted from images. However, we find image post-processing software may introduce extra noise into images. Part of this noise remains in the extracted PRNU fingerprints and is hard to be eliminated by traditional approaches, such as denoising filters. We define this noise as software noise, which pollutes PRNU fingerprints and interferes with authenticating a camera armed device. In this paper, we propose novel approaches for fingerprint matching, a critical step in device authentication, in the presence of software noise. We calculate the cross correlation between PRNU fingerprints of different cameras using a test statistic such as the Peak to Correlation Energy (PCE) so as to estimate software noise correlation. During fingerprint matching, we derive the ratio of the test statistic on two PRNU fingerprints of interest over the estimated software noise correlation. We denote this ratio as the <u>fi</u>ngerprint <u>t</u>o <u>s</u>oftware noise ratio (FITS), which allows us to detect the PRNU hardware noise correlation component in the test statistic for fingerprint matching. Extensive experiments over 10,000 images taken by more than 90 smartphones are conducted to validate our approaches, which outperform the state-of-the-art approaches significantly for polluted fingerprints. We are the first to study fingerprint matching with the existence of software noise.

LeakyOhm: Secret Bits Extraction using Impedance Analysis
  • Saleh Khalaj Monfared
  • Tahoura Mosavirik
  • Shahin Tajik

The threats of physical side-channel attacks and their countermeasures have been widely researched. Most physical side-channel attacks rely on the unavoidable influence of computation or storage on current consumption or voltage drop on a chip. Such data-dependent influence can be exploited by, for instance, power or electromagnetic analysis. In this work, we introduce a novel non-invasive physical side-channel attack, which exploits the data-dependent changes in the impedance of the chip. Our attack relies on the fact that the temporarily stored contents in registers alter the physical characteristics of the circuit, which results in changes in the die's impedance. To sense such impedance variations, we deploy a well-known RF/microwave method called scattering parameter analysis, in which we inject sine wave signals with high frequencies into the system's power distribution network (PDN) and measure the echo of the signals. We demonstrate that according to the content bits and physical location of a register, the reflected signal is modulated differently at various frequency points enabling the simultaneous and independent probing of individual registers. Such side-channel leakage challenges the t-probing security model assumption used in masking, which is a prominent side-channel countermeasure. To validate our claims, we mount non-profiled and profiled impedance analysis attacks on hardware implementations of unprotected and high-order masked AES. We show that in the case of the profiled attack, only a single trace is required to recover the secret key. Finally, we discuss how a specific class of hiding countermeasures might be effective against impedance leakage.

A Systematic Evaluation of Automated Tools for Side-Channel Vulnerabilities Detection in Cryptographic Libraries
  • Antoine Geimer
  • Mathéo Vergnolle
  • Frédéric Recoules
  • Lesly-Ann Daniel
  • Sébastien Bardin
  • Clémentine Maurice

To protect cryptographic implementations from side-channel vulnerabilities, developers must adopt constant-time programming practices. As these can be error-prone, many side-channel detection tools have been proposed. Despite this, such vulnerabilities are still manually found in cryptographic libraries. While a recent paper by Jancar et al. shows that developers rarely perform side-channel detection, it is unclear if existing detection tools could have found these vulnerabilities in the first place.

To answer this question we surveyed the literature to build a classification of 34 side-channel detection frameworks. The classification we offer compares multiple criteria, including the methods used, the scalability of the analysis or the threat model considered. We then built a unified common benchmark of representative cryptographic operations on a selection of 5 promising detection tools. This benchmark allows us to better compare the capabilities of each tool, and the scalability of their analysis. Additionally, we offer a classification of recently published side-channel vulnerabilities. We then test each of the selected tools on benchmarks reproducing a subset of these vulnerabilities as well as the context in which they appear. We find that existing tools can struggle to find vulnerabilities for a variety of reasons, mainly the lack of support for SIMD instructions, implicit flows, and internal secret generation. Based on our findings, we develop a set of recommendations for the research community and cryptographic library developers, with the goal to improve the effectiveness of side-channel detection tools.

A Thorough Evaluation of RAMBAM
  • Daniel Lammers
  • Amir Moradi
  • Nicolai Müller
  • Aein Rezaei Shahmirzadi

The application of masking, widely regarded as the most robust and reliable countermeasure against Side-Channel Analysis~(SCA) attacks, has been the subject of extensive research across a range of cryptographic algorithms, especially AES. However, the implementation cost associated with applying such a countermeasure can be significant and even in some scenarios infeasible due to considerations such as area and latency overheads, as well as the need for fresh randomness to ensure the security properties of the resulting design. Most of these overheads originate from the ability to maintain security in the presence of physical defaults such as glitches and transitions. Among several schemes with a trade-off between such overheads, RAMBAM, presented at CHES~2022, offers an ultra-low latency in terms of the number of clock cycles. It is dedicated to the AES and utilizes redundant representations of the finite field elements to enhance protection against both passive and active physical attacks.

In this paper, we have a deeper look at this technique and provide a comprehensive analysis. The original authors reported that the number of required traces to mount a successful attack increases exponentially with the size of the redundant representation. We however examine their scheme from theoretical point of view. More specifically, we investigate the relationship between RAMBAM and the well-established Boolean masking and, based on this, prove the insecurity of RAMBAM. Through the examples and use cases, we assess the leakage of the scheme in practice and use verification tools to demonstrate that RAMBAM does not necessarily offer adequate protection against SCA attacks neither in theory nor in practice. Confirmed by real-world experiments, we additionally highlight that - if no dedicated facility is incorporated - the RAMBAM designs are susceptible to fault-injection attacks despite providing some degree of protection against a sophisticated attack vector, i.e., SIFA.

SESSION: Session 30: Information Flow & Differential Privacy

A Novel Analysis of Utility in Privacy Pipelines, Using Kronecker Products and Quantitative Information Flow
  • Mário S. Alvim
  • Natasha Fernandes
  • Annabelle McIver
  • Carroll Morgan
  • Gabriel H. Nunes

We combine Kronecker products, and quantitative information flow, to give a novel formal analysis for the fine-grained verification of utility in complex privacy pipelines. The combination explains a surprising anomaly in the behaviour of utility of privacy-preserving pipelines - that sometimes a reduction in privacy results also in a decrease in utility. We use the standard measure of utility for Bayesian analysis, introduced by Ghosh at al. [1], to produce tractable and rigorous proofs of the fine-grained statistical behaviour leading to the anomaly. More generally, we offer the prospect of formal-analysis tools for utility that complement extant formal analyses of privacy. We demonstrate our results on a number of common privacy-preserving designs.

Tainted Secure Multi-Execution to Restrict Attacker Influence
  • McKenna McCall
  • Abhishek Bichhawat
  • Limin Jia

Attackers can steal sensitive user information from web pages via third-party scripts. Prior work shows that secure multi-execution (SME) with declassification is useful for mitigating such attacks, but that attackers can leverage dynamic web features to declassify more than intended. The proposed solution of disallowing events from dynamic web elements to be declassified is too restrictive to be practical; websites that declassify events from dynamic elements cannot function correctly.

In this paper, we present SMT(T), a new information flow monitor based on SME which uses taint tracking within each execution to remember what has been influenced by an attacker. The resulting monitor is more permissive than what was proposed by prior work and satisfies both knowledge- and influence-based definitions of security for confidentiality and integrity policies (respectively). We also show that robust declassification follows from our influence-based security condition, for free. Finally, we examine the performance impact of monitoring attacker influence with SME by implementing SMT(T) on top of Featherweight Firefox.

Assume but Verify: Deductive Verification of Leaked Information in Concurrent Applications
  • Toby Murray
  • Mukesh Tiwari
  • Gidon Ernst
  • David A. Naumann

We consider the problem of specifying and proving the security of non-trivial, concurrent programs that intentionally leak information. We present a method that decomposes the problem into (a) proving that the program only leaks information it has declassified via assume annotations already widely used in deductive program verification; and (b) auditing the declassifications against a declarative security policy. We show how condition (a) can be enforced by an extension of the existing program logic SecCSL, and how (b) can be checked by proving a set of simple entailments. Part of the challenge is to define respective semantic soundness criteria and to formally connect these to the logic rules and policy audit. We support our methodology in an auto-active program verifier, which we apply to verify the implementations of various case study programs against a range of declassification policies.

Deciding Differential Privacy of Online Algorithms with Multiple Variables
  • Rohit Chadha
  • A. Prasad Sistla
  • Mahesh Viswanathan
  • Bishnu Bhusal

We consider the problem of checking the differential privacy of online randomized algorithms that process a stream of inputs and produce outputs corresponding to each input. This paper generalizes an automaton model called DiP automata [10] to describe such algorithms by allowing multiple real-valued storage variables. A DiP automaton is a parametric automaton whose behavior depends on the privacy budget ∈. An automaton A will be said to be differentially private if, for some D, the automaton is D∈-differentially private for all values of ∈ > 0. We identify a precise characterization of the class of all differentially private DiP automata. We show that the problem of determining if a given DiP automaton belongs to this class is PSPACE-complete. Our PSPACE algorithm also computes a value for D when the given automaton is differentially private. The algorithm has been implemented, and experiments demonstrating its effectiveness are presented.

SESSION: Session 31: Cryptography for Blockchains

FlexiRand: Output Private (Distributed) VRFs and Application to Blockchains
  • Aniket Kate
  • Easwar Vivek Mangipudi
  • Siva Maradana
  • Pratyay Mukherjee

Web3 applications based on blockchains regularly need access to randomness that is unbiased, unpredictable, and publicly verifiable. For Web3 gaming applications, this becomes a crucial selling point to attract more users by providing credibility to the "random reward" distribution feature. A verifiable random function (VRF) protocol satisfies these requirements naturally, and there is a tremendous rise in the use of VRF services. As most blockchains cannot maintain the secret keys required for VRFs, Web3 applications interact with external VRF services via a smart contract where a VRF output is exchanged for a fee. While this smart contract-based plain-text exchange offers the much-needed public verifiability immediately, it severely limits the way the requester can employ the VRF service: the requests cannot be made in advance, and the output cannot be reused. This introduces significant latency and monetary overhead.

This work overcomes this crucial limitation of the VRF service by introducing a novel privacy primitive Output Private VRF ( Pri-VRF) and thereby adds significantly more flexibility to the Web3-based VRF services. We call our framework FlexiRand. While maintaining the pseudo-randomness and public verifiability properties of VRFs, FlexiRand ensures that the requester alone can observe the VRF output. The smart contract and anybody else can only observe a blinded-yet-verifiable version of the output. We formally define Pri-VRF, put forward a practically efficient design, and provide provable security analysis in the universal composability (UC) framework (in the random oracle model) using a variant of one-more Diffie-Hellman assumption over bilinear groups.

As the VRF service, with its ownership of the secret key, becomes a single point of failure, it is realized as a distributed VRF with the key secret-shared across distinct nodes in our framework. We develop our distributed Pri-VRF construction by combining approaches from Distributed VRF and Distributed Oblivious PRF literature. We provide provable security analysis (in UC), implement it and compare its performance with existing distributed VRF schemes. Our distributed Pri-VRF only introduces a minimal computation and communication overhead for the VRF service, the requester, and the contract.

Adaptively Secure (Aggregatable) PVSS and Application to Distributed Randomness Beacons
  • Renas Bacho
  • Julian Loss

Publicly Verifiable Secret Sharing (PVSS) is a fundamental primitive that allows to share a secret S among n parties via a publicly verifiable transcript T. Existing (efficient) PVSS are only proven secure against static adversaries who must choose who to corrupt ahead of a protocol execution. As a result, any protocol (e.g., a distributed randomness beacon) that builds on top of such a PVSS scheme inherits this limitation. To overcome this barrier, we revisit the security of PVSS under adaptive corruptions and show that, surprisingly, many protocols from the literature already achieve it in a meaningful way:

  • We propose a new security definition for aggregatable PVSS, i.e., schemes that allow to homomorphically combine multiple transcripts into one compact aggregate transcript AT that shares the sum of their individual secrets. Our notion captures that if the secret shared by AT contains at least one contribution from an honestly generated transcript, it should not be predictable. We then prove that several existing schemes satisfy this notion against adaptive corruptions in the algebraic group model.
  • To motivate our new notion, we show that it implies the adaptive security of two recent random beacon protocols, SPURT (S&P '22) and OptRand (NDSS '23), who build on top of aggregatable PVSS schemes satisfying our notion of unpredictability. For a security parameter λ, our result improves the communication complexity of the best known adaptively secure random beacon protocols to O(λn2) for synchronous networks with t < n/2 corruptions and partially synchronous networks with t < n/3 corruptions.

Short Privacy-Preserving Proofs of Liabilities
  • Francesca Falzon
  • Kaoutar Elkhiyaoui
  • Yacov Manevich
  • Angelo De Caro

In the wake of fraud scandals involving decentralized exchanges and the significant financial loss suffered by individuals, regulators are pressed to put mechanisms in place that enforce customer protections and capital requirements in decentralized ecosystems. Proof of liabilities (PoL) is such a mechanism: it allows a prover (e.g., an exchange) to prove its liability to a verifier (i.e., a customer).

This paper introduces a fully privacy-preserving PoL scheme with short proofs. We store the prover's liabilities in a novel data structure, the sparse summation Verkle tree (SSVT), in which each internal node is a hiding vector commitment of its children and whose root commits to the sum of all the leaves in the tree. We leverage inner product arguments to prove that a user's liability is included in the total liabilities of the prover without leaking any information beyond the liability's inclusion. Our construction yields proofs of size Ologn N) where n is the arity of the SSVT and N is an upper bound on the number of users. Additionally, we show how to further optimize the proof size using aggregation. We benchmark our scheme using an SSVT of size 2256 and one of size 10^9 that covers the universe of all US social security numbers.

The Locality of Memory Checking
  • Weijie Wang
  • Yujie Lu
  • Charalampos Papamanthou
  • Fan Zhang

Motivated by the extended deployment of authenticated data structures (e.g., Merkle Patricia Tries) for verifying massive amounts of data in blockchain systems, we begin a systematic study of the I/O efficiency of such systems. We first explore the fundamental limitations of memory checking, a previously-proposed abstraction for verifiable storage, in terms of its locality-a complexity measure that we introduce for the first time and is defined as the number of non-contiguous memory regions a checker must query to verifiably answer a read or a write query. Our central result is an Ω(log n/log log n) lower bound for the locality of any memory checker. Then we turn our attention to (dense and sparse) Merkle trees, one of the most celebrated memory checkers, and provide stronger lower bounds for their locality. For example, we show that any dense Merkle tree layout will have average locality at least (1/3)log n. Furthermore, if we allow node duplication, we show that if any write operation has at most polylog complexity, then the read locality cannot be less than log n/log log n. Our lower bounds help us construct two new locality-optimized authenticated data structures (DupTree and PrefixTree) which we implement and evaluate on random operations and real workloads, and which are shown to outperform traditional Merkle trees, especially as the number of leaves increases.

SESSION: Session 32: Language Models & Verification

Stealing the Decoding Algorithms of Language Models
  • Ali Naseh
  • Kalpesh Krishna
  • Mohit Iyyer
  • Amir Houmansadr

A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human evaluation. Therefore, the identity and hyperparameters of such decoding algorithms are considered to be extremely valuable to their owners. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyperparameters of its decoding algorithms at very low monetary costs. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo. We demonstrate the feasibility of stealing such information with only a few dollars, e.g., 0.8, 1, 4, and 40 for the four versions of GPT-3.

Verifiable Learning for Robust Tree Ensembles
  • Stefano Calzavara
  • Lorenzo Cazzaro
  • Giulio Ermanno Pibiri
  • Nicola Prezza

Verifying the robustness of machine learning models against evasion attacks at test time is an important research problem. Unfortunately, prior work established that this problem is NP-hard for decision tree ensembles, hence bound to be intractable for specific inputs. In this paper, we identify a restricted class of decision tree ensembles, called large-spread ensembles, which admit a security verification algorithm running in polynomial time. We then propose a new approach called verifiable learning, which advocates the training of such restricted model classes which are amenable for efficient verification. We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data, thus enabling its security verification in polynomial time. Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds, using standard commercial hardware. Moreover, large-spread ensembles are more robust than traditional ensembles against evasion attacks, at the cost of an acceptable loss of accuracy in the non-adversarial setting.

Large Language Models for Code: Security Hardening and Adversarial Testing
  • Jingxuan He
  • Martin Vechev

Large language models (large LMs) are increasingly trained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1% of the time. When we employ SVEN to perform security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3% (or degraded to 36.8%). Importantly, SVEN closely matches the original LMs in functional correctness.

Experimenting with Zero-Knowledge Proofs of Training
  • Sanjam Garg
  • Aarushi Goel
  • Somesh Jha
  • Saeed Mahloujifar
  • Mohammad Mahmoody
  • Guru-Vamsi Policharla
  • Mingyuan Wang

How can a model owner prove they trained their model according to the correct specification? More importantly, how can they do so while preserving the privacy of the underlying dataset and the final model? We study this problem and formulate the notion of zero-knowledge proof of training (zkPoT), which formalizes rigorous security guarantees that should be achieved by a privacy-preserving proof of training. While it is theoretically possible to design zkPoT for any model using generic zero-knowledge proof systems, this approach results in extremely unpractical proof generation times. Towards designing a practical solution, we propose the idea of combining techniques from MPC-in-the-head and zkSNARKs literature to strike an appropriate trade-off between proof size and proof computation time. We instantiate this idea and propose a concretely efficient, novel zkPoT protocol for logistic regression.

Crucially, our protocol is streaming-friendly and does not require RAM proportional to the size of the training circuit, hence, can be done without special hardware. We expect the techniques developed in this paper to also generally be useful for designing efficient zkPoT protocols for other, more sophisticated, ML models.

We implemented and benchmarked prover/verifier running times and proof sizes for training a logistic regression model using mini-batch gradient descent on a 4~GB dataset of 262,144 records with 1024 features. We divide our protocol into three phases: (1) data-independent offline phase (2) data-dependent phase that is independent of the model (3) online phase that depends both on the data and the model. The total proof size (across all three phases) is less than 10% of the data set size (<350 MB). In the online phase, the prover and verifier times are under 10 minutes and half a minute respectively, whereas in the data-dependent phase, they are close to one hour and a few seconds respectively.

SESSION: Session 33: Differential Privacy

Group and Attack: Auditing Differential Privacy
  • Johan Lokna
  • Anouk Paradis
  • Dimitar I. Dimitrov
  • Martin Vechev

(ε, δ) differential privacy has seen increased adoption recently, especially in private machine learning applications. While this privacy definition allows provably limiting the amount of information leaked by an algorithm, practical implementations of differentially private algorithms often contain subtle vulnerabilities. This motivates the need for effective tools that can audit (ε, δ) differential privacy algorithms before deploying them in the real world. However, existing state-of-the-art-tools for auditing (ε, δ) differential privacy directly extend the tools for ε-differential privacy by fixing either ε or δ in the violation search, inherently restricting their ability to efficiently discover violations of (ε, δ) differential privacy.

We present a novel method to efficiently discover (ε, δ) differential privacy violations based on the key insight that many (ε, δ) pairs can be grouped as they result in the same algorithm. Crucially, our method is orthogonal to existing approaches and, when combined, results in a faster and more precise violation search.

We implemented our approach in a tool called Delta-Siege and demonstrated its effectiveness by discovering vulnerabilities in most of the evaluated frameworks, several of which were previously unknown. Further, in 84% of cases, Delta-Siege outperforms existing state-of-the-art auditing tools. Finally, we show how Delta-Siege outputs can be used to find the precise root cause of vulnerabilities, an option no other differential privacy testing tool currently offers.

Interactive Proofs For Differentially Private Counting
  • Ari Biswas
  • Graham Cormode

Differential Privacy (DP) is often presented as a strong privacy-enhancing technology with broad applicability and advocated as a de facto standard for releasing aggregate statistics on sensitive data. However, in many embodiments, DP introduces a new attack surface: a malicious entity entrusted with releasing statistics could manipulate the results and use the randomness of DP as a convenient smokescreen to mask its nefariousness. Since revealing the random noise would obviate the purpose of introducing it, the miscreant may have a perfect alibi. To close this loophole, we introduce the idea of Interactive Proofs For Differential Privacy, which requires the publishing entity to output a zero knowledge proof that convinces an efficient verifier that the output is both DP and reliable. Such a definition might seem unachievable, as a verifier must validate that DP randomness was generated faithfully without learning anything about the randomness itself. We resolve this paradox by carefully mixing private and public randomness to compute verifiable DP counting queries with theoretical guarantees and show that it is also practical for real-world deployment. We also demonstrate that computational assumptions are necessary by showing a separation between information-theoretic DP and computational DP under our definition of verifiability.

Concentrated Geo-Privacy
  • Yuting Liang
  • Ke Yi

This paper proposes concentrated geo-privacy (CGP), a privacy notion that can be considered as the counterpart of concentrated differential privacy (CDP) for geometric data. Compared with the previous notion of geo-privacy [1,5], which is the counterpart of standard differential privacy, CGP offers many benefits including simplicity of the mechanism, lower noise scale in high dimensions, and better composability known as advanced composition. The last one is the most important, as it allows us to design complex mechanisms using smaller building blocks while achieving better utilities. To complement this result, we show that the previous notion of geo-privacy inherently does not admit advanced composition even using its approximate version. Next, we study three problems on private geometric data: the identity query, k nearest neighbors, and convex hulls. While the first problem has been previously studied, we give the first mechanisms for the latter two under geo-privacy. For all three problems, composability is essential in obtaining good utility guarantees on the privatized query answer.

Concurrent Composition for Interactive Differential Privacy with Adaptive Privacy-Loss Parameters
  • Samuel Haney
  • Michael Shoemate
  • Grace Tian
  • Salil Vadhan
  • Andrew Vyrros
  • Vicki Xu
  • Wanrong Zhang

In this paper, we study the concurrent composition of interactive mechanisms with adaptively chosen privacy-loss parameters. In this setting, the adversary can interleave queries to existing interactive mechanisms, as well as create new ones. We prove that every valid privacy filter and odometer for noninteractive mechanisms extends to the concurrent composition of interactive mechanisms if privacy loss is measured using (ε, δ)-DP, ƒ-DP, or Rényi DP of fixed order. Our results offer strong theoretical foundations for enabling full adaptivity in composing differentially private interactive mechanisms, showing that concurrency does not affect the privacy guarantees. We also provide an implementation for users to deploy in practice.

SESSION: Session 34: Kernel & System Calls

SysXCHG: Refining Privilege with Adaptive System Call Filters
  • Alexander J. Gaidis
  • Vaggelis Atlidakis
  • Vasileios P. Kemerlis

We present the design, implementation, and evaluation of SysXCHG: a system call (syscall) filtering enforcement mechanism that enables programs to run in accordance with the principle of least privilege. In contrast to the current, hierarchical design of seccomp-BPF, which does not allow a program to run with a different set of allowed syscalls than its descendants, SysXCHG enables applications to run with "tight" syscall filters, uninfluenced by any future-executed (sub-)programs, by allowing filters to be dynamically exchanged at runtime during execve[at]. As a part of SysXCHG, we also present xfilter: a mechanism for fast filtering using a process-specific view of the kernel's syscall table where filtering is performed. In our evaluation of SysXCHG, we found that our filter exchanging design is performant, incurring ≤= 1.71% slowdown on real-world programs in the PaSH benchmark suite, as well as effective, blocking vast amounts of extraneous functionality, including security-critical syscalls, which the current design of seccomp-BPF is unable to.

SysPart: Automated Temporal System Call Filtering for Binaries
  • Vidya Lakshmi Rajagopalan
  • Konstantinos Kleftogiorgos
  • Enes Göktas
  • Jun Xu
  • Georgios Portokalidis

Restricting the system calls available to applications reduces the attack surface of the kernel and limits the functionality available to compromised applications. Recent approaches automatically identify the system calls required by programs to block unneeded ones. For servers, they even consider different phases of execution to tighten restrictions after initialization completes. However, they require access to the source code for applications and libraries, depend on users identifying when the server transitions from initialization to serving clients, or do not account for dynamically-loaded libraries. This paper introduces SYSPART, an automatic system-call filtering system designed for binary-only server programs that addresses the above limitations. Using a novel algorithm that combines static and dynamic analysis, SYSPART identifies the serving phases of all working threads of a server. Static analysis is used to compute the system calls required during the various serving phases in a sound manner, and dynamic observations are only used to complement static resolution of dynamically-loaded libraries when necessary. We evaluated SYSPART using six popular servers on x86-64 Linux to demonstrate its effectiveness in automatically identifying serving phases, generating accurate system-call filters, and mitigating attacks. Our results show that SYSPART outperforms prior binary-only approaches and performs comparably to source-code approaches.

Hacksaw: Hardware-Centric Kernel Debloating via Device Inventory and Dependency Analysis
  • Zhenghao Hu
  • Sangho Lee
  • Marcus Peinado

Kernel debloating is a practical mechanism to mitigate the security problems of the operating system kernel by reducing its attack surface. Existing kernel debloating mechanisms focus on specializing a kernel to run a target application based on its dynamic traces collected in the past - they remove functions from the kernel which are not used by the application according to the traces. However, since the dynamic traces do not ensure full coverage, false removals of required functions are unavoidable. This paper proposes Hacksaw, a novel mechanism to debloat a kernel for a target machine based on its hardware device inventory. Hacksaw accurately debloats a kernel without false removals because figuring out which hardware components are attached to the machine as well as which device drivers manage them is comprehensive and deterministic. Hacksaw removes not only inoperative device drivers that do not control any attached hardware components but also other kernel modules and functions which are associated with the inoperative drivers according to three dependency analysis approaches: call-graph, driver-model, and compilation-unit analyses. Our evaluation shows that Hacksaw effectively removes inoperative kernel modules and functions (i.e., their respective reduction ratios are 45% and 30% on average) while ensuring validity and compatibility.

KRover: A Symbolic Execution Engine for Dynamic Kernel Analysis
  • Pansilu Pitigalaarachchi
  • Xuhua Ding
  • Haiqing Qiu
  • Haoxin Tu
  • Jiaqi Hong
  • Lingxiao Jiang

We present KRover, a novel kernel symbolic execution engine catered for dynamic kernel analysis such as vulnerability analysis and exploit generation. Different from existing symbolic execution engines, KRover operates directly upon a live kernel thread's virtual memory and weaves symbolic execution into the target's native executions. KRover is compact as it neither lifts the target binary to an intermediary representation nor uses QEMU or dynamic binary translation. Benchmarked against S2E, our performance experiments show that KRover is up to 50 times faster but with one tenth to one quarter of S2E memory cost. As shown in our four case studies, KRover is noise free, has the best-possible binary intimacy and does not require prior kernel instrumentation. Moreover, a user can develop her kernel analyzer that not only uses KRover as a symbolic execution library but also preserves its independent capabilities of reading/writing/controlling the target runtime. Namely, the resulting analyzer on top of KRover integrates symbolic reasoning and conventional dynamic analysis and reaps the benefits of their reinforcement to each other.

SESSION: Session 35: Speculative Execution & Information Flow

Gotcha! I Know What You Are Doing on the FPGA Cloud: Fingerprinting Co-Located Cloud FPGA Accelerators via Measuring Communication Links
  • Chongzhou Fang
  • Ning Miao
  • Han Wang
  • Jiacheng Zhou
  • Tyler Sheaves
  • John M. Emmert
  • Avesta Sasan
  • Houman Homayoun

In recent decades, due to the emerging requirements of computation acceleration, cloud FPGAs have become popular in public clouds. Major cloud service providers, e.g. AWS and Microsoft Azure have provided FPGA computing resources in their infrastructure and have enabled users to design and deploy their own accelerators on these FPGAs. Multi-tenancy FPGAs, where multiple users can share the same FPGA fabric with certain types of isolation to improve resource efficiency, have already been proved feasible. However, this also raises security concerns. Various types of side-channel attacks targeting multi-tenancy FPGAs have been proposed and validated. The awareness of security vulnerabilities in the cloud has motivated cloud providers to take action to enhance the security of their cloud environments.

In FPGA security research papers, researchers always perform attacks under the assumption that attackers successfully co-locate with victims and are aware of the existence of victims on the same FPGA board. However, the way to reach this point, i.e., how attackers secretly obtain information regarding accelerators on the same fabric, is constantly ignored despite the fact that it is non-trivial and important for attackers. In this paper, we present a novel fingerprinting attack to gain the types of co-located FPGA accelerators. We utilize a seemingly non-malicious benchmark accelerator to sniff the communication link and collect performance traces of the FPGA-host communication link. By analyzing these traces, we are able to achieve high classification accuracy for fingerprinting co-located accelerators, which proves that attackers can use our method to perform cloud FPGA accelerator fingerprinting with a high success rate. As far as we know, this is the first paper targeting multi-tenant FPGA accelerator fingerprinting with the communication side-channel.

iLeakage: Browser-based Timerless Speculative Execution Attacks on Apple Devices
  • Jason Kim
  • Stephan van Schaik
  • Daniel Genkin
  • Yuval Yarom

Over the past few years, the high-end CPU market is undergoing a transformational change. Moving away from using x86 as the sole architecture for high performance devices, we have witnessed the introduction of heavy-weight Arm CPUs computing devices. Among these, perhaps the most influential was the introduction of Apple's M-series architecture, aimed at completely replacing Intel CPUs in the Apple ecosystem. However, while significant effort has been invested analyzing x86 CPUs, the Apple ecosystem remains largely unexplored.

In this paper, we set out to investigate the resilience of the Apple ecosystem to speculative side-channel attacks. We first establish the basic toolkit needed for mounting side-channel attacks, such as the structure of caches and CPU speculation depth. We then tackle Apple's degradation of the timer resolution in both native and browser-based code. Remarkably, we show that distinguishing cache misses from cache hits can be done without time measurements, replacing timing based primitives with timerless counterparts based on race conditions. Finally, we use our distinguishing primitive to construct eviction sets and mount Spectre attacks, all while avoiding the use of timers.

We then evaluate Safari's side-channel resilience. We bypass the compressed 35-bit addressing and the value poisoning countermeasures, creating a primitive that can speculatively read and leak any 64-bit address within Safari's rendering process. Combining this with a new method for consolidating websites from different domains into the same renderer process, we demonstrate end-to-end attacks leaking sensitive information, such as passwords, inbox content, and locations from popular services such as Google.

Declassiflow: A Static Analysis for Modeling Non-Speculative Knowledge to Relax Speculative Execution Security Measures
  • Rutvik Choudhary
  • Alan Wang
  • Zirui Neil Zhao
  • Adam Morrison
  • Christopher W. Fletcher

Speculative execution attacks undermine the security of constant-time programming, the standard technique used to prevent microarchitectural side channels in security-sensitive software such as cryptographic code. Constant-time code must therefore also deploy a defense against speculative execution attacks to prevent leakage of secret data stored in memory or the processor registers. Unfortunately, contemporary defenses, such as speculative load hardening (SLH), can only satisfy this strong security guarantee at a very high performance cost.

This paper proposes Declassiflow, a static program analysis and protection framework to efficiently protect constant-time code from speculative leakage. Declassiflow models "attacker knowledge"-data which is inherently transmitted (or, implicitly declassified) by the code's non-speculative execution-and statically removes protection on such data from points in the program where it is already guaranteed to leak non-speculatively. Overall, Declassiflow ensures that data which never leaks during the non-speculative execution does not leak during speculative execution, but with lower overhead than conservative protections like SLH.

SpecVerilog: Adapting Information Flow Control for Secure Speculation
  • Drew Zagieboylo
  • Charles Sherk
  • Andrew C. Myers
  • G. Edward Suh

To address transient execution vulnerabilities, processor architects have proposed both defensive designs and formal descriptions of the security they provide. However, these designs are not typically formally proven to enforce the claimed guarantees; more importantly, there are few tools to automatically ensure that Register Transfer Level (RTL) descriptions are faithful to high-level designs.

In this paper, we demonstrate how to extend an existing security-typed hardware description language to express speculative security conditions and to verify the security of synthesizable implementations. Our tool can statically verify that an RTL hardware design is free of transient execution vulnerabilities without manual proof effort. Our key insight is that erasure labelsi> can be adapted both to be statically checkable and to represent transiently accessed or modified data and its mandatory erasure under misspeculation. Further, we show how to use erasure labels to defend a strong formal definition of speculative security. To validate our approach, we implement several components that are critical to speculative, out-of-order processors and are also common vectors for transient execution vulnerabilities. We show that the security of existing defenses can be correctly validated and that the absence of necessary defenses is detected as a potential vulnerability.

SESSION: Session 36: Verified Cryptographic Implementations

Formalizing, Verifying and Applying ISA Security Guarantees as Universal Contracts
  • Sander Huyghebaert
  • Steven Keuchel
  • Coen De Roover
  • Dominique Devriese

Progress has recently been made on specifying instruction set architectures (ISAs) in executable formalisms rather than through prose. However, to date, those formal specifications are limited to the functional aspects of the ISA and do not cover its security guarantees. We present a novel, general method for formally specifying an ISA's security guarantees to (1) balance the needs of ISA implementations (hardware) and clients (software), (2) can be semi-automatically verified to hold for the ISA operational semantics, producing a high-assurance mechanically-verifiable proof, and (3) support informal and formal reasoning about security-critical software in the presence of adversarial code. Our method leverages universal contracts: software contracts that express bounds on the authority of arbitrary untrusted code. Universal contracts can be kept agnostic of software abstractions, and strike the right balance between requiring sufficient detail for reasoning about software and preserving implementation freedom of ISA designers and CPU implementers. We semi-automatically verify universal contracts against Sail implementations of ISA semantics using our Katamaran tool; a semi-automatic separation logic verifier for Sail which produces machine-checked proofs for successfully verified contracts. We demonstrate the generality of our method by applying it to two ISAs that offer very different security primitives: (1) MinimalCaps: a custom-built capability machine ISA and (2) a (somewhat simplified) version of RISC-V with PMP. We verify a femtokernel using the security guarantee we have formalized for RISC-V with PMP.

Boosting the Performance of High-Assurance Cryptography: Parallel Execution and Optimizing Memory Access in Formally-Verified Line-Point Zero-Knowledge
  • Samuel Dittmer
  • Karim Eldefrawy
  • Stéphane Graham-Lengrand
  • Steve Lu
  • Rafail Ostrovsky
  • Vitor Pereira

Despite the notable advances in the development of high-assurance, verified implementations of cryptographic protocols, such implementations typically face significant performance overheads, particularly due to the penalties induced by formal verification and automated extraction of executable code. In this paper, we address some core performance challenges facing computer-aided cryptography by presenting a formal treatment for accelerating such verified implementations based on multiple generic optimizations covering parallelism and memory access. We illustrate our techniques for addressing such performance bottlenecks using the Line-Point Zero-Knowledge (LPZK) protocol as a case study. Our starting point is a new verified implementation of LPZK that we formalize and synthesize using EasyCrypt; our first implementation is developed to reduce the proof effort and without considering the performance of the extracted executable code. We then show how such (automatically) extracted code can be optimized in three different ways to obtain a 3000x speedup and thus matching the performance of the manual implementation of LPZK of lpzkv2.[13] We obtain such performance gains by first modifying the algorithmic specifications, then by adopting a provably secure parallel execution model, and finally by optimizing the memory access structures. All optimizations are first formally verified inside EasyCrypt, and then executable code is automatically synthesized from each step of the formalization. For each optimization, we analyze performance gains resulting from it and also address challenges facing the computer-aided security proofs thereof, and challenges facing automated synthesis of executable code with such an optimization.

Galápagos: Developing Verified Low Level Cryptography on Heterogeneous Hardwares
  • Yi Zhou
  • Sydney Gibson
  • Sarah Cai
  • Menucha Winchell
  • Bryan Parno

The proliferation of new hardware designs makes it difficult to produce high-performance cryptographic implementations tailored at the assembly level to each platform, let alone to prove such implementations correct. Hence we introduce Galápagos, an extensible framework designed to reduce the effort of verifying cryptographic implementations across different ISAs.

In Galápagos, a developer proves their high-level implementation strategy correct once and then bundles both strategy and proof into an abstract module. The module can then be instantiated and connected to each platform-specific implementation. Galápagos facilitates this connection by generically raising the abstraction of the targeted platforms, and via a collection of new verified libraries and tool improvements to help automate the proof process.

We validate Galápagos via multiple verified cryptographic implementations across three starkly different platforms: a 256-bit special-purpose accelerator, a 16-bit minimal ISA (the MSP430), and a standard 32-bit RISC-V CPU. Our case studies are derived from a real-world use case, the OpenTitan security chip, which is deploying our verified cryptographic code at scale.

Specification and Verification of Side-channel Security for Open-source Processors via Leakage Contracts
  • Zilong Wang
  • Gideon Mohr
  • Klaus von Gleissenthall
  • Jan Reineke
  • Marco Guarnieri

Leakage contracts have recently been proposed as a new security abstraction at the Instruction Set Architecture (ISA) level. Leakage contracts aim to capture the information that processors leak through their microarchitectural implementations. However, so far, we lack a methodology to verify that a processor actually satisfies a given leakage contract.

In this paper, we address this challenge by developing LeaVe, the first tool for verifying register-transfer-level (RTL) processor designs against ISA-level leakage contracts. To this end, we show how to decouple security and functional correctness concerns. LeaVe leverages this decoupling to make verification of contract satisfaction practical. To scale to realistic processor designs, LeaVe further employs inductive reasoning on relational abstractions. Using LeaVe, we precisely characterize the side-channel security guarantees of three open-source RISC-V processors, thereby obtaining the first proofs of contract satisfaction for RTL processor designs.

SESSION: Session 37: Multiparty Computation I

Grotto: Screaming fast (2+1)-PC or ℤ2n via (2,2)-DPFs
  • Kyle Storrier
  • Adithya Vadapalli
  • Allan Lyons
  • Ryan Henry

We introduce Grotto, a framework and C++ library for space- and time-efficient (2+1)-party piecewise polynomial (i.e., spline) evaluation on secrets additively shared over ℤ2n. Grotto improves on the state-of-the-art approaches based on distributed comparison functions (DCFs) in almost every metric, offering asymptotically superior communication and computation costs with the same or lower round complexity. At the heart of Grotto is a novel observation about the structure of the ''tree'' representation underlying the most efficient distributed point functions (DPFs) from the literature, alongside an efficient algorithm that leverages this structure to do with a lightweight DPF what state-of-the-art approaches require comparatively heavyweight DCFs to do. Our open-source Grotto implementation supports dozens of useful functions out of the box, including trigonometric and hyperbolic functions with their inverses; various logarithms; roots, reciprocals, and reciprocal roots; sign testing and bit counting; and over two dozen of the most common univariate activation functions from the deep-learning literature.

Scalable Multiparty Garbling
  • Gabrielle Beck
  • Aarushi Goel
  • Aditya Hegde
  • Abhishek Jain
  • Zhengzhong Jin
  • Gabriel Kaptchuk

Multiparty garbling is the most popular approach for constant-round secure multiparty computation (MPC). Despite being the focus of significant research effort, instantiating prior approaches to multiparty garbling results in constant-round MPC that can not realistically accommodate large numbers of parties. In this work we present the first global-scale multiparty garbling protocol. The per-party communication complexity of our protocol decreases as the number of parties participating in the protocol increases - for the first time matching the asymptotic communication complexity of non-constant round MPC protocols. Our protocol achieves malicious security in the honest-majority setting and relies on the hardness of the Learning Party with Noise assumption.

Linear Communication in Malicious Majority MPC
  • S. Dov Gordon
  • Phi Hung Le
  • Daniel McVicker

The SPDZ multiparty computation protocol \citeC:DPSZ12 allows n parties to securely compute arithmetic circuits over a finite field, while tolerating up to n-1 active corruptions. A line of work building upon SPDZ has made considerable improvement to the protocol's performance, typically focusing on concrete efficiency. However, the communication complexity of each of these protocols is Ømega(n^2|C|).

In this paper, we present a protocol that achieves \mathcalO (n|C|) communication. Our construction is very similar to those in the SPDZ family of protocols, but for one modular sub-routine for computing a verified sum. There are a handful of times in the SPDZ protocols in which the n parties wish to sum n public values. Rather than requiring each party to broadcast their input to all other parties, clearly it is cheaper to use some designated "dealer'' to compute and broadcast the sum. In prior work, it was assumed that the cost of verifying the correctness of these sums is O(n^2), erasing the benefit of using a dealer. We show how to amortize this cost over the computation of multiple sums, resulting in linear communication complexity whenever the circuit size is |C| > n.

Efficient Multiparty Probabilistic Threshold Private Set Intersection
  • Feng-Hao Liu
  • En Zhang
  • Leiyong Qin

Threshold private set intersection (TPSI) allows multiple parties to learn the intersection of their input sets only if the size of the intersection is greater than a certain threshold. This task has been demonstrated useful with practical applications, and thus many active research has been conducted. However, current solutions for TPSI are still slow for large input sets e.g., n=2^20 for the set size, and the potentially practical candidates are only secure against semi-honest adversaries. For the basic PSI, there have been efficient and scalable solutions, even in the malicious settings. It is interesting to determine whether adding a threshold feature would inherently incur a large overhead to PSI.

To bridge the gap, we introduce a new notion called probabilistic TPSI, where the parties learn the intersection \I with probability proportional to the size of \I. This functionality trades a small probability of bad events (i.e., the parties can learn insufficiently large \I) for new directions of efficient and scalable designs. As a novel technical contribution, we design an efficient multiparty probabilistic set size test protocol, which together with any standard PSI realizes the probabilistic variant in the semi-honest setting. Even though this generic approach cannot be generalized to the malicious setting, we identify nice properties from the prior OPRF-based PSI, which can be further blended with our probabilistic set size test to achieve a probabilistic TPSI protocol against malicious adversaries, with competitive efficiency and scalability.

In summary, we show (1) in the semi-honest setting, there exists a probabilistic TPSI that is essentially as efficient as a generic PSI, and (2) in the malicious setting, there is a two-party probabilistic TPSI that is essentially 2x the OPRF-based PSI of the work (Raghuraman and Rindal, CCS 2022), implying a small gap between PSI and the probabilistic TPSI. We conduct comprehensive experiments to evaluate concrete efficiency of our individual building blocks and the overall protocol. Particularly, our overall two-party protocol for the probabilistic TPSI runs within 7 or 10 seconds (online) for n=2^20 in the semi-honest and malicious settings, respectively. This confirms the practical advantages of our approach.

SESSION: Session 38: Network Security

Vulnerability Intelligence Alignment via Masked Graph Attention Networks
  • Yue Qin
  • Yue Xiao
  • Xiaojing Liao

Cybersecurity vulnerability information is often sourced from multiple channels, such as government vulnerability repositories, individually maintained vulnerability-gathering platforms, or vulnerability-disclosure email lists and forums. Integrating vulnerability information from different channels enables comprehensive threat assessment and quick deployment to various security mechanisms. However, automatic integration of vulnerability information, especially those lacking decisive information (e.g., CVE-ID), is hindered by the limitations of today's entity alignment techniques.

In our study, we annotate and release the first cybersecurity-domain vulnerability alignment dataset, and highlight the unique characteristics of security entities, including the inconsistent vulnerability artifacts of identical vulnerability (e.g., impact and affected version) in different vulnerability repositories. Based on these characteristics, we propose an entity alignment model, CEAM, for integrating vulnerability information from multiple sources. CEAM equips graph neural network-based entity alignment techniques with two application-driven mechanisms: asymmetric masked aggregation and partitioned attention. These techniques selectively aggregate vulnerability artifacts to learn the semantic embeddings for vulnerabilities by an asymmetric mask, while ensuring that the artifacts critical to the vulnerability identification are always taken more consideration. Experimental results on vulnerability alignment datasets demonstrate that CEAM significantly outperforms state-of-the-art entity alignment methods.

In Search of netUnicorn: A Data-Collection Platform to Develop Generalizable ML Models for Network Security Problems
  • Roman Beltiukov
  • Wenbo Guo
  • Arpit Gupta
  • Walter Willinger

The remarkable success of the use of machine learning-based solutions for network security problems has been impeded by the developed ML models' inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The community has recognized the critical role that training datasets play in this context and has developed various techniques to improve dataset curation to overcome this problem. Unfortunately, these methods are generally ill-suited or even counterproductive in the network security domain, where they often result in unrealistic or poor-quality datasets.

To address this issue, we propose a new closed-loop ML pipeline that leverages explainable ML tools to guide the network data collection in an iterative fashion. To ensure the data's realism and quality, we require that the new datasets should be endogenously collected in this iterative process, thus advocating for a gradual removal of data-related problems to improve model generalizability. To realize this capability, we develop a data-collection platform, netUnicorn, that takes inspiration from the classic "hourglass'' model and is implemented as its "thin waist" to simplify data collection for different learning problems from diverse network environments. The proposed system decouples data-collection intents from the deployment mechanisms and disaggregates these high-level intents into smaller reusable, self-contained tasks. We demonstrate how netUnicorn simplifies collecting data for different learning problems from multiple network environments and how the proposed iterative data collection improves a model's generalizability.

MDTD: A Multi-Domain Trojan Detector for Deep Neural Networks
  • Arezoo Rajabi
  • Surudhi Asokraj
  • Fengqing Jiang
  • Luyao Niu
  • Bhaskar Ramasubramanian
  • James Ritcey
  • Radha Poovendran

Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. Existing defenses against backdoor attacks are computationally expensive, and their success has been demonstrated primarily on image-based inputs. The increasing popularity of deploying pretrained DNNs to reduce costs of re/training large models makes defense mechanisms that aim to detect 'suspicious' input samples preferable.

In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pretrained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are located relatively farther away from a decision boundary than clean samples. MDTD estimates the distance to a decision boundary using adversarial learning methods and uses this distance to infer whether a test-time input sample is Trojaned or not.

We evaluate MDTD against state-of-the-art Trojan detection methods across five widely used image-based datasets- CIFAR100, CIFAR10, GTSRB, SVHN, and Flowers102, four graph-based datasets- AIDS, WinMal, Toxicant, and COLLAB, and the SpeechCommand audio dataset. Our results show that MDTD effectively identifies samples that contain different types of Trojan triggers. We further evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary. Although such training by the adversary reduces the detection rate of MDTD, this is accomplished at the expense of reducing classification accuracy or adversary success rate, thus rendering the resulting model unfit for use.

ProvG-Searcher: A Graph Representation Learning Approach for Efficient Provenance Graph Search
  • Enes Altinisik
  • Fatih Deniz
  • Hüsrev Taha Sencar

We present ProvG-Searcher, a novel approach for detecting known APT behaviors within system security logs. Our approach leverages provenance graphs, a comprehensive graph representation of event logs, to capture and depict data provenance relations by mapping system entities as nodes and their interactions as edges. We formulate the task of searching provenance graphs as a subgraph matching problem and employ a graph representation learning method. The central component of our search methodology involves embedding of subgraphs in a vector space where subgraph relationships can be directly evaluated. We achieve this through the use of order embeddings that simplify subgraph matching to straightforward comparisons between a query and precomputed subgraph representations. To address challenges posed by the size and complexity of provenance graphs, we propose a graph partitioning scheme and a behavior-preserving graph reduction method. Overall, our technique offers significant computational efficiency, allowing most of the search computation to be performed offline while incorporating a lightweight comparison step during query execution. Experimental results on standard datasets demonstrate that ProvG-Searcher achieves superior performance, with an accuracy exceeding 99% in detecting query behaviors and a false positive rate of approximately 0.02%, outperforming other approaches.

SESSION: Session 39: Privacy in Computation

Securely Sampling Discrete Gaussian Noise for Multi-Party Differential Privacy
  • Chengkun Wei
  • Ruijing Yu
  • Yuan Fan
  • Wenzhi Chen
  • Tianhao Wang

Differential Privacy (DP) is a widely used technique for protecting individuals' privacy by limiting what can be inferred about them from aggregate data. Recently, there have been efforts to implement DP using Secure Multi-Party Computation (MPC) to achieve high utility without the need for a trusted third party. One of the key components of implementing DP in MPC is noise sampling. Our work presents the first MPC solution for sampling discrete Gaussian, a common type of noise used for constructing DP mechanisms, which plays nicely with malicious secure MPC protocols.

Our solution is both generic, supporting various MPC protocols and any number of parties, and efficient, relying primarily on bit operations and avoiding computation with transcendental functions or non-integer arithmetic. Our experiments show that our method can generate 215 discrete Gaussian samples with a standard deviation of 20 and a security parameter of 128 in 1.5 minutes.

Detecting Violations of Differential Privacy for Quantum Algorithms
  • Ji Guan
  • Wang Fang
  • Mingyu Huang
  • Mingsheng Ying

Quantum algorithms for solving a wide range of practical problems have been proposed in the last ten years, such as data search and analysis, product recommendation, and credit scoring. The concern about privacy and other ethical issues in quantum computing naturally rises up. In this paper, we define a formal framework for detecting violations of differential privacy for quantum algorithms. A detection algorithm is developed to verify whether a (noisy) quantum algorithm is differentially private and automatically generates bugging information when the violation of differential privacy is reported. The information consists of a pair of quantum states that violate the privacy, to illustrate the cause of the violation. Our algorithm is equipped with Tensor Networks, a highly efficient data structure, and executed both on TensorFlow Quantum and TorchQuantum which are the quantum extensions of famous machine learning platforms - TensorFlow and PyTorch, respectively. The effectiveness and efficiency of our algorithm are confirmed by the experimental results of almost all types of quantum algorithms already implemented on realistic quantum computers, including quantum supremacy algorithms (beyond the capability of classical algorithms), quantum machine learning models, quantum approximate optimization algorithms, and variational quantum eigensolvers with up to 21 quantum bits.

Amplification by Shuffling without Shuffling
  • Borja Balle
  • James Bell
  • Adrià Gascón

Motivated by recent developments in the shuffle model of differential privacy, we propose a new approximate shuffling functionality called Alternating Shuffle, and provide a protocol implementing alternating shuffling in a single-server threat model where the adversary observes all communication. Unlike previous shuffling protocols in this threat model, the per-client communication of our protocol only grows sub-linearly in the number of clients. Moreover, we study the concrete efficiency of our protocol and show it can improve per-client communication by one or more orders of magnitude with respect to previous (approximate) shuffling protocols. We also show a differential privacy amplification result for alternating shuffling analogous to the one for uniform shuffling, and demonstrate that shuffling-based protocols for secure summation based a construction of Ishai et al. remain secure under the Alternating Shuffle. In the process we also develop a protocol for exact shuffling in single-server threat model with amortized logarithmic communication per-client which might be of independent interest.

HELiKs: HE Linear Algebra Kernels for Secure Inference
  • Shashank Balla
  • Farinaz Koushanfar

We introduce HELiKs, a groundbreaking framework for fast and secure matrix multiplication and 3D convolutions, tailored for privacy-preserving machine learning. Leveraging Homomorphic Encryption (HE) and Additive Secret Sharing, HELiKs enables secure matrix and vector computations while ensuring end-to-end data privacy for all parties. Key innovations of the proposed framework include an efficient multiply-accumulate (MAC) design that significantly reduces HE error growth, a partial sum accumulation strategy that cuts the number of HE rotations by a logarithmic factor, and a novel matrix encoding that facilitates faster online HE multiplications with one-time pre-computation. Furthermore, HELiKs substantially reduces the number of keys used for HE computation, leading to lower bandwidth usage during the setup phase. In our evaluation, HELiKs shows considerable performance improvements in terms of runtime and communication overheads when compared to existing secure computation methods. With our proof-of-work implementation (available on GitHub: https://github.com/shashankballa/HELiKs), we demonstrate state-of-the-art performance with up to 32x speedup for matrix multiplication and 27x speedup for 3D convolution when compared to prior art. HELiKs also reduces communication overheads by 1.5x for matrix multiplication and 29x for 3D convolution over prior works, thereby improving the efficiency of data transfer.

SESSION: Session 40: Medley

SkillScanner: Detecting Policy-Violating Voice Applications Through Static Analysis at the Development Phase
  • Song Liao
  • Long Cheng
  • Haipeng Cai
  • Linke Guo
  • Hongxin Hu

The Amazon Alexa marketplace is the largest Voice Personal Assistant (VPA) platform with over 100,000 voice applications (i.e., skills) published to the skills store. In an effort to maintain the quality and trustworthiness of voice-apps, Amazon Alexa has implemented a set of policy requirements to be adhered to by third-party skill developers. However, recent works reveal the prevalence of policy-violating skills in the current skills store. To understand the causes of policy violations in skills, we first conduct a user study with 34 third-party skill developers focusing on whether they are aware of the various policy requirements defined by the Amazon Alexa platform. Our user study results show that there is a notable gap between VPA's policy requirements and skill developers' practices. As a result, it is inevitable that policy-violating skills will be published.

To prevent the inflow of new policy-breaking skills to the skills store from the source, it is critical to identify potential policy violations at the development phase. In this work, we design and develop SkillScanner, an efficient static code analysis tool to facilitate third-party developers to detect policy violations early in the skill development lifecycle. To evaluate the performance of SkillScanner, we conducted an empirical study on 2,451 open source skills collected from GitHub. SkillScanner effectively identified 1,328 different policy violations from 786 skills. Our results suggest that 32% of these policy violations are introduced through code duplication (i.e., code copy and paste). In particular, we found that 42 skill code examples from potential Alexa's official accounts (e.g., ''alexa'' and ''alexa-samples'' on GitHub) contain policy violations, which lead to 81 policy violations in other skills due to the copy-pasted code snippets from these Alexa's code examples.

Protecting Intellectual Property of Large Language Model-Based Code Generation APIs via Watermarks
  • Zongjie Li
  • Chaozheng Wang
  • Shuai Wang
  • Cuiyun Gao

The rise of large language model-based code generation (LLCG) has enabled various commercial services and APIs. Training LLCG models is often expensive and time-consuming, and the training data are often large-scale and even inaccessible to the public. As a result, the risk of intellectual property (IP) theft over the LLCG models (e.g., via imitation attacks) has been a serious concern. In this paper, we propose the first watermark (WM) technique to protect LLCG APIs from remote imitation attacks. Our proposed technique is based on replacing tokens in an LLCG output with their "synonyms" available in the programming language. A WM is thus defined as the stealthily tweaked distribution among token synonyms in LLCG outputs. We design six WM schemes (instantiated into over 30 WM passes) which rely on conceptually distinct token synonyms available in programming languages. Moreover, to check the IP of a suspicious model (decide if it is stolen from our protected LLCG API), we propose a statistical tests-based procedure that can directly check a remote, suspicious LLCG API.

We evaluate our WM technique on LLCG models fine-tuned from two popular large language models, CodeT5 and CodeBERT. The evaluation shows that our approach is effective in both WM injection and IP check. The inserted WMs do not undermine the usage of normal users (i.e., high fidelity) and incur negligible extra cost. Moreover, our injected WMs exhibit high stealthiness and robustness against powerful attackers; even if they know all WM schemes, they can hardly remove WMs without largely undermining the accuracy of their stolen models.

Simplifying Mixed Boolean-Arithmetic Obfuscation by Program Synthesis and Term Rewriting
  • Jaehyung Lee
  • Woosuk Lee

Mixed Boolean Arithmetic (MBA) obfuscation transforms a program expression into an equivalent but complex expression that is hard to understand. MBA obfuscation has been popular to protect programs from reverse engineering thanks to its simplicity and effectiveness. However, it is also used for evading malware detection, necessitating the development of effective MBA deobfuscation techniques. Existing deobfuscation methods suffer from either of the four limitations: (1) lack of general applicability, (2) lack of flexibility, (3) lack of scalability, and (4) lack of correctness. In this paper, we propose a versatile MBA deobfuscation method that synergistically combines program synthesis, term rewriting, and an algebraic simplification method. The key novelty of our approach is that we perform on-the-fly learning of transformation rules for deobfuscation, and apply them to rewrite the input MBA expression. We implement our method in a tool called ProMBA and evaluate it on over 4000 MBA expressions obfuscated by the state-of-the-art obfuscation tools. Experimental results show that our method outperforms the state-of-the-art MBA deobfuscation tools by a large margin, successfully simplifying a vast majority of the obfuscated expressions into their original forms.

Enhancing OSS Patch Backporting with Semantics
  • Su Yang
  • Yang Xiao
  • Zhengzi Xu
  • Chengyi Sun
  • Chen Ji
  • Yuqing Zhang

Keeping open-source software (OSS) up to date is one potential solution to prevent known vulnerabilities. However, it requires frequent and costly testing and may introduce compatibility issues. Consequently, developers often choose to backport security patches to the vulnerable versions instead. Manual backporting is time-consuming, especially for large OSS such as the Linux kernel. Therefore, automating this process is urgently needed to save considerable time. Existing automated approaches for backporting patches involve either automatic patch generation or automatic patch migration. However, these methods are often ineffective and error-prone since they failed to locate the precise patch locations or generate the correct patch, operating only on the syntactic level.

In this paper, we propose a patch type-sensitive approach to automatically backport OSS security patches, guided by the patch type and patch semantics. Specifically, our approach identifies patch locations with the aid of program dependency graph-based matching at the semantic level. It further applies fine-grained patch migration and fine-tuning based on patch types. We have implemented our approach in a tool named TSBPORT and evaluated it on a large-scale dataset consisting of 1,815 pairs of real-world security patches for the Linux kernel. The evaluation results show that TSBPORT successfully backported 1,589 (87.59%) patches, out of which 587 (32.34%) could not be backported by any state-of-the-art approaches, significantly outperforming state-of-the-art approaches. In addition, experiments also show that TSBPORT can be generalized to backport patches in other OSS projects with a success rate of 88.18%.

SESSION: Session 41: Measuring Security Deployments

Evaluating the Security Posture of Real-World FIDO2 Deployments
  • Dhruv Kuchhal
  • Muhammad Saad
  • Adam Oest
  • Frank Li

FIDO2 is a suite of protocols that combines the usability of local authentication (e.g., biometrics) with the security of public-key cryptography to deliver passwordless authentication. It eliminates shared authentication secrets (i.e., passwords, which could be leaked or phished) and provides strong security guarantees assuming the benign behavior of the client-side protocol components.

However, when this assumption does not hold true, such as in the presence of malware, client authentications pose a risk that FIDO2 deployments must account for. FIDO2 provides recommendations for deployments to mitigate such situations. Yet, to date, there has been limited empirical investigation into whether deployments adopt these mitigations and what risks compromised clients present to real-world FIDO2 deployments, such as unauthorized account access or registration.

In this work, we aim to fill in the gap by: 1) systematizing the threats to FIDO2 deployments when assumptions about the client-side protocol components do not hold, 2) empirically evaluating the security posture of real-world FIDO2 deployments across the Tranco Top 1K websites, considering both the server-side and client-side perspectives, and 3) synthesizing the mitigations that the ecosystem can adopt to further strengthen the practical security provided by FIDO2. Through our investigation, we identify that compromised clients pose a practical threat to FIDO2 deployments due to weak configurations, and known mitigations exhibit critical shortcomings and/or minimal adoption. Based on our findings, we propose directions for the ecosystem to develop additional defenses into their FIDO2 deployments. Ultimately, our work aims to drive improvements to FIDO2's practical security.

Are we there yet? An Industrial Viewpoint on Provenance-based Endpoint Detection and Response Tools
  • Feng Dong
  • Shaofei Li
  • Peng Jiang
  • Ding Li
  • Haoyu Wang
  • Liangyi Huang
  • Xusheng Xiao
  • Jiedong Chen
  • Xiapu Luo
  • Yao Guo
  • Xiangqun Chen

Provenance-Based Endpoint Detection and Response (P-EDR) systems are deemed crucial for future Advanced Persistent Threats (APT) defenses. Despite the fact that numerous new techniques to improve P-EDR systems have been proposed in academia, it is still unclear whether the industry will adopt P-EDR systems and what improvements the industry desires for P-EDR systems. To this end, we conduct the first set of systematic studies on the effectiveness and the limitations of P-EDR systems. Our study consists of four components: a one-to-one interview, an online questionnaire study, a survey of the relevant literature, and a systematic measurement study. Our research indicates that all industry experts consider P-EDR systems to be more effective than conventional Endpoint Detection and Response (EDR) systems. However, industry experts are concerned about the operating cost of P-EDR systems. In addition, our research reveals three significant gaps between academia and industry (1) overlooking client-side overhead; (2) imbalancedalarm triage cost and interpretation cost; and (3) excessive server side memory consumption. This paper's findings provide objective data on the effectiveness of P-EDR systems and how much improvements are needed to adopt P-EDR systems in industry.

Don't Leak Your Keys: Understanding, Measuring, and Exploiting the AppSecret Leaks in Mini-Programs
  • Yue Zhang
  • Yuqing Yang
  • Zhiqiang Lin

Mobile mini-programs in WeChat have gained significant popularity since their debut in 2017, reaching a scale similar to that of Android apps in the Play Store. Like Google, Tencent, the provider of WeChat, offers APIs to support the development of mini-programs and also maintains a mini-program market within the WeChat app. However, mini-program APIs often manage sensitive user data within the social network platform, both on the WeChat client app and in the cloud. As a result, cryptographic protocols have been implemented to secure data access. In this paper, we demonstrate that WeChat should have required the use of the "appsecret" master key, which is used to authenticate a mini-program, to be used only in the mini-program back-end. If this key is leaked in the front-end of the mini-programs, it can lead to catastrophic attacks on both mini-program developers and users. Using a mini-program crawler and a master key leakage inspector, we measured 3,450,586 crawled mini-programs and found that 40,880 of them had leaked their master keys, allowing attackers to carry out various attacks such as account hijacking, promotion abuse, and service theft. Similar issues were confirmed through testing and measuring of Baidu mini-programs too. We have reported these vulnerabilities and the list of vulnerable mini-programs to Tencent and Baidu, which awarded us with bug bounties, and also Tencent recently released a new API to defend against these attacks based on our findings.

The Effectiveness of Security Interventions on GitHub
  • Felix Fischer
  • Jonas Höbenreich
  • Jens Grossklags

In 2017, GitHub was the first online open source platform to show security alerts to its users. It has since introduced further security interventions to help developers improve the security of their open source software. In this study, we investigate and compare the effects of these interventions. This offers a valuable empirical perspective on security interventions in the context of software development, enriching the predominantly qualitative and survey-based literature landscape with substantial data-driven insights. We conduct a time series analysis on security-altering commits covering the entire history of a large-scale sample of over 50,000 GitHub repositories to infer the causal effects of the security alert, security update, and code scanning interventions. Our analysis shows that while all of GitHub's security interventions have a significant positive effect on security, they differ greatly in their effect size. By comparing the design of each intervention, we identify the building blocks that worked well and those that did not. We also provide recommendations on how practitioners can improve the design of their interventions to enhance their effectiveness.

SESSION: Session 42: Attacking the Web

CoCo: Efficient Browser Extension Vulnerability Detection via Coverage-guided, Concurrent Abstract Interpretation
  • Jianjia Yu
  • Song Li
  • Junmin Zhu
  • Yinzhi Cao

Extensions complement web browsers with additional functionalities and also bring new vulnerability venues, allowing privilege escalations from adversarial web pages to use extension APIs. Prior works on extension vulnerability detection adopt classic static analysis, which is unable to handle dynamic JavaScript features such as those function calls as part of array lookups. At the same time, prior abstract interpretation focuses on lightweight server-side JavaScript, which often cannot scale to client-side extension code due to object explosions in the abstract domain.

In this paper, we design, implement and evaluate a novel, coverage-driven, concurrent abstract interpretation framework, called CoCo, to efficiently detect vulnerabilities in browser extensions. On one hand, CoCo parallelizes abstract interpretation with concurrent taint propagation for each branching statement, message passing and content/background scripts to detect vulnerabilities with improved scalability. On the other hand, CoCo prioritizes analysis that increases code coverage, thus further detecting more vulnerabilities. Our evaluation shows that CoCo detects at least 43 zero-day, exploitable, manually-verified extension vulnerabilities that cannot be detected by state-of-the-art works. We responsibly disclosed all the zero-day vulnerabilities to extension developers.

Finding All Cross-Site Needles in the DOM Stack: A Comprehensive Methodology for the Automatic XS-Leak Detection in Web Browsers
  • Dominik Trevor Noß
  • Lukas Knittel
  • Christian Mainka
  • Marcus Niemietz
  • Jörg Schwenk

Cross-Site Leaks (XS-Leaks) are a class of vulnerabilities that allow a web attacker to infer user state from a target web application cross-origin. Fixing XS-Leaks is a cat-and-mouse game: once a published vulnerability is fixed, a variant is discovered. To end this game, we propose a methodology to find all leak techniques for a given state-dependent resource and a set of inclusion method. We translate a website's DOM at runtime into a directed graph. We execute this translation twice, once for each state. The outputs are two slightly different graphs. We then get the set of all leak techniques by computing these two graphs' differences. The remaining nodes and edges differ between the two states, and the corresponding DOM properties and objects can be observed cross-origin.

We implemented AutoLeak, our open-source solution for automatically detecting known and yet unknown XS-Leaks in web browsers and websites. For our systematic study, we focus on XS-Leak test cases for web browsers with detectable differences induced by HTTP headers. We created and evaluated a total of 151776 test cases in Chrome, Firefox, and Safari. AutoLeak executed them automatically without human interaction and identified up to 8403 leak techniques per test case. On top, AutoLeak's systematic evaluation uncovers 5 novel classes of XS-Leaks based on leak techniques that allow detecting novel HTTP headers cross-origin. We show the applicability of our methodology on 24 web sites in the Tranco Top 50 and uncovered XS-Leaks in 20 of them.

Uncovering and Exploiting Hidden APIs in Mobile Super Apps
  • Chao Wang
  • Yue Zhang
  • Zhiqiang Lin

Mobile applications, particularly those from social media platforms such as WeChat and TikTok, are evolving into "super apps" that offer a wide range of services such as instant messaging and media sharing, e-commerce, e-learning, and e-government. These super apps often provide APIs for developers to create "miniapps" that run within the super app. These APIs should have been thoroughly scrutinized for security. Unfortunately, we find that many of them are undocumented and unsecured, potentially allowing miniapps to bypass restrictions and gain higher privileged access. To systematically identify these hidden APIs before they are exploited by attackers, we have developed a tool APIScope with both static analysis and dynamic analysis, where static analysis is used to recognize hidden undocumented APIs, and dynamic analysis is used to confirm whether the identified APIs can be invoked by an unprivileged 3rd-party miniapps. We have applied APIScope to five popular super apps (i.e., WeChat, WeCom, Baidu, QQ, and Tiktok) and found that all of them contain hidden APIs, many of which can be exploited due to missing security checks. We have also quantified the hidden APIs that may have security implications by verifying if they have access to resources protected by Android permissions. Furthermore, we demonstrate the potential security hazards by presenting various attack scenarios, including unauthorized access to any web pages, downloading and installing malicious software, and stealing sensitive information. We have reported our findings to the relevant vendors, some of whom have patched the vulnerabilities and rewarded us with bug bounties.

A Good Fishman Knows All the Angles: A Critical Evaluation of Google's Phishing Page Classifier
  • Changqing Miao
  • Jianan Feng
  • Wei You
  • Wenchang Shi
  • Jianjun Huang
  • Bin Liang

Phishing is one of the most popular cyberspace attacks. Phishing detection has been integrated into mainstream browsers to provide online protection. The phishing detector of Google Chrome reports millions of phishing attacks per week. However, it has been proven to be vulnerable to evasion attacks. Currently, Google has upgraded Chrome/Chromium's phishing detector, introducing a CNN-based image classifier. The robustness of the new-generation detector is unclear. If it can be bypassed, its billions of users will be exposed to sophisticated attackers. This paper presents a critical evaluation of Google's phishing detector by targeted evasion testing, and investigates corresponding defensive techniques. First, we propose a three-stage evasion method against the phishing image classifier. The experiments show that it can be completely bypassed with adversarial phishing pages generated using the proposed method. Meanwhile, the phishing pages still preserve their visual utility. Second, we introduce two defense techniques to enhance the phishing detection model. The results show that even using lightweight defense methods can significantly improve the model robustness. Our research reveals that Google's new-generation phishing classifier is very vulnerable to targeted evasion attacks. A sophisticated phishers can know how to fool the classifier. Billions of Chrome users are being exposed to potential phishing attacks. To improve its robustness, necessary security enhancements should be introduced.

SESSION: Session 43: Multiparty Computation II

Improved Distributed RSA Key Generation Using the Miller-Rabin Test
  • Jakob Burkhardt
  • Ivan Damgård
  • Tore Kasper Frederiksen
  • Satrajit Ghosh
  • Claudio Orlandi

Secure distributed generation of RSA moduli (e.g., generating N=pq where none of the parties learns anything about p or q) is an important cryptographic task, that is needed both in threshold implementations of RSA-based cryptosystems and in other, advanced cryptographic protocols that assume that all the parties have access to a trusted RSA modulo. In this paper, we provide a novel protocol for secure distributed RSA key generation based on the Miller-Rabin test. Compared with the more commonly used Boneh-Franklin test (which requires many iterations), the Miller-Rabin test has the advantage of providing negligible error after even a single iteration of the test for large enough moduli (e.g., 4096 bits).

From a technical point of view, our main contribution is a novel divisibility test which allows to perform the primality test in an efficient way, while keeping p and q secret.

Our semi-honest RSA generation protocol uses any underlying secure multiplication protocol in a black-box way, and our protocol can therefore be instantiated in both the honest or dishonest majority setting based on the chosen multiplication protocol. Our semi-honest protocol can be upgraded to protect against active adversaries at low cost using existing compilers. Finally, we provide an experimental evaluation showing that for the honest majority case, our protocol is much faster than Boneh-Franklin.

Towards Generic MPC Compilers via Variable Instruction Set Architectures (VISAs)
  • Yibin Yang
  • Stanislav Peceny
  • David Heath
  • Vladimir Kolesnikov

In MPC, we usually represent programs as circuits. This is a poor fit for programs that use complex control flow, as it is costly to compile control flow to circuits. This motivated prior work to emulate CPUs inside MPC. Emulated CPUs can run complex programs, but they introduce high overhead due to the need to evaluate not just the program, but also the machinery of the CPU, including fetching, decoding, and executing instructions, accessing RAM, etc.

Thus, both circuits and CPU emulation seem a poor fit for general MPC. The former cannot scale to arbitrary programs; the latter incurs high per-operation overhead.

We propose variable instruction set architectures (VISAs), an approach that inherits the best features of both circuits and CPU emulation. Unlike a CPU, a VISA machine repeatedly executes entire program fragments, not individual instructions. By considering larger building blocks, we avoid most of the machinery associated with CPU emulation: we directly handle each fragment as a circuit.

We instantiated a VISA machine via garbled circuits (GC), yielding constant-round 2PC for arbitrary assembly programs. We use improved branching (Stacked Garbling, Heath and Kolesnikov, Crypto 2020) and recent Garbled RAM (GRAM) (Heath et al., Eurocrypt 2022). Composing these securely and efficiently is intricate, and is one of our main contributions.

We implemented our approach and ran it on common programs, including Dijkstra's and Knuth-Morris-Pratt. Our 2PC VISA machine executes assembly instructions at 300Hz to 4000Hz, depending on the target program. We significantly outperform the state-of-the-art CPU-based approach (Wang et al., ESORICS 2016, whose tool we re-benchmarked on our setup). We run in constant rounds, use 6 X less bandwidth, and run more than 40 X faster on a low-latency network. With 50ms (resp. 100ms) latency, we are 898 X (resp. 1585 X) faster on the same setup.

While our focus is MPC, the VISA model also benefits CPU-emulation-based Zero-Knowledge proof compilers, such as ZEE and EZEE (Heath et al., Oakland'21 and Yang et al., EuroS&P'22).

COMBINE: COMpilation and Backend-INdependent vEctorization for Multi-Party Computation
  • Benjamin Levy
  • Muhammad Ishaq
  • Benjamin Sherman
  • Lindsey Kennard
  • Ana Milanova
  • Vassilis Zikas

Recent years have witnessed significant advances in programming technology for multi-party computation (MPC), bringing MPC closer to practice and wider applicability. Typical MPC programming frameworks focus on either front-end language design (e.g., Wysteria, Viaduct, SPDZ), or back-end protocol design and implementation (e.g., ABY, MOTION, MP-SPDZ).

We propose a methodology for an MPC compilation toolchain, which by mimicking the compilation methodology of classical compilers enables middle-end (i.e., machine-independent) optimizations, yielding significant improvements. We advance an intermediate language, which we call MPC-IR that can be viewed as the analogue of (enriched) Static Single Assignment (SSA) form. MPC-IR enables backend-independent optimizations in a close analogy to machine-independent optimizations in classical compilers. To demonstrate our approach, we focus on a specific backend-independent optimization, SIMD-vectorization: We devise a novel classical-compiler-inspired automatic SIMD-vectorization on MPC-IR. To demonstrate backend independence and quality of our optimization, we evaluate our approach with two mainstream backend frameworks that support multiple types of MPC protocols, namely MOTION and MP-SPDZ, and show significant improvements across the board.

Let's Go Eevee! A Friendly and Suitable Family of AEAD Modes for IoT-to-Cloud Secure Computation
  • Amit Singh Bhati
  • Erik Pohle
  • Aysajan Abidin
  • Elena Andreeva
  • Bart Preneel

IoT devices collect privacy-sensitive data, e.g., in smart grids or in medical devices, and send this data to cloud servers for further processing. In order to ensure confidentiality as well as authenticity of the sensor data in the untrusted cloud environment, we consider a transciphering scenario between embedded IoT devices and multiple cloud servers that perform secure multi-party computation (MPC). Concretely, the IoT devices encrypt their data with a lightweight symmetric cipher and send the ciphertext to the cloud servers. To obtain the secret shares of the cleartext message for further processing, the cloud servers engage in an MPC protocol to decrypt the ciphertext in a distributed manner. This way, the plaintext is never exposed to the individual servers.

As an important building block in this scenario, we propose a new, provably secure family of lightweight modes for authenticated encryption with associated data (AEAD), called Eevee. The Eevee family has fully parallel decryption, making it suitable for MPC protocols for which the round complexity depends on the complexity of the function they compute. Further, our modes use the lightweight forkcipher primitive that offers fixed-length output expansion and a compact yet parallelizable internal structure.

All Eevee members improve substantially over the few available state-of-the-art (SotA) MPC-friendly modes and other standard solutions. We benchmark the Eevee family on a microcontroller and in MPC. Our proposed mode Jolteon (when instantiated with ForkSkinny) provides 1.85x to 3.64x speedup in IoT-encryption time and 3x to 4.5x speedup in both MPC-decryption time and data for very short queries of 8 bytes and, 1.55x to 3.04x and 1.23x to 2.43x speedup, respectively, in MPC-decryption time and data for queries up to 500 bytes when compared against SotA MPC-friendly modes instantiated with SKINNY. We also provide two advanced modes, Umbreon and Espeon, that show a favorable performance-security trade-off with stronger security guarantees such as nonce-misuse security. Additionally, all Eevee members have full n-bit security (where n is the block size of the underlying primitive), use a single primitive and require smaller state and HW area when compared with the SotA modes under their original security settings.

SESSION: Session 44: Machine Learning, Cryptography, & Cyber-Physical Systems

On the Security of KZG Commitment for VSS
  • Atsuki Momose
  • Sourav Das
  • Ling Ren

The constant-sized polynomial commitment scheme by Kate, Zaverucha, and Goldberg (Asiscrypt 2010), also known as the KZG commitment, is an essential component in designing bandwidth-efficient verifiable secret-sharing (VSS) protocols. We point out, however, that the KZG commitment is missing two important properties that are crucial for VSS protocols.

First, the KZG commitment has not been proven to be degree binding in the standard adversary model without idealized group assumptions. In other words, the committed polynomial is not guaranteed to have the claimed degree, which is supposed to be the reconstruction threshold of VSS. Without this property, shareholders in VSS may end up reconstructing different secrets depending on which shares are used.

Second, the KZG commitment does not support polynomials with different degrees at once with a single setup. If the reconstruction threshold of the underlying VSS protocol changes, the protocol must redo the setup, which involves an expensive multi-party computation known as the powers of tau setup.

In this work, we augment the KZG commitment to address both of these limitations. Our scheme is degree-binding in the standard model under the strong Diffie-Hellman (SDH) assumption. It supports any degree 0 < d ≤ m under a powers-of-tau common reference string with m+ 1 group elements generated by a one-time setup.

Targeted Attack Synthesis for Smart Grid Vulnerability Analysis
  • Suman Maiti
  • Anjana Balabhaskara
  • Sunandan Adhikary
  • Ipsita Koley
  • Soumyajit Dey

Modern smart grids utilize advanced sensors and digital communication to manage the flow of electricity from generation source to consumption points. They also employ anomaly detection units and phasor measurement units (PMUs) for security and monitoring of grid behavior. However, as smart grids are distributed, vulnerability analysis is necessary to identify and mitigate potential security threats targeting the sensors and communication links. We propose a novel algorithm that uses measurement parameters, such as power flow or load flow, to identify the smart grid's most vulnerable operating intervals. Our methodology incorporates a Monte Carlo simulation approach to identify these intervals and deploys a deep reinforcement learning agent to generate attack vectors during the identified intervals that can compromise the grid's safety and stability in the minimum possible time, while remaining undetected by local anomaly detection units and PMUs. Our approach provides a structured methodology for effective smart grid vulnerability analysis, enabling system operators to analyze the impact of attack parameters on grid safety and stability and facilitating suitable design changes in grid topology and operational parameters.

Secure and Timely GPU Execution in Cyber-physical Systems
  • Jinwen Wang
  • Yujie Wang
  • Ning Zhang

Graphics Processing Units (GPU) are increasingly deployed on Cyber-physical Systems (CPSs), frequently used to perform real-time safety-critical functions, such as object detection on autonomous vehicles. As a result, availability is important for GPU tasks in CPS platforms. However, existing Trusted Execution Environments (TEE) solutions with availability guarantees focus only on CPU computing.

To bridge this gap, we propose AvaGPU, a TEE that guarantees real-time availability for CPU tasks involving GPU execution under compromised OS. There are three technical challenges. First, to prevent malicious resource contention due to separate scheduling of CPU and GPU tasks, we proposed a CPU-GPU co-scheduling framework that couples the priority of CPU and GPU tasks. Second, we propose software-based secure preemption on GPU tasks to bound the degree of priority inversion on GPU. Third, we propose a new split design of GPU driver with minimized Trusted Computing Base (TCB) to achieve secure and efficient GPU management for CPS. We implement a prototype of AvaGPU on the Jetson AGX Orin platform. The system is evaluated on benchmark, synthetic tasks, and real-world applications with 15.87% runtime overhead on average.

SalsaPicante: A Machine Learning Attack on LWE with Binary Secrets
  • Cathy Yuanchen Li
  • Jana Sotáková
  • Emily Wenger
  • Mohamed Malhou
  • Evrard Garcelon
  • François Charton
  • Kristin Lauter

Learning with Errors (LWE) is a hard math problem underpinning many proposed post-quantum cryptographic (PQC) systems. The only PQC Key Exchange Mechanism (KEM) standardized by NIST [13] is based on module LWE [2], and current publicly available PQ Homomorphic Encryption (HE) libraries are based on ring LWE. The security of LWE-based PQ cryptosystems is critical, but certain implementation choices could weaken them. One such choice is sparse binary secrets, desirable for PQ HE schemes for efficiency reasons. Prior work SALSA[51] demonstrated a machine learning-based attack on LWE with sparse binary secrets in small dimensions (n ≤ = 128) and low Hamming weights (h ≤ = 4). However, this attack assumes access to millions of eavesdropped LWE samples and fails at higher Hamming weights or dimensions.

We present PICANTE, an enhanced machine learning attack on LWE with sparse binary secrets, which recovers secrets in much larger dimensions (up to n=350) and with larger Hamming weights (roughly n/10, and up to h=60 for n=350). We achieve this dramatic improvement via a novel preprocessing step, which allows us to generate training data from a linear number of eavesdropped LWE samples (4n) and changes the distribution of the data to improve transformer training. We also improve the secret recovery methods of SALSA and introduce a novel cross-attention recovery mechanism allowing us to read off the secret directly from the trained models. While PICANTE does not threaten NIST's proposed LWE standards, it demonstrates significant improvement over SALSA and could scale further, highlighting the need for future investigation into machine learning attacks on LWE with sparse binary secrets.

SESSION: Session 45: Privacy in Machine Learning

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning
  • Chengkun Wei
  • Minghu Zhao
  • Zhikun Zhang
  • Min Chen
  • Wenlong Meng
  • Bo Liu
  • Yuan Fan
  • Wenzhi Chen

Differential privacy (DP), as a rigorous mathematical definition quantifying privacy leakage, has become a well-accepted standard for privacy protection. Combined with powerful machine learning (ML) techniques, differentially private machine learning (DPML) is increasingly important. As the most classic DPML algorithm, DP-SGD incurs a significant loss of utility, which hinders DPML's deployment in practice. Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss. However, these studies are isolated and cannot comprehensively measure the performance of improvements proposed in algorithms. More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability.

We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks. We first present a taxonomy of where improvements are located in the ML life cycle. Based on our taxonomy, we jointly perform an extensive measurement study of the improved DPML algorithms, over twelve algorithms, four model architectures, four datasets, two attacks, and various privacy budget configurations. We also cover state-of-the-art label differential privacy (Label DP) algorithms in the evaluation. According to our empirical results, DP can effectively defend against MIAs, and sensitivity-bounding techniques such as per-sample gradient clipping play an important role in defense. We also explore some improvements that can maintain model utility and defend against MIAs more effectively. Experiments show that Label DP algorithms achieve less utility loss but are fragile to MIAs. ML practitioners may benefit from these evaluations to select appropriate algorithms. To support our evaluation, we implement a modular re-usable software, DPMLBench,1. We open-source the tool in https://github.com/DmsKinson/DPMLBench which enables sensitive data owners to deploy DPML algorithms and serves as a benchmark tool for researchers and practitioners.

Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep Learning
  • Hanshen Xiao
  • Jun Wan
  • Srinivas Devadas

We study the fundamental problem of the construction of optimal randomization in Differential Privacy (DP). Depending on the clipping strategy or additional properties of the processing function, the corresponding sensitivity set theoretically determines the necessary randomization to produce the required security parameters. Towards the optimal utility-privacy tradeoff, finding the minimal perturbation for properly-selected sensitivity sets stands as a central problem in DP research. In practice, l2/l1-norm clippings with Gaussian/Laplace noise mechanisms are among the most common setups. However, they also suffer from the curse of dimensionality. For more generic clipping strategies, the understanding of the optimal noise for a high-dimensional sensitivity set remains limited. This raises challenges in mitigating the worst-case dimension dependence in privacy-preserving randomization, especially for deep learning applications.

In this paper, we revisit the geometry of high-dimensional sensitivity sets and present a series of results to characterize the non-asymptotically optimal Gaussian noise for Rényi DP (RDP). Our results are both negative and positive: on one hand, we show the curse of dimensionality is tight for a broad class of sensitivity sets satisfying certain symmetry properties; but if, fortunately, the representation of the sensitivity set is asymmetric on some group of orthogonal bases, we show the optimal noise bounds need not be explicitly dependent on either dimension or rank. We also revisit sampling in the high-dimensional scenario, which is the key for both privacy amplification and computation efficiency in large-scale data processing. We propose a novel method, termed twice sampling, which implements both sample-wise and coordinate-wise sampling, to enable Gaussian noises to fit the sensitivity geometry more closely. With closed-form RDP analysis, we prove twice sampling produces asymptotic improvement of the privacy amplification given an additional l∞ -norm restriction, especially for small sampling rate. We also provide concrete applications of our results on practical tasks. Through tighter privacy analysis combined with twice sampling, we efficiently train ResNet22 in low sampling rate on CIFAR10, and achieve 69.7% and 81.6% test accuracy with (ε=2,δ=10-5) and (ε=8,δ=10-5) DP guarantee, respectively.

Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation
  • Xiaochen Zhu
  • Vincent Y. F. Tan
  • Xiaokui Xiao

Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass
  • Minxin Du
  • Xiang Yue
  • Sherman S. M. Chow
  • Tianhao Wang
  • Chenyu Huang
  • Huan Sun

Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs).

We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level. It saves 3x time and memory costs compared to DP-SGD with the latest high-speed library. It also reduces the average success rates of embedding inversion and sensitive attribute inference by up to 88pp and 41pp, respectively, whereas DP-SGD fails.

SESSION: Session 46: Program Analysis & Instrumentation

Whole-Program Control-Flow Path Attestation
  • Nikita Yadav
  • Vinod Ganapathy

Path attestation is an approach to remotely attest the execution of a program ℘. In path attestation, a prover platform, which executes ℘, convinces a remote verifier V of the integrity of ℘ by recording the path that ℘ takes as it executes a particular input. While a number of prior techniques have been developed for path attestation, they have generally been applied to record paths only for parts of the execution of ℘. In this paper, we consider the problem of whole program control-flow path attestation, i.e., to attest the execution of the entire program path in ℘. We show that prior approaches for path attestation use sub-optimal techniques that fundamentally fail to scale to whole program paths, and impose a large runtime overhead on the execution of ℘. We then develop Blast, an approach that reduces these overheads using a number of novel approaches inspired by prior work from the program profiling literature. Our experiments show that Blast makes path attestation more practical for use on a wide variety of embedded programs.

Improving Security Tasks Using Compiler Provenance Information Recovered At the Binary-Level
  • Yufei Du
  • Omar Alrawi
  • Kevin Snow
  • Manos Antonakakis
  • Fabian Monrose

The complex optimizations supported by modern compilers allow for compiler provenance recovery at many levels. For instance, it is possible to identify the compiler family and optimization level used when building a binary, as well as the individual compiler passes applied to functions within the binary. Yet, many downstream applications of compiler provenance remain unexplored. To bridge that gap, we train and evaluate a multi-label compiler provenance model on data collected from over 27,000 programs built using LLVM 14, and apply the model to a number of security-related tasks. Our approach considers 68 distinct compiler passes and achieves an average F-1 score of 84.4%. We first use the model to examine the magnitude of compiler-induced vulnerabilities, identifying 53 information leak bugs in 10 popular projects. We also show that several compiler optimization passes introduce a substantial amount of functional code reuse gadgets that negatively impact security. Beyond vulnerability detection, we evaluate other security applications, including using recovered provenance information to verify the correctness of Rich header data in Windows binaries (e.g., forensic analysis), as well as for binary decomposition tasks (e.g., third party library detection).

SymGX: Detecting Cross-boundary Pointer Vulnerabilities of SGX Applications via Static Symbolic Execution
  • Yuanpeng Wang
  • Ziqi Zhang
  • Ningyu He
  • Zhineng Zhong
  • Shengjian Guo
  • Qinkun Bao
  • Ding Li
  • Yao Guo
  • Xiangqun Chen

Intel Security Guard Extensions (SGX) have shown effectiveness in critical data protection. Recent symbolic execution-based techniques reveal that SGX applications are susceptible to memory corruption vulnerabilities. While existing approaches focus on conventional memory corruption in ECalls of SGX applications, they overlook an important type of SGX dedicated vulnerability: cross-boundary pointer vulnerabilities. This vulnerability is critical for SGX applications since they heavily utilize pointers to exchange data between secure enclaves and untrusted environments. Unfortunately, none of the existing symbolic execution approaches can effectively detect cross-boundary pointer vulnerabilities due to the lack of an SGX-specific analysis model that properly handles three unique features of SGX applications: Multi-entry Arbitrary-order Execution, Stateful Execution, and Context-aware Pointers. To address such problems, we propose a new analysis model named Global State Transition Graph with Context Aware Pointers (GSTG-CAP) that simulates properties-preserving execution behaviors for SGX applications and drives symbolic execution for vulnerability detection. Based on GSTG-CAP, we build a novel symbolic execution-based vulnerability detector named SYMGX to detect cross-boundary pointer vulnerabilities. According to our evaluation, SYMGX can find 30 0-DAY vulnerabilities in 14 open-source projects, three of which have been confirmed by developers. SYMGX also outperforms two state-of-the-art tools, COIN and TeeRex, in terms of effectiveness, efficiency, and accuracy.

TypeSqueezer: When Static Recovery of Function Signatures for Binary Executables Meets Dynamic Analysis
  • Ziyi Lin
  • Jinku Li
  • Bowen Li
  • Haoyu Ma
  • Debin Gao
  • Jianfeng Ma

Control-Flow Integrity (CFI) is considered a promising solution in thwarting advanced code-reuse attacks. While the problem of backward-edge protection in CFI is nearly closed, effective forward-edge protection is still a major challenge. The keystone of protecting the forward edge is to resolve indirect call targets, which although can be done quite accurately using type-based solutions given the program source code, it faces difficulties when carried out at the binary level. Since the actual type information is unavailable in COTS binaries, type-based indirect call target matching typically resorts to approximate function signatures inferred using the arity and argument width of indirect callsites and calltargets. Doing so with static analysis, therefore, forces the existing solutions to assume the arity/width boundaries in a too-permissive way to defeat sophisticated attacks.

In this paper, we propose a novel hybrid approach to recover fine-grained function signatures at the binary level, called TypeSqueezer. By observing program behaviors dynamically, TypeSqueezer combines the static analysis results on indirect callsites and calltargets together, so that both the lower and the upper bounds of their arity/width can be computed according to a philosophy similar to the squeeze theorem. Moreover, the introduction of dynamic analysis also enables TypeSqueezer to approximate the actual type of function arguments instead of only representing them using their widths. These together allow TypeSqueezer to significantly refine the capability of indirect call target resolving, and generate the approximate CFGs with better accuracy. We have evaluated TypeSqueezer on the SPEC CPU2006 benchmarks as well as several real-world applications. The experimental results suggest that TypeSqueezer achieves higher type-matching precision compared to existing binary-level type-based solutions. Moreover, we also discuss the intrinsic limitations of static analysis and show that it is not enough in defeating certain type of practical attacks; while on the other hand, the same attacks can be successfully thwarted with the hybrid analysis result of our approach.

SESSION: Session 47: Security Professionals

"Make Them Change it Every Week!": A Qualitative Exploration of Online Developer Advice on Usable and Secure Authentication
  • Jan H. Klemmer
  • Marco Gutfleisch
  • Christian Stransky
  • Yasemin Acar
  • M. Angela Sasse
  • Sascha Fahl

Usable and secure authentication on the web and beyond is mission-critical. While password-based authentication is still widespread, users have trouble dealing with potentially hundreds of online accounts and their passwords. Alternatives or extensions such as multi-factor authentication have their own challenges and find only limited adoption. Finding the right balance between security and usability is challenging for developers. Previous work found that developers use online resources to inform security decisions when writing code. Similar to other areas, lots of authentication advice for developers is available online, including blog posts, discussions on Stack Overflow, research papers, or guidelines by institutions like OWASP or NIST.

We are the first to explore developer advice on authentication that affects usable security for end-users. Based on a survey with 18 professional web developers, we obtained 406 documents and qualitatively analyzed 272 contained pieces of advice in depth. We aim to understand the accessibility and quality of online advice and provide insights into how online advice might contribute to (in)secure and (un)usable authentication. We find that advice is scattered and that finding recommendable, consistent advice is a challenge for developers, among others. The most common advice is for password-based authentication, but little for more modern alternatives. Unfortunately, many pieces of advice are debatable (e.g., complex password policies), outdated (e.g., enforcing regular password changes), or contradicting and might lead to unusable or insecure authentication. Based on our findings, we make recommendations for developers, advice providers, official institutions, and academia on how to improve online advice for developers.

Sharing Communities: The Good, the Bad, and the Ugly
  • Thomas Geras
  • Thomas Schreck

There are many mysteries surrounding sharing communities, mainly due to their hidden workings and the complexity of joining. Nevertheless, these communities are critical to the security ecosystem, so a more profound understanding is necessary. In addition, they face challenges such as building trust, communicating effectively, and addressing social problems.

This work aims to understand better the working methods, organizational structures, goals, benefits, and challenges of sharing communities to help improve their effectiveness and efficiency. To achieve this goal, we conducted video interviews with 25 experts from different countries worldwide who participate in various types of sharing communities. In addition, we applied socio-technical systems (STS) theory in our analysis process to elaborate on our findings from the interviews, identify correlations between them, and explore the interrelationships between social and technical elements of sharing communities.

Our findings underscore the need for a holistic view of how sharing communities work. Instead of looking at individual aspects in isolation, considering the interrelationships between the different elements, especially the social, is crucial. This holistic perspective allows us to understand better the complexity and dynamics of sharing communities and how they can function effectively and efficiently. The findings of this study provide valuable impetus for the further development of sharing communities and can serve as a basis for future research.

Alert Alchemy: SOC Workflows and Decisions in the Management of NIDS Rules
  • Mathew Vermeer
  • Natalia Kadenko
  • Michel van Eeten
  • Carlos Gañán
  • Simon Parkin

Signature-based network intrusion detection systems (NIDSs) and network intrusion prevention systems (NIPSs) remain at the heart of network defense, along with the rules that enable them to detect threats. These rules allow Security Operation Centers (SOCs) to properly defend a network, yet we know almost nothing about how rules are created, evaluated and managed from an organizational standpoint. In this work, we analyze the processes surrounding the creation, management, and acquisition of rules for network intrusion detection. To understand these processes, we conducted interviews with 17 professionals who work at Managed Security Service Providers (MSSPs) or other organizations that provide network monitoring as a service or conduct their own network monitoring internally. We discovered numerous critical factors, such as rule specificity and total number of alerts and false positives, that guide SOCs in their rule management processes. These lower-level aspects of network monitoring processes have generally been regarded as immutable by prior work, which has mainly focused on designing systems that handle the resulting alert flows by dynamically reducing the number of noisy alerts SOC analysts need to sift through. Instead, we present several recommendations that address these lower-level aspects to help improve alert quality and allow SOCs to better optimize workflows and use of available resources. These recommendations include increasing the specificity of rules, explicitly defining feedback loops from detection to rule development, and setting up organizational processes to improve the transfer of tacit knowledge.

Do Users Write More Insecure Code with AI Assistants?
  • Neil Perry
  • Megha Srivastava
  • Deepak Kumar
  • Dan Boneh

AI code assistants have emerged as powerful tools that can aid in the software development life-cycle and can improve developer productivity. Unfortunately, such assistants have also been found to produce insecure code in lab environments, raising significant concerns about their usage in practice. In this paper, we conduct a user study to examine how users interact with AI code assistants to solve a variety of security related tasks. Overall, we find that participants who had access to an AI assistant wrote significantly less secure code than those without access to an assistant. Participants with access to an AI assistant were also more likely to believe they wrote secure code, suggesting that such tools may lead users to be overconfident about security flaws in their code. To better inform the design of future AI-based code assistants, we release our user-study apparatus to researchers seeking to build on our work.

SESSION: Session 48: Defending the Web

HODOR: Shrinking Attack Surface on Node.js via System Call Limitation
  • Wenya Wang
  • Xingwei Lin
  • Jingyi Wang
  • Wang Gao
  • Dawu Gu
  • Wei Lv
  • Jiashui Wang

Node.js applications are becoming more and more widely adopted on the server side, partly due to the convenience of building these applications on top of the runtime provided by popular Node.js engines and the large number of third-party packages provided by the Node Package Management (npm) registry. Node.js provides Node.js applications with system interaction capabilities using system calls. However, such convenience comes with a price, i.e., the attack surface of JavaScript arbitrary code execution (ACE) vulnerabilities is expanded to the system call level.

There lies a noticeable gap between existing protection techniques in the JavaScript code level (either by code debloating or read-write-execute permission restriction) and a targeted defense for emerging critical system call level exploitation. To fill the gap, we design and implement HODOR, a lightweight runtime protection system based on enforcing precise system call restrictions when running a Node.js application. HODOR achieved this by addressing several nontrivial technical challenges. First, HODOR requires to construct high-quality call graphs for both the Node.js application (in JavaScript) and its underlying Node.js framework (in JavaScript and C/C++). Specifically, HODOR incorporates several important optimizations in both the JavaScript and C/C++ levels to improve the state-of-the-art tools for building more precise call graphs. Then, HODOR creates the main-thread whitelist and the thread-pool whitelist respectively containing the identified necessary system calls based on the call graphs mappings. Finally, with the whitelists, HODOR implements lightweight system call restriction using the Linux kernel feature Secure Computing Mode (seccomp) to shrink the attack surface. We utilize HODOR to protect 168 real-world Node.js applications compromised by arbitrary code/command execution attacks. HODOR could reduce the attack surface to 19.42% on average with negligible runtime overhead (i.e., <3%).

ADEM: An Authentic Digital EMblem
  • Felix Linker
  • David Basin

In times of armed conflict, the emblems of the red cross, red crescent, and red crystal are used to mark physical infrastructure. This enables military units to identify assets as protected under international humanitarian law to avoid attacking them. In this paper, we tackle the novel security problem of how to extend such protection to digital, network-connected infrastructure through a digital emblem. A digital emblem has a unique combination of security requirements, namely, authentication, accountability, and a property that we call covert inspection. Covert inspection states that those wishing to authenticate assets as protected must be able to do so without revealing that they may attack unprotected entities.

In this paper, we (i) define the requirements of a digital emblem, emphasizing security requirements, (ii) present ADEM, a decentralized design that implements a digital emblem analogous to the physical emblems of the red cross, crescent, and crystal, and (iii) provide a comprehensive threat model and analysis that ADEM achieves strong security guarantees against an active network adversary.

In addition to our security analysis, ADEM was also evaluated in a series of domain expert meetings at the invitation of the International Committee of the Red Cross. We report on the feedback we received, which supports our thesis that ADEM is not just theoretically interesting but practically relevant to limit attacks on protected parties in cyberspace.

Is Modeling Access Control Worth It?
  • David Basin
  • Juan Guarnizo
  • Srđan Krstic
  • Hoang Nguyen
  • Martín Ochoa

Implementing access control policies is an error-prone task that can have severe consequences for the security of software applications. Model-driven approaches have been proposed in the literature and associated tools have been developed with the goal of reducing the complexity of this task and helping developers to produce secure software efficiently. Nevertheless, there is a lack of empirical data supporting the advantages of model-driven security approaches over code-centric approaches, which are the de-facto industry standard for software development.

In this work, we compare the result of implementing the same functional and security requirements by multiple developer groups in the context of a security engineering graduate course. We thereby obtain evidence on the security and efficiency of a tool-based model-driven approach to security from the literature compared to a direct implementation in a well-known, modern web-development framework. For example, the projects using model-driven development pass up to 50% more security tests on average with less development effort. Also, we observe that models are twice as concise as manual implementations, which improves system maintainability.

Fine-Grained Data-Centric Content Protection Policy for Web Applications
  • Zilun Wang
  • Wei Meng
  • Michael R. Lyu

The vast amount of sensitive data in modern web applications has become a prime target for cyberattacks. Existing browser security policies disallow the execution of unknown scripts, but do not restrict access to sensitive web content by 'trusted' third-party scripts. Therefore, the over-privileged third-party scripts can compromise the confidentiality and integrity of sensitive user data in the applications.

This paper proposes Content Protection Policy (CPP), a new web security mechanism for providing fine-grained confidentiality and integrity protection for sensitive client-side user data. It enables object-level protection instead of page-level protection by taking a data-centric design approach. A policy specifies the access permission of each script on individual sensitive elements. Any unauthorized access is denied by default to achieve the least privilege in the browser.

We implemented a prototype system - DOMinator - to enforce the content protection policies in the browser, and an extension - policy generator - to help web developers write basic policy rules. We thoroughly evaluated it with popular websites and show that it can effectively protect sensitive web content with a low performance overhead and great usability. CPP complements existing security mechanisms and provides web developers with a more flexible way to protect sensitive data, which can further mitigate the impact of content injection attacks and significantly improve the security of web applications.

SESSION: Session 49: Cryptographic Protocols

On the Security of Rate-limited Privacy Pass
  • Hien Chu
  • Khue Do
  • Lucjan Hanzlik

The privacy pass protocol allows users to redeem anonymously issued cryptographic tokens instead of solving annoying CAPTCHAs. The issuing authority verifies the credibility of the user, who can later use the pass while browsing the web using an anonymous or virtual private network. Hendrickson et al. proposed an IETF draft (privacypass-rate-limit-tokens-00) for a rate-limiting version of the privacy pass protocol, also called rate-limited Privacy Pass(RlP). Introducing a new actor called a mediator makes both versions inherently different. The mediator applies access policies to rate-limit users' access to the service while, at the same time, should be oblivious to the website/origin the user is trying to access. In this paper, we formally define the rate-limited Privacy Pass protocol and propose a game-based security model to capture the informal security notions introduced by Hendrickson et al.. We show a construction from simple building blocks that fulfills our security definitions and even allows for a post-quantum secure instantiation. Interestingly, the instantiation proposed in the IETF draft is a specific case of our construction. Thus, we can reuse the security arguments for the generic construction and show that the version used in practice is secure.

Passive SSH Key Compromise via Lattices
  • Keegan Ryan
  • Kaiwen He
  • George Arnold Sullivan
  • Nadia Heninger

We demonstrate that a passive network attacker can opportunistically obtain private RSA host keys from an SSH server that experiences a naturally arising fault during signature computation. In prior work, this was not believed to be possible for the SSH protocol because the signature included information like the shared Diffie-Hellman secret that would not be available to a passive network observer. We show that for the signature parameters commonly in use for SSH, there is an efficient lattice attack to recover the private key in case of a signature fault. We provide a security analysis of the SSH, IKEv1, and IKEv2 protocols in this scenario, and use our attack to discover hundreds of compromised keys in the wild from several independently vulnerable implementations.

Stealth Key Exchange and Confined Access to the Record Protocol Data in TLS 1.3
  • Marc Fischlin

We show how to embed a covert key exchange sub protocol within a regular TLS 1.3 execution, generating a stealth key in addition to the regular session keys. The idea, which has appeared in the literature before, is to use the exchanged nonces to transport another key value. Our contribution is to give a rigorous model and analysis of the security of such embedded key exchanges, requiring that the stealth key remains secure even if the regular key is under adversarial control. Specifically for our stealth version of the TLS 1.3 protocol we show that this extra key is secure in this setting under the common assumptions about the TLS protocol.

As an application of stealth key exchange we discuss sanitizable channel protocols, where a designated party can partly access and modify payload data in a channel protocol. This may be, for instance, an intrusion detection system monitoring the incoming traffic for malicious content and putting suspicious parts in quarantine. The noteworthy feature, inherited from the stealth key exchange part, is that the sender and receiver can use the extra key to still communicate securely and covertly within the sanitizable channel, e.g., by pre-encrypting confidential parts and making only dedicated parts available to the sanitizer. We discuss how such sanitizable channels can be implemented with authenticated encryption schemes like GCM or ChaChaPoly. In combination with our stealth key exchange protocol, we thus derive a full-fledged sanitizable connection protocol, including key establishment, which perfectly complies with regular TLS 1.3 traffic on the network level. We also assess the potential effectiveness of the approach for the intrusion detection system Snort.

ELEKTRA: Efficient Lightweight multi-dEvice Key TRAnsparency
  • Julia Len
  • Melissa Chase
  • Esha Ghosh
  • Daniel Jost
  • Balachandar Kesavan
  • Antonio Marcedone

Key Transparency (KT) systems enable service providers of end-to-end encrypted communication (E2EE) platforms to maintain a Verifiable Key Directory (VKD) that maps each user's identifier, such as a username or email address, to their identity public key(s). Users periodically monitor the directory to ensure their own identifier maps to the correct keys, thus detecting any attempt to register a fake key on their behalf to Meddler-in-the-Middle (MitM) their communications.

We introduce and formalize a new primitive called Multi-Device Verifiable Key Directory (MVKD), which strengthens both the security, privacy, and usability guarantees of VKD by leveraging the multi-device setting. We formalize three properties for a MVKD (completeness, extraction-based soundness, and privacy), striking a non-trivial balance between strong guarantees and the limitations imposed by a truly practical system. We then present a new MVKD system called ELEKTRA. This system combines the core of the Keybase KT system (running in production since 2014) with ideas from SEEMless (Chase et. al., 2019) and RZKS (Chen et. al., 2022). Our construction is the first to achieve the above multi-device guarantees while having formal security and privacy proofs. Finally, we implement ELEKTRA and present benchmarks demonstrating its practicality.

SESSION: Session 50: Homomorphic Encryption II

HE3DB: An Efficient and Elastic Encrypted Database Via Arithmetic-And-Logic Fully Homomorphic Encryption
  • Song Bian
  • Zhou Zhang
  • Haowen Pan
  • Ran Mao
  • Zian Zhao
  • Yier Jin
  • Zhenyu Guan

As concerns are increasingly raised about data privacy, encrypted database management system (DBMS) based on fully homomorphic encryption (FHE) attracts increasing research attention, as FHE permits DBMS to be directly outsourced to cloud servers without revealing any plaintext data. However, the real-world deployment of FHE-based DBMS faces two main challenges: i) high computational latency, and ii) lack of elastic query processing capability, both of which stem from the inherent limitations of the underlying FHE operators. Here, we introduce HE3DB, a fully homomorphically encrypted, efficient and elastic DBMS framework based on a new FHE infrastructure. By proposing and integrating new arithmetic and logic homomorphic operators, we devise fast and high-precision homomorphic comparison and aggregation algorithms that enable a variety of SQL queries to be applied over FHE ciphertexts, e.g., compound filter-aggregation, sorting, grouping, and joining. In addition, in contrast to existing encrypted DBMS that only support aggregated information retrieval, our framework permits further server-side elastic analytical processing over the queried FHE ciphertexts, such as private decision tree evaluation. In the experiment, we rigorously study the efficiency and flexibility of HE3DB. We show that, compared to the state-of-the-art techniques, HE3DB can homomorphically evaluate end-to-end SQL queries as much as 41X-299X faster than the state-of-the-art solution, completing a TPC-H query over a 16-bit 10K-row database within 241 seconds.

Level Up: Private Non-Interactive Decision Tree Evaluation using Levelled Homomorphic Encryption
  • Rasoul Akhavan Mahdavi
  • Haoyan Ni
  • Dimitry Linkov
  • Florian Kerschbaum

As machine learning as a service continues gaining popularity, concerns about privacy and intellectual property arise. Users often hesitate to disclose their private information to obtain a service, while service providers aim to protect their proprietary models. Decision trees, a widely used machine learning model, are favoured for their simplicity, interpretability, and ease of training. In this context, Private Decision Tree Evaluation (PDTE) enables a server holding a private decision tree to provide predictions based on a client's private attributes. The protocol is such that the server learns nothing about the client's private attributes. Similarly, the client learns nothing about the server's model besides the prediction and some hyperparameters.

In this paper, we propose two novel non-interactive PDTE protocols, XXCMP-PDTE and RCC-PDTE, based on two new non-interactive comparison protocols, XXCMP and RCC. Our evaluation of these comparison operators demonstrates that our proposed constructions can efficiently evaluate high-precision numbers. Specifically, RCC can compare 32-bit numbers in under 10 milliseconds.

We assess our proposed PDTE protocols on decision trees trained over UCI datasets and compare our results with existing work in the field. Moreover, we evaluate synthetic decision trees to showcase scalability, revealing that RCC-PDTE can evaluate a decision tree with over 1000 nodes and 16 bits of precision in under 2 seconds. In contrast, the current state-of-the-art requires over 10 seconds to evaluate such a tree with only 11 bits of precision.

Fast Unbalanced Private Set Union from Fully Homomorphic Encryption
  • Binbin Tu
  • Yu Chen
  • Qi Liu
  • Cong Zhang

Private set union (PSU) allows two parties to compute the union of their sets without revealing anything else. It has been widely used in various applications. While several computationally efficient PSU protocols have been developed for the balanced case, they have a potential limitation in their communication complexity, which grows (super)-linearly with the size of the larger set. This poses a challenge when performing PSU in the unbalanced setting, where one party is a constrained device holding a small set, and another is a service provider holding a large set.

In this work, we propose a generic construction of unbalanced PSU from leveled fully homomorphic encryption and a newly introduced protocol called permuted matrix private equality test. By instantiating the generic construction, we obtain two unbalanced PSU protocols whose communication complexity is linear in the size of the smaller set, and logarithmic in the larger set.

We implement our protocols. Experiments demonstrate that our protocols outperform all previous protocols in the unbalanced setting. The larger difference between the sizes of two sets, the better our protocols perform. For input sets of sizes 210 and 220 with items of length 128 bits, our PSU requires only 2.767 MB of communication. Compared with the state-of-the-art PSU proposed by Zhang et al. (USENIX Security 2023), there are 37 X shrink in communication and roughly 10 - 35 X speedup in the running time depending on the network environments.

Efficient Multiplicative-to-Additive Function from Joye-Libert Cryptosystem and Its Application to Threshold ECDSA
  • Haiyang Xue
  • Man Ho Au
  • Mengling Liu
  • Kwan Yin Chan
  • Handong Cui
  • Xiang Xie
  • Tsz Hon Yuen
  • Chengru Zhang

Threshold ECDSA receives interest lately due to its widespread adoption in blockchain applications. A common building block of all leading constructions involves a secure conversion of multiplicative shares into additive ones, which is called the multiplicative-to-additive (MtA) function. MtA dominates the overall complexity of all existing threshold ECDSA constructions. Specifically, O(n2) invocations of MtA are required in the case of n active signers. Hence, improvement of MtA leads directly to significant improvements for all state-of-the-art threshold ECDSA schemes.

In this paper, we design a novel MtA by revisiting the Joye-Libert (JL) cryptosystem. Specifically, we revisit JL encryption and propose a JL-based commitment, then give efficient zero-knowledge proofs for JL cryptosystem which are the first to have standard soundness. Our new MtA offers the best time-space complexity trade-off among all existing MtA constructions. It outperforms state-of-the-art constructions from Paillier by a factor of 1.85 to 2 in bandwidth and 1.2 to 1.7 in computation. It is 7X faster than those based on Castagnos-Laguillaumie encryption only at the cost of 2X more bandwidth. While our MtA is slower than OT-based constructions, it saves 18.7X in bandwidth requirement. In addition, we also design a batch version of MtA to further reduce the amortised time and space cost by another 25%.

SESSION: Session 51: Privacy in Systems

Splice: Efficiently Removing a User's Data from In-memory Application State
  • Xueyuan Han
  • James Mickens
  • Siddhartha Sen

Splice is a new programming framework that allows security-conscious applications to efficiently locate and delete a user's in-memory state. The core technical challenge is determining how to delete a user's memory values without breaking application-specific semantic invariants involving the memory state of remaining users. Splice solves this problem using three techniques: taint tracking (which traces how a user's data flows through memory), deletion by synthesis (which overwrites each user-owned memory value in place, replacing it with a value that preserves the symbolic constraints of enclosing data structures), and a novel type system (which forces applications to employ defensive programming to avoid computing over synthesize-deleted values in unsafe ways). Using four realistic applications that we ported to Splice, we show that Splice's type system and defensive programming requirements are not onerous for developers. We also demonstrate that Splice's run-time overheads are similar to those of prior taint tracking systems, while enabling strong deletion semantics.

Leakage-Abuse Attacks Against Forward and Backward Private Searchable Symmetric Encryption
  • Lei Xu
  • Leqian Zheng
  • Chengzhi Xu
  • Xingliang Yuan
  • Cong Wang

Dynamic searchable symmetric encryption (DSSE) enables a server to efficiently search and update over encrypted files. To minimize the leakage during updates, a security notion named forward and backward privacy is expected for newly proposed DSSE schemes. Those schemes are generally constructed in a way to break the linkability across search and update queries to a given keyword. However, it remains underexplored whether forward and backward private DSSE is resilient against practical leakage-abuse attacks (LAAs), where an attacker attempts to recover query keywords from the leakage passively collected during queries.

In this paper, we aim to be the first to answer this question firmly through two non-trivial efforts. First, we revisit the spectrum of forward and backward private DSSE schemes over the past few years, and unveil some inherent constructional limitations in most schemes. Those limitations allow attackers to exploit query equality and establish a guaranteed linkage among different (refreshed) query tokens surjective to a candidate keyword. Second, we refine volumetric leakage profiles of updates and queries by associating each with a specific operation. By further exploiting update volume and query response volume, we demonstrate that all forward and backward private DSSE schemes can leak the same volumetric information (e.g., insertion volume, deletion volume) as those without such security guarantees. To testify our findings, we realize two generic LAAs, i.e., frequency matching attack and volumetric inference attack, and we evaluate them over various experimental settings in the dynamic context. Finally, we call for new efficient schemes to protect query equality and volumetric information across search and update queries.

Using Range-Revocable Pseudonyms to Provide Backward Unlinkability in the Edge
  • Cláudio Correia
  • Miguel Correia
  • Luís Rodrigues

In this paper we propose a novel abstraction that we have named Range-Revocable Pseudonyms (RRPs). RRPs are a new class of pseudonyms whose validity can be revoked for any time-range within its original validity period. The key feature of RRPs is that the information provided to revoke a pseudonym for a given time-range cannot be linked with the information provided when using the pseudonym outside the revoked range. We provide an algorithm to implement RRPs using efficient cryptographic primitives where the space complexity of the pseudonym is constant, regardless of the granularity of the revocation range, and the space complexity of the revocation information only grows logarithmically with the granularity; this makes the use of RRPs far more efficient than the use of many short-lived pseudonyms. We have used RRPs to design EDGAR, an access control system for VANET scenarios that offers backward unlinkability. The experimental evaluation of EDGAR shows that, when using RRPs, the revocation can be performed efficiently (even when using time slots as small as 1 second) and that users can authenticate with low latency (0.5-3.5ms ).

Shufflecake: Plausible Deniability for Multiple Hidden Filesystems on Linux
  • Elia Anzuoni
  • Tommaso Gagliardoni

We present Shufflecake, a new plausible deniability design to hide the existence of encrypted data on a storage medium making it very difficult for an adversary to prove the existence of such data. Shufflecake can be considered a "spiritual successor'' of tools such as TrueCrypt and VeraCrypt, but vastly improved: it works natively on Linux, it supports any filesystem of choice, and can manage multiple volumes per device, so to make deniability of the existence of hidden partitions really plausible. Compared to ORAM-based solutions, Shufflecake is extremely fast and simpler but does not offer native protection against multi-snapshot adversaries. However, we discuss security extensions that are made possible by its architecture, and we show evidence why these extensions might be enough to thwart more powerful adversaries. We implemented Shufflecake as an in-kernel tool for Linux, adding useful features, and we benchmarked its performance showing only a minor slowdown compared to a base encrypted system. We believe Shufflecake represents a useful tool for people whose freedom of expression is threatened by repressive authorities or dangerous criminal organizations, in particular: whistleblowers, investigative journalists, and activists for human rights in oppressive regimes.

SESSION: Session 52: Attacks & Malware

Take Over the Whole Cluster: Attacking Kubernetes via Excessive Permissions of Third-party Applications
  • Nanzi Yang
  • Wenbo Shen
  • Jinku Li
  • Xunqi Liu
  • Xin Guo
  • Jianfeng Ma

As the dominant container orchestration system, Kubernetes is widely used by many companies and cloud vendors. It runs third-party add-ons and applications (termed third-party apps) on its control plane to manage the whole cluster. The security of these third-party apps is critical to the whole cluster but has not been systematically studied so far.

Therefore, this paper analyzes the security of third-party apps and reveals that third-party apps are granted excessive critical permissions, which can be exploited by an attacker to escape from the worker node and take over the whole Kubernetes cluster. Even worse, excessive permissions of different third-party apps can be chained together to turn non-critical issues into severe attack vectors. To systematically analyze the exploitability of excessive permissions, we design three strategies based on different attacking paths. These three strategies can steal the cluster admin permission with the DaemonSet of a third-party app directly, or via the same app's or another app's critical component indirectly.

We investigate the security impact of excessive permission attacks in real production environments. We analyze all third-party apps in CNCF and show that 51 of 153 (33.3%) ones have potential security risks. We further scan Kubernetes services provided by the top four cloud vendors. The results show that all of them are vulnerable to excessive permission attacks. We report all our findings to the corresponding teams and get eight new CVEs from communities and a security bounty from Google.

Lost along the Way: Understanding and Mitigating Path-Misresolution Threats to Container Isolation
  • Zhi Li
  • Weijie Liu
  • XiaoFeng Wang
  • Bin Yuan
  • Hongliang Tian
  • Hai Jin
  • Shoumeng Yan

Filesystem isolation enforced by today's container technology has been found to be less effective in the presence of host-container interactions increasingly utilized by container tools. This weakened isolation has led to a type of path misresolution (Pamir) vulnerabilities, which have been considered to be highly risky and continuously reported over the years. In this paper, we present the first systematic study on the Pamir risk and the existing fixes to related vulnerabilities. Our research reveals that in spite of significant efforts being made to patch vulnerable container tools and address the risk, the Pamir vulnerabilities continue to be discovered, including a new vulnerability (CVE-2023-0778) we rediscovered from patched software. A key insight of our study is that the Pamir risk is inherently hard to prevent at the level of container tools, due to their heavy reliance on third-party components. While security inspections should be applied to all components to mediate host-container interactions, third-party component developers tend to believe that container tools should perform security checks before invoking their components, and are therefore reluctant to patch their code with the container-specific protection. Moreover, due to the large number of components today's container tools depend on, re-implementing all of them is impractical.

Our study shows that kernel-based filesystem isolation is the only way to ensure isolation always in place during host-container interactions. In our research, we design and implement the first such an approach that extends the filesystem isolation to dentry objects, by enforcing access control on host-container interactions through the filesystem. Our design addresses the fundamental limitation of one-way isolation characterizing today's container, uses carefully-designed policies to ensure accurate and comprehensive interaction control, and implants the protection into the right kernel location to minimize the performance impact. We verify our approach using model checking, which demonstrates its effectiveness in eliminating the Pamir risk. Our evaluation further shows that our approach incurs negligible overheads, vastly outperforming all existing Pamir patches, and maintains compatibility with all mainstream container tools. We have released our code and filed a request to incorporate our technique into the Linux kernel.

PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer Detection
  • Shijia Li
  • Jiang Ming
  • Pengda Qiu
  • Qiyuan Chen
  • Lanqing Liu
  • Huaifeng Bao
  • Qiang Wang
  • Chunfu Jia

Binary packing, a widely-used program obfuscation style, compresses or encrypts the original program and then recovers it at runtime. Packed malware samples are pervasive---they conceal arresting code features as unintelligible data to evade detection. To rapidly respond to large-scale packed malware, security analysts search specific binary patterns to identify corresponding packers. The quality of such packer patterns or signatures is vital to malware dissection. However, existing packer signature rules severely rely on human analysts' experience. In addition to expensive manual efforts, these human-written rules (e.g., YARA) also suffer from high false positives: as they are designed to search the pattern of bytes rather than instructions, they are very likely to mismatch with unexpected instructions.

In this paper, we look into the weakness of existing packer detection signatures and propose a novel automatic YARA rule generation technique, called PackGenome. Inspired by the biological concept of species-specific genes, we observe that packer-specific genes can help determine whether a program is packed. Our framework generates new YARA rules from packer-specific genes, which are extracted from the unpacking routines reused in the same-packer protected programs. To reduce false positives, we propose a byte selection strategy to systematically evaluate the mismatch possibility of bytes. We compare PackGenome with public-available packer signature collections and a state-of-the-art automatic rule generation tool. Our large-scale experiments with more than 640K samples demonstrate that PackGenome can deliver robust YARA rules to detect Windows and Linux packers, including emerging low-entropy packers. PackGenome outperforms existing work in all cases with zero false negatives, low false positives, and a negligible scanning overhead increase.

RetSpill: Igniting User-Controlled Data to Burn Away Linux Kernel Protections
  • Kyle Zeng
  • Zhenpeng Lin
  • Kangjie Lu
  • Xinyu Xing
  • Ruoyu Wang
  • Adam Doupé
  • Yan Shoshitaishvili
  • Tiffany Bao

Leveraging a control flow hijacking primitive (CFHP) to gain root privileges is critical to attackers striving to exploit Linux kernel vulnerabilities. Such attack has become increasingly elusive as security researchers propose capable kernel security mitigations, leading to the development of complex (and, as a trade-off, brittle and unreliable) attack techniques to regain it. In this paper, we obviate the need for complexity by proposing RetSpill, a powerful yet elegant exploitation technique that employs user space data already present on the kernel stack for privilege escalation.

RetSpill exploits the common practice of temporarily storing data on the kernel stack, such as when preserving user space register values during a switch from the user space to the kernel space. We perform a systematic study and identify four common practices that spill user space data to the kernel stack. Although this practice is perfectly within the kernel's security specification, it introduces a new exploitation path when paired with a control flow hijacking (CFH) vulnerability, enabling RetSpill to turn such vulnerabilities directly into privilege escalation reliably. Moreover, RetSpill can bypass many defenses currently deployed in the Linux kernels. To demonstrate the severity of this problem, we collected 22 real-world kernel vulnerabilities and built a semi-automated tool that abuses intentionally-stored, on-stack user space data for kernel exploitation in a semi-automated fashion. Our tool generated end-to-end privilege escalation exploits for 20 out of 22 CFH vulnerabilities. Finally, we propose a new mechanism to defend against the attack.

SESSION: Session 53: Usable Authentication

Measuring Website Password Creation Policies At Scale
  • Suood Alroomi
  • Frank Li

Researchers have extensively explored how password creation policies influence the security and usability of user-chosen passwords, producing evidence-based policy guidelines. However, for web authentication to improve in practice, websites must actually implement these recommendations. To date, there has been limited investigation into what password creation policies are actually deployed by sites. Existing works are mostly dated and all studies relied on manual evaluations, assessing a small set of sites (at most 150, skewed towards top sites). Thus, we lack a broad understanding of the password policies used today. In this paper, we develop an automated technique for inferring a website's password creation policy, and apply it at scale to measure the policies of over 20K sites, over two orders of magnitude (~135x) more sites than prior work. Our findings identify the common policies deployed, potential causes of weak policies, and directions for improving authentication in practice. Ultimately, our study provides the first large-scale understanding of password creation policies on the web.

"I just stopped using one and started using the other": Motivations, Techniques, and Challenges When Switching Password Managers
  • Collins W. Munyendo
  • Peter Mayer
  • Adam J. Aviv

This paper explores what motivates password manager (PM) users in the US to switch from one PM to another, the techniques they employ when switching, and challenges they encounter throughout. Through a screener (n = 412) followed by a main survey (n = 54), we find that browser-based PMs are the most widely used, with most of these users motivated to use the PM due to convenience. Unfortunately, password reuse remains high. Most participants that switch PMs do so for usability reasons, but are also motivated by cost, as third-party PMs' full suite of features often require a subscription fee. Some PM-switchers are also motivated by recent security breaches, such as what was reported at LastPass in the Fall of 2022, with some participants losing trust in LastPass and PMs generally as a result. Those that switch mostly employ manual techniques of moving their passwords, e.g., copying and pasting their credentials from their previous to their new PM, despite most PMs offering ways to automatically transfer credentials in bulk across PMs. Assistance during the switching process is limited, with less than half of participants that switched receiving guidance during the switching process. From these findings, we make recommendations to PMs that can improve their overall user experience and use, including eliciting and acting on regular feedback from users as well as making PM settings more easily reachable and customizable by end-users.

"We've Disabled MFA for You": An Evaluation of the Security and Usability of Multi-Factor Authentication Recovery Deployments
  • Sabrina Amft
  • Sandra Höltervennhoff
  • Nicolas Huaman
  • Alexander Krause
  • Lucy Simko
  • Yasemin Acar
  • Sascha Fahl

Multi-Factor Authentication is intended to strengthen the security of password-based authentication by adding another factor, such as hardware tokens or one-time passwords using mobile apps.

However, this increased authentication security comes with potential drawbacks that can lead to account and asset loss. If users lose access to their additional authentication factors for any reason, they will be locked out of their accounts. Consequently, services that provide Multi-Factor Authentication should deploy procedures to allow their users to recover from losing access to their additional factor that are both secure and easy-to-use.

In this work, we investigate the security and user experience of Multi-Factor Authentication recovery procedures, and compare their deployment to descriptions on help and support pages.

We first evaluate the official help and support pages of 1,303 websites that provide Multi-Factor Authentication and collect documented information about their recovery procedures. Second, we select a subset of 71 websites, create accounts, set up Multi-Factor Authentication, and perform an in-depth investigation of their recovery procedure security and user experience.

We find that many websites deploy insecure Multi-Factor Authentication recovery procedures and allowed us to circumvent and disable Multi-Factor Authentication when having access to the accounts' associated email addresses. Furthermore, we commonly observed discrepancies between our in-depth analysis and the official help and support pages, implying that information meant to aid users is often either incorrect or outdated.

Based on our findings, we provide recommendations for best practices regarding Multi-Factor Authentication recovery.

Uncovering Impact of Mental Models towards Adoption of Multi-device Crypto-Wallets
  • Easwar Vivek Mangipudi
  • Udit Desai
  • Mohsen Minaei
  • Mainack Mondal
  • Aniket Kate

Cryptocurrency users saw a sharp increase in different types of crypto wallets in the past decade. However, the emerging multi-device wallets, even with improved security guarantees over their single-device counterparts, are yet to receive proportionate adoption. This work presents a data-driven investigation into the perceptions of users towards multi-device wallets, using a survey of 357 crypto-wallet users. Our results revealed two significant groups among our participants-Newbies and Non-newbies. Our follow-up qualitative analysis, after educating, revealed a gap between the mental model for these participants and actual security guarantees. Furthermore, we investigated preferred default settings for crypto-wallets across our participants over different key-share distribution settings of multi-device wallets-the threat model considerations affected user preferences, signifying a need for contextualizing default settings. We identified concrete, actionable design avenues for future multi-device wallet developers to improve adoption.

SESSION: Session 54: Measuring the Web

You Call This Archaeology? Evaluating Web Archives for Reproducible Web Security Measurements
  • Florian Hantke
  • Stefano Calzavara
  • Moritz Wilhelm
  • Alvise Rabitti
  • Ben Stock

Given the dynamic nature of the Web, security measurements on it suffer from reproducibility issues. In this paper we take a systematic look into the potential of using web archives for web security measurements. We first evaluate an extensive set of web archives as potential sources of archival data, showing the superiority of the Internet Archive with respect to its competitors. We then assess the appropriateness of the Internet Archive for historical web security measurements, detecting subtleties and possible pitfalls in its adoption. Finally, we investigate the feasibility of using the Internet Archive to simulate live security measurements, using recent archival data in place of live data. Our analysis shows that archive-based security measurements are a promising alternative to traditional live security measurements, which is reproducible by design; nevertheless, it also shows potential pitfalls and shortcomings of archive-based measurements. As an important contribution, we use the collected knowledge to identify insights and best practices for future archive-based security measurements.

Cybercrime Bitcoin Revenue Estimations: Quantifying the Impact of Methodology and Coverage
  • Gibran Gomez
  • Kevin van Liebergen
  • Juan Caballero

Multiple works have leveraged the public Bitcoin ledger to estimate the revenue cybercriminals obtain from their victims. Estimations focusing on the same target often do not agree, due to the use of different methodologies, seed addresses, and time periods. These factors make it challenging to understand the impact of their methodological differences. Furthermore, they underestimate the revenue due to the (lack of) coverage on the target's payment addresses, but how large this impact remains unknown.

In this work, we perform the first systematic analysis on the estimation of cybercrime bitcoin revenue. We implement a tool that can replicate the different estimation methodologies. Using our tool we can quantify, in a controlled setting, the impact of the different methodology steps. In contrast to what is widely believed, we show that the revenue is not always underestimated. There exist methodologies that can introduce huge overestimation. We collect 30,424 payment addresses and use them to compare the financial impact of 6 cybercrimes (ransomware, clippers, sextortion, Ponzi schemes, giveaway scams, exchange scams) and of 141 cybercriminal groups. We observe that the popular multi-input clustering fails to discover addresses for 40% of groups. We quantify, for the first time, the impact of the (lack of) coverage on the estimation. For this, we propose two techniques to achieve high coverage, possibly nearly complete, on the DeadBolt server ransomware. Our expanded coverage enables estimating DeadBolt's revenue at $2.47M, 39 times higher than the estimation using two popular Internet scan engines.

Jack-in-the-box: An Empirical Study of JavaScript Bundling on the Web and its Security Implications
  • Jeremy Rack
  • Cristian-Alexandru Staicu

In recent years, we have seen an increased interest in studying the software supply chain of user-facing applications to uncover problematic third-party dependencies. Prior work shows that web applications often rely on outdated or vulnerable third-party code. Moreover, real-world supply chain attacks show that dependencies can also be used to deliver malicious code, e.g., for carrying cryptomining operations. Nonetheless, existing measurement studies in this domain neglect an important software engineering practice: developers often merge together third-party code into a single file called bundle, which they then deliver from their own servers, making it appear as first-party code. Bundlers like Webpack or Rollup are popular open-source projects with tens of thousand of GitHub stars, suggesting that this technology is widely-used by developers. Ignoring bundling may result in underestimating the complexity of modern software supply chains.

In this work, we aim to address these methodological shortcomings of prior work. To this end, we propose a novel methodology for automatically detecting bundles, and partially reverse engineer them. Using this methodology, we conduct the first large-scale empirical study of bundled code on the web and examine its security implications. We provide evidence about the high prevalence of bundles, which are contained in 40% of all websites, and the average website includes more than one bundle. Following our methodology, we reidentify 1 051 vulnerabilities originating from 33 vulnerable npm packages, included in bundled code. Among the vulnerabilities, we find 17 critical and 59 high severity ones, which might enable malicious actors to execute attacks such as arbitrary code execution. Analyzing the low-rated libraries included in bundles, we discover 10 security holding packages, which suggest that supply-chain attacks affecting bundles are not only possible, but they are already happening.

Understanding and Detecting Abused Image Hosting Modules as Malicious Services
  • Geng Hong
  • Mengying Wu
  • Pei Chen
  • Xiaojing Liao
  • Guoyi Ye
  • Min Yang

As a new type of underground ecosystem, the exploitation of Abused IHMs as MalIcious sErvices (AIMIEs) is becoming increasingly prevalent among miscreants to host illegal images and propagate harmful content. However, there has been little effort to understand this new menace, in terms of its magnitude, impact, and techniques, not to mention any serious effort to detect vulnerable image hosting modules on a large scale. To fulfill this gap, this paper presents the first measurement study of AIMIEs. By collecting and analyzing 89 open-sourced AIMIEs, we reveal the landscape of AIMIEs, report the evolution and evasiveness of abused image hosting APIs from reputable companies such as Alibaba, Tencent, and Bytedance, and identify real-world abused images uploaded through those AIMIEs. In addition, we propose a tool, called Viola, to detect vulnerable image hosting modules (IHMs) in the wild. We find 477 vulnerable IHM upload APIs associated with 338 web services, which integrated vulnerable IHMs, and 207 victim FQDNs. The highest-ranked domain with vulnerable web service is baidu.com, followed by bilibili.com and 163.com. We have reported abused and vulnerable IHM upload APIs and received acknowledgments from 69 of them by the time of paper submission.

SESSION: Session 55: Security of Cryptographic Protocols & Implementations

Faster Constant-time Evaluation of the Kronecker Symbol with Application to Elliptic Curve Hashing
  • Diego F. Aranha
  • Benjamin Salling Hvass
  • Bas Spitters
  • Mehdi Tibouchi

We generalize the Bernstein-Yang (BY) algorithm [11] for constant-time modular inversion to compute the Kronecker symbol, of which the Jacobi and Legendre symbols are special cases. We first develop a basic and easy-to-implement algorithm, defined with full-precision division steps. We then describe an optimized version due to Hamburg [21] over word-sized inputs, and formally verify its correctness. Along the way, we introduce a number of optimizations for implementing both versions in constant time. The resulting algorithms are particularly suitable for computing the Legendre symbol with dense prime p, where no efficient addition chain is known for exponentiating to p-1 over 2, as it is often the case in pairing-friendly elliptic curves. Our high-speed implementation for a range of parameters shows that the new algorithm is up to 40 times faster than exponentiation, and up to 25.7% faster than the previous state of the art. We illustrate our techniques with hashing to elliptic curves using the SwiftEC algorithm [17], with savings of 14.7%-48.1%, and to accelerating the CTIDH isogeny-based key exchange [7], with savings of 3.5-13.5%.

Verifiable Verification in Cryptographic Protocols
  • Marc Fischlin
  • Felix Günther

Common verification steps in cryptographic protocols, such as signature or message authentication code checks or the validation of elliptic curve points, are crucial for the overall security of the protocol. Yet implementation errors omitting these steps easily remain unnoticed, as often the protocol will function perfectly anyways. One of the most prominent examples is Apple's goto fail bug where the erroneous certificate verification skipped over several of the required steps, marking invalid certificates as correctly verified. This vulnerability went undetected for at least 17 months.

We propose here a mechanism which supports the detection of such errors on a cryptographic level. Instead of merely returning the binary acceptance decision, we let the verification return more fine-grained information in form of what we call a confirmation code. The reader may think of the confirmation code as disposable information produced as part of the relevant verification steps. In case of an implementation error like the goto fail bug, the confirmation code would then miss essential elements.

The question arises now how to verify the confirmation code itself. We show how to use confirmation codes to tie security to basic functionality at the overall protocol level, making erroneous implementations be detected through the protocol not functioning properly. More concretely, we discuss the usage of confirmation codes in secure connections, established via a key exchange protocol and secured through the derived keys. If some verification steps in a key exchange protocol execution are faulty, then so will be the confirmation codes, and because we can let the confirmation codes enter key derivation, the connection of the two parties will eventually fail. In consequence, an implementation error like goto fail would now be detectable through a simple connection test. We propose here a mechanism which supports the detection of such errors on a cryptographic level. Instead of merely returning the binary acceptance decision, we let the verification return more fine-grained information in form of what we call a confirmation code. The reader may think of the confirmation code as disposable information produced as part of the relevant verification steps. In case of an implementation error like the goto fail bug, the confirmation code would then miss essential elements. The question arises now how to verify the confirmation code itself. We show how to use confirmation codes to tie security to basic functionality at the overall protocol level, making erroneous implementations be detected through the protocol not functioning properly. More concretely, we discuss the usage of confirmation codes in secure connections, established via a key exchange protocol and secured through the derived keys. If some verification steps in a key exchange protocol execution are faulty, then so will be the confirmation codes, and because we can let the confirmation codes enter key derivation, the connection of the two parties will eventually fail. In consequence, an implementation error like goto fail would now be detectable through a simple connection test.

Compact Frequency Estimators in Adversarial Environments
  • Sam A. Markelon
  • Mia Filic
  • Thomas Shrimpton

Count-Min Sketch (CMS) and HeavyKeeper (HK) are two realizations of a compact frequency estimator (CFE). These are a class of probabilistic data structures that maintain a compact summary of (typically) high-volume streaming data, and provides approximately correct estimates of the number of times any particular element has appeared. CFEs are often the base structure in systems looking for the highest-frequency elements (i.e., top-K elements, heavy hitters, elephant flows). Traditionally, probabilistic guarantees on the accuracy of frequency estimates are proved under the implicit assumption that stream elements do not depend upon the internal randomness of the structure. Said another way, they are proved in the presence of data streams that are created by non-adaptive adversaries. Yet in many practical use-cases, this assumption is not well-matched with reality; especially, in applications where malicious actors are incentivized to manipulate the data stream. We show that the CMS and HK structures can be forced to make significant estimation errors, by concrete attacks that exploit adaptivity. We analyze these attacks analytically and experimentally, with tight agreement between the two. Sadly, these negative results seem unavoidable for (at least) sketch-based CFEs with parameters that are reasonable in practice. On the positive side, we give a new CFE (Count-Keeper) that can be seen as a composition of the CMS and HK structures. Count-Keeper estimates are typically more accurate (by at least a factor of two) than CMS for "honest" streams; our attacks against CMS and HK are less effective (and more resource intensive) when used against Count-Keeper; and Count-Keeper has a native ability to flag estimates that are suspicious, which neither CMS or HK (or any other CFE, to our knowledge) admits.

ACABELLA: Automated (Crypt)analysis of Attribute-Based Encryption Leveraging Linear Algebra
  • Antonio de la Piedra
  • Marloes Venema
  • Greg Alpár

Attribute-based encryption (ABE) is a popular type of public-key encryption that enforces access control cryptographically, and has spurred the proposal of many use cases. To satisfy the requirements of the setting, tailor-made schemes are often introduced. However, designing secure schemes---as well as verifying that they are secure---is notoriously hard. Several of these schemes have turned out to be broken, making them dangerous to deploy in practice.

To overcome these shortcomings, we introduce ACABELLA. ACABELLA simplifies generating and verifying security proofs for pairing-based ABE schemes. It consists of a framework for security proofs that are easy to verify manually and an automated tool that efficiently generates these security proofs. Creating such security proofs generally takes no more than a few seconds. The output is easy to understand, and the proofs can be verified manually. In particular, the verification of a security proof generated by ACABELLA boils down to performing simple linear algebra.

The ACABELLA tool is open source and also available via a web interface. With its help, experts can simplify their proof process by verifying or refuting the security claims of their schemes and practitioners can get an assurance that the ABE scheme of their choice is secure.

SESSION: Session 56: Oblivious Algorithms & Data Structures

Ramen: Souper Fast Three-Party Computation for RAM Programs
  • Lennart Braun
  • Mahak Pancholi
  • Rahul Rachuri
  • Mark Simkin

Secure RAM computation allows a number of parties to evaluate a function represented as a random-access machine (RAM) program in a way that reveals nothing about the private inputs of the parties except from what is already revealed by the function output itself. In this work we present Ramen, which is a new protocol for computing RAM programs securely among three parties, tolerating up to one passive corruption. Ramen provides reasonable asymptotic guarantees and is concretely efficient at the same time. We have implemented our protocol and provide extensive benchmarks for various settings.

Asymptotically, our protocol requires a constant number of rounds and an amortized sublinear amount of communication and computation per memory access. In terms of concrete efficiency, our protocol outperforms previous solutions. For a memory of size 226 our memory accesses are 25x faster in the LAN and 8x faster in the WAN setting, when compared to the previously fastest, and concurrent, solution by Vadapalli, Henry, and Goldberg (USENIX Security 2023). Due to our superior asymptotic guarantees, the efficiency gap is only widening as the memory gets larger and for this reason Ramen provides the currently most scalable concretely efficient solution for securely computing RAM programs.

Secure Statistical Analysis on Multiple Datasets: Join and Group-By
  • Gilad Asharov
  • Koki Hamada
  • Ryo Kikuchi
  • Ariel Nof
  • Benny Pinkas
  • Junichi Tomida

We implement a secure platform for statistical analysis over multiple organizations and multiple datasets. We provide a suite of protocols for different variants of JOIN and GROUP-BY operations. JOIN allows combining data from multiple datasets based on a common column. GROUP-BY allows aggregating rows that have the same values in a column or a set of columns, and then apply some aggregation summary on the rows (such as sum, count, median, etc.). Both operations are fundamental tools for relational databases. One example use case of our platform is in data marketing in which an analyst would join purchase histories and membership information, and then obtain statistics, such as "Which products were bought by people earning this much per annum?"

Both JOIN and GROUP-BY involve many variants, and we design protocols for several common procedures. In particular, we propose a novel group-by-median protocol that has not been known so far. Our protocols rely on sorting protocols, and work in the honest majority setting and against malicious adversaries. To the best of our knowledge, this is the first implementation of JOIN and GROUP-BY protocols secure against a malicious adversary.

FutORAMa: A Concretely Efficient Hierarchical Oblivious RAM
  • Gilad Asharov
  • Ilan Komargodski
  • Yehuda Michelson

Oblivious RAM (ORAM) is a general-purpose technique for hiding memory access patterns. This is a fundamental task underlying many secure computation applications. While known ORAM schemes provide optimal asymptotic complexity, despite extensive efforts, their concrete costs remain prohibitively expensive for many interesting applications. The current state-of-the-art practical ORAM schemes are suitable only for somewhat small memories (Square-Root ORAM or Path ORAM).

This work presents a novel concretely efficient ORAM construction based on recent breakthroughs in asymptotic complexity of ORAM schemes (PanORAMa and OptORAMa). We bring these constructions to the realm of practically useful schemes by relaxing the restriction on constant local memory size. Our design provides a factor of at least 6 to 8 improvement over an optimized variant of Path ORAM for a set of reasonable memory sizes (e.g., 1GB, 1TB) and with the same local memory size. To our knowledge, this is the first practical implementation of an ORAM based on the full hierarchical ORAM framework. Prior to our work, the belief was that hierarchical ORAM-based constructions were inherently too expensive in practice. We implement our design and provide extensive evaluation and experimental results.

Waks-On/Waks-Off: Fast Oblivious Offline/Online Shuffling and Sorting with Waksman Networks
  • Sajin Sasy
  • Aaron Johnson
  • Ian Goldberg

As more privacy-preserving solutions leverage trusted execution environments (TEEs) like Intel SGX, it becomes pertinent that these solutions can by design thwart TEE side-channel attacks that research has brought to light. In particular, such solutions need to be fully oblivious to circumvent leaking private information through memory or timing side channels.

In this work, we present fast fully oblivious algorithms for shuffling and sorting of data. Oblivious shuffling and sorting of data are two fundamental primitives that are frequently used for permuting data in privacy-preserving solutions. We present novel oblivious shuffling and sorting algorithms that work in the offline/online model such that the bulk of the computation can be done in an offline phase independent of the data to be permuted, resulting in an online phase that is asymptotically (O(βn log n)) and concretely (>5× and >3×) more efficient than the state-of-the-art solutions for oblivious shuffling and sorting (O(β n log² n)) when permuting n items, each of size β.

Our work revisits Waksman networks, and leverages the key observation that setting the control bits of a Waksman network for a uniformly random shuffle is independent of the data to be shuffled. However, setting the control bits of a Waksman network efficiently and fully obliviously poses a challenge, and we provide a novel control bit setting algorithm to this end. The total cost (inclusive of offline computation) of our algorithms WaksShuffle and WaksSort are lower than all other fully oblivious shuffling and sorting algorithms for moderately sized problems (β > 1400 B), and the performance gap only widens with increase in item sizes. Furthermore, our shuffling algorithm WaksShuffle improves the online cost of oblivious shuffling by >5× for shuffling 220 items of any size; similarly WaksShuffle+QS provides >2.7× speedups in the online cost of oblivious sorting.

SESSION: Session 57: Privacy in the Digital World

General Data Protection Runtime: Enforcing Transparent GDPR Compliance for Existing Applications
  • David Klein
  • Benny Rolle
  • Thomas Barber
  • Manuel Karl
  • Martin Johns

Recent advances in data protection regulations brings privacy benefits for website users, but also comes at a cost for operators. Retrofitting the privacy requirements of laws such as the General Data Protection Regulation (GDPR) onto legacy software requires significant auditing and development effort. In this work we demonstrate that this effort can be minimized by viewing data protection requirements through the lens of information flow tracking. Instead of manual inspections of applications, we propose a lightweight enforcement engine which can reliably prevent unlawful data processing even in the presence of bugs or misconfigured software. Taking GDPR regulations as a starting point, we define twelve software requirements which, if implemented properly, ensure adequate handling of personal data. We go on to show how these requirements can be fulfilled by proposing a metadata structure and enforcement policies for dynamic information flow tracking frameworks. To put this idea into practice, we present Fontus, a Java Virtual Machine (JVM) information flow tracking framework, which can transparently label personal data in existing Java applications in order to aid compliance with data protection regulations. Finally, we demonstrate the applicability of our approach by enforcing data protection polices across 7 large, open source web applications, with no changes required to the applications themselves.

Control, Confidentiality, and the Right to be Forgotten
  • Aloni Cohen
  • Adam Smith
  • Marika Swanberg
  • Prashant Nalini Vasudevan

Recent digital rights frameworks give users the right to delete their data from systems that store and process their personal information (e.g., the "right to be forgotten" in the GDPR).

How should deletion be formalized in complex systems that interact with many users and store derivative information? We argue that prior approaches fall short. Definitions of machine unlearning[6] are too narrowly scoped and do not apply to general interactive settings. The natural approach of deletion-as-confidentiality[15] is too restrictive: by requiring secrecy of deleted data, it rules out social functionalities.

We propose a new formalism: deletion-as-control. It allows users' data to be freely used before deletion, while also imposing a meaningful requirement after deletion--thereby giving users more control.

Deletion-as-control provides new ways of achieving deletion in diverse settings. We apply it to social functionalities, and give a new unified view of various machine unlearning definitions from the literature. This is done by way of a new adaptive generalization of history independence.

Deletion-as-control also provides a new approach to the goal of machine unlearning, that is, to maintaining a model while honoring users' deletion requests. We show that publishing a sequence of updated models that are differentially private under continual release satisfies deletion-as-control. The accuracy of such an algorithm does not depend on the number of deleted points, in contrast to the machine unlearning literature.

PolicyChecker: Analyzing the GDPR Completeness of Mobile Apps' Privacy Policies
  • Anhao Xiang
  • Weiping Pei
  • Chuan Yue

The European General Data Protection Regulation (GDPR) mandates a data controller (e.g., an app developer) to provide all information specified in Articles (Arts.) 13 and 14 to data subjects (e.g., app users) regarding how their data are being processed and what are their rights. While some studies have started to detect the fulfillment of GDPR requirements in a privacy policy, their exploration only focused on a subset of mandatory GDPR requirements. In this paper, our goal is to explore the state of GDPR-completeness violations in mobile apps' privacy policies. To achieve our goal, we design the PolicyChecker framework by taking a rule and semantic role based approach. PolicyChecker automatically detects completeness violations in privacy policies based not only on all mandatory GDPR requirements but also on all if-applicable GDPR requirements that will become mandatory under specific conditions. Using PolicyChecker, we conduct the first large-scale GDPR-completeness violation study on 205,973 privacy policies of Android apps in the UK Google Play store. PolicyChecker identified 163,068 (79.2%) privacy policies containing data collection statements; therefore, such policies are regulated by GDPR requirements. However, the majority (99.3%) of them failed to achieve the GDPR-completeness with at least one unsatisfied requirement; 98.1% of them had at least one unsatisfied mandatory requirement, while 73.0% of them had at least one unsatisfied if-applicable requirement logic chain. We conjecture that controllers' lack of understanding of some GDPR requirements and their poor practices in composing a privacy policy can be the potential major causes behind the GDPR-completeness violations. We further discuss recommendations for app developers to improve the completeness of their apps' privacy policies to provide a more transparent personal data processing environment to users.

Speranza: Usable, Privacy-friendly Software Signing
  • Kelsey Merrill
  • Zachary Newman
  • Santiago Torres-Arias
  • Karen R. Sollins

Software repositories, used for wide-scale open software distribution, are a significant vector for security attacks. Software signing provides authenticity, mitigating many such attacks. Developer-managed signing keys pose usability challenges, but certificate-based systems introduce privacy problems. This work, Speranza, uses certificates to verify software authenticity but still provides anonymity to signers using zero-knowledge identity co-commitments.

In Speranza, a signer uses an automated certificate authority (CA) to create a private identity-bound signature and proof of authorization. Verifiers check that a signer was authorized to publish a package without learning the signer's identity. The package repository privately records each package's authorized signers, but publishes only commitments to identities in a public map. Then, when issuing certificates, the CA issues the certificate to a distinct commitment to the same identity. The signer then creates a zero-knowledge proof that these are co-commitments.

We implemented a proof-of-concept for Speranza. We find that costs to maintainers (signing) and end users (verifying) are small (sub-millisecond), even for a repository with millions of packages. Techniques inspired by recent key transparency systems reduce the bandwidth for serving authorization policies to 2 KiB. Server costs in this system are negligible. Our evaluation finds that Speranza is practical on the scale of the largest software repositories.

We also emphasize practicality and deployability in this project. By building on existing technology and employing relatively simple and well-established cryptographic techniques, Speranza can be deployed for wide-scale use with only a few hundred lines of code and minimal changes to existing infrastructure. Speranza is a practical way to bring privacy and authenticity together for more trustworthy open-source software.

SESSION: Session 58: Measuring Machine Learning & Software Security

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
  • Yiting Qu
  • Xinyue Shen
  • Xinlei He
  • Michael Backes
  • Savvas Zannettou
  • Yang Zhang

State-of-the-art Text-to-Image models like Stable Diffusion and DALLE\cdot2 are revolutionizing how people generate visual content. At the same time, society has serious concerns about how adversaries can exploit such models to generate problematic or unsafe images. In this work, we focus on demystifying the generation of unsafe images and hateful memes from Text-to-Image models. We first construct a typology of unsafe images consisting of five categories (sexually explicit, violent, disturbing, hateful, and political). Then, we assess the proportion of unsafe images generated by four advanced Text-to-Image models using four prompt datasets. We find that Text-to-Image models can generate a substantial percentage of unsafe images; across four models and four prompt datasets, 14.56% of all generated images are unsafe. When comparing the four Text-to-Image models, we find different risk levels, with Stable Diffusion being the most prone to generating unsafe content (18.92% of all generated images are unsafe). Given Stable Diffusion's tendency to generate more unsafe content, we evaluate its potential to generate hateful meme variants if exploited by an adversary to attack a specific individual or community. We employ three image editing methods, DreamBooth, Textual Inversion, and SDEdit, which are supported by Stable Diffusion to generate variants. Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world. Overall, our results demonstrate that the danger of large-scale generation of unsafe images is imminent. We discuss several mitigating measures, such as curating training data, regulating prompts, and implementing safety filters, and encourage better safeguard tools to be developed to prevent unsafe generation.1 Our code is available at https://github.com/YitingQu/unsafe-diffusion.

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models
  • Zeyang Sha
  • Zheng Li
  • Ning Yu
  • Yang Zhang

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the misuse of their generated fake images. To tackle this problem, we pioneer a systematic study on the detection and attribution of fake images generated by text-to-image generation models. Concretely, we first build a machine learning classifier to detect the fake images generated by various text-to-image generation models. We then attribute these fake images to their source models, such that model owners can be held responsible for their models' misuse. We further investigate how prompts that generate fake images affect detection and attribution. We conduct extensive experiments on four popular text-to-image generation models, including DALL·E 2, Stable Diffusion, GLIDE, and Latent Diffusion, and two benchmark prompt-image datasets. Empirical results show that (1) fake images generated by various models can be distinguished from real ones, as there exists a common artifact shared by fake images from different models; (2) fake images can be effectively attributed to their source models, as different models leave unique fingerprints in their generated images; (3) prompts with the "person'' topic or a length between 25 and 75 enable models to generate fake images with higher authenticity. All findings contribute to the community's insight into the threats caused by text-to-image generation models. We appeal to the community's consideration of the counterpart solutions, like ours, against the rapidly-evolving fake image generation.

"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences
  • Daniel Olszewski
  • Allison Lu
  • Carson Stillman
  • Kevin Warren
  • Cole Kitroser
  • Alejandro Pascual
  • Divyajyoti Ukirde
  • Kevin Butler
  • Patrick Traynor

Reproducibility is crucial to the advancement of science; it strengthens confidence in seemingly contradictory results and expands the boundaries of known discoveries. Computer Security has the natural benefit of creating artifacts that should facilitate computational reproducibility, the ability for others to use someone else's code and data to independently recreate results, in a relatively straightforward fashion. While the Security community has recently increased its attention on reproducibility, an independent and comprehensive measurement of the current state of reproducibility has not been conducted. In this paper, we perform the first such study, targeting reproducible artifacts generated specifically by papers on machine learning security (one of the most popular areas in academic research) published in Tier 1 security conferences over the past ten years (2013-2022). We perform our measurement study of indirect and direct reproducibility over nearly 750 papers, their codebases, and datasets. Our analysis shows that there is no statistically significant difference between the availability of artifacts before and after the introduction of Artifact Evaluation Committees in Tier 1 conferences. However, based on three years of results, artifacts that pass through this process work at a higher rate than those that do not. From our collected findings, we offer data-driven suggestions for improving reproducibility in our community, including five common problems observed in our study. In so doing, we demonstrate that significant progress still needs to be made in computational reproducibility in Computer Security research.

Unhelpful Assumptions in Software Security Research
  • Ita Ryan
  • Utz Roedig
  • Klaas-Jan Stol

In the study of software security many factors must be considered. Once venturing beyond the simplest of laboratory experiments, the researcher is obliged to contend with exponentially complex conditions. Software security has been shown to be affected by priming, tool usability, library documentation, organisational security culture, the content and format of internet resources, IT team and developer interaction, Internet search engine ordering, developer personality, security warning placement, mentoring, developer experience and more. In a systematic review of software security papers published since 2016, we have identified a number of unhelpful assumptions that are commonly made by software security researchers. In this paper we list these assumptions, describe why they sometimes do not reflect reality, and suggest implications for researchers.

SESSION: Session 59: Tracking the Web

Read Between the Lines: Detecting Tracking JavaScript with Bytecode Classification
  • Mohammad Ghasemisharif
  • Jason Polakis

Browsers and extensions that aim to block online ads and tracking scripts predominantly rely on rules from filter lists for determining which resource requests must be blocked. These filter lists are often manually curated by a community of online users. However, due to the arms race between blockers and ad-supported websites, these rules must continuously get updated so as to adapt to novel bypassing techniques and modified requests, thus rendering the detection and rule-generation process cumbersome and reactive (which can result in major delays between propagation and detection). In this paper, we address the detection problem by proposing an automated pipeline that detects tracking and advertisement JavaScript resources with high accuracy, designed to incur minimal false positives and overhead. Our method models script detection as a text classification problem, where JavaScript resources are documents containing bytecode sequences. Since bytecode is directly obtained from the JavaScript interpreter, our technique is resilient against commonly used bypassing methods, such as URL randomization or code obfuscation. We experiment with both deep learning and traditional ML-based approaches for bytecode classification and show that our approach identifies ad/tracking scripts with 97.08% accuracy, significantly outperforming cutting-edge systems in terms of both precision and the level of required features. Our experimental analysis further highlights our system's capabilities, by demonstrating how it can augment filter lists by uncovering ad/tracking scripts that are currently unknown, as well as proactively detecting scripts that have been erroneously added by list curators.

CookieGraph: Understanding and Detecting First-Party Tracking Cookies
  • Shaoor Munir
  • Sandra Siby
  • Umar Iqbal
  • Steven Englehardt
  • Zubair Shafiq
  • Carmela Troncoso

As third-party cookie blocking is becoming the norm in mainstream web browsers, advertisers and trackers have started to use first-party cookies for tracking. To understand this phenomenon, we conduct a differential measurement study with versus without third-party cookies. We find that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked.

As opposed to third-party cookie blocking, first-party cookie blocking is not practical because it would result in major breakage of website functionality. We propose CookieGraph, a machine learning-based approach that can accurately and robustly detect and block first-party tracking cookies. CookieGraph detects first-party tracking cookies with 90.18% accuracy, outperforming the state-of-the-art CookieBlock by 17.31%. We show that CookieGraph is robust against cookie name manipulation, while CookieBlock's accuracy drops by 15.87%. While blocking all first-party cookies results in major breakage on 32% of the sites with SSO logins, and CookieBlock reduces it to 10%, we show that CookieGraph does not cause any major breakage on these sites.

Our deployment of CookieGraph shows that first-party tracking cookies are used on 89.86% of the top-million websites. We find that 96.61% of these first-party tracking cookies are in fact ghostwritten by third-party scripts embedded in the first-party context. We also find evidence of first-party tracking cookies being set by fingerprinting scripts. The most prevalent first-party tracking cookies are set by major advertising entities such as Google, Facebook, and TikTok.

AdCPG: Classifying JavaScript Code Property Graphs with Explanations for Ad and Tracker Blocking
  • Changmin Lee
  • Sooel Son

Advertising and tracking service (ATS) blocking has been safeguarding the privacy of millions of Internet users from privacy-invasive tracking behaviors. Previous research has proposed using a graph representation that models the structural relationships in loading web resources and then conducting ATS node classification based on this graph representation. However, these context-based ATS classification methods suffer from (1) inconsistent classification due to the varying context in which ATS resources are loaded and (2) a lack of explainability of the classification results, making it difficult to identify the code-level causes for ATS classification.

We propose AdCPG, a graph neural network (GNN) framework tailored for ATS classification. Our approach focuses on classifying JavaScript (JS) content rather than considering the loading context of web resources. Given JS files, AdCPG leverages their code property graphs (CPGs) and conducts graph classification on these CPGs that model the semantic and structural information of these JS files. To provide the explanations for ATS classification, AdCPG highlights the JS code that contributes the most to classifying the JS files into ATS using a GNN explainer. AdCPG achieved an accuracy of 98.75% on the Tranco top-10K websites, demonstrating high performance using only JS content. Upon deployment, AdCPG identified 650 JS files from 500 domains that were not detected by any ATS filter lists and previous ATS classification tools. AdCPG plays a complementary role in identifying ATS resources while providing code-level explanations, which minimizes the engineering effort required to validate ATS classification results.

SESSION: Session 60: Poster Session

Poster: Using CodeQL to Detect Malware in npm
  • Matías F. Gobbi
  • Johannes Kinder

Malicious packages are a problem on npm, but like other malware, they are rarely completely novel and share large semantic similarities. We propose to leverage the existing static analysis framework CodeQL to find malware on npm; but instead of detecting variants of vulnerabilities, we use it to detect variants of malware. We present a methodology for writing queries from recently reported packages, as a way of defining semantic signature for specific malicious behavior, where a single one can then be used to match entire families of malware. An iteration of our approach resulted in the discovery of 125 malicious packages from the registry, without producing a single false alarm.

Poster: Data Minimization by Construction for Trigger-Action Applications
  • Mohammad M. Ahmadpanah
  • Daniel Hedin
  • Andrei Sabelfeld

Trigger-Action Platforms (TAPs) enable applications to integrate various devices and services otherwise unconnected. Recent features of TAPs introduce additional sources of data such as queries in IFTTT. The current TAPs, like IFTTT, demand that trigger and query services transmit excessive amounts of user data to the TAP. To limit the data to what is actually necessary for the execution to comply with the principle of data minimization, input services should send no more than the necessary data. LazyTAP proposes a new paradigm of data minimization by construction in TAPs, introducing a novel perspective for data collection from input services. While the existing push-all approach of TAPs entails coarse-grained data over-approximation, LazyTAP pulls input data on-demand at the level of attributes, once accessed by the app execution. Thanks to the fine granularity provided by LazyTAP, multiple trigger and query services can be naturally minimized while the behavior of app executions is preserved. In addition, a great benefit of LazyTAP is being seamless for third-party app developers. By leveraging laziness, LazyTAP defers computation and proxies objects to load necessary remote data behind the scenes. Our evaluation study on app benchmarks shows that on average LazyTAP improves minimization by 95% over IFTTT and by 38% over minTAP, with a tolerable performance overhead. This poster goes into further details about LazyTAP and elaborates on its prototype implementation.

Poster: Verifiable Encodings for Maliciously-Secure Homomorphic Encryption Evaluation
  • Sylvain Chatel
  • Christian Knabenhans
  • Apostolos Pyrgelis
  • Carmela Troncoso
  • Jean-Pierre Hubaux

Homomorphic encryption has become a promising solution for protecting the privacy of computations on sensitive data. However, existing homomorphic encryption pipelines do not guarantee the correctness of the computation result in the presence of a malicious adversary. In this poster, we present two encodings compatible with state-of-the-art fully homomorphic encryption schemes that enable practical client-verification of homomorphic computations, while enabling all the operations required for modern privacy-preserving analytics. Based on these encodings, we introduce a ready-to-use library for the verification of any homomorphic operation executed over encrypted data. We demonstrate its practicality for various applications and, in particular, we show that it enables verifiability of some homomorphic analytics with less than 3 times overhead compared to the homomorphic encryption baseline.

Poster: Circumventing the GFW with TLS Record Fragmentation
  • Niklas Niere
  • Sven Hebrok
  • Juraj Somorovsky
  • Robert Merget

State actors around the world censor the HTTPS protocol to block access to certain websites. While many circumvention strategies utilize the TCP layer only little emphasis has been placed on the analysis of TLS-a complex protocol and integral building block of HTTPS. In contrast to the TCP layer, circumvention methods on the TLS layer do not require root privileges since TLS operates on the application layer. With this proposal, we want to motivate a deeper analysis of TLS in regard to censorship circumvention techniques. To prove the existence of such techniques, we present TLS record fragmentation as a novel circumvention technique and circumvent the Great Firewall of China (GFW) using this technique. We hope that our research fosters collaboration between censorship and TLS researchers.

Poster: Generating Experiences for Autonomous Network Defense
  • Andres Molina-Markham
  • Luis F. Robaina
  • Akash H. Trivedi
  • Derek G. Tsui
  • Ahmad Ridley

Reinforcement Learning (RL) offers a promising path toward developing defenses for the next generation of computer networks. The hope is that RL not only helps to automate network defenses, but in addition, RL finds novel solutions to defend networks that adapt to deal with the increasing complexity of networks and threats. Despite the promise, existing work applying RL to cybersecurity trains cyber defenders on rigid and narrow problem definitions with small computer networks.

Inspired by research that demonstrates that open-ended learning helps agents to adapt rapidly and to generalize to tasks never seen before, we hypothesize that similar approaches can offer a path toward practical RL for network defense. We provide evidence to support this hypothesis. A key aspect to enable generalizable learning is our approach for generating experiences for the learning agent--based on a universe of tasks-in a manner that allows the agent to learn to defend increasingly more complex networks. We show that RL agents can learn to master a reasonably complex network defense task by learning to solve tasks with varying degrees of difficulty. Our preliminary results show that in addition to contributing to the feasibility of mastering complex tasks, this type of experience generation may result in more robust policies.

Overall, our research demonstrates that the collection of experiences that we present to the learning agent is a critical aspect for achieving high performance. We share with the research community our approaches for (i) defining distributions over network defense tasks; (ii) updating distributions as the agent learns; and (iii) maintaining key aspects of tasks invariant to preserve knowledge as tasks vary.

Our experiments are enabled by the second version of our Framework for Advanced Reinforcement Learning for Autonomous Network Defense (FARLAND), which integrates support for action representations, dynamic task selection, and validation of policies in simulation and emulation. Our hope is that by sharing our ideas and results, we foster collaborations and innovation toward the creation of increasingly sophisticated gyms to train network defenders.

Poster: From Hashes to Ashes - A Comparison of Transcription Services
  • Rudolf Siegel
  • Rafael Mrowczynski
  • Maria Hellenthal
  • Michael Schilling

In recent years, semi-structured interviews gained more and more importance in cyber security research. Transcribing audio recordings of such interviews is a crucial step in qualitative data analysis, but it is also a work-intensive and time-consuming task. While outsourcing presents a common option, maintaining research quality requires precise transcriptions -- a task further compounded by technical jargon and established expressions in the research field. In this study, we compare different transcription services and evaluate their outcome quality within the context of cyber security. Our findings provide insights for researchers navigating the complex landscape of transcription services, offering informed choices to enhance the accuracy and validity of qualitative data analysis.

Poster: Mujaz: A Summarization-based Approach for Normalized Vulnerability Description
  • Hattan Althebeiti
  • Brett Fazio
  • William Chen
  • David Mohaisen

This work proposes a multi-task Natural Language Processing (NLP) system to normalize and summarize the descriptions into a uniform structure. A dataset was curated from an official public database and broken into several constituent entities representing a particular aspect of the description. A model is trained on the annotated features independently and jointly to generate a simple and uniform summary. We also introduce our human metrics to judge the quality of the generated summary with respect to human comprehension and content accuracy.

Poster: Boosting Adversarial Robustness by Adversarial Pre-training
  • Xiaoyun Xu
  • Stjepan Picek

Vision Transformer (ViT) shows superior performance on various tasks, but, similar to other deep learning techniques, it is vulnerable to adversarial attacks. Due to the differences between ViT and traditional CNNs, previous works designed new adversarial training methods as defenses according to the design of ViT, such as blocking attention to individual patches or dropping embeddings with low attention. However, these methods usually focus on fine-tuning stage or the training of the model itself. Improving robustness at the pre-training stage, especially with lower overhead, has yet to be thoroughly investigated. This paper proposes a novel method, Adv-MAE, which increases adversarial robustness by masked adversarial pre-training without a penalty to performance on clean data. We design a simple method to generate adversarial perturbation for the autoencoder, as the autoencoder does not provide classification results. Then, we use masked inputs with perturbation to conduct adversarial training for the autoencoder. The pre-trained autoencoder can be used to build a ViT with better robustness. Our experimental results show that, when using adversarial fine-tuning, Adv-MAE offers better accuracy under adversarial attack than the non-adversarial pre-training method (3.46% higher on CIFAR-10, 1.12% higher on Tiny ImageNet). It also shows better accuracy on clean data (4.94% higher on CIFAR-10, 1.74% higher on Tiny ImageNet), meaning Adv-MAE does not deteriorate performance on clean inputs. In addition, masked pre-training also shows much lower time consumption at each training epoch.

Poster: Vulcan -- Repurposing Accessibility Features for Behavior-based Intrusion Detection Dataset Generation
  • Christian van Sloun
  • Klaus Wehrle

The generation of datasets is one of the most promising approaches to collecting the necessary behavior data to train machine learning models for host-based intrusion detection. While various dataset generation methods have been proposed, they are often limited and either only generate network traffic or are restricted to a narrow subset of applications. We present Vulcan, a preliminary framework that uses accessibility features to generate datasets by simulating user interactions for an extendable set of applications. It uses behavior profiles that define realistic user behavior and facilitate dataset updates upon changes in software versions, thus reducing the effort required to keep a dataset relevant. Preliminary results show that using accessibility features presents a promising approach to improving the quality of datasets in the HIDS domain.

Poster: Computing the Persistent Homology of Encrypted Data
  • Dominic Gold
  • Koray Karabina
  • Francis Motta

Topological Data Analysis (TDA) offers a suite of computational tools that provide quantified shape features of high dimensional data that can be used by modern statistical and predictive machine learning (ML) models. Persistent homology (PH) transforms data (e.g., point clouds, images, time series) into persistence diagrams (PDs)--compact representations of its latent topological structures. Because PDs enjoy inherent noise tolerance, are interpretable, provide a solid basis for data analysis, and can be made compatible with the expansive set of well-established ML model architectures, PH has been widely adopted for model development including on sensitive data. Thus, TDA should be incorporated into secure end-to-end data analysis pipelines. This paper introduces a version of the fundamental algorithm to compute PH on encrypted data using homomorphic encryption (HE).

Poster: Attestor -- Simple Proof-of-Storage-Time
  • Arup Mondal

Proof of Storage-Time (PoST) is a cryptographic primitive that enables a server to demonstrate non-interactive continuous availability of outsourced data in a publicly verifiable way.

In this work, we propose Attestor, a stateless transparent proof-of-storage-time scheme with simple proofs and efficient publicly output verification without using trapdoors and incurring any extra overheads in the setup phase. We design our PoST protocol, Attestor, using: a standard VDF scheme with proof aggregation.

Poster: Query-efficient Black-box Attack for Image Forgery Localization via Reinforcement Learning
  • Xianbo Mo
  • Shunquan Tan
  • Bin Li
  • Jiwu Huang

Recently, deep learning has been widely used in forensics tools to detect and localize forgery images. However, its susceptibility to adversarial attacks highlights the need for the exploration of anti-forensics research. To achieve this, we introduce an innovative and query-efficient black-box anti-forensics framework tailored for the generation of adversarial forgery images. This framework is designed to simulate the query dynamics of online forensic services, utilizing a Markov Decision Process formulation within the paradigm of reinforcement learning. We further introduce a novel reward function, which evaluates the efficacy of attacks based on the disjunction between query results and attack targets. To improve the query efficiency of these attacks, an actor-critic algorithm is employed to maximize cumulative rewards. Empirical findings substantiate the efficacy of our proposed methodology. Specifically, it demonstrates pronounced adversarial effects on a range of prevailing image forgery detectors, while ensuring negligible visually perceptible distortions in the resultant anti-forensics images.

Poster: Membership Inference Attacks via Contrastive Learning
  • Depeng Chen
  • Xiao Liu
  • Jie Cui
  • Hong Zhong

Since machine learning model is often trained on a limited data set, the model is trained multiple times on the same data sample, which causes the model to memorize most of the training set data. Membership Inference Attacks (MIAs) exploit this feature to determine whether a data sample is used for training a machine learning model. However, in realistic scenarios, it is difficult for the adversary to obtain enough qualified samples that mark accurate identity information, especially since most samples are non-members in real world applications. To address this limitation, in this paper, we propose a new attack method called CLMIA, which uses unsupervised contrastive learning to train an attack model. Meanwhile, in CLMIA, we require only a small amount of data with known membership status to fine-tune the attack model. We evaluated the performance of the attack using ROC curves showing a higher TPR at low FPR compared to other schemes.

Poster: Ethics of Computer Security and Privacy Research - Trends and Standards from a Data Perspective
  • Kevin Li
  • Zhaohui Wang
  • Ye Wang
  • Bo Luo
  • Fengjun Li

Ethics is an important criterion for security research. This work presents the current status and trends that security researchers have taken to address ethical concerns in their studies from a data perspective. In particular, we created a dataset of 3,756 papers published in three top-tier conferences between 2010 and 2022, among which 963 papers were identified with ethical concerns. With this dataset, we provided answers to three questions regarding the current practices and trends: (1) What is the landscape of ethical considerations in security research? For example, how many security research projects have raised ethical concerns in their studies, and which research areas are likely to cause ethical risks and concerns? (2) What are the current practices to address these ethical risks? And (3) What are the important factors impacting the ethical awareness of researchers?

Poster: RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning
  • Shae McFadden
  • Zeliang Kan
  • Lorenzo Cavallaro
  • Fabio Pierazzi

Intuitively, poisoned machine learning (ML) models may forget their adversarial manipulation via retraining. However, can we quantify the time required for model recovery? From an adversarial perspective, is a small amount of poisoning sufficient to force the defender to retrain significantly more over time?

This poster paper proposes RPAL, a new framework to answer these questions in the context of malware detection. To quantify recovery, we propose two new metrics: intercept, i.e., the first time in which the poisoned model's and vanilla model's performance intercept; recovery rate, i.e., the percentage of time after intercept that the poisoned model's performance is within a tolerance margin which approximates the vanilla model's performance. We conduct experiments on an Android malware dataset (2014-2016), with two feature abstractions based on Drebin and MaMaDroid, with uncertainty-sampling active learning (retraining), and label flipping (poisoning). We utilize the introduced parameter and metrics to demonstrate (i) how the active learning and poisoning rates impact recovery and (ii) that feature representation impacts recovery.

Poster: Combining Fuzzing with Concolic Execution for IoT Firmware Testing
  • Jihyeon Yu
  • Juhwan Kim
  • Yeohoon Yun
  • Joobeom Yun

The supply of IoT devices is increasing year by year. Even in industries that demand sophistication, such as unmanned driving, construction, and robotics industry, IoT devices are being utilized. However, the security of IoT devices is lagging behind this development due to their diverse types and challenging firmware execution environments. The existing methods, such as direct device connectivity or partial emulation, are used to solve this. However, full system emulation is better suited for the large-scale analysis, because it can test many firmwares without requiring devices. Therefore, recent studies have integrated emulation and software testing techniques such as fuzzing, but they are still unsuitable for testing various firmware and inefficient. In this poster, we propose FirmColic, which combines fuzzing with concolic execution to mitigate these limitations. FirmColic is a type of augmented process emulation, which improves the effectiveness of fuzzing using keyword extraction based on concolic execution. Also, we apply five arbitration techniques in an augmented process emulation environment for the high success rates of the emulation. We prove that FirmColic has faster detection, more crash detection, and a higher code coverage than the previous studies.

Poster: Efficient AES-GCM Decryption Under Homomorphic Encryption
  • Ehud Aharoni
  • Nir Drucker
  • Gilad Ezov
  • Eyal Kushnir
  • Hayim Shaul
  • Omri Soceanu

Computation delegation to untrusted third-party while maintaining data confidentiality is possible with homomorphic encryption (HE). However, in many cases, the data was encrypted using another cryptographic scheme such as AES-GCM. Hybrid encryption (a.k.a Transciphering) is a technique that allows moving between cryptosystems, which currently has two main drawbacks: 1) lack of standardization or bad performance of symmetric decryption under FHE; 2) lack of input data integrity.

We report the first implementations of AES-GCM decryption under CKKS, which is the fastest implementation of standardized and commonly used symmetric encryption under homomorphic encryption that also provides integrity. Our solution opens the door to end-to-end implementations such as encrypted deep neural networks while relying on AES-GCM encrypted input.

Poster: Multi-target & Multi-trigger Backdoor Attacks on Graph Neural Networks
  • Jing Xu
  • Stjepan Picek

Recent research has indicated that Graph Neural Networks (GNNs) are vulnerable to backdoor attacks, and existing studies focus on the One-to-One attack where there is a single target triggered by a single backdoor. In this work, we explore two advanced backdoor attacks, i.e., the multi-target and multi-trigger backdoor attacks, on GNNs: 1) One-to-N attack, where there are multiple backdoor targets triggered by controlling different values of the trigger; 2) N-to-One attack, where the attack is only triggered when all the N triggers are present. The initial experimental results illustrate that both attacks can achieve a high attack success rate (up to 99.72%) on GNNs for the node classification task.

Poster: Longitudinal Analysis of DoS Attacks
  • Fabian Kaiser
  • Haya Shulman
  • Michael Waidner

Denial-of-Service (DoS) attacks have become a regular occurrence in the digital world of today. Easy-to-use attack software via download and botnet services that can be rented cheaply in the darknet enable adversaries to conduct such attacks without requiring a comprehensive knowledge of the techniques.

To investigate this threat, we conduct a study on DoS attacks that occurred between 1 January 2015 and 31 December 2022. We gather statistics regarding the victims and on how the attacks were conducted. Furthermore, we show possible side effects of such attacks on critical Internet infrastructure.

This study provides interesting insights as well as observations and is useful for researchers and experts for developing defenses to mitigate DoS attacks. We therefore make our dataset publicly available.

Poster: The Risk of Insufficient Isolation of Database Transactions in Web Applications
  • Simon Koch
  • Malte Wessels
  • David Klein
  • Martin Johns

Web applications utilizing databases for persistence frequently expose security flaws due to race conditions. The commonly accepted remedy to this problem is to envelope related database operations in transactions. Unfortunately, sole trust in transactions to isolate competing sets of database interactions is often misplaced. While the precise isolation properties of transactions depend on the configuration of the database management system (DBMS), the default configuration of common DBMS exposes transactions to anomalies that render their protection worthless.

We give a comprehensive overview on the behavior of common DBMSes with respect to transactions and show that their default settings are insufficient to provide comprehensive protection. Furthermore we conduct a preliminary study on how commonly transactions and isolation configuration adjustments are deployed across 4.222 open source PHP applications that use SQL, finding 2.789 transactions and only 418 isolation adjustments indicators.

Our findings indicate that race conditions are an underappreciated vulnerability class and adjustments are too rare to for transactions to reliably provide sufficient protection.

Poster: Privacy Risks from Misconfigured Android Content Providers
  • Christopher Lenk
  • Johannes Kinder

Android applications record and process personal user data, and they can share it among each other throughcontent providers. While the access is protected through multiple mechanisms, unintentional misconfigurations can allow an attacker to access or modify private application data. In this work, we study how content providers protect private data in a systematic study on 14.4 million Android apps. We identify potentially vulnerable apps by using static analysis to successively reduce the set of target apps. Using a custom attack app, we can confirm data leakage in practice and successfully access privacy-sensitive information. We conclude that this points to an inherent problem in designing secure Android applications and discuss possible mitigations.

Poster: Bridging Trust Gaps: Data Usage Transparency in Federated Data Ecosystems
  • Johannes Lohmöller
  • Eduard Vlad
  • Markus Dahlmanns
  • Klaus Wehrle

The evolving landscape of data ecosystems (DEs) increasingly demands integrated and collaborative data-sharing mechanisms that simultaneously ensure data sovereignty. However, recently proposed federated platforms, e.g., Gaia-X, only offer a promising solution to share data among already trusted participants-they still lack features to establish and maintain trust. To address this issue, we propose transparency logs for data usage that retrospectively build trust among participants. Inspired by certificate transparency logs that successfully bridge trust gaps in PKIs, we equip data owners with credible evidence of data usage. We show that our transparency logs for data usage are well scalable to sizable DEs. Thus, they are a promising approach to bridge trust gaps in federated DEs with cryptographic guarantees, fostering more robust data sharing.

Poster: Panacea --- Stateless and Non-Interactive Oblivious RAM
  • Kelong Cong
  • Debajyoti Das
  • Georgio Nicolas
  • Jeongeun Park

Oblivious RAM (ORAM) allows a client to outsource database storage to a remote server while hiding the data access pattern. Existing designs use non-linear data structures (e.g., trees or hierarchical structures) and follow a online-offline paradigm. Clients submit their queries in the online phase and then the queries are ''flushed'' in the offline (eviction) phase. Such designs are interactive, requiring more than one round of client-server communication, be it during the online, offline, or both phases. Moreover, the client has to maintain an internal state which depends on the database state.

We present Panacea: a novel design of ORAM based on FHE techniques, that uses a linear data-structure while being stateless, non-interactive, achieving O(1) bandwidth-blowup, and does not need an offline phase. In our design, servers are assumed to be much more resourceful than clients, which is typically the case in the cloud computing landscape. In that sense, we offload all the computational overhead to the server, and only ask our clients to perform encryption and decryption. In terms of interaction, this new paradigm is almost identical to those of any plaintext cloud storage solution. This allows ORAM to be a privacy enhancing drop-in solution for remote storage services such as storage buckets, password managers, etc. Building on top of our simple design, we show how to boost the server performance by three orders of magnitude in the amortized setting using probabilistic batch codes.

Poster: Backdoor Attack on Extreme Learning Machines
  • Behrad Tajalli
  • Gorka Abad
  • Stjepan Picek

Deep neural networks (DNNs) achieve top performance through costly training on large datasets. Such resources may not be available in some scenarios, like IoT or healthcare. Extreme learning machines (ELMs) aim to alleviate this problem using single-layered networks, requiring fewer training resources. Current investigations have found that DNNs are prone to security and privacy threats, where malfunction of the network or training data extraction can be performed.

Due to the increasing attention to ELMs and their lack of security investigations, we research the security implications of this type of network. Precisely, we investigate backdoor attacks in ELMs. We created a comprehensive experimental setup to evaluate their security in various datasets and scenarios. We conclude that ELMs are vulnerable to backdoor attacks with up to 97% attack success rate. Additionally, we adapt and evaluate the usage fine-pruning to ELMs.

Poster: Accountable Processing of Reported Street Problems
  • Roman Matzutt
  • Jan Pennekamp
  • Klaus Wehrle

Municipalities increasingly depend on citizens to file digital reports about issues such as potholes or illegal trash dumps to improve their response time. However, the responsible authorities may be incentivized to ignore certain reports, e.g., when addressing them inflicts high costs. In this work, we explore the applicability of blockchain technology to hold authorities accountable regarding filed reports. Our initial assessment indicates that our approach can be extended to benefit citizens and authorities in the future.

Poster: WIP: Account ZK-Rollups from Sumcheck Arguments
  • Rex Fernando
  • Arnab Roy

Traditional blockchains execute transactions through the use of consensus mechanisms that guarantee reasonable expectations of integrity and finality. However, often these incur costs making the throughput of transactions orders of magnitude slower than their web2 counterparts. To address this drawback, so called L2 layers offload transactions from the main chain, called L1, execute these fast and anchor back the result of these transaction through succinct checkpoints. ZK-Rollups are emerging as compelling methods to establish the integrity of such checkpoints, by the use of compressed cryptographic proofs.

However, the use of general purpose SNARKs still suffers from efficiency issues. We define a simple, but ubiquitous, account integrity constraint defined over a series of transactions and design a ZK-Rollup for transaction integrity using sumcheck arguments that exploits its simple structure and provides attractive efficiency benefits.

Poster: Signer Discretion is Advised: On the Insecurity of Vitalik's Threshold Hash-based Signatures
  • Mario Yaksetig
  • Alexander Havlin

We show that the Lamport threshold signature scheme proposed by Vitalik Buterin is not existentially unforgeable under chosen message attacks (EU-CMA). In this work, we formalize the proposed threshold hash-based signature scheme, and show an attack that results in a 60-bit security reduction. Our attack completes in seconds in a setting with a single malicious adversary (the leader of a consensus round), thus contradicting the claim that even with 96 malicious colluding participants (out of a total of 256), an adversary can only make a signature for approximately 1 in 280 possible values. In summary, the original estimated security analysis of the proposed threshold signature scheme claimed security against an adversary in control of approximately a year of continuous work from the entire bitcoin network. Our attack, however, runs in seconds using a commodity laptop.

Poster: Longitudinal Measurement of the Adoption Dynamics in Apple's Privacy Label Ecosystem
  • David G. Balash
  • Mir Masood Ali
  • Monica Kodwani
  • Xiaoyuan Wu
  • Chris Kanich
  • Adam J. Aviv

This work reports on a large scale, longitudinal analysis of the adoption dynamics of privacy labels in the iOS App Store, measuring this first-of-its kind ecosystem as it reaches maturity over two and a half years after launching in December 2020. The motivation is to shed light on the factors affecting the shifts in privacy labels and provide insights into how and when an app's label changes. By collecting nearly weekly snapshots of over 1.6 million apps for over a year, we analyze the dynamics of privacy label adoption and the accuracy of reported labels. Our analysis of 74.5% of apps having labels after two years provides important context into this mature ecosystem where labels are becoming the standard. However, we find compelling evidence that labels may not fully capture behavior, as 28.9% of apps indicate no data collection and distributions differ between voluntary versus mandatory adoptions. Once set, labels rarely change but additions reflect more data collection. In addition to our measurement, we also plan to release a new (and growing) data set that can be used by future researchers.

Poster: Towards a Dataset for the Discrimination between Warranted and Unwarranted Emails
  • Eric Burton Samuel Martin
  • Hossein Shirazi
  • Indrakshi Ray

In this research, the prevailing issue we address is the over-generalized perspective of spam/ham (non-spam) classification. Despite the intricacies of spam classification, reliance on user feedback may inadvertently skew filters to misclassify legitimate and malicious emails, as users are prone to flag innocuous commercial mail as spam rather than unsubscribing. Current spam datasets have a propensity to include such user-flagged spam which can lead to further misclassification, leading to filters biased against warranted commercial correspondence. Motivated to address this concern, we introduce two new classification categories that delve deeper into the nuances of spam. 'Warranted spam', refers to consensual communications, from a credible source with transparent and safe opt-out mechanisms, and 'unwarranted spam' describes unsolicited messages, often of a malicious nature. Utilizing these classifications, we propose an innovative and dynamic 'warranted spam' dataset that seeks to pave the way for researchers to develop more sophisticated spam filtering techniques. Furthermore, our study delves into pioneering machine learning and natural language processing approaches, harnessing our dataset's potential. The overarching aspiration of our work is to augment online safety, preserve brand integrity, and optimize both the user experience and the efficacy of email marketing campaigns.

Poster: Cybersecurity Usage in the Wild: A look at Deployment Challenges in Intrusion Detection and Alert Handling
  • Wyatt Sweat
  • Danfeng (Daphne) Yao

We examine the challenges cybersecurity practitioners face during their daily activities, employing a survey and semi-directed interview for data gathering. Practitioners report on the frequency and level of threats as well as other factors like burnout. These factors are observed to vary with organization size and field (e.g. Medical, E-commerce).

Poster: Towards Lightweight TEE-Assisted MPC
  • Wentao Dong
  • Cong Wang

This work presents HPCG (short for hardware-assisted pseudorandom correlation generator), a work-in-progress lightweight TEE (LTEE)-assisted MPC solution for both high performance and strong security. HPCG relies on a succinct codebase and small LTEE chips that work only in the MPC offline phase, with the aim of addressing efficiency bottlenecks in traditional MPC, while minimizing the use and trust in secure hardware, which makes a rational compromise between pure cryptography and TEE techniques. We design HPCG to work for diverse MPC settings in the preprocessing model and conform to the mainstream secret-sharing semantics, making it easy to deploy and integrate into existing MPC practices.

Poster: Fooling XAI with Explanation-Aware Backdoors
  • Maximilian Noppel
  • Christian Wressnegger

The overabundance of learnable parameters in recent machine-learning models renders them inscrutable. Even their developers can not explain their exact inner workings anymore. For this reason, researchers have developed explanation algorithms to shed light on a model's decision-making process. Explanations identify the deciding factors for a model's decision. Therefore, much hope is set in explanations to solve problems like biases, spurious correlations, and more prominently attacks like neural backdoors.

In this paper, we present explanation-aware backdoors, which fool both, the model's decisions and the explanation algorithm in the presence of a trigger. Explanation-aware backdoors therefore can bypass explanation-based detection techniques and "throw a red herring" at the human analyst. While we have presented successful explanation-aware backdoors in our original work, "Disguising Attacks with Explanation-Aware Backdoors," in this paper, we provide a brief overview and a focus on the dataset "German Traffic Sign Recognition Benchmark" (GTSRB). We evaluate a different trigger and target explanation compared to the original paper and present results for GradCAM explanations. Supplemental material is publicly available at https://intellisec.de/research/xai-backdoor.

Poster: Metadata-private Messaging without Coordination
  • Peipei Jiang
  • Qian Wang
  • Yihao Wu
  • Cong Wang

Metadata-private messaging (MPM) refers to an end-to-end encrypted messaging system that protects not just the payload messages but also the privacy-revealing communication metadata, such as user identities, conversation frequencies, traffic volumes, etc. Protecting the communication metadata is challenging due to the existence of global adversaries that can monitor and even actively interfere with the traffic. Established systems like Tor are not adequate under such adversarial models. Thus, many academic systems have been proposed to push this frontier with different trade-offs among security, performance, and trust assumptions. Despite progress, one major limitation prevalent in almost all prior art is the requirement for messaging buddies to coordinate the time (also known as "dialing'') to start the conversation. Compared to traditional messaging systems, such coordination protocols, which must also be metadata private, are expensive for both user adoption and service operations. In this ongoing study, we propose to develop a new MPM system without coordination. Unlike prior art, we plan to model the MPM system into two separate modules: metadata-private notifications and metadata-private message retrieval, which is intuitively inspired by traditional messaging systems. We will instantiate these ideas by drawing insights from recent work about private signaling, oblivious message retrieval, and MPM under hardware trust.

Poster: Control-Flow Integrity in Low-end Embedded Devices
  • Sashidhar Jakkamsetti
  • Youngil Kim
  • Andrew Searles
  • Gene Tsudik

Embedded, smart, and IoT devices are increasingly popular in numerous everyday settings. Since lower-end devices have the most strict cost constraints, they tend to have few, if any, security features. This makes them attractive targets for exploits and malware.

Prior research proposed various security architectures for enforcing security properties for resource-constrained devices, e.g., via Remote Attestation (ℜA). Such techniques can (statically) verify software integrity of a remote device and detect compromise. However, run-time (dynamic) security, e.g., via Control-Flow Integrity (CFI), is hard to achieve.

This work constructs an architecture that ensures integrity of software execution against run-time attacks, such as Return-Oriented Programming (ROP). It is built atop a recently proposed CASU - a low-cost active Root-of-Trust (RoT) that guarantees software immutability. We extend CASU to support a shadow stack and a CFI monitor to mitigate run-time attacks. This gives some confidence that CFI can indeed be attained even on low-end devices, with minimal hardware overhead.

Poster: Generic Multidimensional Linear Cryptanalysis of Feistel Ciphers
  • Betül Askin Özdemir
  • Tim Beyne

This poster presents new generic attacks on Feistel ciphers that incorporate the key addition at the input of the round function only. This feature leads to a specific vulnerability that can be exploited using multidimensional linear cryptanalysis. More specifically, our approach involves using key-independent linear trails so that the distribution of a combination of the plaintext and ciphertext can be computed, making it possible to use the likelihood-ratio test as a distinguisher. We provide theoretical estimates of the cost of our generic attacks, and verify these experimentally by applying the attacks to CAST-128 and LOKI91. The theoretical and experimental findings demonstrate that the proposed attacks lead to significant reductions in data or time complexity in several interesting cases.

Poster: Secure and Differentially Private kth Ranked Element
  • Gowri R Chandran
  • Philipp-Florens Lehwalder
  • Leandro Rometsch
  • Thomas Schneider

The problem of finding the kth Ranked Element (KRE) is of particular interest in collaborative studies for financial and medical agencies alike. Many of the applications of KRE deal with sensitive information that needs to be protected. The protocol by Chandran et al. (SECRYPT'22) considers a model where multiple parties hold datasets with many elements and wish to compute the kth element of their joint dataset. In their model, all participating parties interact with a central party in a star network topology. However, they leak some intermediate information to the central party.

In this work we use differential privacy techniques to hide this leakage. We use the Laplace mechanism for introducing differentially private noise and use sigmoid scaling to improve the accuracy of the protocol. We show that our modifications have only a small impact on the accuracy. We also give experimental performance results and compare our work to the previous works on KRE.

Poster: Towards Practical Brainwave-based User Authentication
  • Matin Fallahi
  • Patricia Arias-Cabarcos
  • Thorsten Strufe

Brainwave measuring devices have transitioned from specialized medical tools to user-friendly and economically accessible consumer products. This shift has opened new avenues for pervasive services, with applications spanning brain-computer interfaces (BCIs), disease detection, criminal trials, and, notably, authentication in computer security. Electroencephalography (EEG) signals, being difficult to steal and revocable, present an attractive biometric option. However, the practical deployment of these signals is hindered by security threats, usability issues, and privacy concerns. To this end, we expect to improve the overall performance of authentication systems using consumer-grade devices, gain a better understanding of user attitudes toward this type of authentication, and protect the user's privacy against unauthorized use of samples collected during enrollment and verification.

Poster: A Privacy-Preserving Smart Contract Vulnerability Detection Framework for Permissioned Blockchain
  • Wensheng Tian
  • Lei Zhang
  • Shuangxi Chen
  • Hu Wang
  • Xiao Luo

The two main types of blockchains that are currently widely deployed are public blockchains and permissioned blockchains. The research that has been conducted for blockchain vulnerability detection is mainly oriented to public blockchains. Less consideration is given to the unique requirements of the permissioned blockchains, which cannot be directly migrated to the application scenarios of the permissioned blockchains. The permissioned blockchain is deployed between verified organizations, and its smart contracts may contain sensitive information such as the transaction flow of the contracts, transaction algorithms, etc. The sensitive information can be considered as the private information of the smart contracts themselves, which should be kept confidential to users outside the blockchain. In this paper, a privacy-preserving smart contract vulnerability detection framework is proposed. The framework leverages blockchain and confidential computing technologies to enable vulnerability detection in permissioned blockchain smart contracts while protecting the privacy of smart contracts. The framework is also able to protect the interests of vulnerability detection model owners. We experimentally validate the detection performance of our framework in a confidential computing environment.

Poster: The Unknown Unknown: Cybersecurity Threats of Shadow IT in Higher Education
  • Jan-Philip van Acken
  • Joost F. Gadellaa
  • Slinger Jansen
  • Katsiaryna Labunets

The growing number of employee-introduced IT solutions creates new attack vectors and challenges for cybersecurity management and IT administrators. These unauthorised hardware, software, or services are called shadow IT. In higher education, the diversity of the shadow IT landscape is even more prominent due to the flexible needs of researchers, educators, and students.

We studied shadow IT and related cyber threats in higher education via interviews with 11 IT and security experts. Our results provide a comprehensive overview of observed shadow IT types and related cyber threats. The findings revealed prevalent cloud and self-acquired software use as common shadow IT, with cybersecurity risks resulting from outdated software and visibility gaps. Our findings led to advice for practitioners: manage shadow IT responsibly with cybersecurity best practices, consider stakeholder needs, support educators and researchers, and offer usable IT solutions.

Poster: Detecting Adversarial Examples Hidden under Watermark Perturbation via Usable Information Theory
  • Ziming Zhao
  • Zhaoxuan Li
  • Tingting Li
  • Zhuoxue Song
  • Fan Zhang
  • Rui Zhang

Image watermark is a technique widely used for copyright protection. Recent studies show that the image watermark can be added to the clear image as a kind of noise to realize fooling deep learning models. However, previous adversarial example (AE) detection schemes tend to be ineffective since the watermark logo differs from typical noise perturbations. In this poster, we propose Themis, a novel AE detection method against watermark perturbation. Different from prior methods, Themis neither modifies the protected classifier nor requires knowledge of the process for generating AEs. Specifically, Themis leverages usable information theory to calculate the pointwise score, thereby discovering those instances that may be watermark AEs. The empirical evaluations involving 5 different logo watermark perturbations demonstrate the proposed scheme can efficiently detect AEs, and significantly (over 15% accuracy) outperforms five state-of-the-art (SOTA) detection methods. The visualization results display our detection metric is more distinguishable between AEs and non-AEs. Meanwhile, Themis realizes a larger Area Under Curve (AUC) in a threshold-resilient manner, while only introducing ∼0.04s overhead.

Poster: Unveiling the Impact of Patch Placement: Adversarial Patch Attacks on Monocular Depth Estimation
  • Gyungeun Yun
  • Kyungho Joo
  • Wonsuk Choi
  • Dong Hoon Lee

For autonomous driving systems, cameras and LiDAR sensors are necessary devices that provide precise depth information by which positions and sizes of objects can be identified. Moreover, recent advances in deep learning have extended their capabilities to include monocular camera setup for depth estimation. Compared with the conventional devices like LiDAR or stereo cameras for the depth estimation, the monocular camera enables to estimate depths with a low cost. It is known that the depth estimation models for the monocular camera are vulnerable to adversarial examples. However, most adversarial attacks against the monocular depth estimation have been conducted with targeted patches that are placed on a target object. It is known that the targeted patch outperforms the adjacent and remote patch that is placed beyond the target object, when it comes to an attack success rate. However, the adjacent and remote patch would provide high flexibility in patch placement, as it can be placed beyond the target object's scope. In this paper, we experimentally confirm that the patch placement significantly affects the attack success rates, particularly in specific regions.

Poster: Verifiable Data Valuation with Strong Fairness in Horizontal Federated Learning
  • Ruei-Hau Hsu
  • Hsuan-Cheng Su
  • Yi-An Yu

Federated learning (FL) represents an innovative decentralized paradigm in the field of machine learning, which differs from traditional centralized approaches. It facilitates collaborative model training among multiple participants and transfers only model parameters without directly exchanging raw data to maintain confidentiality. Data valuation for each data provider becomes a critical issue to guarantee the fairness of federated learning by estimating the dataset quality of each data provider based on the contribution to the global model prediction performance. To value datasets in FL, the concept of Shapley value is introduced to estimate the contribution of each dataset to a trained global model by measuring the effects of including and excluding a local model parameter in various combinations of global model parameters. However, the contribution measurement to each dataset performed by an aggregator or certain central component as a verifier becomes irrational as the verifier is under the control of an organization. Thus, this work presents a contribution measurement framework or data valuation with strong fairness, where forged results from the contribution measurement procedure are impossible. The new framework allows every participant (data provider) to verify the results of contribution measurement.

SESSION: Session 61: Workshops

WPES '23: 22nd Workshop on Privacy in the Electronic Society
  • Bart P. Knijnenburg
  • Panagiotis Papadimitratos

These proceedings contain the papers selected for inclusion in the technical program for the 22st ACM Workshop on Privacy in the Electronic Society (ACM WPES 2023), held in conjunction with the 30th ACM Conference on Computer and Communication Security (ACM CCS 2023) at the Tivoli Congress Center in Copenhagen, Denmark, on November 26, 2023. In response to the workshop's call for papers, 31 valid submissions were received, including 21 full paper submissions and 10 short paper submissions. They were evaluated by a technical program committee consisting of 54 researchers whose backgrounds include a diverse set of topics related to privacy. Each paper was reviewed by at least 3 members of the program committee. Papers were evaluated based on their importance, novelty, and technical quality. After the rigorous review process, 9 submissions were accepted as full papers (acceptance rate: 29.0%) and an additional 8 submissions were accepted as short papers.

CPSIoTSec'23: Fifth Workshop on CPS & IoT Security and Privacy
  • Magnus Almgren
  • Earlence Fernandes

The fifth Workshop on CPS & IoT Security and Privacy is set to take place in Copenhagen, Denmark, on November 26, 2023, in conjunction with the ACM Conference on Computer and Communications Security (CCS'23). This workshop marks the amalgamation of two workshops held in 2019: one focused on the security and privacy of cyber-physical systems, while the other one centered on the security and privacy of IoT. The primary objective of this workshop is to create a collaborative forum that brings together academia, industry experts, and governmental entities, encouraging them to contribute cutting-edge research, share demonstrations or hands-on experiences, and engage in discussions.

This year, our call for contributions encompassed a broad spectrum, including mature research papers, work in progress submissions, and Systematization of Knowledge papers. The workshop program includes five full-length papers on the security and privacy of CPS/IoT, alongside five shorter papers that present original work-in-progress. Furthermore, the workshop will feature two distinguished keynote presentation, offering insights into the field, and a demonstration to provide a practical dimension to the discussions. The complete CPSIoTSec'23 workshop proceedings are available at: https://dl.acm.org/doi/proceedings/10.1145/3605758

WAHC '23: 11th Workshop on Encrypted Computing & Applied Homomorphic Cryptography
  • Michael Brenner
  • Anamaria Costache
  • Kurt Rohloff

The 11th Workshop on Encrypted Computing and Applied Homomorphic Cryptography is held in Copenhagen, Denmark on Novem- ber 26, 2023, co-located with the ACM Conference on Computer and Communications Security (CCS). The workshop aims to bring together professionals, researchers and practitioners from academia, industry and government in the area of computer security and applied cryptography with an interest in practical applications of homomorphic encryption, encrypted computing, functional encryption and secure function evaluation, private information retrieval and searchable encryption. The workshop will feature 9 exciting accepted talks on different aspects of secure computation and a forum to discuss current and future challenges. Additionally, the workshop will feature one keynote presentation, as well as one invited talk.

MTD '23: 10th ACM Workshop on Moving Target Defense
  • Ning Zhang
  • Qi Li

The tenth ACM Workshop on Moving Target Defense (MTD) is held on November 26, 2023, in conjunction with the ACM Conference on Computer and Communications Security (CCS). The main objective of the workshop is to discuss novel randomization, diversification, and dynamism techniques for computer systems and network, new metric and analysis frameworks to assess and quantify the effectiveness of MTD, and discuss challenges and opportunities that such defenses provide. We have constructed an exciting and diverse program of six refereed papers, and two invited keynote talks that will provide the participants with a vibrant and thought-provoking set of ideas and insights.

SaTS'23: The 1st ACM Workshop on Secure and Trustworthy Superapps
  • Zhiqiang Lin
  • Xiaojing Liao

The paradigm of mobile computing has shifted with the rise of mobile super apps, encompassing diverse services within single applications. These apps, featuring "miniapps," have gained popularity for their native app-like features and comprehensive ecosystems. However, this popularity has led to significant concerns about user data security and privacy. The Workshop on Secure and Trustworthy Superapps (SaTS 2023), co-hosted with ACM CCS 2023, addresses these challenges. As super apps become essential for communication, entertainment, and commerce, the workshop fosters collaboration among researchers and practitioners. By tackling these concerns, the event aims to provide insights and solutions benefiting the security community, industry, and society. SaTS 2023 aims to illuminate these issues while promoting knowledge exchange and innovative problem-solving.

CCSW '23: Cloud Computing Security Workshop
  • Francesco Regazzoni
  • Apostolos Fournaris

Clouds and massive-scale computing infrastructures are starting to dominate computing and will likely continue to do so for the foreseeable future. Major cloud operators are now comprising millions of cores hosting substantial fractions of corporate and government IT infrastructure. CCSW is the world's premier forum bringing together researchers and practitioners in all security aspects of cloud-centric and outsourced computing, including:

  • Side channel attacks
  • Cryptographic protocols for cloud security
  • Secure cloud resource virtualization mechanisms
  • Secure data management outsourcing (e.g., database as a service)
  • Privacy and integrity mechanisms for outsourcing
  • Foundations of cloud-centric threat models
  • Secure computation outsourcing
  • Remote attestation mechanisms in clouds
  • Sandboxing and VM-based enforcements
  • Trust and policy management in clouds
  • Secure identity management mechanisms
  • Cloud-aware web service security paradigms and mechanisms
  • Cloud-centric regulatory compliance issues and mechanisms
  • Business and security risk models and clouds
  • Cost and usability models and their interaction with security in clouds
  • Scalability of security in global-size clouds
  • Binary analysis of software for remote attestation and cloud protection
  • Network security (DOS, IDS etc.) mechanisms for cloud contexts
  • Security for emerging cloud programming models
  • Energy/cost/efficiency of security in clouds
  • mOpen hardware for cloud

Machine learning for cloud protection CCSW especially encourages novel paradigms and controversial ideas that are not on the above list. The workshop has historically acted as a fertile ground for creative debate and interaction in security-sensitive areas of computing impacted by clouds. This year marked the 14th anniversary of CCSW. In the past decade, CCSW has had a significant impact in our research community.

PLAS: The 18th Workshop on Programming Languages and Analysis for Security
  • Fraser Brown
  • Klaus v. Gleissenthall

PLAS provides a forum for exploring and evaluating the use of programming language and program analysis techniques for promoting security in the complete range of software systems, from compilers to machine-learned models and smart contracts. The workshop encourages proposals of new, speculative ideas, evaluations of new or known techniques in practical settings, and discussions of emerging threats and problems. We also host position papers that are radical, forward-looking, and lead to lively and insightful discussions influential to future research at the intersection of programming languages and security. This year will mark the 18th iteration of PLAS, which was first held in 2007 in San Diego. We expect an exciting program and many interesting discussions.

DeFi '23: Workshop on Decentralized Finance and Security
  • Kaihua Qin
  • Fan Zhang

Decentralized Finance (DeFi) heralds a transformative moment in the realm of finance, challenging traditional intermediaries with a blockchain-centric blueprint. As DeFi burgeons, the intricate dance between its evolution and security emerges as an area of pivotal significance. This workshop navigates the multifaceted landscape of DeFi, where inherent challenges intertwine with new vulnerabilities, emphasizing the necessity for vigilant evaluations and adaptive measures to ensure the integrity of the ecosystem. It further delves into the ripple effects of regulatory scrutiny and its subsequent influence on DeFi's security matrix. As we stand on the cusp of uncharted territories, the workshop aims to provide a comprehensive discourse on DeFi's security challenges, fortified by interdisciplinary expertise, inviting participants to explore, ideate, and collaboratively forge a path towards a robust and secure DeFi paradigm.

ARTMAN '23: First Workshop on Recent Advances in Resilient and Trustworthy ML Systems in Autonomous Networks
  • Gregory Blanc
  • Takeshi Takahashi
  • Zonghua Zhang

The increasing integration of machine learning (ML) approaches into the operation and management (O&M) of modern networks has led researchers to address various problems such as performance optimization, anomaly detection, traffic prediction, root-cause analysis and incident troubleshooting. Autonomous networks leverage the wealth of both business and operations data to achieve fully intelligent and automated O&M for various telecommunications applications. However, their high level of service requires the closest scrutiny as such applications depend on their resilience and trustworthiness, especially in the face of motivated attackers that aim at abusing their underlying ML models. This workshop fosters the close collaboration between researchers and practitioners at the intersection of security, networks and ML communities to improve the security of ML applications in autonomous networks together.

ASHES '23: Workshop on Attacks and Solutions in Hardware Security
  • Lejla Batina
  • Chip Hong Chang
  • Domenic Forte
  • Ulrich Rührmair

The workshop on "Attacks and Solutions in HardwarE Security (ASHES)" welcomes any theoretical and practical works on hardware security, including attacks, solutions, countermeasures, proofs, classification, formalization, and implementations. Besides mainstream research, ASHES puts some focus on new and emerging scenarios: This includes the Internet of Things (IoT), nuclear weapons inspections, arms control, consumer and infrastructure security, or supply chain security, among others. ASHES also welcomes works on special purpose hardware, such as lightweight, low-cost, and energy-efficient devices, or non-electronic security systems.

The workshop hosts four different paper categories: Apart from regular and short papers, this includes works that systematize and structure a certain (sub-)area (so-called "Systematization of Knowledge" (SoK) papers), and so-termed "Wild-and-Crazy" (WaC) papers, which distribute seminal ideas at an early conceptual stage. This summary gives a brief overview of the sixth edition of the workshop, which took place on November 30, 2023 in Copenhagen, Denmark, as a post-conference satellite workshop of ACM CCS.

AISec '23: 16th ACM Workshop on Artificial Intelligence and Security
  • Maura Pintor
  • Florian Simon Tramèr
  • Xinyun Chen

The use of Artificial Intelligence (AI) and Machine Learning (ML) has been the center of the most outstanding advancements in the last years. The ability to analyze considerable streams of data in real time makes these technologies the most promising tool in many domains, including cybersecurity. As an outstanding example, ML can be used for identifying malware because of its ability to detect patterns otherwise difficult to see for humans and hard-coded rules. As malware continues to evolve, ML will become increasingly important for keeping up with the latest threats. However, the use of AI and ML in security-relevant domains raised rightful concerns about their trustworthiness and robustness, especially in front of adaptive attackers. Additionally, privacy threats are now emerging as a crucial aspect and need proper testing and possibly mitigation to prevent data stealing and leakage of sensitive information. The AISec workshop provides a venue for presenting and discussing new developments in the intersection of security and privacy with AI and ML. The complete AISec'23 workshop proceedings are available at: https://dl.acm.org/doi/proceedings/10.1145/3576915.3624029.

Tutorial-HEPack4ML '23: Advanced HE Packing Methods with Applications to ML
  • Ehud Aharoni
  • Nir Drucker
  • Hayim Shaul

Outsourcing computations over sensitive data to a third-party cloud environment should often rely on dedicated privacy-preserving solutions in order to adhere to privacy regulations such as the GDPR [7]. One solution that gained great attention is fully homomorphic encryption (FHE), a cryptographic method that allows performing different types of computation on encrypted data. Still, writing a non-interactive FHE code that evaluates complex functions is a task that is mostly left to experts. Otherwise, the resulted code may become very slow and even impractical.

Tile tensor is a recent data structure that comes together with a dedicated language that aims to simplify the process of writing complex FHE programs. This tutorial introduces developers of security solutions without previous FHE background to the world of FHE programming through using tile tensors. It provides step-by-step guidelines for implementing complex operators such as matrix-multiplication and convolutions, and eventually guides the audience toward writing their own privacy-preserving convolutional neural network solution. The demonstrations in this tutorial use Python and the IBM HElayers[1] library that implements tile tensors.

SCORED '23: Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses
  • Marcela Melara
  • Santiago Torres-Arias
  • Laurent Simon

Recent attacks on the software supply chain have shed light on the fragility and importance of ensuring the security and integrity of this vital ecosystem. Addressing the technical and social challenges to building trustworthy software for deployment in sensitive and/or large-scale enterprise or governmental settings requires innovative solutions and an interdisciplinary approach. The Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED) is a venue that brings together industry practitioners, academics, and policymakers to present and discuss security vulnerabilities, novel defenses against attacks, project demos, adoption requirements and best practices in the software supply chain. The complete SCORED'23 workshop proceedings are available at: https://dl.acm.org/doi/proceedings/10.1145/3576915

SESSION: Session 62: Demos

Demo: Certified Robustness on Toolformer
  • Yue Xu
  • Wenjie Wang

Tool-augmented language models (TALMs) overcome the limitations of current language models (LMs), allowing them to leverage external tools to enhance performance. One state-of-the-art example is Toolformer introduced by Meta AI Research, which achieves a broader integration of tool utilization. However, Toolformer faces particular concerns related to the robustness of its predictions in the optimal positioning for API calls. Adversarial perturbations can alter the position of API calls chosen by Toolformer, thus resulting in responses that are not only incorrect but potentially even less accurate than those generated by standard language models. To improve the robustness of Toolformer and fulfill the capability of its toolbox, our focus lies on addressing the potential vulnerabilities that arise from small perturbations in the input or prompt space. To achieve this goal, we plan to study adversarial attacks from both attackers' and defenders' perspectives by first studying the adversarial attack algorithms on the input and prompt space, then proposing the certified robustness to the Toolformer API calls scheduling, which is not only empirically effective but also theory-backed.

Demo: Data Minimization and Informed Consent in Administrative Forms
  • Nicolas Anciaux
  • Sabine Frittella
  • Baptiste Joffroy
  • Benjamin Nguyen

This article proposes a demonstration implementing the data minimization privacy principle, focusing on reducing data collected by government administrations through forms. Data minimization is defined in many privacy regulations worldwide, but has not seen extensive real-world application. We propose a model based on logic and game theory and show that it is possible to create a practical and efficient solution for a real French welfare benefit case.

Demo: Image Disguising for Scalable GPU-accelerated Confidential Deep Learning
  • Yuechun Gu
  • Sagar Sharma
  • Keke Chen

Deep learning training involves large training data and expensive model tweaking, for which cloud GPU resources can be a popular option. However, outsourcing data often raises privacy concerns. The challenge is to preserve data and model confidentiality without sacrificing GPU-based scalable training and low-cost client-side preprocessing, which is difficult for conventional cryptographic solutions to achieve. This demonstration shows a new approach, image disguising, represented by recent work: DisguisedNets, NeuraCrypt, and InstaHide, which aim to securely transform training images while still enabling the desired scalability and efficiency. We present an interactive system for visually and comparatively exploring these methods. Users can view disguised images, note low client-side processing costs, and observe the maintained efficiency and model quality during server-side GPU-accelerated training. This demo aids researchers and practitioners in swiftly grasping the advantages and limitations of image-disguising methods.