ASHES'20 PROCEEDINGS

ACM CCS 2020 - November 9-13, 2020

ASHES'20: Proceedings of the 4th ACM Workshop on Attacks and Solutions in Hardware Security

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

The Pursuit of Happiness: Establishing Hardware Root-of-Trust for Cyber Security

Mark M. Tehranipoor

Design, fabrication, assembly, test, and debug of integrated circuits and systems have become distributed across the globe, raising major concerns about their security and trustworthiness. Such systems are prevalent is many critical-mission infrastructures, in which they require long and secure operation life. In this talk, we will provide a detailed overview of the challenges in today's global electronics supply chain, then discuss the need for establishing hardware root of trust (HROT) within cyber domain. Notably, we will present innovative methods to demonstrate authentication and trust verification for Integrated circuits and systems.

Formidable Challenges in Hardware Implementations of Fully Homomorphic Encryption Functions for Applications in Machine Learning

Çetin Kaya Koç

The concept of homomorphic encryption was introduced almost exactly same time as the first public-key cryptographic algorithm RSA, which was multiplicatively homomorphic. Encryption functions with additive and multiplicative homomorphisms allow us (at least in principle) to compute any function homomorphically, and thus are highly desired. Such encryption functions have applications in healthcare, machine learning and national security. Since the work of Craig Gentry [1], there have been several fully homomorphic encryption proposals, however, their time and space requirements do not give way to acceptably efficient implementations in real-world scenarios. The challenge comes from the fact that, while the encryption, decryption and homomorphic operations are simple arithmetic operations (such as polynomial addition and multiplication), the sizes of operands are beyond the usual operand sizes we have been used to in the standard public-key cryptography. For example, the polynomial operands (representing ciphertexts) used in the BGV algorithm [2] are supposed to have up to 16k terms, with each term up to 1k bits. About 1024-bit message is encrypted into one ciphertext that requires several million bits. In this talk, I will present some of formidable algorithmic and architectural challenges facing FHE implementors.

SESSION: Session 1: PUFs and Beyond

SoK: Towards Secret-Free Security

Ulrich Ruhrmair

Digital secret keys are indispensable in modern cryptography and computer security - but at the same time constitute a routinely exploited attack target in every hardware system that stores them. This discrepancy has created perpetual battle between key extractors and protectors over the decades. Some recent approaches attempt to overcome this issue by simply avoiding keys and secrets in vulnerable devices: Physical Unclonable Functions (PUFs), for example, are capable of evading 'classical keys', i.e., permanently digital secrets, in electronic hardware. Nevertheless, many PUFs still contain physical or analog secrets deep in their structure, whose disclosure to adversaries also breaks security: This includes the manufacturing variations in SRAM PUFs that determine their power-up states, or the signal delays of Arbiter PUFs that determine their responses. A second generation of physical primitives shows promise to resolve this remaining problem: So-called Complex PUFs, SIMPLs/ PPUFs, and related techniques enable completely 'secret-free' systems, where adversaries could inspect every bit and every atom, and learn any information present in any form in the hardware, without being able to break security. This Systematization of Knowledge (SoK) paper takes this situation as starting point, and categorizes, formalizes, and overviews the recently evolving area of secret-free security. It tries to lay the foundations for future generations of secret-free hardware, which could be innately and provably immune against any invasive, side channel, or key extraction attacks.

Erasable PUFs: Formal Treatment and Generic Design

Chenglu Jin
Wayne Burleson
Marten van Dijk
Ulrich Rührmair

Physical Unclonable Functions (PUFs) have not only been suggested as new key storage mechanism, but --- in the form of so-called "Strong PUFs'' --- also as cryptographic primitives in advanced schemes, including key exchange, oblivious transfer, or secure multi-party computation. This notably extends their application spectrum, and has led to a sequence of publications at leading venues such as IEEE S&P, CRYPTO, and EUROCRYPT in the past[3,6,10,11,29, 41]. However, one important unresolved problem is that adversaries can break the security of all these advanced protocols if they gain physical access to the employed Strong PUFs after protocol completion [41]. It has been formally proven[49] that this issue cannot be overcome by techniques on the protocol side alone, but requires resolution on the hardware level --- the only fully effective known countermeasure being so-called Erasable PUFs. Building on this work, this paper is the first to describe a generic method how any given silicon Strong PUF with digital CRP-interface can be turned into an Erasable PUFs[36]. We describe how the Strong PUF can be surrounded with a trusted control logic that allows the blocking (or "erasure") of single CRPs. We implement our approach, which we call "GeniePUF", on FPGA, reporting detailed performance data and practicality figures. Furthermore, we develop the first comprehensive definitional framework for Erasable PUFs. Our work so re-establishes the effective usability of Strong PUFs in advanced cryptographic applications, and in the realistic case adversaries get access to the Strong PUF after protocol completion.

SESSION: Session 2: Side Channels: Attacks & Defences

Far Field EM Side-Channel Attack on AES Using Deep Learning

Ruize Wang
Huanyu Wang
Elena Dubrova

We present the first deep learning-based side-channel attack on AES-128 using far field electromagnetic emissions as a side channel. Our neural networks are trained on traces captured from five different Bluetooth devices at five different distances to target and tested on four other Bluetooth devices. We can recover the key from less than 10K traces captured in an office environment at 15 m distance to target even if the measurement for each encryption is taken only once. Previous template attacks required multiple repetitions of the same encryption. For the case of 1K repetitions, we need less than 400 traces on average at 15 m distance to target. This improves the template attack presented at CHES'2020 which requires 5K traces and key enumeration up to 223.

Lightweight Implementation of the LowMC Block Cipher Protected Against Side-Channel Attacks

Javad Bahrami
Viet B. Dang
Abubakr Abdulgadir
Khaled N. Khasawneh
Jens-Peter Kaps
Kris Gaj

LowMC is a parameterizable block cipher developed for use in Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE). In these applications, linear operations are much less expensive in terms of resource utilization compared to the non-linear operations due to their low multiplicative complexity. In this work, we implemented two versions of LowMC -- unrolled and lightweight. Both implementations are realized using RTL VHDL. To the best of our knowledge, we report the first lightweight implementation of LowMC and the first implementation protected against side-channel analysis (SCA). For the SCA protection, we used a hybrid 2/3 shares Threshold Implementation (TI) approach, and for the evaluation, the Test Vector Leakage Assessment (TVLA) method, also known as the T-test. Our unprotected implementations show information leakage at 10K traces, and after protection, they could successfully pass the T-test for 1 million traces. The Xilinx Vivado is used for the synthesis, implementation, functional verification, timing analysis, and programming of the FPGA. The target FPGA family is Artix-7, selected due to its widespread use in multiple applications. Based on our results, the numbers of LUTs are 867 and 3,328 for the lightweight and the unrolled architecture with unrolling factor U = 16, respectively. It takes 14.21 μs for the lightweight architecture and 1.29 μs for the unrolled design with U = 16 to generate one 128-bit block of the ciphertext. The fully unrolled architecture beats the best previous implementation by Kales et al. in terms of the number of LUTs by a factor of 4.5. However, this advantage comes at the cost of having 2.9 higher latency.

Exploring Effect of Residual Electric Charges on Cryptographic Circuits

Mitsuru Shiozaki
Takeshi Sugawara
Takeshi Fujino

Building leakage models is important in designing countermeasures against side-channel attacks (SCAs), and Hamming-weight/distance (HW/HD) models are traditional leakage models. Electromagnetic analysis (EMA) attacks using a tiny EM probe are the most powerful SCAs. Recent studies have reported that EMA attacks can measure SCA leaks not included in the HW/HD models [16,19]. A current-path leak is one such leak, and a mirror circuit was introduced as a countermeasure against it. We experimentally found that a mirror circuit insufficiently hides (decreases) EMA leaks, resulting in residual electric charges (RECs) between stacked transistors leaking secret information. REC leaks are not included in the current-path leakage model as well as the HW/HD leakage models. RECs can carry the history of the gate's state over multiple clock cycles. Therefore, we propose a countermeasure against REC leaks and designed advanced encryption standard-128 (AES-128) circuits using IO-masked dual-rail read-only memory (MDR-ROM) with a 180-nm complementary metal-oxide-semiconductor (CMOS) process. We compared the resilience of our AES-128 circuits against EMA attacks with and without our countermeasure. We also discuss RECs' effect on physically unclonable functions (PUFs). RECs do not make PUFs vulnerable but affect PUF performance. We demonstrate that RECs affect the performance of arbiter PUFs (APUFs) we fabricated with 180- and 40-nm CMOS processes.

SESSION: Session 3: Fault Attacks & Cryptographic Hardware Design

Differential Fault Analysis of NORX

Amit Jana
Dhiman Saha
Goutam Paul

In recent literature, there has been a particular interest in studying nonce-based Authenticated Encryption (AE) schemes in the light of fault-based attacks as they seem to present automatic protection against Differential Fault Attacks (DFA). In this work, we present the first DFA on nonce-based CAESAR scheme NORX (applicable to all the versions v1, v2.0, v3.0). We demonstrate a scenario when faults introduced in NORX in parallel mode can be used to collide the internal branches to produce an all-zero state. We later show how this can be used to replay NORX despite being instantiated by different nonces, messages. Once replayed, we show how the key of NORX can be recovered using secondary faults and using the faulty tags. We use different fault models to showcase the versatility of the attack strategy. A detailed theoretical analysis of the expected number of faults required under various models is also furnished. Under the random bit-flip model, around 1384 faults need to be induced to reduce the key-space from 2128 to 232 while the random byte-flip model requires 332 faults to uniquely identify the key. To the best of our knowledge, this is the first fault attack that uses both internal and classical differentials to mount a DFA on a nonce-based authenticated cipher which is otherwise believed to be immune to DFA.

PRINCE under Differential Fault Attack: Now in 3D

Aikata
Banashri Karmakar
Dhiman Saha

Fault analysis is one of the most studied physical attacks primarily due to the inherent ease of implementation. This work investigates integral and differential fault analysis attacks on the well-known lightweight block-cipher PRINCE. The work begins by identifying new integral properties of PRINCE which are not restricted to be symmetric around the middle rounds. The work also identifies new slow diffusion trails on the cipher. Both properties are exploited to mount practical integral and differential fault attacks on PRINCE that uniquely recover the key. The integral fault attack has a time complexity of 236 and 220 with 15 nibble faults in round 8.5 and 9.5 respectively while the slow diffusion differential fault attack works with 4 bit-faults in the 10th round with a complexity of 222. Finally, the fact that the faults can be injected very close to the middle rounds forms one of the interesting aspects of this work and adds to the state-of-the-art on contemporary results on PRINCE available in the literature. Moreover, a 3-D visualization model of PRINCE state has also been proposed in this work which can be used to extend or improve existing attacks on PRINCE.

Building a Modern TRNG: An Entropy Source Interface for RISC-V

Markku-Juhani O. Saarinen
G. Richard Newell
Ben Marshall

The currently proposed RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic PRNGs into a separate interface, and in its use of polling. We describe the interface, its use in cryptography, and offer additional discussion, background, and rationale for various aspects of it. This design is informed by lessons learned from earlier mainstream ISAs, recently introduced SP 800-90B and FIPS 140-3 entropy audit requirements, AIS 31 and Common Criteria, current and emerging cryptographic needs such as post-quantum cryptography, and the goal of supporting a wide variety of RISC-V implementations and applications. Many of the architectural choices are a result of quantitative observations about random number generators in secure microcontrollers, the Linux kernel, and cryptographic libraries. We further compare the architecture to some contemporary random number generators and describe a minimalistic TRNG reference implementation that uses the Entropy Source together with RISC-V AES instructions.

SESSION: Session 4: Hardware & System Security

SoK: Physical and Logic Testing Techniques for Hardware Trojan Detection

Sree Ranjani Rajendran
Rijoy Mukherjee
Rajat Subhra Chakraborty

Hardware Trojans have emerged as great threat to the trustability of modern electronic systems. A deployed electronic system with one or more undetected Hardware Trojan-infected components can cause grave harm, ranging from personal information loss to destruction of national infrastructure. The inherently surreptitious nature and bewildering variety of Hardware Trojans makes their detection an extremely challenging exercise. In this paper, we explore the state-of-the-art of non-destructive testing for Hardware Trojan detection, with our coverage including both physical measurement based testing, as well as logic testing. We present systematic classification of Hardware Trojans and their detection techniques, and describe these techniques in details, including their stand-out features and strengths and weaknesses. We conclude the paper with an evaluation of the current status of progress, and major directions of future research.

SpectreRewind: Leaking Secrets to Past Instructions

Jacob Fustos
Michael Bechtel
Heechul Yun

Transient execution attacks use microarchitectural covert channels to leak secrets that should not have been accessible during logical program execution. Commonly used micro-architectural covert channels are those that leave lasting footprints in the micro-architectural state, for example, a cache state change, from which the secret is recovered after the transient execution is completed.

In this paper, we present SpectreRewind, a new approach to create and exploit contention-based covert channels for transient execution attacks. In our approach, a covert channel is established by issuing the necessary instructions logically before the transiently executed victim code. Unlike prior contention based covert channels, which require simultaneous multi-threading (SMT), SpectreRewind supports covert channels based on a single hardware thread, making it viable on systems where the attacker cannot utilize SMT. We show that contention on the floating point division unit on commodity processors can be used to create a high-performance (~100 KB/s), low-noise covert channel for transient execution attacks instead of commonly used flush+reload based cache covert channels. We also show that the proposed covert channel works in the JavaScript sandbox environment of a Chrome browser.

WaC: A New Doctrine for Hardware Security

Adam Hastings
Simha Sethumadhavan

In this paper, we promote the idea that recent woes in hardware security are not because of a lack of technical solutions but rather because market forces and incentives prevent those with the ability to fix problems from doing so. At the root of the problem is the fact that hardware security comes at a cost; present issues in hardware security can be seen as the result of the players in the game of hardware security finding ways of avoiding paying this cost. We formulate this idea into a doctrine of security, namely the Doctrine of Shared Burdens. Three cases studies-Rowhammer, Spectre, and Meltdown-are interpreted though the lens of this doctrine. Our doctrine illuminates why these problems exist and what can be done about them.