WAHC '23

Proceedings of the 11th Workshop on Encrypted Computing & Applied Homomorphic Cryptography
Last Update : [26 November, 2023]

SESSION: Session I

GPU Acceleration of High-Precision Homomorphic Computation Utilizing Redundant Representation
  • Shintaro Narisada
  • Hiroki Okada
  • Kazuhide Fukushima
  • Shinsaku Kiyomoto
  • Takashi Nishide

Fully homomorphic encryption (FHE) can perform computations on encrypted data, allowing us to analyze sensitive data without losing its security. The main issue for FHE is its lower performance, especially for high-precision computations, compared to calculations on plaintext data. Making FHE viable for practical use requires both algorithmic improvements and hardware acceleration. Recently, Klemsa and Ö nen (CODASPY'22) presented fast homomorphic algorithms for high-precision integers, including addition, multiplication and some fundamental functions, by utilizing a technique called redundant representation. Their algorithms were applied on TFHE, which was proposed by Chillotti et al. (Asiacrypt'16). In this paper, we further accelerate this method by extending their algorithms to multithreaded environments. The experimental results show that our approach performs 128-bit addition in 0.41 seconds, 32-bit multiplication in 4.3 seconds, and 128-bit Max and ReLU functions in 1.4 seconds using a Tesla V100S server.

Falkor: Federated Learning Secure Aggregation Powered by AESCTR GPU Implementation
  • Mariya Georgieva Belorgey
  • Sofia Dandjee
  • Nicolas Gama
  • Dimitar Jetchev
  • Dmitry Mikushin

We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major complexity aspects: 1) large number of clients; 2) highly complex machine learning models such as CNNs, RNNs, Transformers, etc. The AES-CTR-based masking function in our aggregation protocol is built on the concept of counterbased cryptographically-secure pseudorandom number generators (csPRNGs) as described in [32] and subsequently used by Facebook for their torchcsprng csPRNG. We improve upon torchcsprng by careful use of shared memory on the GPU device, a recent idea of Cihangir Tezcan [38] and obtain 100x speedup in the masking function compared to a single CPU core.

Finally, we demonstrate scalability of our protocol in two realworld Federated Learning scenarios: 1) efficient training of large logistic regression models with 50 features and 50M data points distributed across 1000 clients that can dropout and securely aggregated via three servers (running secure multi-party computation (SMPC)); 2) training a recurrent neural network (RNN) model for sentiment analysis of Twitter feeds coming from a large number of Twitter users (more than 250,000 users). In case 1), our secure aggregation algorithm runs in less than a minute compared to a pure MPC computation (on 3 parties) that takes 27 hours and uses 400GB RAM machines as well as 1 gigabit-per-second network. In case 2), the total training is around 10 minutes using our GPU powered secure aggregation versus 10 hours using a single CPU core.

High-precision RNS-CKKS on fixed but smaller word-size architectures: theory and application
  • Rashmi Agrawal
  • Jung Ho Ahn
  • Flavio Bergamaschi
  • Ro Cammarota
  • Jung Hee Cheon
  • Fillipe D. M. de Souza
  • Huijing Gong
  • Minsik Kang
  • Duhyeong Kim
  • Jongmin Kim
  • Hubert de Lassus
  • Jai Hyun Park
  • Michael Steiner
  • Wen Wang

A prevalent issue in the residue number system (RNS) variant of the Cheon-Kim-Kim-Song (CKKS) homomorphic encryption (HE) scheme is the challenge of efficiently achieving high precision on hardware architectures with a fixed, yet smaller, word-size of bit length W , especially when the scaling factor satisfies log Δ > W

In this work, we introduce an efficient solution termed composite scaling. In this approach, we group multiple RNS primes as ql :=Π t-1 j=0 ql,j such that logql,j< W for 0 ≤ j < t, and use each composite ql in the rescaling procedure as ct →→ [ct/ql]. This strategy contrasts the traditional rescaling method in RNS-CKKS, where each ql is chosen as a single log Δ-bit prime, a method we designate as single scaling

To achieve higher precision in single scaling, where log Δ > W , one would either need a novel hardware architecture with word size W' > log Δ or would have to resort to relatively inefficient solutions rooted in multi-precision arithmetic. This problem, however, doesn't arise in composite scaling. In the composite scaling approach, the larger the composition degree t, the greater the precision attainable with RNS-CKKS across an extensive range of secure parameters tailored for workload deployment.

We have integrated composite scaling RNS-CKKS into both OpenFHE and Lattigo libraries. This integration was achieved via a concrete implementation of the method and its application to the most up-to-date workloads, specifically, logistic regression training and convolutional neural network inference. Our experiments demonstrate that single and composite scaling approaches are functionally equivalent, both theoretically and practically


Noah's Ark: Efficient Threshold-FHE Using Noise Flooding
  • Morten Dahl
  • Daniel Demmler
  • Sarah El Kazdadi
  • Arthur Meyre
  • Jean-Baptiste Orfila
  • Dragos Rotaru
  • Nigel P. Smart
  • Samuel Tap
  • Michael Walter

We outline a secure and efficient methodology to do threshold distributed decryption for LWE based Fully Homomorphic Encryption schemes. Due to the smaller parameters used in some FHE schemes, such as Torus-FHE (TFHE), the standard technique of "noise flooding'' seems not to apply. We show that noise flooding can also be used with schemes with such small parameters, by utilizing a switch to a scheme with slightly higher parameters and then utilizing the efficient bootstrapping operations which TFHE offers. Our protocol is proved secure via a simulation argument, making its integration in bigger protocols easier to manage.

A Probabilistic Design for Practical Homomorphic Majority Voting with Intrinsic Differential Privacy
  • Arnaud Grivet Sébert
  • Martin Zuber
  • Oana Stan
  • Renaud Sirdey
  • Cédric Gouy-Pailler

As machine learning (ML) has become pervasive throughout various fields (industry, healthcare, social networks), privacy concerns regarding the data used for its training have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the other actors of the training phase. In this context of secure collaborative learning, Differential Privacy (DP) and Fully Homomorphic Encryption (FHE) are two complementary countermeasures of growing interest to thwart privacy attacks in ML systems. Central to many collaborative training protocols, in the line of PATE, is majority voting aggregation. Thus, in this paper, we design SHIELD, a probabilistic approximate majority voting operator which is faster when homomorphically executed than existing approaches based on exact argmax computation over an histogram of votes. As an additional benefit, the inaccuracy of SHIELD is used as a feature to provably enable DP guarantees. Although SHIELD may have other applications, we focus here on one setting and seamlessly integrate it in the SPEED collaborative training framework from [20] to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an algorithm is constructively usable as a degree of freedom to achieve better FHE performances.

Optimizing HE operations via Level-aware Key-switching Framework
  • Intak Hwang
  • Jinyeong Seo
  • Yongsoo Song

In the state-of-the-art Homomorphic Encryption (HE) schemes, the key-switching procedure is commonly used as a building block of non-linear operations, but also a major performance bottleneck. Its complexity is primarily determined by the corresponding gadget decomposition, which transforms a ciphertext entry into a vector of small elements to reduce the noise growth from the multiplication with an evaluation key. Prior works such as Cheon et al. (SAC 2018) and Halevi et al. (CT-RSA 2019) fixed a decomposition function in the setup phase which is applied across all ciphertext levels, thus yielding suboptimal performance.

In this paper, we introduce a novel key-switching framework for leveled HEs. We aim to allow the use of different decomposition functions during the evaluation phase so that the optimal decomposition method can be utilized at each level to achieve the best performance. A naive solution might generate multiple key-switching keys corresponding to all possible decomposition functions, and sends them to an evaluator. However, our solution can achieve the goal without such communication overhead since it allows an evaluator to dynamically derive other key-switching keys from a single key-switching key depending on the choice of gadget decomposition.

We implement our framework at a proof-of-concept level to provide concrete benchmark results. Our experiments show that we achieve the optimal performance at every level while maintaining the same computational capability and communication costs.


Trivial Transciphering With Trivium and TFHE
  • Thibault Balenbois
  • Jean-Baptiste Orfila
  • Nigel Smart

We examine the use of Trivium and Kreyvium as transciphering mechanisms for use with the TFHE FHE scheme. Trivium was introduced in the eSTREAM project as a general purpose stream cipher, whilst Kreyvium was introduced to strengthen Trivium (in the context of transciphering BGV/BFV ciphertext). Previously both ciphers were investigated for FHE transciphering only in the context of the BGV/BFV FHE schemes; this is despite Trivium and Kreyvium being particularly suited to TFHE. Recent work by Dobraunig et al. gave some initial experimental results using TFHE. We show that these two symmetric ciphers have excellent performance when homomorphically evaluated using TFHE. Indeed we improve upon the results of Dobraunig et al. by at least two orders of magnitude in terms of latency. This shows that, for TFHE at least, one can transcipher using a standardized symmetric cipher (Trivium), without the need for special FHE-friendly ciphers being employed. For applications wanting extra security, but without the benefit of relying on a standardized cipher, our work shows that Kreyvium is a good candidate.

A Homomorphic AES Evaluation in Less than 30 Seconds by Means of TFHE
  • Daphné Trama
  • Pierre-Emmanuel Clet
  • Aymen Boudguiga
  • Renaud Sirdey

Since the pioneering work of Gentry, Halevi, and Smart in 2012 \citegentry_BGV, the state of the art on transciphering has moved away from work on AES to focus on new symmetric algorithms that are better suited for a homomorphic execution. Yet, with recent advances in homomorphic cryptosystems, the question arises as to where we stand today. Especially since AES execution is the application that may be chosen by NIST in the FHE part of its future call for threshold encryption. In this paper, we propose an AES implementation using TFHE programmable bootstrapping which runs in less than a minute on an average laptop. We detail the transformations carried out on the original AES code as well as the optimized FHE operators we developed to lead to a more efficient homomorphic evaluation. We also duly give several execution times on different machines, depending on the type of execution (sequential or parallelized). These times vary from 4.5 minutes (resp. 54 secs) for sequential (resp. parallel) execution on a standard laptop down to 28 seconds for a parallelized execution over 16 threads on a multi-core workstation.

Fully Homomorphic Privacy-Preserving Naive Bayes Machine Learning and Classification
  • Boyoung Han
  • Yeonghyeon Kim
  • Jina Choi
  • Hojune Shin
  • Younho Lee

Despite the revolutionary advancement of homomorphic encryption (HE) technology, no efficient fully homomorphic Naive Bayes (NB) classifier that can perform training with HE-encrypted data has been developed without using a decryption function. In this study, we propose an approximate homomorphic logarithm calculation method with a relative error of less than 0.01% on average. Using the SIMD function of the underlying HE scheme, the logarithm values for thousands of encrypted probability values can be calculated in approximately 2.5s with the help of a GPU. Based on this, we propose an efficient fully homomorphic NB method. The proposed NB classifier could complete the training on the breast cancer dataset considered within approximately 14.3s, and perform inference for a query in 0.84s. This is estimated around 28 times faster compared to the recent privacy-preserving NB classifier supporting an analogous level of security by Liu et al. in 2017 on the same computational environment and the same CKKS HE operations performance.