Privacy-preserving deep learning

Paper by Ryffel,


We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.

Training data for natural language processing

Paper by Gerlach & Font-Clos


The use of Project Gutenberg (PG) as a text corpus has been extremely popular in statistical analysis of language for more than 25 years. However, in contrast to other major linguistic datasets of similar importance, no consensual full version of PG exists to date. In fact, most PG studies so far either consider only a small number of manually selected books, leading to potential biased subsets, or employ vastly different pre-processing strategies (often specified in insufficient details), raising concerns regarding the reproducibility of published results. In order to address these shortcomings, here we present the Standardized Project Gutenberg Corpus (SPGC), an open science approach to a curated version of the complete PG data containing more than 50,000 books and more than 3×109 word-tokens. Using different sources of annotated metadata, we not only provide a broad characterization of the content of PG, but also show different examples highlighting the potential of SPGC for investigating language variability across time, subjects, and authors. We publish our methodology in detail, the code to download and process the data, as well as the obtained corpus itself on 3 different levels of granularity (raw text, timeseries of word tokens, and counts of words). In this way, we provide a reproducible, pre-processed, full-size version of Project Gutenberg as a new scientific resource for corpus linguistics, natural language processing, and information retrieval.

More efficiency in dealing with noise in homomorphic encryption

Paper by Wang,

Dealing with noise is what slows down homomorphic encryption because the encryption has continuously to be refreshed.


Achieving both simplicity and efficiency in fully homomorphic encryption (FHE) schemes is important for practical applications. In the simple FHE scheme proposed by Ducas and Micciancio (DM), ciphertexts are refreshed after each homomorphic operation. And ciphertext refreshing has become a major bottleneck for the overall efficiency of the scheme. In this paper, we propose a more efficient FHE scheme with fewer ciphertext refreshings. Based on the DM scheme and another simple FHE scheme proposed by Gentry, Sahai, and Waters (GSW), ciphertext matrix operations and ciphertext vector additions are both applied in our scheme. Compared with the DM scheme, one more homomorphic NOT AND (NAND) operation can be performed on ciphertexts before ciphertext refreshing. Results show that, under the same security parameters, the computational cost of our scheme is obviously lower than that of GSW and DM schemes for a depth-2 binary circuit with NAND gates. And the error rate of our scheme is kept at a sufficiently low level.

Optimizing homomorphic encryption for multiparty data sharing

Paper by Alexandru et. al.


The development of large-scale distributed control systems has led to the outsourcing of costly computations to cloud-computing platforms, as well as to concerns about privacy of the collected sensitive data. This paper develops a cloud-based protocol for a quadratic optimization problem involving multiple parties, each holding information it seeks to maintain private. The protocol is based on the projected gradient ascent on the Lagrange dual problem and exploits partially homomorphic encryption and secure multi-party computation techniques. Using formal cryptographic definitions of indistinguishability, the protocol is shown to achieve computational privacy, i.e., there is no computationally efficient algorithm that any involved party can employ to obtain private information beyond what can be inferred from the party’s inputs and outputs only. In order to reduce the communication complexity of the proposed protocol, we introduced a variant that achieves this objective at the expense of weaker privacy guarantees. We discuss in detail the computational and communication complexity properties of both algorithms theoretically and also through implementations. We conclude the paper with a discussion on computational privacy and other notions of privacy such as the non-unique retrieval of the private information from the protocol outputs.

More improvements on key length in homomorphic encryption

Paper by Chaoju Hu and Jianwei Zhao


The public key of the integer homomorphic encryption scheme which was proposed by Van Dijk et al. is long, so the scheme is almost impossible to use in practice. By studying the scheme and Coron’s public key compression technique, a scheme which is able to encrypt n bits plaintext once was obtained. The scheme improved the efficiency of the decrypting party and increased the number of encrypting parties, so it meets the needs of cloud computing better. The security of the scheme is based on the approximate GCD problem and the sparse-subset sum problem.