Senior Scientist, NTT Research; Associate Professor of Computer Science, Northeastern University
A New Take on Big-Key Encryption and Incompressible Encodings: Using Useful Data
In his talk at the recent Upgrade 2020, the NTT Research Summit, Dr. Daniel Wichs posed an interesting question: whether we can come up with an incompressible representation of a large public data set, such as the entire Wikipedia site.
Wichs is a senior scientist with NTT Research and an associate professor of computer science at Northeastern Univ. His talk was based on a paper he wrote with Dr. Tal Moran, a faculty member at the Interdisciplinary Center (IDC) Herzliya, and presented at Crypto 2020.
In his talk Wichs noted Wikipedia is some 50 gigabytes in size. But there’s no need to store all that data when, so long as you have an Internet connection, all you need is the URL: www.wikipedia.org. But the notion of incompressible encodings is useful in cryptography, so the challenge he addressed was representing Wikipedia (or any large trove of data) such that the representation requires the full 50GB of storage, even for someone who has the URL and can get the data behind it for free at any time.
Incompressible encoding consists of an encoding algorithm and a decoding algorithm; both are public, so there’s no secret key, Wichs said. The encoding algorithm takes some amount of data, say the Wikipedia data, and encodes it in a probabilistic randomized way to derive a codeword. You can think of the codeword as an alternate representation of the Wikipedia data; call it c. Anybody can come and decode the codeword to recover the underlying data.
Now consider an adversary that has access to the same underlying Wikipedia data. The adversary’s goal is to compress the codeword that was created, and output a smaller compressed value – call it w. The adversary’s decompression procedure takes w and its goal is to recover the codeword c.
Such an attack can be thwarted if we take the randomness of the encoding procedure “and spread it around in some clever way throughout the codeword in such a way that it’s impossible for the adversary to separate out the randomness and the data, and only store the randomness,” not the entire data set, Wichs said.
This notion of incompressible encodings was defined in a prior work, Damgard, Ganesh and Orlandi from Crypto 2019, which they called Proofs of Replicated Storage.
Wichs’ work gives a new construction of these types of incompressible encodings, using a Common Random String Model, with a linear encoding time.
In cryptography there’s always concern over the problem of system compromise, where it’s relatively easy for an attacker to extract any cryptographic keys stored on the compromised system. The idea of big-key cryptography is to mitigate against such a threat by making the keys large – many gigabytes or even terabytes – such that it requires too much bandwidth to download or is not cost-effective for the intruder to have to store so much data.
But that creates another problem: it’s also not cost-effective for the user to store all that data, either. That is, unless the data is useful for some other purpose other than providing security.
“The new idea in our work is let’s make the secret key useful,” Wichs said. “Instead of just having a secret key be some useless, random data that the cryptographic scheme picks, let’s have a secret key that stores let’s say … the user’s movie collection or music collection.”
So, the secret key is an incompressible encoding of useful data on the user’s computer. An adversary would have to exfiltrate almost the entire key to break the security of the system, he said.
In his talk, Wichs laid out the thinking behind his work, including the use of a cryptographic object that uses a relaxed notion of Lossy Trapdoor Permutations.
For the full transcript of Daniel Wichs’s presentation, click here.
Watch Daniel Wichs’s full presentation below.