PAPAYA: a European project for a confidential data analysis platform

Projets européens H2020EURECOM is coordinating the three-year European project, PAPAYA, launched on May 1st. Its mission: enable cloud services to process encrypted or anonymized data without having to access the unencrypted data. Melek Önen, a researcher specialized in applied cryptography, is leading this project. In this interview she provides more details on the objectives of this H2020 project.


What is the objective of the H2020 Papaya project?

Melek Önen: Small and medium-sized companies do not always have the means to internally process large amounts of data that is often personal or confidential. They therefore use cloud services to simplify the task, but in so doing they lose control over their data. Our mission with the PAPAYA project (which stands for PlAtform for PrivAcY-preserving data Analytics) is to succeed in using data processing and classification methods while keeping the data encrypted and/or anonymized. This would offer companies greater security and confidentiality when they use third party cloud services, since these services could no longer access the unencrypted data. This has become a major issue since the European General Data Protection Regulation (GDPR) has come into effect.

What is your main challenge in this project?

MÖ: Today, when we encrypt data the traditional way, it is protected in a randomized manner—in other words, using a method that lacks transparency. It is impossible to carry out operations on data in this state. In 2009, cryptography researcher Craig Gentry proposed a unique method called fully homomorphic encryption. Using this method, several operations can be carried out on encrypted data. The problem is, processing data this way is not very efficient in terms of memory usage and the processes required. The majority of our work will involve designing variants of the data processing algorithms that will be compatible with data protected by homomorphic encryption.

Can you explain how you design variants of data processing algorithms?

MÖ: For example, a neural network contains both linear operations that are easily managed with appropriate encryption methods as well as non-linear operations.  We do not know how to process encrypted data using non-linear operations. Yet the network’s accuracy depends on these non-linear operations, so we cannot do without them. What we must do in this situation is approximate these operations, which are actually functions, by using other linear functions with similar behavior. The more effective this approximation, the more accurate the neural network, and we can therefore process the encrypted data.

What use cases do you plan to work on?

MÖ: We have two different use cases. The first is medical data encryption. This situation affects many hospitals that have patients’ data but are not large enough to have their own internal data processing services. They therefore use cloud services. The second case involves web analytics and it could be useful for the tourism sector. Data collected by smartphone users could be very useful in this sector that analyzes the way tourists move from one place of interest to another. For both cases, we imagine several progressive scenarios. First, for one data owner who has all the users’ unencrypted data that he encrypts with the same key and transfers to the cloud. Next, several owners with several keys. Finally, we consider data that comes directly from users.

Who else is working on this project with you?

MÖ: PAPAYA brings together 6 partners including EURECOM, which is coordinating the action. The companies involved in assisting us with the use cases and in designing this new platform are Atos, IBM, Haifa Research Lab, Orange Labs, and MediaClinics—an SME that makes sensors for monitoring patients in hospitals. In terms of academic partners, we are working with Karlstad University in Sweden. We will work together for the entire three-year project.

Leave a Reply

Your email address will not be published. Required fields are marked *