This repository contains the implementation of a Secure Data Lake for medical data, developed as a semester project for the Security and Applications of Trusted Hardware (CC4077) course at the Faculty of Sciences of the University of Porto (FCUP).
The project addresses the security challenges of collaborative medical data storage. It utilizes Intel Software Guard Extensions (SGX) to create a Trusted Execution Environment (TEE). This allows healthcare providers (e.g., hospitals) to upload sensitive Electronic Health Records (EHRs) to a remote enclave where data is encrypted, stored, and processed without ever being exposed to the infrastructure provider or unauthorized parties.
- Trusted Execution Environment (TEE): Uses Intel SGX enclaves to isolate code and data.
- Remote Attestation: Implements the Intel SGX Remote Attestation flow (using Intel Attestation Service - IAS) to verify the identity and integrity of the enclave before data transfer.
- Secure Communication: Establishes a secure channel via a modified Sigma protocol and Diffie-Hellman Key Exchange (DHKE).
- Encrypted Storage: All patient data is stored using AES-GCM (128-bit) encryption.
- Privacy-Preserving Analytics: Supports aggregate queries (e.g., calculating mean blood pressure) inside the enclave, ensuring individual records remain confidential.
The solution implements a simplified Data Lake architecture consisting of:
- Data Source (Hospital/Client): Provides structured patient EHR data (CSV format).
- Ingestion & Encryption: The client attests the enclave, exchanges keys, encrypts the data, and uploads it.
- Trusted Storage: The enclave stores the raw data in memory, protected by hardware-level isolation.
- Processing: The enclave performs computations (e.g.,
secure_compute_mean) on the decrypted data within the trusted boundary and returns only the result.
-
Enclave: The trusted component containing private code for data management and the
secure_store_dataandsecure_compute_meanpipelines. - Service Provider (SP): Interacts with the Intel Attestation Service (IAS) using Enhanced Privacy ID (EPID) to validate the Enclave's Quote.
-
Client Application: Conducts the handshake, encrypts the symmetric key (
$k$ ) with the enclave's public key ($pk$ ), and handles data transmission.
- Protocol: Modified Sigma protocol for key exchange.
- Cryptography: OpenSSL for the Service Provider and Intel SGX SDK crypto libraries for the enclave.
-
Key Wrapping:
$c \leftarrow E(pk, k)$ — The shared symmetric key is wrapped using the enclave's public key.