Skip to content

antoniopedropi/Secure-Data-Lakes-Medical-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Secure Data Lakes in Medical Data

Platform Language Status

About The Project

This repository contains the implementation of a Secure Data Lake for medical data, developed as a semester project for the Security and Applications of Trusted Hardware (CC4077) course at the Faculty of Sciences of the University of Porto (FCUP).

The project addresses the security challenges of collaborative medical data storage. It utilizes Intel Software Guard Extensions (SGX) to create a Trusted Execution Environment (TEE). This allows healthcare providers (e.g., hospitals) to upload sensitive Electronic Health Records (EHRs) to a remote enclave where data is encrypted, stored, and processed without ever being exposed to the infrastructure provider or unauthorized parties.

Key Features

  • Trusted Execution Environment (TEE): Uses Intel SGX enclaves to isolate code and data.
  • Remote Attestation: Implements the Intel SGX Remote Attestation flow (using Intel Attestation Service - IAS) to verify the identity and integrity of the enclave before data transfer.
  • Secure Communication: Establishes a secure channel via a modified Sigma protocol and Diffie-Hellman Key Exchange (DHKE).
  • Encrypted Storage: All patient data is stored using AES-GCM (128-bit) encryption.
  • Privacy-Preserving Analytics: Supports aggregate queries (e.g., calculating mean blood pressure) inside the enclave, ensuring individual records remain confidential.

Architecture

The solution implements a simplified Data Lake architecture consisting of:

  1. Data Source (Hospital/Client): Provides structured patient EHR data (CSV format).
  2. Ingestion & Encryption: The client attests the enclave, exchanges keys, encrypts the data, and uploads it.
  3. Trusted Storage: The enclave stores the raw data in memory, protected by hardware-level isolation.
  4. Processing: The enclave performs computations (e.g., secure_compute_mean) on the decrypted data within the trusted boundary and returns only the result.

Technical Implementation

Components

  • Enclave: The trusted component containing private code for data management and the secure_store_data and secure_compute_mean pipelines.
  • Service Provider (SP): Interacts with the Intel Attestation Service (IAS) using Enhanced Privacy ID (EPID) to validate the Enclave's Quote.
  • Client Application: Conducts the handshake, encrypts the symmetric key ($k$) with the enclave's public key ($pk$), and handles data transmission.

Attestation & Security

  • Protocol: Modified Sigma protocol for key exchange.
  • Cryptography: OpenSSL for the Service Provider and Intel SGX SDK crypto libraries for the enclave.
  • Key Wrapping: $c \leftarrow E(pk, k)$ — The shared symmetric key is wrapped using the enclave's public key.

About

A Proof-of-Concept Secure Data Lake for medical records utilizing Intel SGX for trusted execution, remote attestation, and privacy-preserving analytics.

Topics

Resources

Stars

Watchers

Forks

Contributors