The NEARDATA project, a research project funded by the European Union’s Horizon Europe programme, has successfully completed its mission to build a world-class platform for extreme near-data processing. By bringing computation directly to where data is stored, the project has removed critical bottlenecks in managing massive datasets across the "compute continuum", from local hospital devices to massive cloud infrastructures.
The project has developed innovative technologies for Extreme Data Processing like:
The project’s most tangible impacts are found in its five specialized medical and scientific use cases:
NEARDATA has firmly positioned Europe as a leader in "extreme data" and digital sovereignty:
The technologies developed are designed for wide-scale industrial and commercial adoption:
| Focus Area | Key Achievement | Future Development Potential and Positioning |
|---|---|---|
|
Public Health and Medicine |
|
Creation of a pan-European surgical AI and precision medicine infrastructure. All these advances enable a shift from reactive medicine to proactive prevention, improving health security and the response to future pandemics. |
|
Scientific Leadership |
Production of 42 scientific publications, with 38% in elite venues such as Nature Communications. Process optimization that reduced the cost of RNA sequence analysis by two orders of magnitude. |
Well positioned in Open Science through the publication of massive datasets: Appendix300 (surgery), Transcriptomics Atlas (genetics), and METASPACE. Influence on future global standards for genomic data compression (MPEG-G). |
|
Data Privacy and Sovereignty |
Implementation of Confidential Computing and Trusted Execution Environments (TEEs) to protect ultra-sensitive genomic and clinical data without slowing down computation. | Guarantee of digital sovereignty, allowing European hospitals to collaborate on AI training by sharing knowledge (models) without the need to move or expose private patient data. |
|
Economic and Industrial Growth |
Launch of commercial services such as PyRun and services from Scontain/SCONE for a secure cloud (Zero Trust). Integration of these technologies into the global portfolios of partners like Dell Technologies. |
Democratization of access to complex data analysis for SMEs through a bioinformatics "app-store" model. Scontain's architecture allows companies to adopt advanced AI while complying with the strictest cybersecurity regulations. |
| KPI | Use Case | Summary of results |
|---|---|---|
| KPI-1 | Epistasis(GWD) | Lithops-HPC connector improves data ingestion in GWD pipeline by 36x. |
| KPI-1 | Epistasis(GWD) | Hyperparameter selection improves performance by 5x. |
| KPI-5 | Epistasis(GWD/MDR) | Cyclomatic-Complexity reveals 1.5x fewer execution paths. |
| KPI-5 | Epistasis(GWD/MDR) | Yaqin’s metrics reveals 1.6x fewer branches, loops and nesting depth. |
| KPI-3 | Epistasis(GWD/MDR) | The auto-scaler improves execution time of MDR use-case by 1.5x. |
| KPI-2 | Surgery | StreamSense achieve low video frame indexing latency between 63ms and 360ms. |
| KPI-2 | Surgery | Semantic video search latency is around 30ms for large collections of surgical video. |
| KPI-1 | Surgery | Data transfer savings in AI loading. Integrating with PyTorch data transfers are reduced between 83.79% and 99.83%. |
| KPI-4 | Surgery | Enhanced encryption of files with TEE, access control mechanisms. The security of the system was rigorously validated through adversarial testing and TEE attestation, confirming that both the confidentiality and integrity of model updates and training data were consistently enforced. |
| KPI-1 | Transcriptomics | Early stopping technique have increased alignment throughput by 19.5%. |
| KPI-1 | Transcriptomics | Use of a newer release of human genome index has resulted in execution times improvements of up to 12x and smaller STAR index file (from 85GB to 30GB). |
| KPI-1 | Transcriptomics | The usage of spot instances reduced, on average, 50% of the execution cost. |
| KPI-1 | Genomics | The Nexus FASTQGzip streamlet provides a good compression vs speed trade-off. It achieves 3.8x better compression ratio than plain FASTQ and 13.2x faster data processing than using FASTQ Gzip files. |
| KPI-1 | Genomics | FaaStream is up to 65.14% cheaper than Flink running the genomics Kpop job. |
| KPI-1 | Metabolomics | Depending on the size, we get a speed-up on processing time ranging from 1.13x to 1.22x faster. |
| KPI-4 | Metabolomics | Achieved full confidential computing support (data at rest, in transit and in use) on cloud storage service (MinIO) and FaaS Lithops Singularity, aided by SCONE mechanisms and using the TEE; achieved partial confidential computing support (data at rest and in transit) due to limitations of the ported system (Metaspace is not fully compliant with Lithops Singularity + SCONE). |
Project Coordinator
Dr. Pedro García López
pedro.garcia@urv.cat
NEARDATA has received funding from the European Union’s Horizon research and innovation programme under grant agreement No 101092644.