NearData

Extreme Near-Data Processing Platform

Learn More

Objectives


The goal of NEARDATA is to create an extreme data infrastructure mediating data flows between Object Storage and Data Analytics platforms across the Compute Continuum. Our novel XtremeDataHub platform is an intermediary data service that intercepts and optimises data flows (S3 API, stream APIs) with highperformance near-data connectors (Cloud/Edge). Finally, our unique Data Broker service will provide secure data access and orchestration of dispersed data sources thanks to TEEs and federated learning architectures. Our NEARDATA platform is a novel technology for data mining of large and dispersed unstructured data sets that can be deployed in the Cloud and in the Edge (HPC, IoT Devices), that leverages advanced AI technologies and offers a novel confidential cybersecurity layer for trusted data computation.

Neardata architecture

The goals of NearData are the following:

  • Provide high-performance near-data processing for Extreme Data Types: The first objective is to create a novel intermediary data service (XtremeDataHub) providing serverless data connectors that optimize data management operations (partitioning, filtering, transformation, aggregation) and interactive queries (search, discovery, matching, multi-object queries) to efficiently present data to analytics platforms. Our data connectors facilitate a elas- tic data-driven process-then-compute paradigm which significantly reduces data communication on the data interconnect, ultimately resulting in higher overall data throughput.
  • Support real-time video streams but also event streams that must be ingested and processed very fast to Object Storage: The second objective is to seamlessly combine streaming and batch data processing for analytics. To this end, we will develop stream data connectors deployed as stream operators offering very fast stateful computations over low-latency event and video streams.
  • The third objective is to create a Data Broker service enabling trustworthy data sharing and confidential orchestration of data pipelines across the Compute Continuum. We will provide secure data orchestration, transfer, processing and access thanks to Trusted Execution Environments (TEEs) and federated learning architectures.


Use Cases


Genomics

Creation of methods, fast storage, and communications infrastructures to communicate distributed computing power with scalable storage systems, allowing efficient distribution of datasets across the system.

Metabolomics

Expand the analysis of metabolomics raw data and boost external access and efficient re-use of open data. Creation of federated and Hybrid distributed architecture and ensuring data privacy but also shared global computations.

Surgery

Create generalised machine-learning models that can aid surgeons during surgery and allow video data to be analysed in real-time and with low latency.

Deliverables


Deliverable 1.1
Public Project Website
PDF
Deliverable 1.3
Data Management Plan
PDF
Deliverable 2.1
Initial Architecture Specifications
PDF
Deliverable 2.2
NEARDATA Architecture Specs and Early Prototypes
PDF
Deliverable 3.1
XtremeHub first release and documentation
PDF
Deliverable 4.1
Data Broker release and documentation
PDF
Deliverable 5.1
First release of KPI benchmarks in all use cases and data connector libraries
PDF
Deliverable 6.1
Communication plan
PDF
Deliverable 6.2
Communication and standardization report
PDF

Publications


On Data Processing through the Lenses of S3 Object Lambda
Link
Challenges and Opportunities for RISC-V Architectures Towards Genomics-Based Workloads
Link
A Seer knows best: Auto-tuned object storage shuffling for serverless analytics
Link
MLLess: Achieving cost efficiency in serverless machine learning training
Link
Exhaustive Variant Interaction Analysis using Multifactor Dimensionality Reduction
Link
Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments
Link
SinClave: Hardware-assisted Singletons for TEEs
Link
A Last-Level Defense for Application Integrity and Confidentiality
Link
Trustworthy confidential virtual machines for the masses
Link
GLIDER: A Scalable and Elastic Data Lake for the Cloud in Serverless Environments
Link
Scaling a Variant Calling Genomics Pipeline with FaaS
Link
Practical Storage-Compute Elasticity for Stream Data Processing
Link
Pravega: A Tiered Storage System for Data Streams
Link
The Nanoservices Framework: Co-locating Microservices in the Cloud-Edge Continuum
Link
METASPACE-ML: Metabolite annotation for imaging mass spectrometry using machine learning
Link
One model to use them all: Training a segmentation model with complementary datasets
Link
Exploiting inherent elasticity of serverless in algorithms with unbalanced and irregular workloads
Link

News


NEARDATA Kick-off meeting

Consortium meeting

Cloud-Edge Continuum (CEC’23)

Workshop

European Big Data Value Forum (EBDVF 2023)

Forum

ACM/IFIP Middleware 2023

Conference

M12 NEARDATA Meeting

Consortium meeting

Mobile World Congress 2024 (MWC24)

Congress

International Symposium on Cluster, Cloud and Internet Computing

Symposium

Cloud-Edge Continuum (CEC’24)

Workshop

Partners


The NEARDATA consortium is a well-balanced team of industrial and academic partners

About


Project title Neardata: Extreme Near-Data Processing Platform
Grant agreement ID 101092644
Coordinator Dr. Pedro García López
Partners Universitat Rovira i Virgili (Spain)
Barcelona Supercomputing Center (Spain)
Technische Universität Dresden (Germany)
Deutsches Krebsforschungszentrum Heidelberg (Germany)
European Molecular Biology Laboratory (Germany)
EMC Information Systems International Unlimited Company (Ireland)
KIO Networks España SA (Spain)
Sano - Centrum Zindywidualizowanej Medycyny Obliczeniowej (Poland)
Scontain GMBH (Germany)
UK Health Security Agency (United Kingdom)
Duration 01 Jan 2023 - 31 Dec 2025
Overall budget 3,913,585.00€
Programme Horizon >
    WORLD LEADING DATA AND COMPUTING TECHNOLOGIES 2022 (HORIZON-CL4-2022-DATA-01)
Topic HORIZON-CL4-2022-DATA-01-05
Funding scheme HORIZON-RIA HORIZON Research and Innovation Actions
Dissemination materials Brochure - Video

Contact us

Project Coordinator

Dr. Pedro García López

pedro.garcia@urv.cat

EU Flag

NEARDATA has received funding from the European Union’s Horizon research and innovation programme under grant agreement No 101092644.