NearData

Extreme Near-Data Processing Platform

Learn More

Objectives


The goal of NEARDATA is to create an extreme data infrastructure mediating data flows between Object Storage and Data Analytics platforms across the Compute Continuum. Our novel XtremeDataHub platform is an intermediary data service that intercepts and optimises data flows (S3 API, stream APIs) with highperformance near-data connectors (Cloud/Edge). Finally, our unique Data Broker service will provide secure data access and orchestration of dispersed data sources thanks to TEEs and federated learning architectures. Our NEARDATA platform is a novel technology for data mining of large and dispersed unstructured data sets that can be deployed in the Cloud and in the Edge (HPC, IoT Devices), that leverages advanced AI technologies and offers a novel confidential cybersecurity layer for trusted data computation.

Neardata architecture

The goals of NEARDATA are the following:

  • Provide high-performance near-data processing for Extreme Data Types: The first objective is to create a novel intermediary data service (XtremeDataHub) providing serverless data connectors that optimize data management operations (partitioning, filtering, transformation, aggregation) and interactive queries (search, discovery, matching, multi-object queries) to efficiently present data to analytics platforms. Our data connectors facilitate a elas- tic data-driven process-then-compute paradigm which significantly reduces data communication on the data interconnect, ultimately resulting in higher overall data throughput.
  • Support real-time video streams but also event streams that must be ingested and processed very fast to Object Storage: The second objective is to seamlessly combine streaming and batch data processing for analytics. To this end, we will develop stream data connectors deployed as stream operators offering very fast stateful computations over low-latency event and video streams.
  • The third objective is to create a Data Broker service enabling trustworthy data sharing and confidential orchestration of data pipelines across the Compute Continuum. We will provide secure data orchestration, transfer, processing and access thanks to Trusted Execution Environments (TEEs) and federated learning architectures.


Use Cases


Genomics

Creation of methods, fast storage, and communications infrastructures to communicate distributed computing power with scalable storage systems, allowing efficient distribution of datasets across the system.

Metabolomics

Expand the analysis of metabolomics raw data and boost external access and efficient re-use of open data. Creation of federated and Hybrid distributed architecture and ensuring data privacy but also shared global computations.

Surgery

Create generalised machine-learning models that can aid surgeons during surgery and allow video data to be analysed in real-time and with low latency.

Deliverables


Deliverable 1.1
Public Project Website
PDF
Deliverable 1.3
Data Management Plan
PDF
Deliverable 2.1
Initial Architecture Specifications
PDF
Deliverable 2.2
NEARDATA Architecture Specs and Early Prototypes
PDF
Deliverable 3.1
XtremeHub first release and documentation
PDF
Deliverable 4.1
Data Broker release and documentation
PDF
Deliverable 5.1
First release of KPI benchmarks in all use cases and data connector libraries
PDF
Deliverable 6.1
Communication plan
PDF
Deliverable 6.2
Communication and standardization report
PDF
* Deliverables pending approval

Publications

Software results

News


Participation in the European Big Data Value Forum (EBDVF 2024)

Forum

Cloud-Edge Continuum (CEC’24)

Workshop

International Symposium on Cluster, Cloud and Internet Computing

Symposium

Mobile World Congress 2024 (MWC24)

Congress

M12 NEARDATA Meeting

Consortium meeting

ACM/IFIP Middleware 2023

Conference

Partners


The NEARDATA consortium is a well-balanced team of industrial and academic partners

About


Project title Neardata: Extreme Near-Data Processing Platform
Grant agreement ID 101092644
Coordinator Dr. Pedro García López
Partners Universitat Rovira i Virgili (Spain)
Barcelona Supercomputing Center (Spain)
Technische Universität Dresden (Germany)
Deutsches Krebsforschungszentrum Heidelberg (Germany)
European Molecular Biology Laboratory (Germany)
EMC Information Systems International Unlimited Company (Ireland)
KIO Networks España SA (Spain)
Sano - Centrum Zindywidualizowanej Medycyny Obliczeniowej (Poland)
Scontain GMBH (Germany)
UK Health Security Agency (United Kingdom)
Duration 01 Jan 2023 - 31 Dec 2025
Overall budget 3,913,585.00€
Programme Horizon >
    WORLD LEADING DATA AND COMPUTING TECHNOLOGIES 2022 (HORIZON-CL4-2022-DATA-01)
Topic HORIZON-CL4-2022-DATA-01-05
Funding scheme HORIZON-RIA HORIZON Research and Innovation Actions

NEARDATA project is part of:
DataNexus Cluster

Contact us

Project Coordinator

Dr. Pedro García López

pedro.garcia@urv.cat

EU Flag

NEARDATA has received funding from the European Union’s Horizon research and innovation programme under grant agreement No 101092644.