Neardata project review

Final Review Agenda

Meeting Subject: Neardata final review

Venue: Online - Microsoft Teams

Date: September 25, 2024

Chair: Pedro García López (Coordinator)

Time Subject Time (mins) Lead partner
9:00 Private meeting, reviewers and PO 30 mins PO
9:30 Welcome - Presentation of Participants 15 mins Coordinator
9:45 Overall Project Presentation - Project Overview and Big Picture - Achievements and project outcomes 15 mins Coordinator
10:00 WP2. Architecture - Lithops & Data plug (5 mins) - Metaspace (5 mins) - WP2 (15 mins) Q&A (15 mins) 40 mins Coordinator, EMBL
10:40 Coffee break 10 mins
10:50 WP3. Data plane - Pravega (5 mins) - WP3 (15 mins) Q&A (15 mins) 35 mins DELL
11:25 WP4. Control plane - SCONE (5 mins) - WP4 (15 mins) Q&A (15 mins) 35 mins TUD
12:00 Lunch 60 mins
13:00 WP5. Extreme Health Use Cases - Genomics (30 mins) - Metabolomics (10 mins) - Surgery (10 mins) Q&A (15 mins) 65 mins BSC, SANO, UKHS, EMBL, NCT
14:05 WP6. Promoting Impact - Promotional video (5 mins) - WP6 (15 mins) Q&A (15 mins) 35 mins SCO
14:40 Final recap and summary 10 mins Coordinator
14:50 WP1. Project Management - Use of resources, financial information - Management procedures - Amendment Q&A (15 mins) 35 mins Coordinator
15:25 Final remarks 10 mins PO
15:35 Private meeting - reviewers and PO 30 mins PO
16:05 Oral feedback from reviewers and comment from project 25 mins PO
16:30 End of Review

Review Slides (tentative)

Document PDF KPI
Project Overview
WP1
WP2
WP3
WP4
WP5 - Transcriptomics Use Case
WP5 - Genomics 1
WP5 - Genomics 2
WP5 - Surgomics Use Case
WP6

Deliverables (pending approval)

Title PDF
Deliverable 1.1
Deliverable 1.3
Deliverable 2.1
Deliverable 2.2
Deliverable 3.1
Deliverable 4.1
Deliverable 5.1
Deliverable 6.1
Deliverable 6.2

Publications

Name Link
On Data Processing through the Lenses of S3 Object Lambda
Challenges and Opportunities for RISC-V Architectures Towards Genomics-Based Workloads
A Seer knows best: Auto-tuned object storage shuffling for serverless analytics
MLLess: Achieving cost efficiency in serverless machine learning training
Exhaustive Variant Interaction Analysis using Multifactor Dimensionality Reduction
Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments
SinClave: Hardware-assisted Singletons for TEEs
A Last-Level Defense for Application Integrity and Confidentiality
Trustworthy confidential virtual machines for the masses
GLIDER: A Scalable and Elastic Data Lake for the Cloud in Serverless Environments
Scaling a Variant Calling Genomics Pipeline with FaaS
Practical Storage-Compute Elasticity for Stream Data Processing
Pravega: A Tiered Storage System for Data Streams
The Nanoservices Framework: Co-locating Microservices in the Cloud-Edge Continuum
METASPACE-ML: Metabolite annotation for imaging mass spectrometry using machine learning
One model to use them all: Training a segmentation model with complementary datasets
Exploiting inherent elasticity of serverless in algorithms with unbalanced and irregular workloads
Serverless End Game: Disaggregation enabling Transparency
The many faces of locality in Big Data Analytics. Springer Handbook of Data Engineering (pending link)
Dataplug: Cloud-aware Unstructured Data Management for Scientific Cloud Computing (pending link)
CRISP: Confidentiality, Rollback, and Integrity Storage Protection for Confidential Stateful Computing (pending link)

Software results

Title Description Repository
Lithops A multi-cloud framework for big data analytics and embarrassingly parallel jobs.
METASPACE Cloud engine and platform for metabolite annotation for imaging mass spectrometry.
Pravega Pravega - Streaming as a new software defined storage primitive.
SCONE All related to SCONE confidentiality support.
Dataplug Dataplug is a Python framework for efficiently accessing partitions of unstructured data stored in object storage for elastic workloads in the Cloud
Glider Glider ephemeral storage system with in-storage computation.
METASPACE & Lithops Lithops-based Serverless implementation of the METASPACE spatial metabolomics annotation pipeline.
Metabolomics Data Space International Metabolomics Data Space
Serverless benchmarks Serverless benchmarks
Genomics use-case Variant calling source code.
Metabolomics use-case ML models from Experiment 1.
Surgery use-case 1 DefinitiFederated Learning Source Code.ve
Surgery use-case 2 Surgical Pravega GStreamer Demo.
Transcriptomic Atlas use-case 1 Federated Learning for Human Genome Variation Analysis.
Transcriptomic Atlas use-case 2 Transcriptomic Atlas Pipeline.
Variant - Interactions use-case 1 MDR use-case source-code integrated with HPC Data Connector.
Variant - Interactions use-case 2 MDR use-case with Apache Spark.
Video-streaming benchmarks Video-streaming benchmarks with Gstreamer and Pravega connectors.

Videos

Title Video
NEARDATA Project - Extreme Near-Data Processing Platform
METASPACE demo
Pravega streams flink coordinated autoscaling
Secure Federeted Learning using SCONE
Pravega Cluster PoC
KeyCloak for NEARDATA
Lithops Vanilla execution
Metabolomics pipeline running in run.lithops.cloud online Python notebook

Contact us

Project Coordinator

Dr. Pedro García López

pedro.garcia@urv.cat

EU Flag

NEARDATA has received funding from the European Union’s Horizon research and innovation programme under grant agreement No 101092644.