Neardata project review
Final Review Agenda
Meeting Subject: Neardata final review
Venue: Online - Microsoft Teams
Date: September 25, 2024
Chair: Pedro García López (Coordinator)
Time |
Subject |
Time (mins) |
Lead partner |
9:00 |
Private meeting, reviewers and PO |
30 mins |
PO |
9:30 |
Welcome - Presentation of Participants |
15 mins |
Coordinator |
9:45 |
Overall Project Presentation
- Project Overview and Big Picture
- Achievements and project outcomes |
15 mins |
Coordinator |
10:00 |
WP2. Architecture
- Lithops & Data plug (5 mins)
- Metaspace (5 mins)
- WP2 (15 mins)
Q&A (15 mins) |
40 mins |
Coordinator, EMBL |
10:40 |
Coffee break |
10 mins |
|
10:50 |
WP3. Data plane
- Pravega (5 mins)
- WP3 (15 mins)
Q&A (15 mins) |
35 mins |
DELL |
11:25 |
WP4. Control plane
- SCONE (5 mins)
- WP4 (15 mins)
Q&A (15 mins) |
35 mins |
TUD |
12:00 |
Lunch |
60 mins |
|
13:00 |
WP5. Extreme Health Use Cases
- Genomics (30 mins)
- Metabolomics (10 mins)
- Surgery (10 mins)
Q&A (15 mins) |
65 mins |
BSC, SANO, UKHS, EMBL, NCT |
14:05 |
WP6. Promoting Impact
- Promotional video (5 mins)
- WP6 (15 mins)
Q&A (15 mins) |
35 mins |
SCO |
14:40 |
Final recap and summary |
10 mins |
Coordinator |
14:50 |
WP1. Project Management
- Use of resources, financial information
- Management procedures
- Amendment
Q&A (15 mins) |
35 mins |
Coordinator |
15:25 |
Final remarks |
10 mins |
PO |
15:35 |
Private meeting - reviewers and PO |
30 mins |
PO |
16:05 |
Oral feedback from reviewers and comment from project |
25 mins |
PO |
16:30 |
End of Review |
|
|
Review Slides (tentative)
Document |
PDF |
KPI |
Project Overview |
|
|
WP1 |
|
|
WP2 |
|
|
WP3 |
|
|
WP4 |
|
|
WP5 - Transcriptomics Use Case |
|
|
WP5 - Genomics 1 |
|
|
WP5 - Genomics 2 |
|
|
WP5 - Surgomics Use Case |
|
|
WP6 |
|
|
Deliverables (pending approval)
Title |
PDF |
Deliverable 1.1 |
|
Deliverable 1.3 |
|
Deliverable 2.1 |
|
Deliverable 2.2 |
|
Deliverable 3.1 |
|
Deliverable 4.1 |
|
Deliverable 5.1 |
|
Deliverable 6.1 |
|
Deliverable 6.2 |
|
Publications
Name |
Link |
On Data Processing through the Lenses of S3 Object Lambda |
|
Challenges and Opportunities for RISC-V Architectures Towards Genomics-Based Workloads |
|
A Seer knows best: Auto-tuned object storage shuffling for serverless analytics |
|
MLLess: Achieving cost efficiency in serverless machine learning training |
|
Exhaustive Variant Interaction Analysis using Multifactor Dimensionality Reduction |
|
Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments |
|
SinClave: Hardware-assisted Singletons for TEEs |
|
A Last-Level Defense for Application Integrity and Confidentiality |
|
Trustworthy confidential virtual machines for the masses |
|
GLIDER: A Scalable and Elastic Data Lake for the Cloud in Serverless Environments |
|
Scaling a Variant Calling Genomics Pipeline with FaaS |
|
Practical Storage-Compute Elasticity for Stream Data Processing |
|
Pravega: A Tiered Storage System for Data Streams |
|
The Nanoservices Framework: Co-locating Microservices in the Cloud-Edge Continuum |
|
METASPACE-ML: Metabolite annotation for imaging mass spectrometry using machine learning |
|
One model to use them all: Training a segmentation model with complementary datasets |
|
Exploiting inherent elasticity of serverless in algorithms with unbalanced and irregular workloads |
|
Serverless End Game: Disaggregation enabling Transparency |
|
The many faces of locality in Big Data Analytics. Springer Handbook of Data Engineering (pending link) |
|
Dataplug: Cloud-aware Unstructured Data Management for Scientific Cloud Computing (pending link) |
|
CRISP: Confidentiality, Rollback, and Integrity Storage Protection for Confidential Stateful Computing (pending link) |
|
Software results
Title |
Description |
Repository |
Lithops |
A multi-cloud framework for big data analytics and embarrassingly parallel jobs. |
|
METASPACE |
Cloud engine and platform for metabolite annotation for imaging mass spectrometry. |
|
Pravega |
Pravega - Streaming as a new software defined storage primitive. |
|
SCONE |
All related to SCONE confidentiality support. |
|
Dataplug |
Dataplug is a Python framework for efficiently accessing partitions of unstructured data stored in object storage for elastic workloads in the Cloud |
|
Glider |
Glider ephemeral storage system with in-storage computation. |
|
METASPACE & Lithops |
Lithops-based Serverless implementation of the METASPACE spatial metabolomics annotation pipeline. |
|
Metabolomics Data Space |
International Metabolomics Data Space |
|
Serverless benchmarks |
Serverless benchmarks |
|
Genomics use-case |
Variant calling source code. |
|
Metabolomics use-case |
ML models from Experiment 1. |
|
Surgery use-case 1 |
DefinitiFederated Learning Source Code.ve |
|
Surgery use-case 2 |
Surgical Pravega GStreamer Demo. |
|
Transcriptomic Atlas use-case 1 |
Federated Learning for Human Genome Variation Analysis. |
|
Transcriptomic Atlas use-case 2 |
Transcriptomic Atlas Pipeline. |
|
Variant - Interactions use-case 1 |
MDR use-case source-code integrated with HPC Data Connector. |
|
Variant - Interactions use-case 2 |
MDR use-case with Apache Spark. |
|
Video-streaming benchmarks |
Video-streaming benchmarks with Gstreamer and Pravega connectors. |
|
Videos
Title |
Video |
NEARDATA Project - Extreme Near-Data Processing Platform |
|
METASPACE demo |
|
Pravega streams flink coordinated autoscaling |
|
Secure Federeted Learning using SCONE |
|
Pravega Cluster PoC |
|
KeyCloak for NEARDATA |
|
Lithops Vanilla execution |
|
Metabolomics pipeline running in run.lithops.cloud online Python notebook |
|