Archives and Data Management Systems in the Big Data Era

Europe/Rome
216 (CNR Bologna)

216

CNR Bologna

Description

Il meeting “Archives and Data management Systems”, che si svolgerà dal 26 al 28 febbraio prossimi presso l’Area della Ricerca di Bologna. Organizzato in seno all'Unità Scientifica Computing (USCVIII), questo meeting e` il terzo di una serie dedicata alla gestione e cura dei dati nell’epoca dei Big Data. 

Le tematiche trattate saranno prevalentemente centrate su: 

  • Evoluzione della funzione degli archivi astronomici nel panorama dei grandi progetti internazionali;

  • aspetti legati alla gestione dati nei centri HPC, nelle attività del PNRR e connessione con i partner industriali.

  • Data Management Systems comprensivo delle tecnologie innovative per lo storage dei dati astronomici e dei rispettivi metadati;

  • sistemi di autenticazione e autorizzazione: dagli account utente di sistema operativo alle identità e credenziali web;

  • Interoperabilità tecnica e semantica, approccio ai principi FAIR: cosa c’è e cosa manca nel campo delle risorse dati e servizi in astrofisica

 

Location: Bologna - CNR - Main Building - 1 Piano stanza 216

Date: 26-28/02/2025 

Orari : 9:00 - 18:00

Fine evento: 28/02/25 ore 16

Modalità di partecipazione: ibrida in presenza e remota ma con limitata possibilità di interazione per domande da remoto. E' fortemente consigliata la partecipazione in persona.

Link per collegamento remoto:

https://meet.google.com/vde-gbqv-bao

Logistica:

ACCESSO ALL'AREA CNR
Per accedere all'Area CNR dovrete identificarvi presso la portineria
esterna, che controllera' che siate elencati nella lista dei
partecipanti.
Dopo esservi registrati riceverete il badge del workshop. Nei giorni
successivi, per accedere all'area CNR potrete mostrare il badge alla
guardia della portineria esterna.

PORTABADGE
Per limitare il piu' possibile il nostro impatto ambientale, vi
chiediamo gentilmente di restituirci i porta-badge al termine del
workshop  in modo da poterli riutilizzare per altri eventi.

Registration
Registration
Participants
  • Adriano Tullo
  • Alejandro Mus
  • Alessandra Zanichelli
  • Alessandro Altenburger
  • Alexia Cociancich
  • Andrea Bignamini
  • Andrea Mattana
  • Antonietta Fara
  • Carlo Giocoli
  • Carlotta Pittori
  • Ciriaco Goddi
  • Claudio Gheller
  • Cristian Magro
  • Cristiano Urban
  • Cristina Re
  • Daniele Tavagnacco
  • Davide Miceli
  • Dawoon Kim
  • Diego Paris
  • Elena Fedorova
  • Elia Gasperini
  • Emanuele Bascape
  • emanuele scalise
  • Enrico Licata
  • Ezequiel J. Marchesini
  • Fabrizio Bocchino
  • Federico Abbate
  • Federico Fiordoliva
  • Flavia Calderone
  • Flavio Licciulli
  • Francesca Martines
  • Francesco Bedosti
  • Francesco Fiori
  • Francesco Verrecchia
  • Francisco Miguel Montenegro Montes
  • Franco Tinarelli
  • Fulvio Gianotti
  • Gabriele Cremonese
  • Georgios Zacharis
  • Giacomo Coran
  • gianluca molgora
  • Gianni Bernardi
  • Giorgio Bergamin
  • Giorgio Calderone
  • Giovanni Carraro
  • Giovanni Naldi
  • Giulia Despali
  • Giuseppe Di Persio
  • Giuseppe Riccio
  • Hendrik Heinl
  • Ismam Abu
  • Letizia Caito
  • Loris Bortignon
  • Luciano Nicastro
  • marcella massardi
  • Marcello Lodi
  • Marco Molinaro
  • Mariano Muscas
  • Mario Marino
  • Martina Vicinanza
  • Massimo Costantini
  • Massimo Sponza
  • Matteo Gandolfi
  • Matteo Perri
  • Matteo Stagni
  • Maura Pilia
  • Michele Doro
  • Michele Doro
  • Natalia Amanda Vergara
  • Nicola Ragno
  • Nicolò Antonietti
  • Raffaele Persico
  • Renato Frattin
  • Riccardo Smareglia
  • Robert Butora
  • Robert Janusz
  • Romolo Politi
  • Rosita Paladino
  • Sandro De Santis
  • Sara Bertocco
  • Sara Gelsumini
  • Sergio Poppi
  • stefano bianco
  • Stefano Cavuoti
  • Stefano Chiappella
  • Stefano Dal Pra
  • Stefano Gallozzi
  • Tomas Azevedo Silva
  • tommaso nicosia
  • Umberto Galtarossa
  • Valerio Pastore
  • Vincenzo Galluzzi
  • Vito Conforti
    • Registration and Welcome 216

      216

      CNR Bologna

    • Session1 - International Project Archives status 216

      216

      CNR Bologna

      Update or current status report of the newly developed archive infrastructure of the Flagship Projects.

      • 1
        Towards a data platform providing a holistic support to AtLAST operations

        AtLAST (Atacama Large Aperture Sub-millimeter Telescope) is a project that aims at building and operating the next large single-dish facility observing at sub-mm wavelengths in Chile. In addition to pursuing transformational science and expanding technological limits, we put a strong emphasis on sustainability aspects. Among the many outcomes from the recently completed EU-funded design study was a number of transformational science cases covering several different of astrophysics. In addition, we have created a comprehensive operations plan that envisions remote distributed operations, a modern user support model and the implementation of infrastructures enabling an easy and transparent access to the data. Here, we want to explore strategies for describing, storing and sharing the different types of data that AtLAST will produce including engineering, weather and science data at different stages of processing. We envision a platform where the different AtLAST stakeholders (science users, proposal reviewers, telescope astronomers and operators, engineers etc) will be able to access the technical and scientific data they need as well as the tools necessary to analyse these data.

        Speaker: Francisco Miguel Montenegro Montes (Universidad Complutense de Madrid)
      • 2
        ALMA Archive towards the Wideband Sensitivity Upgrade Era

        ALMA is undergoing the Wideband Sensitivity Upgrade (WSU), that will result in an increase of the instantaneous spectral bandwidth, spectral scan speed, and sensitivity for all observations. However, this will also brings some technical and data management challenges. I will present the major consequences of the upgrade for the data flow and the ALMA Science Archive development. I will also present the approaches that in the European ARC, and in particular in the Italian node, we are considering to face these new challenges.

        Speaker: Marcella Massardi (Istituto Nazionale di Astrofisica (INAF))
      • 3
        The Gaia mission: towards a whole-sky Legacy Data management and archive infrastructure at OPS4 INAF-ASI facility
        Speaker: Deborah Busonero (Istituto Nazionale di Astrofisica (INAF))
    • 10:30
      Coffee Break 216

      216

      CNR Bologna

    • Session 2 - International and National Projects Archives 216

      216

      CNR Bologna

      • 4
        SSDC
        Speaker: Matteo Perri (Istituto Nazionale di Astrofisica (INAF))
      • 5
        IA2 INAF Data Center
        Speaker: Cristina Knapic (Istituto Nazionale di Astrofisica (INAF))
      • 6
        The AGILE data archive and data management system c/o the ASI Space Science Data Center

        AGILE (Astrorivelatore Gamma ad Immagini LEggero) has been a unique and successful space mission of the scientific program of the Italian Space Agency (ASI) focused on high-energy astrophysics, built and operated with the programmatic and technical support of INAF and INFN. During almost 17 years of observations in orbit (from April 23, 2007, to January 18, 2024), AGILE contributed in fundamental ways to high-energy astrophysics, cosmic-ray physics, solar physics and to the study of terrestrial gamma-ray flashes. Its archives and catalogs, compliant with FAIR principles, are available to the community through the AGILE Data Center, which is part of the ASI multi-mission Space Science Data Center (SSDC, previously known as ASDC). In this presentation, we give an overview of the AGILE data archive and data management system.
        Even if AGILE was a small space mission that could not be classified within the context of so-called "big data", it also had a Guest Observer Programme and addressed many issues that can be a useful example for the future.

        Speaker: Dr Carlotta Pittori (Istituto Nazionale di Astrofisica (INAF))
      • 7
        Toward a Public MAGIC Gamma-Ray Telescope Legacy Data Portal

        The MAGIC telescopes are one of the three major IACTs (Imaging Atmospheric Cherenkov Telescopes) for observation of gamma rays in the TeV regime currently operative. MAGIC operates since 2003, and has published data of more than 80 sources, mostly blazars, in several emission states. MAGIC already distributes astronomical .fits files with basic final scientific products such as spectral energy distributions, light curves and skymaps from published results.

        We are working on a updated format of high-level data files that contains more information (for a complete legacy of results), in ascii format (for human eye readability) and compliant with VO requirements. The final goal is the legacy of all high-level products in MAGIC paper, possibly including multi-wavelength data specifically analyzed for the publications. This activity is also meant toward the new generation of IACT, the Cherenkov Telescope Array Observatory.

        In this contribution we describe this project.

        Speaker: Michele Doro (University of Padova)
      • 8
        ASTRI Horn, ASTRI Mini-Array and CTA Observatory a new Archival Perspective Design
        Speaker: Dr Stefano Gallozzi (Istituto Nazionale di Astrofisica (INAF))
      • 9
        SKA SRC - an international effort
        Speaker: Claudio Gheller (Istituto Nazionale di Astrofisica (INAF))
    • 13:00
      Lunch 216

      216

      CNR Bologna

    • Session 3 - Archival activities in HPC centers 216

      216

      CNR Bologna

      The national HPC centers in Italy are hosting also archival and long preservation facilities. Activities around this aspect are expected in this session.

      • 10
        The Sardinia Radio Telescope data management, from observing to archiving
        Speaker: Antonietta Angela Rita Fara (Istituto Nazionale di Astrofisica (INAF))
      • 11
        Overview Data Lakes Spoke 3 and IDL
        Speakers: Giacomo Coran (Istituto Nazionale di Astrofisica (INAF)), Massimo Costantini (Istituto Nazionale di Astrofisica (INAF))
      • 12
        The GAIA use case in Spoke 3 and Innovation Grants
        Speakers: Enrico Licata (Istituto Nazionale di Astrofisica (INAF)), Sara Gelsumini (Istituto Nazionale di Astrofisica (INAF))
      • 13
        Round table
    • 16:00
      Coffee break 216

      216

      CNR Bologna

    • Session 4 - Industrial contribution 216

      216

      CNR Bologna

      Overviews on archival and storage off the shelf commercial tools are expected.

      • 14
        Advanced archival solutions

        IBM illustrerà le soluzioni di memorizzazione ed archiviazione per ambienti complessi e multipiattaforma: on premise, distribuiti e cloud oriented.
        Partendo dalle soluzioni storiche (come le librerie robotizzate) arriveremo alle soluzioni software defined (come Storage Scale/GPFS, Ceph e Cloud Object Storage), che possono utilizzare sia appliance IBM che HW standard.

        Big Internationa Projects;
        * Gestione Dati in centri HPC, PNRR e Industria;
        * Data Management Systems;

        Speaker: Mr Sandro De Santis (IBM)
      • 15
        QStar Tape as NAS: sicurezza e affidabilità per l’archiviazione a lungo termine di immagini satellitari e dati scientifici

        Nel panorama della gestione dei dati, la crescita esponenziale delle informazioni impone soluzioni di archiviazione che siano non solo scalabili, ma anche sicure e sostenibili.
        QStar Tape as NAS risponde a queste esigenze offrendo un’alternativa strategica ai tradizionali sistemi di storage su disco, particolarmente adatta per la conservazione di immagini satellitari, dati provenienti da ricerche scientifiche e processi di High-Performance Computing (HPC).
        Protezione e sicurezza dei dati
        L’archiviazione di dati scientifici e immagini satellitari richiede elevati standard di sicurezza e affidabilità nel tempo. I sistemi a nastro garantiscono una protezione avanzata contro le minacce informatiche: a differenza dei dischi, le cartucce LTO possono essere conservate offline, eliminando il rischio di attacchi ransomware o di corruzione accidentale dei dati. Inoltre, la cifratura hardware garantisce la protezione delle informazioni sensibili, fondamentali per le agenzie spaziali, i centri di ricerca e le istituzioni governative.
        Archiviazione a lungo termine e sostenibilità
        L’enorme quantità di dati generati dai satelliti e dai progetti scientifici richiede un sistema capace di garantire la conservazione per decenni senza degradazione. Le tecnologie a nastro offrono una durata superiore rispetto ai dischi magnetici, con una stabilità comprovata per oltre 30 anni. Inoltre, il consumo energetico è drasticamente inferiore: a differenza degli hard disk attivi 24/7, le cartucce LTO non necessitano di alimentazione continua, riducendo i costi operativi e l’impatto ambientale.
        Scalabilità ed efficienza per i Big Data
        I progetti di osservazione terrestre, le simulazioni scientifiche e le applicazioni HPC generano petabyte di dati, spesso con la necessità di un accesso rapido e continuo. QStar Tape as NAS consente di utilizzare i nastri come un filesystem NAS tradizionale, offrendo una soluzione scalabile e conveniente per la gestione dei Big Data. I costi complessivi dello storage risultano notevolmente ridotti rispetto ai dischi, senza compromessi in termini di accessibilità e prestazioni. Conclusione QStar Tape as NAS rappresenta la scelta ideale per enti spaziali, istituti di ricerca e aziende che necessitano di un archivio sicuro, scalabile e sostenibile. Grazie alla combinazione di affidabilità,
        protezione e ottimizzazione dei costi, questa tecnologia consente di gestire e preservare enormi volumi di dati, garantendo l’accessibilità e l’integrità delle informazioni nel lungo periodo.

        Speaker: Dr Raffaele Persico (QStar)
      • 16
        Soluzioni Quantum per l'Archiviazioni a Lungo Termine

        Le organizzazioni necessitano sempre più di soluzioni moderne e facili da usare che le aiutino a memorizzare, gestire, proteggere, archiviare e analizzare enormi quantità di dati.
        Quantum ActiveScale Cold Storage è una soluzione di archiviazione cloud ibrida progettata per offrire uno storage a lungo termine altamente scalabile, sicuro e conveniente per dati ad alta e bassa frequenza di accesso. Basata su un'architettura Object Storage distribuita e supportando le API S3 e S3 GLACIER, combina tecnologie di erasure coding e data durability per garantire affidabilità ed efficienza. Ideale per backup, archiviazione di dati scientifici, contenuti multimediali e conformità normativa, ActiveScale Cold Storage riduce i costi operativi grazie a un modello di accesso ottimizzato per il retrieval su richiesta. Integrandosi con servizi cloud e on-premise, offre un'alternativa economicamente vantaggiosa ai tradizionali archivi a nastro, con tempi di accesso ai dati più rapidi e gestione semplificata.

        Speaker: Mr Stefano Chiappella
      • 17
        How to handle Multi PB Environments for years, easily - The Pure Storage Way

        Pure Storage offers robust archiving solutions, prominently featuring the Evergreen architecture, high density, and lower costs. The Evergreen architecture allows Pure Storage to deliver continuous, non-disruptive upgrades, ensuring that the storage infrastructure remains modern and efficient over a 10+ year lifespan without the need for disruptive migrations or re-buys. This architecture guarantees that customers can minimize downtime and future-proof their storage needs.

        Our Platforms provides efficient, high-throughput, scale-out storage, optimized for large-scale environments with both file and object storage capabilities. The storage solutions are designed to be energy efficient, consuming significantly less power and generating less heat compared to traditional systems. This focus on efficiency not only reduces operational costs but also aligns with modern sustainability goals by minimizing the environmental impact.

        Additionally, Pure Storage's solutions are cost-effective. Consolidating storage silos into a unified platform reduces the complexity and costs associated with managing multiple systems. The predictable pricing model and the option for as-a-service offerings through Evergreen//One provide financial flexibility and help organizations better manage their storage expenses while ensuring robust data protection and rapid recovery capabilities via features like SafeMode snapshots.

        These combined advantages make Pure Storage's archiving solutions an excellent choice for organizations looking to achieve high performance, low costs, and reliable, future-proof storage infrastructure.

        Speaker: Umberto Galtarossa (Pure Storage)
      • 18
        VSP ONE Object Storage
        Speaker: Cristian Magro (Hitachi Vantara)
    • Session 5 - Workflow management systems and Data Mining 216

      216

      CNR Bologna

      Focused on the usage of archival data, this session is user oriented to collect requirements and discuss about possible future implementations.

      • 19
        Workflow Management Systems
        Speaker: Andrea Bignamini (Istituto Nazionale di Astrofisica (INAF))
      • 20
        The QUBRICS database for machine learning: architecture and performance.

        The QUBRICS project aims to identify new, bright and high-redshift QSOs in the Southern Hemisphere using photometric data from several astronomical surveys, and machine learning methods for classification and redshift regression.

        I will briefly describe the architecture and performance of our internal database, and its role within the QUBRICS ecosystem.

        Speaker: Giorgio Calderone (Istituto Nazionale di Astrofisica (INAF))
      • 21
        INAF Open Science
        Speaker: Riccardo Smareglia (Istituto Nazionale di Astrofisica (INAF))
      • 22
        Mining Archives: needs for Machine Learning
        Speakers: Giuseppe Riccio (Istituto Nazionale di Astrofisica (INAF)), Stefano Cavuoti (INAF - Astronomical Observatory of Capodimonte Napoli)
      • 23
        Round table
    • 10:30
      Coffee Break 216

      216

      CNR Bologna

    • Session 6 - Data Management Systems - part 1 216

      216

      CNR Bologna

      Starting from the Astrophysical needs in archival methods, a description of possible approaches to the subject will be given, reporting also some implementation examples.

      • 24
        Data Management: best practices

        How to build an on-demand system to meet the challenges of large projects and big data

        Speaker: Dr Stefano Gallozzi (Istituto Nazionale di Astrofisica (INAF))
      • 25
        Archive as a Service: A Microservices-Based Hyperconverged Infrastructure
        Speaker: Federico Fiordoliva (Istituto Nazionale di Astrofisica (INAF))
      • 26
        Web-based approach to Data Management
        Speaker: Luciano Nicastro (Istituto Nazionale di Astrofisica (INAF))
      • 27
        Round table
    • 12:30
      Lunch 216

      216

      CNR Bologna

    • Session 7 - Data Management Systems - part 2 216

      216

      CNR Bologna

      • 28
        The Central Role of Database Technology in Astronomical Archives

        The common modus operandi in astrophysical research is to recycle knowledge already acquired and shared within a single team. Too little attention is given to the discovery of new technologies to better fulfill project specifications and requirements unless a technological limit is reached during any testing use-case.
        Although the Database Management System (DBMS) is the backbone on which astronomical archives are based, the choice of the best suited DB for scientists' needs often comes up against the need to put into operation a testing prototypes, so the search for an improved technology is simply bypassed in favor of a solution already known and mastered.
        In this talk we will try to make a general excursus on different kinds of DBMS, related tool and show possible approaches to consciously make the best technology choice for the scientific project, or in case, adopt a new database paradigm called “Polyglot Persistence”, where different DB solutions are joined together within a common archive ecosystem. The Polyglot Persistence focus on atomic services identified by data providers and data consumer and build the service adaptively to match the data management and access requirements.

        Speaker: Dr Stefano Gallozzi (Istituto Nazionale di Astrofisica (INAF))
      • 29
        RUCIO Data Management System: a Simple Archive or just a Distributed Storage System?

        In this talk we will present the features of RUCIO DM.
        This software, developed and maintained by CERN on which the storage of different national and international research projects is based, represents a technological solution capable of federating together different pools of storage distributed in various locations and, through the use of a centralized catalog, provides the most common access interfaces to this data as well as a series of management services to promote data redundancy and the possibility of connecting directly to Workload Management Systems for the automation of data processing.
        We will show how it is possible to use RUCIO for a simple astronomical archive, but also how to use RUCIO directly as distributed storage, interfacing it to the capabilities of an external database to ensure greater performance in terms of reliability and availability.

        Speaker: Georgios Zacharis (INAF/OAR))
      • 30
        Round table
    • 16:00
      Coffee Break 216

      216

      CNR Bologna

    • Session 8 - A&A 216

      216

      CNR Bologna

      Discussion over users approach to data, with considerations on authentications and authorizations in the scope of direct or web modulated accesses to resources.

      • 31
        The origin of RAP
        Speaker: Franco Tinarelli (Istituto Nazionale di Astrofisica (INAF))
      • 32
        Effortless Identity Management

        In this presentation, we will explore how to quickly and efficiently implement authentication, authorization, and Single Sign-On (SSO) using Keycloak, an open-source identity and access management solution. Through a video demo, we will show how easy it is to set up a secure authentication system, manage user permissions, and enable login across multiple applications.

        Speaker: Massimo Costantini (Istituto Nazionale di Astrofisica (INAF))
      • 33
        Round table
    • 20:00
      Aperitivo Sociale Bologna centro

      Bologna centro

      via Clavature 12

      Ci troviamo al Mercato di Mezzo per un aperitivo in compagnia!
      Adesione libera

    • Session 9 - Open Science, FAIR and Interoperability 216

      216

      CNR Bologna

      • 34
        Open Data Long Term Data Preservation for HEP: the CNAF Experience

        In this talk, we quickly remind the FAIR data principles, the importance and need for Long Term Data Preservation (LTDP) solutions in High Energy Physics (HEP). We present a case study of the CDF experiment. We showcase how CNAF addressed the preservation challenges highlighting both technical solutions and broader implications for open science and data fairness. We finally share the insights gained through this experience.

        Speaker: Stefano Dal Pra (INFN-CNAF)
      • 35
        Updates on the IVOA standards ecosystem

        Big data and the dynamic software infrastructure landscape are challenges also for the interoperability ecosystem defined by the IVOA standards.
        This contribution reports on the current status of the IVOA activities and on how the VO faces the change in software paradigms and data analysis challenges keeping a eye continuously focused on open standards, preservation and interoperability.

        Speaker: Marco Molinaro (Istituto Nazionale di Astrofisica (INAF))
      • 36
        Towards FAIRness of radio data in the SKA era

        The increasing importance of Science Archives and archive mining in defining the ultimate productivity of an observing facility motivated the Italian Centre for Astronomical Archives (IA2) to develop and maintain the INAF Radio Data Archive. Such a geographically-distributed archival facility flexibly handles different data models and formats aimed at data discovery/access through Virtual Observatory (VO).
        The activity related to the INAF Radio Data Archive led us to join the IVOA Radio Interest Group, whose goal is the interoperability, and more generally the FAIRness, of radio data. In this talk we will present our contribution, mainly focused on the definition of requirements and use cases for the representation of radio astronomy data in the VO. This implied the identification of metadata concepts needed by the radio domain that are not currently supported by the VO, hence the definition of a radio-specific data model that has been published as an IVOA Proposed Recommendation.
        Complementary to data discovery, archival models for data ingestion and operations are required by large projects like the SKA and its SKA Regional Centre Network. In this respect we will present our activity within the Orange and Azure SRCNet Teams, mainly focused on the definition of observatory and advanced data products, their relation with science use cases and the definition of requirements for visualisation tools. Also, a fundamental contribution is being given on the mapping of SKA metadata onto existing and widely used data models, like the CADC Common Archive Observation Model and the IVOA ObsCore itself.

        Speaker: Vincenzo Galluzzi (Istituto Nazionale di Astrofisica (INAF))
    • 10:30
      Coffee Break 216

      216

      CNR Bologna

    • Session 10 - Interoperability and Virtual Observatory 216

      216

      CNR Bologna

    • 13:00
      Lunch 216

      216

      CNR Bologna

    • Session 11 - Closing remarks 216

      216

      CNR Bologna

      Sum-up of the meeting and final discussion

    • 15:30
      Coffee Break 216

      216

      CNR Bologna

    • Private session Sector 2 216

      216

      CNR Bologna