INAF Science Archives & the Big Data Challenge

Name: INAF Science Archives & the Big Data Challenge
Start: 2019-06-17T13:00:00+00:00
End: 2019-06-19T19:55:00+00:00
Location: INAF

17 Jun 2019, 13:00 → 19 Jun 2019, 19:55 UTC

INAF

viale del Parco Mellini - Rome

Description

The purpose of the workshop is to gather for discussion all main Italian actors involved in the use and management of astrophysical data, also within the interdisciplinary perspective of multimessenger. From an overview of the existing archives and their development, to the discussion of the Archive 2.0 concept for the Big Data, the different functionalities of archives will be presented. The use of modern era archives is no longer circumscribed to the search for scientific information, but it extends to providing the framework for the search, manipulation and analysis of data from telescopes, either terrestrial or satellite, of the new 2020 era.

SOC:

A. Antonelli
A. Grado,
C. Knapic (chair),
E. Molinari,
R. Morbidelli,
M. Nanni,
G. Polenta,
R. Smareglia,
A. Zanichelli.

Remote connection details:

Google Meet - https://meet.google.com/rao-unut-gss

Book of Abstract available here

Preliminary minutes available here

Participants

80 View full list

Monday 17 June
- 13:00 → 14:00
  
  Registration
- 14:00 → 14:15
  Welcome & Opening
  - 14:00
    
    Welcome 5m
    
    Welcome to the conveners by the Chair of the Workshop and logistic details
    
    Speaker: Cristina Knapic (Istituto Nazionale di Astrofisica (INAF))
  - 14:05
    
    Workshop Opening 10m
    
    Workshop scope and main goals
    
    Speaker: Nicolo' D'Amico (Istituto Nazionale di Astrofisica (INAF))
- 14:15 → 15:40
  Session 1a: INAF Archives: the current landscape in science and technology - Chair: C. Knapic
  
  Overview sullo stato dell'arte degli archivi in INAF / Italia
  - 14:15
    
    Workshop aim 10m
    
    Speaker: Riccardo Smareglia (Istituto Nazionale di Astrofisica (INAF))
    
    01 - Worshop archivi start.pptx
  - 14:25
    
    UTG2 25m
    
    Speaker: Alessandra Zanichelli (Istituto Nazionale di Astrofisica (INAF))
    
    02 -archives_and_big_data_UTG2_INAF_2019_v1.pdf
  - 14:50
    
    UTG 3 25m
    
    Speaker: Matteo Perri (Istituto Nazionale di Astrofisica (INAF))
    
    INAF_Archives_UTG3_key.pdf
  - 15:15
    
    UTG4 - Storing and processing planetological data 25m
    
    In this presentation I will introduce how and where planetological data are stored and processed. I will describe the main repositories available from NASA, ESA and ASI and the main processing pipelines with particular emphasis on Martian data.
    
    Speaker: Simone Silvestro (INAF - Osservatorio Astronomico Capodimonte)
    
    04 - SSilvestro_BigData_Roma_2019_2.pptx
- 15:40 → 16:00
  
  Coffee Break
- 16:00 → 17:55
  Session 1b: INAF archives: the current landscape in science and technology - Chair: M. Nanni
  
  Stato dell'arte degli archivi in INAF / Italia
  - 16:00
    
    INAF Italian Astronomical Archives facility 25m
    
    Description of the IA2 tools and services offered to support the astronomical community. from the data preservation to data accessibility, findability and interoperability.
    
    Speaker: Cristina Knapic (Istituto Nazionale di Astrofisica (INAF))
    
    IA2_CK_20190617.pdf
  - 16:25
    
    Online scientific analysis tools for astronomy: the SSDC web services 25m
    
    The ASI-SSDC offers a multitude of online and interactive web services for the astronomical community, which go from archiving and distribution of space mission data to online interactive analysis tools; from multi-mission and multi-wavelength cross-search catalogs to high-level data product services like the broadband spectral energy distribution (SED) builder and the SSDC Sky Explorer.
    
    In this talk, I will present the main SSDC web services, their applications and pratical use in the contest of the multi-wavelength and multi-messenger astronomy.
    
    Speaker: Fabrizio Lucarelli (Istituto Nazionale di Astrofisica (INAF))
    
    06 - Lucarelli_SSDC-WebService_final.pdf
  - 16:50
    
    Gaia DPCT experience in data processing and data management 15m
    
    The DPCT is one of the six Gaia Data Processing Centers belonging to the Gaia Ground Segment and it is the result of a joint effort between the science team of INAF-OATo and the ALTEC industry team, started in July 2008 to support the Italian participation to the Gaia mission data processing tasks (AVU data reduction sw systems). DPCT operations started about one month after the Gaia satellite launch and since then data have been received and processed without interruption with a hundreds of thousands of executed workflows and jobs and databases size of hundreds of terabytes. This talk is focused on all the data processing pipeline (management) phases: data receiving, data processing, data extraction, data archiving and data retrieving.
    
    Speaker: Deborah Busonero (Istituto Nazionale di Astrofisica (INAF))
    
    DPCT_Busonero_20190617.pdf
  - 17:05
    
    The WEAVE archive: a User view 15m
    
    I will show how a typical search session should be performed in the WEAVE archive. Various types of research will be shown and also all the tools implemented in both research and data analysis interfaces.
    
    Speaker: Dr Daniela Bettoni (INAF - Osservatprio Astronomico di Padova)
    
    WEAVE_archivi_Roma.pdf
  - 17:20
    
    LBT, from data to users: the archive of the scientific products @ OAR 15m
    
    Since 2007, the year of its setup, the LBT Survey Data Center (LSC) has supported the Italian community in the exploitation of the LBT data, from the call for proposals to the service of imaging data and spectra reduction and the dissemination of the science-ready processed data and metadata to the PI owners. This has been possible thanks to the work packages that build LSC: the Proposal Handling System that schedules observations according to the ranking by the TAC and to specific observation constraints; the Data Reduction Units which provide the service of the raw data processing and the Science-Ready Data Unit which delivers the processed data and metadata to the PIs. The main tool that supports these operating units is the LSC portal which provides a series of facilities to the observing team for handling the observations and also it allows the PIs to interact with the observing team for monitoring the status of their programs and to interact with the LSC team for retrieving their science-ready processed data. But the very core of LSC is the archive where all the data (raw and processed) and metadata and all the information concerning the proposals are stored. The purpose of this talk is to describe the general workflow which connects all these aspects of LSC.
    
    Speaker: Diego Paris (Istituto Nazionale di Astrofisica (INAF))
    
    09 - LSC_June2019.pdf
  - 17:35
    
    AGILE Science Operation Center at SSDC: from raw data to online analysis tools. 15m
    
    AGILE is a Scientific Mission of the Italian Space Agency (ASI) built and operated in cooperation with INAF and INFN, devoted to gamma-ray astrophysics.
    
    The satellite has been in orbit since April 23rd, 2007. The AGILE Data Center, part of the multi-mission ASI Space Science Data Center ASI-SSDC (previously known as ASDC), is in charge of all the scientific oriented activities related to the processing, archiving, and distribution of AGILE data. Quicklook data analysis and fast communication of new transients are implemented as an essential part of the AGILE science program. One important service to the scientific community, characterizing all the SSDC supported missions, is to provide web tools enabling users to perform quick-look and preliminary on-line analysis of scientific data and cross-correlations at different wavelengths. I will present the AGILE data center main activities, with particular focus on Scientific Operations, multimessenger follow-up of gravitational waves and neutrinos, and User Support features.
    
    Speaker: Carlotta Pittori (Istituto Nazionale di Astrofisica (INAF))
    
    10 - AGILE_at_SSDC_Pittori_INAF_2019_FL.pptx
Tuesday 18 June
- 09:00 → 10:40
  Session 2a: From data to science and back: current experiences and future perspectives - Chair: A. Antonelli
  - 09:00
    
    CADC - Canadian Astronomy DC facility features - S. Gaudet (Invited) 25m
    
    01-2019-06_INAF_CADC_CANFAR.pptx
  - 09:25
    
    The GaiaPortal and the cross-match algorithms and software developed at SSDC: unique solutions to complex scientific and technological challenges. (Paola Maria Marrese - Silvia Marinoni) 25m
    
    Although each astronomical catalogue on its own can be a very powerful scientific tool, it is the combination of archives with each other that truly opens up amazing possibilities for modern astronomical research and more closely meets its requirements. The advanced interoperation of archives should leave the user with the feeling of working with one single data archive. The first step for seamless data access is the computation of the cross-match between surveys. The complexity and scientific issues related to cross-matching has become very popular now that the combined use of large data sets from different surveys and/or wavelength domains is more and more common. The cross-matching of astronomical catalogues is a complex and challenging problem both scientifically and technologically, especially when matching large surveys which include several millions or billions of sources. There are different approaches to the combination of astronomical catalogues, and cross-match algorithms can be very different. It is important to correctly define both the scientific problem one is faced with and the objectives of the cross-match. The Gaia catalogue, with its high accuracy astrometry and high angular resolution, is used in the cross-match algorithms developed in SSDC as the link between other publicly available surveys obtained either from ground or from space, large or small, old, recent and future.
    Gaia data releases, with almost two billions of sources, a largely heterogeneous dataset included in the catalogue and thus complex metadata, represent an excellent and challenging example of how to implement the techniques necessary for the management, the access and the scientific exploitation of big data. SSDC is one of the four ESA partner data centres for the distribution of Gaia data. Through the GaiaPortal at SSDC, users are allowed to access a huge distributed archive of complex high-level data including Gaia, large optical/NIR public surveys and the results of their cross-match.
    The GaiaPortal web interface is almost unique, especially if compared to the solutions adopted by ESA and other partner data centres. GaiaPortal is built to be multi-wavelength and its distinctive characteristic and strength lie in allowing users to interrogate highly composite data without worrying neither to have in depth knowledge on the structure and the organization of the data in the database, nor to correctly write intricate SQL based queries.
    
    02-BigDATA_SSDC_Gaia.pdf
  - 09:50
    
    ASTRI Data Handling and Archiving 15m
    
    In the context of the Cherenkov Telescope Array (CTA), INAF is developing an end-to-end prototype of the CTA Small-Size Telescopes in dual-mirror (SST-2M) configuration. The prototype, named ASTRI-Horn, is located at the INAF "M.C. Fracastoro" observing station in Serra La Nave (Mt. Etna, Sicily), and is currently undergoing the performance verification phase. A mini-array of nine ASTRI telescopes has been then proposed to be deployed and operated as a pathfinder sub-array at the CTA Observatory southern site.
    The INAF-OAR CTA/ASTRI team, in collaboration with SSDC researchers, has developed a full end-to-end software package for the reduction up to the final scientific products of raw data acquired with both ASTRI-Horn prototype and mini-array. The group is also undertaking a massive production of Monte Carlo simulation data using the CTA Monte Carlo software. Simulated data are being used to validate the simulation chain and evaluate the ASTRI-Horn prototype and mini-array performance. Recently, real data of the Crab Nebula taken by the ASTRI-Horn telescope has been successfully reduced and analysed, leading to the first detection of an astrophysical source at very-high energies by a Cherenkov telescope in dual-mirror configuration.
    The INAF-OAR team has also developed the ASTRI data Archiving System (AAS). AAS is in production since 2016, as soon as the first ASTRI light was taken, and takes care of long term data preservation and distribution to the scientific community. Within the AAS, we developed a Proposal Handling System, an Observation Scheduler, and a simple interface for PI data retrieval, the ASTRI Gateway, which allows to access the whole ASTRI data chain products (from raw to higher science-ready data products).
    In this contribution, we present the architecture and the main components of the ASTRI data handling systems and report about the status of their development and application.
    
    Speaker: Saverio Lombardi (INAF-OAR and ASI-SSDC)
    
    ASTRI-DH_INAF_ARCHIVES_BIG_DATA_CHALLENGE_Roma062019_Lombardi.pdf
  - 10:05
    
    Italian contribution to CTA: Archives and a framework for Big Data management. (Stefano Gallozzi - Nicolò Parmiggiani) 25m
    
    04b-Parmiggiani Development of Big Data framework for Cherenkov Telescope Array.pdf
    
    ASTRICTA_ARCHIVES_INAF_CONTRIBUTIONS.pdf
- 10:40 → 10:55
  
  Coffee Break
- 10:55 → 12:50
  Session 2b: From data to science and back: current experiences and future perspectives - Chair: A. Zanichelli
  - 10:55
    
    ESO experience in data handling - M. Arnaboldi (Invited) 25m
    
    Speaker: Arnaboldi
    
    ESO Archive 062019.pptx
  - 11:20
    
    Designing tools to reduce complexity: the spectroscopic surveys example 20m
    
    The VIMOS spectrograph at ESo/VLT has been used to carry out many spectroscopic surveys, targeting from a few thousand up to one hundred thousand objects. Each one of these projects presented a high degree of complexity, from the collection of the multi-wavelength data from which to extract the survey parent sample, to the definition of the observations, the data reduction and its validation by team astronomers, the collection of the project data-products, and the final scientific analysis that required putting together these data products and the original parent sample data. Over the years the Astronomical Software group at IASF Milano has designed and produced a series of software tools aimed at hiding as much as possible the project complexity from the astronomers participating to the project. These tools include database and project management and browsing tools, data reduction pipelines specifically tailored for the projects, and data analysis tools that could work via a direct interface to the project database and data repository. Here we propose to briefly summarize this experience, with emphasis on the success and failure moments, and on the lessons learned, which could be applied to future large projects, like the surveys to be carried out with the MOONS or MOSAIC spectrographs, or the Euclid ESA mission.
    
    Speaker: Marco Scodeggio (Istituto Nazionale di Astrofisica (INAF))
    
    06-Scodeggio_lessons_from_surveys.pdf
  - 11:40
    
    LOFAR-IT: the user experience 20m
    
    LOFAR is a powerful and revolutionary interferometer working at low frequency that is providing breakthrough results in different astrophysics research topics. Due to the unprecedented large field-of-view and frequency coverage/resolution, the large size of a typical LOFAR dataset after data correlation and compression requires specific computers for the data calibration. For this reason, an Italian e-infrastructure is under development by the Italian LOFAR Data Working Group. I will report the user experience of the Italian LOFAR computing nodes. The pipelines installed in these computers use the state-of-the-art of the calibration and imaging software allowing for top level science. By using these facilities, Italian astronomers have been already able to publish a number of scientific results based on LOFAR observations.
    
    Speaker: Andrea Botteon (IRA-INAF)
    
    rome_inaf_data-expanded.pdf
    
    rome_inaf_data-expanded_movie.avi
  - 12:00
    
    The AENEAS survey of radio archives: use cases and user interface recommendations 15m
    
    Speaker: Vincenzo Galluzzi (Istituto Nazionale di Astrofisica (INAF))
    
    The AENEAS survey of radio archives .pdf
  - 12:15
    
    Discussion: Archival new frontieres 35m
- 12:50 → 14:00
  
  Lunch
- 14:00 → 15:25
  Session 3a: From data to science and back: data processing and pipeline management - Chair: G. Polenta
  - 14:00
    
    CTA: from the Bulk Data Archive to the Science Gateway - S. Schlendstedt (Invited) 25m
    
    09-Schlenstedt-INAF-archive.pdf
  - 14:25
    
    Euclid OU-NIR Processing Function 15m
    
    Euclid is an ESA M-class mission devoted to study the dark Universe. The mission will investigate the distance-redshift relationship and the evolution of cosmic structures by measuring shapes and redshifts of galaxies and clusters of galaxies out to redshifts ~2, or equivalently to a look-back time of 10 billion years. The Euclid Science Ground Segment (SGS) is made of transnational Organization Units (OU), each corresponding to a subset of the overall Euclid Data Processing. The development of the Euclid SGS is scheduled in a series of Scientific Challenges (SCs), in which the Data Processing pipelines are tested on realistic simulations with an incremental involvement of the various OUs. In this talk, I will focus on the OU-NIR, that is the OU in charge of the reduction of the Near Infrared imaging data collected by the NISP instrument, and of the pre-processing of the spectrometer observations in collaboration with the OU-SIR.
    The NIR Processing Function is composed by a main scientific reduction pipeline and a number of calibration pipelines to characterize the instrumental effects (e.g. bad pixels, dark current, flat fielding, persistence, etc.). The Science Pipeline is made of different Processing Elements dealing with the individual tasks, from acquiring raw data up to the production of fully characterized, astrometrically and photometrically calibrated images. After a description of the NIR Processing functions, I will present the current status of the development that is being tested in the SC#4,5,6 of the Euclid SGS.
    
    Speaker: Fabiana Faustini (Istituto Nazionale di Astrofisica (INAF))
    
    OU-NIR_pipeline.pdf
  - 14:40
    
    The Additional Representative Images for Legacy: a development project for the ALMA Science Archive 15m
    
    The Additional Representative Images for Legacy (ARI-L, PI: Massardi) project aims to increase the legacy value of the ALMA Science Archive (ASA) by bringing the reduction level of ALMA data from Cycles 2-4 close to the level of the more recent Cycles processed with the ALMA Imaging Pipeline.
    The project has been recently approved by ESO and JAO and will soon start producing and ingesting into the ASA a uniform set of full data cubes and continuum images covering at least 70\% of the data from Cycles 2-4. These cubes will complement the much smaller QA2-generated image products, which cover only a small fraction (< 10\%) of the observed data for those cycles. I will present the project rationale, the feasibility study and its operational plan, that involves also INAF resources and facilities.
    
    Speaker: Marcella Massardi (Istituto Nazionale di Astrofisica (INAF))
    
    ARI_L_IA2archiveWS.pdf
  - 14:55
    
    Radio Survey Data Analysis in the Visibility Domain 15m
    
    I will talk about a use case where new scalable Bayesian methods may be used for detecting and characterize galaxies directly from visibilities of large-scale radio continuum surveys (Rivi & Miller 2018, Rivi et al 2019, Malyali et al. 2019). The analysis of radio surveys has traditionally relied on a set of image reconstruction techniques. However the imaging process may introduce artifacts and correlated noise distributions, with subsequent estimates of scientific
    parameters suffering from systematic errors that are difficult to accurately estimate. Until recently this has not been a major issue, but the increased sensitivities and size
    of the forthcoming generation of radio interferometers, such as SKA will allow new scientific measurements, such as weak lensing, that require more reliable and complete source catalogues, meaning higher accuracy in galaxy detection and
    characterization. An alternative approach to image reconstruction is to work directly in the visibility domain, where the data originates and it is not yet affected
    by the systematics introduced by the imaging process. Modelling of Direction Dependent Effects, obtained in observations of large fields of view or from radio
    telescopes with non-coplanar baselines, may also be easily introduced in model fitting techniques for galaxy parameter estimation and data calibration. This novel approach is very promising but also computationally very challenging
    because of the large size of datasets that must be processed and the source number density expected to reach. For example, a nominal weak lensing survey using the
    first 30% of Band 2 will require 30 kHz frequency channel bandwidth and 0.5 seconds sampling time to make smearing effects tolerable, meaning about 20 PB of raw visibilities per pointing for 1 hour integration time with the current design of
    SKA-MID. Adding that the expected number of sources for such surveys is in the order of 104 per field of view, it is clear that this analysis will require tools exploiting High Performance Computing (HPC) infrastructures and has to be
    performed where the data is stored.
    
    Speaker: Marzia Rivi (Istituto Nazionale di Astrofisica (INAF))
    
    MRivi.pdf
  - 15:10
    
    Italian Astrophysical Research with LSST 15m
    
    The Istituto Nazionale di Astrofisica (INAF) is participating in LSST (Large Synoptic Survey Telescope). The key idea is collecting, handling and sharing to the Italian community the time series data and images provided/collected by LSST. Italian astronomical community is involved in different projects from the stellar to the extragalatic to the cosmological topics. This means a unique opportunity to fully exploit, but also to further improve, the current experience in the field of efficient e-infrastructures, data archiving, modeling, processing, mining and fusion, across different wavelengths, temporal and spatial scales. The new very recent LSST policy provides a partnership model focussed on in-kind contributions (hardware and/or software) instead of monetary contributions. On this basis and to give a competing world-wide opportunity to the national astrophysical community, it will be important to develop the infrastructure (hardware, software) required to deal with the massive amount of data produced by LSST in the near future.
    
    Speaker: Dr Ilaria Musella (INAF)
    
    LSST_BIGDATA2.pptx
- 15:25 → 15:45
  
  Coffee Break 20m
- 15:45 → 18:00
  Session 3b: From data to science and back: data processing and pipelines management - Chair E. Molinari
  - 15:45
    
    Data exploiting, mining, preserving and visualization in Cosmological Numerical Simulations 20m
    
    With the ever-growing power of supercomputers, numerical simulations are becoming more detailed, more sophisticated and.. bigger, year after year. One problem of the field is the storage and utilization of produced data. I will present example production dataset in the field of the formation and evolution of galaxies and galaxy clusters, and describe the needs of our community as far as anaysis, visualization and long-term storage are concerned. I will discuss the possible access to these data, sketching an idea of various type of users.
    
    Speaker: Giuseppe Murante (INAF - OATs)
    
    RomaArchivi2019-Murante.pptx
  - 16:05
    
    Integrated data analysis for precision spectroscopy 15m
    
    To achieve its purpose, the analysis of astronomical data must be (1) reproducible, (2) versatile in handling both multi-messenger observations and simulations, (3) easily automatized and scalable. In the field of optical spectroscopy, the demand for high-level procedures to process the data has been met by a growing number of dedicated software packages; what is still lacking is a shared environment to seamlessly combine this resource to fully exploit their potential. An effort in this direction is provided by Astrocook, a Python package developed at INAF-OATs to analyze quasar spectra and equipped with its own graphical user interface to launch procedures and create workflows. In this talk I will describe the current status of Astrocook and raise the issue of its possible connection to an archive 2.0 framework.
    
    Speaker: Guido Cupani (Istituto Nazionale di Astrofisica (INAF))
    
    2019-06-18 INAF Science Archives & the Big Data Challenge.pdf
  - 16:20
    
    GAPS Time Series resource and service implementation 15m
    
    The publicly available Time Series from GAPS exoplanets observations have been proposed as a use case to provide requirements to the IVOA in the effort of standardizing modelling, discovery and access to time series datasets.
    The interoperability effort has continued and following the current modelling draft and scenario a prototype TAP service has been built at the Italian center for Astronomical Archives to deploy the GAPS Time Series datasets. The service embeds the time series modelling solution for time axis representation and, in addition, provides a view for the recently started activity of homogeneizing the representation
    and discovery of exoplanetary datasets. This contribution reports current resource and service status and foreseen developments, that will focus on two main points: VO compliant serialization of the output and modelling of the
    exoplanetary metadata and attributes of the datasets themselves.
    
    Speaker: Andrea Bignamini (Istituto Nazionale di Astrofisica (INAF))
    
    Bignamini_BigDataMeeting_Roma_GAPSTimeSeries.pdf
  - 16:35
    
    The REM images archive: 15 years of a RDBMS and Web oriented system 15m
    
    REM produces an average of 1500 images per night through its Optical (ROS2) and IR (REMIR) cameras. The observation log, images and other data are managed via DB (MySQL) oriented s/w. All the collected images are transferred in real time by a three nodes client/server communication system from La Silla to OAS-Bologna. The REM DB system stores not only information available from the FITS header, but also computed parameters, like the images footprint on sky. It also automatically manages proprietary/public data access and keep tracks of its usage. A PHP-JavaScript web interface allows the users not only to retrieve images based on search criteria, but also to inspect them interactively and perform a number of actions like objects detection and reference catalogs overplot. Custom and public tools are used to this aim, e.g. AladinLite and JS9. Further capabilities are being implemented. I will describe the various components of the REM archive system and show how its architecture is perfectly suitable to manage multi-telescope/camera images.
    
    Speaker: Luciano Nicastro (Istituto Nazionale di Astrofisica (INAF))
    
    17-nicastro_archives_ws19.pptx
  - 16:50
    
    Discussion: Workflow systems 1h 10m
Wednesday 19 June
- 09:00 → 10:40
  Session 4a: After science: Overview - Chair: A. Grado
  - 09:00
    
    INAF Scientific Director vision 15m
    
    Speaker: Filippo Maria Zerbi (Istituto Nazionale di Astrofisica (INAF))
  - 09:15
    
    Challenges in data management and distribution within the terrestrial network of gravitational wave detectors - P. Astone (Invited) 25m
    
    The recent discovery of Gravitational Waves (GWs) from merging black holes and inspiraling neutron stars, an event seen as GWs, short gamma-ray burst and subsequent kilonova by space and ground-based observatories, have marked the beginning of multimessenger astronomy.
    LIGO and Virgo collaborations have organized prompt low-latency analyses for transient (limited time duration) signals, offline analyses for longer duration signals or for refined analysis on transient signals, and efficient methods to distribute the relevant information to interested parties. GW raw data from these discoveries, together with instructions, examples and analysis codes, is publicly available. In this talk I will describe the data management infrastructure, from production to distribution, for the different classes of GW signals we are presently looking for.
    
    Speaker: Dr Pia Astone (INFN, sezione di Roma)
    
    01-PiaAstone_INAF2019.pptx
  - 09:40
    
    Machine learning implementation in the multi-messenger search of gravitational wave sources 15m
    
    Currently, the difficulty of multi-messenger search of gravitational waves (GWs) in optical is the large localization of GWs, which would result in a huge amount of candidates. Meanwhile, the crucial point for such optical counterpart (r-process kilonova) search is the cadence, since the magnitude of kilonova would decrease very fast. Therefore, some tools that can assist in automatic transient candidates evaluation is needed for such real-time astronomical search. In my talk, I will describe a machine learning tool, for rapid and efficient transient candidate selection by exploiting the differencing images.
    
    Speaker: Sheng Yang (Istituto Nazionale di Astrofisica (INAF))
    
    02-machine learning.pdf
  - 09:55
    
    The European Open Science Cloud: an opportunity for the Italian astro community 15m
    
    Given the level of funding being invested by the EU under the Horizon 2020 programme, the European Open Science Cloud (EOSC) is becoming an important asset for research in our continent. Among EU-funded projects, the EOSCpilot project defining EOSC policies and implementing pilot infrastructure and applications has just been completed; EOSChub is underway implementing the initial EOSC operational infrastructure; ESCAPE has been recently kicked-off with the goal of tailoring EOSC for the needs of the astro and particle physics communities. This status provides quite an opportunity for the astro community to exploit EOSC for building and optimising an integrated environment capable of managing, interoperating, processing, analysing data, involving in this process all main archives and the VO.
    
    Speaker: Fabio Pasian (INAF - OATs)
    
    03-Pasian - EOSC - 190619.pptx
  - 10:10
    
    INAF Long Term Preservation and Curation 10m
    
    The LTP & C of astronomical data is not only the preservation of raw data archives, but has the purpose of preserving and enhancing the knowledge applied to reducing them by creating a bridge between the scientific articles and reduced data. Part of this data, especially if tables, are already preserved through services like CDS, but the reduced data do not yet find their place and identification. Above all, for their identification, the possibility of connecting them through a DOI (Digital Object Identifier) is under development. The state of the art of this activity will be illustrate.
    
    Speaker: Riccardo Smareglia (Istituto Nazionale di Astrofisica (INAF))
    
    04-Smareglia - Long term preservation.pptx
  - 10:20
    
    Discussion: Future solutions 20m
- 10:40 → 11:00
  
  Coffee Break 20m
- 11:00 → 13:00
  Session 4b: After Science: Interoperability - Chair: R. Smareglia
  - 11:00
    
    Astrophysical Resource Interoperability - M. Molinaro (invited) 25m
    
    Interoperability is a core part of astrophysical research
    since decades; from agreed data exchange formats,
    even before the existence of network exchange, to
    higher levels of interoperation anticipating the current
    Open and FAIR paradigms. Efforts and solutions proposed by the IVOA community, and taken up by world class data centers, closely follow this attitude and provide a flexible platform to match scientific user requirements. In the framework of ESCAPE project, integration of the
    same infrastructure has started at the level of the EOSC
    to provide input and gain on a larger community. Near future goals combining astrophysical data resource
    interoperability with added value services and computational
    resources should start from and take advantage of what
    the Virtual Observatory already provides.
    
    Speaker: Marco Molinaro (Istituto Nazionale di Astrofisica (INAF))
    
    Molinaro_AstroResInterop_ArchivesWS2019.pdf
  - 11:25
    
    Towards developing new Italian IVOA simulation data services: I. Exoplanet case - ARTECS data service-archive of terrestrial-type climate simulations. 15m
    
    Speaker: Stavro Lambrov Ivanovski (Istituto Nazionale di Astrofisica (INAF))
    
    06-IA2_Roma_IVANOVSKI.pptx
  - 11:40
    
    Exo-MerCat: a merged exoplanet catalog with Virtual Observatory connection 15m
    
    The heterogeneity of observational papers makes every attempt to write a uniform catalog almost impossible. Our aim is to build a new catalog selecting the best targets whose datasets were included in one or more of the four major exoplanets online databases: NASA Exoplanet Archive [1,A], Exoplanet Orbit Database [2,B], Exoplanet Encyclopaedia [3,C] and Open Exoplanet Catalogue [4,D].
    We wrote a Python code that collects and selects the most precise measurement for all interesting planetary and orbital parameters, taking into account the presence of multiple aliases for the same target. For each parameter, the code stores the corresponding reference paper link. For this reason, when the merging process is completed it could be possible to have a final dataset for each target which is not necessarily composed of consistent measurements. It is however not essential for our statistical purposes.
    The code is able to download the source files from the three catalogs by use of VO ConeSearch connections to the major stellar catalogs such as SIMBAD [5] and those available in VizieR [6]. It also retrieves the compulsory user preferences through a Graphic User Interface, which allows choosing all sorts of parameter range selection. It is able to generate automatic plots that are commonly used in the exoplanetary community, but the user can retrieve and manipulate data at will. Exo-MerCat is ingested into a proper database with TAP service and can be queried by all the VO-aware TAP-enabled applications.
    
    Speaker: Eleonora Alei (Istituto Nazionale di Astrofisica (INAF))
    
    07-ExoMerCat.pdf
  - 11:55
    
    Advanced archival data analysis tools 15m
    
    As we are approaching the era of big data in astronomy, tools to address the challenges of database cross-correlation and automated advanced image product generation are needed. Here we present two such tools - KAFE and TOAST. KAFE, the Key-analysis Automated FITS-images Explorer, is a web-based FITS image post-processing analysis interface designed to be applicable in the radio to sub-mm wavelength domain. KAFE was developed to complement selected FITS files with metadata based on a uniform image analysis approach as well as to provide advanced image analysis diagnostics in a fully automated way. The Telescope Observational Astronomical Sample Tool (TOAST) is a data visualisation platform whose purpose is the taxonomy of multi-wavelength telescope archive contents alongside with the exploration of the physical parameter space probed by existing observations through catalogue cross-matching. TOAST addresses both galactic and extra-galactic databases and can be used twofold: astronomical archive/database content visualisation as well as user-specific data sample generation.
    
    Speaker: Sandra Burkutean (Istituto Nazionale di Astrofisica (INAF))
    
    08-rome_2019_burkutean.pdf
  - 12:10
    
    Discussion: DOI 50m
- 13:00 → 14:00
  
  Lunch 1h
- 14:00 → 15:55
  Session 5a: Challenges in science data management: science gateways - Chair: C. Knapic
  - 14:00
    
    Riding space-borne data with the TLS prototype 15m
    
    The authors present an update on the status of the implementation of the TLS project, part of the MITIC "premiale" initiative, at the Gaia DPCT. We will address the experiments, both scientific and technological, selected as best representing the motivations behind the TLS idea. These include some of the fundamental elements for proving the paradigm of long-term preservation of data and calibrating pipelines.
    
    Speaker: Mario Gilberto Lattanzi (Istituto Nazionale di Astrofisica (INAF))
    
    LattanziICT2019RomaINAF-Hq.pdf
  - 14:15
    
    SKA RCs 15m
    
    SKA regional Centers infrastructure will be presented
    
    Speaker: Riccardo Smareglia (Istituto Nazionale di Astrofisica (INAF))
    
    10-Smareglia - SKA - RC.pptx
  - 14:30
    
    Vialactea Visual Analytics Tool for Star Formation Studies of the Galactic Plane 15m
    
    Vialactea Visual Analytics Tool, based on the VisIVO suite, is an innovative environment for the study of star-forming regions on our Galaxy . It allows an integrated analysis and exploitation of the combination of all new-generation surveys (from infrared to radio) of the Galactic Plane from space missions and ground-based facilities, using a novel data and science analysis paradigm based on 3D visual analytics and data mining frameworks.
    The implementation philosophy behind the tool is to make transparent to the scientist the access to all information without requiring technical skills to access all the data stored the into the ViaLactea Knowledge Base (VLKB) that contains files in FITS format (from 2D images in the radio continuum to 3D FITS cubes containing radio velocity spectra at specific molecular lines and, also, a collection of 3D extinction maps) and a relational database that completes the VLKB resource content in terms of knowledge derived from the data and contains information related to 2D maps; filaments and bubbles; compact sources; and radio cubes.
    
    Speaker: Fabio Roberto Vitello
    
    11-INAF_archivi - Vialactea Visual Analytic Tool for star formation studies.pptx
  - 14:45
    
    Data exchange and computation between data centers using common A&A 15m
    
    The new Authentication and Authorization system in INAF called RAP is capable to handle multiple accounts for web application usage and it was a joint venture between Radio Astronomical Institute (IRA) and Italian Astronomical Archives (IA2) in the scope of SKA Pre-construction phase and in use at IA2. Both working groups shared skills and experiences in the field of Authentication and Authorization to allow users and client applications to access remote resources, data and services. An advancement and prototyping pilot is under development and it is composed by RAP, an Authorization module (Grouper) and a service to allow account linking authorizations sharing using Virtual Observatory recommendations, in order to recover and analyse data from their own experiments. The aim was to implement a multi protocol authentication mechanism SAML2.0, OAuth2, X.509 and Self Registration, to permit the account linking (join of digital identities) and to manage groups of users. This talk will describe the current harmonization activities between existing systems and the recommendations of the IVOA. This activity will be applied also in the AENEAS- ESRC scope to validate requirements and provide an effective test bed.
    
    Speaker: Franco Tinarelli (Istituto Nazionale di Astrofisica (INAF))
    
    12-RC Data Exchange.pdf
  - 15:00
    
    Discussion: Science gateways 55m
- 15:55 → 16:15
  
  Coffee Break 20m
- 16:15 → 17:25
  Session 5b: Challenge Data Management (computing) - Chair: R. Morbidelli
  - 16:15
    
    LOFAR-IT: the computing infrastructure 20m
    
    On March 2018, INAF approved the participation to the International LOFAR Telescope (ILT) and the organization of a LOFAR-IT Consortium leaded by INAF with the participation of University of Turin (Department of Phyiscs: DP-UniTO) The consortium is aiming at coordinating at Italian level the participation to ILT.
    A Board of the LOFAR-IT project has been established to manage the whole participation of INAF to ILT. INAF will contribute to the development of LOFAR2.0 with an agreement with AstroTec to provide a LOFAR2.0 station in 2021- 22.
    INAF and the LOFAR-IT Consortium have also decided to create an e-infrastructure for the reduction/analysis of LOFAR data mainly for the Italian Astronomers..
    In this talk we will present the main characteristics of the LOFAR-IT e-infrastructure, the harmonization with other INAF National Infrastructure, including archive infrastructure and the future development of the facilities . We will finally discuss the activities and studies we are doing on the LOFAR pipelines to study the computational and storage needs of the Italian community that are driving the implementation of the e-Infrastucture.
    
    Speaker: Ugo Becciani
    
    14-Becciani_LOFAR_IT.pdf
  - 16:35
    
    High Performance Computing and Artificial Intelligence for Solar System activities at SSDC 20m
    
    The Solar System exploration activities at SSDC are constantly growing and the advent of the brand new version of the MATISSE tool is going to represent the real turning point for the exploitation of powerful techniques, such as High Performance Computing (HPC) and Artificial Intelligence (AI).
    Both of these two fields are currently under development using Juno-JIRAM data as test cases, with the parallelization of the atmospheric retrieval algorithm by Grassi et al. (2010, 2017) and the usage of a Computer Vision (CV – a subset of Artificial General Intelligence, AGI) system to automatically identify White Ovals in the Jupiter atmosphere.
    In the first case, the SSDC team is developing a set of optimization techniques to achieve the parallel computing of existing non-parallel retrieval algorithms. A generalized library for HPC is the final objective of this activity, to be considered as an API (Application Programming Inteface) for similar science cases with SSDC involvement. The generalization is aimed both at the platform abstraction level (distributed memory model rather than shared memory model) and at the programming language level (by means of language binding into python environment and wrapping of native code).
    The reference platform for development is currently the Microsoft Azure cloud, used as a suitable workbench to define, test and profile different scenarios, while the reference HPC target platform is a cluster system based on Dell technology.
    In the second case cited, by means of a feature engineering phase constrained by a priori physical knowledge of the Jupiter atmosphere, nine wavelength channels were selected from the Juno-JIRAM multi-spectral imagery acquired during the Juno’s first perijove as those most sensitive to changes in physical variables expected to characterize White Ovals, including cloud densities, cloud altitudes, ammonia concentration, atmospheric temperature, etc. For visualization purposes, the selected nine channels were depicted as three false-color monitor-typical RGB images showing sensory data correlated with, respectively, cloud densities, cloud altitudes and temperature-driven phenomena.
    To mimic the capacity of human vision in detecting White Ovals across each of the three selected RGB images, a computational model of human vision was designed and implemented as an automated CV algorithm, requiring no human-machine interaction to run and subsequently instantiated as follows. (A) (Numeric) RGB data were submitted to color constancy and mapped by RGBIAM (RGB image automatic mapper) onto (categorical) color names. (B) Image-object shape indexes were estimated and discretized into fuzzy sets (low, medium and high). (C) Discrete and finite sets of target-specific color names and geometric fuzzy sets were logically combined by a target-specific CV decision-tree classifier.
    This system, although applied preliminarily to a single data set, scored “high” in terms of outcome and process quantitative quality indexes, such as degree of automation, accuracy, efficiency and scalability.
    As a future development, the CV system robustness (variance) to changes in input data will be assessed upon multiple Juno-JIRAM data acquisitions, structured according to the proposed RGB image triplets and can be also thought to be used to other type of data typically treated by SSDC, such as astronomical images or even Earth-observation ones.
    
    Speaker: Angelo Zinzi (SSSC - ASI)
    
    13-Zinzi-HPC+AI_SSDC_v2.pptx
  - 16:55
    
    Discussion: Archives and computation 30m
- 17:25 → 18:00
  
  Closing remarks
  
  wrap-up.pdf

Choose timezone

INAF Science Archives & the Big Data Challenge

INAF