Sant Antoni Mª Claret, 171
08041 Barcelona, Spain
This workshop is organized in the framework of OBELICS (Observatory E-environments LINked by common ChallengeS) work package of ASTERICS. OBELICS activities aim at encouraging common developments and adoption of common solutions for data processing, archive, analysis and access among ESFRI and world class projects in Astronomy and Astroparticle Physics, such as CTA, SKA, KM3NeT, EUCLID, LSST, EGO-Virgo, E-ELT.
The ASTERICS – OBELICS workshops aim at building bridges between ESFRI projects, concerned scientific communities, e-infrastructures, industries and further consortia.
The 2nd ASTERICS – OBELICS Workshop will address potential connections between the ESFRI projects and the implementation of EOSC for data interoperability.
“Astronomy & Astroparticle Physics assets in building the European Open Science Cloud”
OBELICS- Data Generation and information extraction overview
Prof.Jose Luis Contreras
(Universidad Complutense de Madrid)
SW and Simulation in ASTRI
Fast convolutional resampling on parallel architectures
In radio astronomical interferometry and other applications, the
measurements and the image are related by an (approximate) Fourier
In these cases, it is often necessary to resample the measurements onto a
regular grid to be able to use the Fast Fourier Transform (FFT).
Resampling includes a convolution to suppress aliasing. The convolution
function can also include a correction for deviations of the measurement
equation from a Fourier transforms, for example instrumental or atmospheric
Especially for high update rates of the correction, this can become
For LOFAR (and future radio observatories) the data volumes are too large to
be send to the end user for further processing.
The data needs to processed at LOFAR central processing.
The processing pipeline needs to run near real time, otherwise an ever
growing backlog will arise.
This requirement could not be met when quickly varying corrections for
atmospheric effects where included using the conventional approach.
Image Domain Gridding (IDG) is a convolutional resampling algorithm designed
from the start to maximize parallelism.
The result is an algorithm that is not the most computationally efficient in
pure operation count, but maps very well onto massively parallel
architectures. It outperforms other approaches that do fewer compute
operations, but are not optimized for parallelism.
Within the DOME project this algorithm has been implemented, optimized and
benchmarked for various parallel architectures.
Within the OBELICS project we have analyzed the accuracy of the algorithm,
embedded it into an imager for the LOFAR pipeline, and benchmarked the
overall performance. Demonstrating that the LOFAR requirements can be met
using the GPUs that are part of the LOFAR cluster
DrSebastiaan van der Tol
DL3: An open high-level data format for gamma-ray astronomy (D-GEX)
The VHE gamma-ray astronomy is evolving with CTA away from the old model of collaboration-led experiments towards that of a public observatory, where guest observers will submit observation proposals and have access to the corresponding data, software for scientific analysis and support services. We believe the open high-level data format (DL3) currently being developed for CTA (see [open-dl3]) could be extended to be used by all Imaging Atmospheric Cherenkov Telescopes (IACTs) and possibly other high energy observatories (e.g. water cherenkov detectors or even neutrino telescopes). Following similar initiatives within other IACTs, we developed a test pipeline to convert MAGIC data products into the DL3 format as a testbed. These tools are currently being used to test and extend the proposed open DL3 format and future CTA science tools, such as ctools or gammaPy.
Tests of parallel filesystems on the intel Xeon-D
Parallel filesystems are a key component of every HTC and HPC clusters. Using such filesystems on low power SOCs such as ARM or x86 give the possibility to deploy cost effective storage volumes with respect to usual storage appliances. A promising low power CPU that provides server-grade perfomances with a low TDP is the Intel Xeon-D with which I setup a small cluster using the BeeGFS filesystem developed by the Fraunhofer institute. A report on the benchmarks performed will be presented
D-INT intro and overview
STOA - Large scale interferometry pipeline (D-INT)
Extracting meaningful data products form large, heterogenous sets of multiple radio observations can be labour intensive and difficult. We present STOA (Script Tracking for Observational Astronomy), a web application that provides a fast run-test-rerun work cycle for these situations. We demonstrate a use case on the ALMA archive and show how STOA can integrate with existing software and work patterns.
(University of Cambridge)
LSST: Qserv integration into science pipelines (D-INT)
Python high-level programming language in the field of astrophysics and astroparticle
Python high-level programming language in the field of astrophysics and astroparticle
physics is the most widely used software langage, but its performances is still challenging for researchers. In this talk, I will present performance benchmark results that we have obtained by comparing Numpy Python libraries and a mathematical Python library wrapped on our own optimized C/C++ code. Along with benchmark performance, I also present some of the best practices for python developers.
D-ANA intro and overview
A&A activities under OBELICS with links to AARC2 and other EU projects (D-ANA)
The first part of the presentation (<10') will be given by Fabio and will summarise OBELICS A&A activities as delivered in D3.10, with links towards the EGI-Engage EU project. This will be followed by a presentation (10') given by Sonia Zorba of INAF-OATs on SKA and links to AENEAS. The final part (10' given by Alessandro) will discuss CTA and its links towards AARC2 as an opportunity to get the most benefit from AARC2-OBELICS cooperation.
CORELib: A COsmic Ray Event LIBrary for Open Access (D-ANA)
Cosmic rays are a common background source for experiments in astroparticle physics and neutrino astronomy. The requirements of computing power needed to simulate air showers are heavily dependent on the energy window of interest, the simulated processes, the minimum energy of products and the inclination of the primaries. CORELib is a cosmic ray event library that is meant to be open to access to satisfy a broad range of needs. Although models are always changing and improving, there is a need for a reference dataset suitable also to develop and compare the performances of reconstruction and classification algorithms. The status of production is reviewed and the challenges in data sharing are discussed.
(University of Salerno and INFN)
Improved characterisation of sources in interferometry images using MCMC (D-ANA)
Source finding methods currently in use typically focus on processing of CLEANed images on a pixel-by-pixel basis. We present a Bayesian method that is designed to work better with interferometry products and is extensible to multi wavelength astronomy.
(University of Cambridge)
High Performance Computing softwares for astronomy and astrophysics
Package for Likelihood-Based Fitting Methods
Integration of CASA with Jupyter for efficient remote processing (D-ANA)
Task-based distribted processing for radio-interferometric imaging with CASA
Task-based structuring of data processing has a long tradition in radio astronomy. Originally this architecture was driven by very high ratio of input data volume to working memory size and the tasks almost always had sequential dependencies, hence forming a pipeline. With recent rapid increase of number of baselines, bandwidths (and in some cases beams) with which interferometric data are recorded there is an increasing interest parallelising and distribution radio astronomy processing using tasks. The SKA Science Data Processor architecture, for example, is acutely dependent on such task based distribution schemes. I will explain the motivation for these approaches and some of the challenges.
In second half of the talk I will present the architecture of a task-based parallelisation system for CASA built on top of the SWIFT/T framework. This parallelisation system is now in use in Cambridge for imaging-based processing and calibration of data from the Hydrogen Epoch of Reionization Array (HERA) telescope, an official SKA precursor telescope.
(University of Cambridge)
The Role of the EOSC HLEG & expectations 2017-2018 - Silvana Muscella, Chair, High Level Expert Group on European Open Science Cloud
The talk will provide an overview of the goal of the groups which is the setup of a data-driven infrastructure that builds on: what exists, intends to cater for the whole scientific community and provides the governance and services that are today missing.
EOSC-hub: Project overview and contribution to the EOSC initiative - Tiziana Ferrari
The presentation will provide an overview of the H2020 project EOSC-hub and of how it will contribute to the implementation of the European Open Science Cloud initiative of the EC.
The EOSC-hub project creates the integration and management system (the
Hub) of the future European Open Science Cloud that delivers a catalogue of services, software and data from the EGI Federation, EUDAT CDI, INDIGO-DataCloud and major research e-Infrastructures. The Hub builds on mature processes, policies and tools from the leading European federated e-Infrastructures to cover the whole life-cycle of services, from planning to delivery. The Hub aggregates services from local, regional and national e-Infrastructures in Europe and worldwide.
The Hub acts as a single contact point for researchers and innovators to discover, access, use and reuse a broad spectrum of resources for advanced data-driven research. Through the virtual access mechanism, more scientific communities and users have access to services supporting their scientific discovery and collaboration across disciplinary and geographical boundaries.
The project also improves skills and knowledge among researchers and service operators by delivering specialised trainings and by establishing competence centres to co-create solutions. The project creates a Joint Digital Innovation Hub that stimulates an ecosystem of industry/SMEs, service providers and researchers to support business pilots, market take-up and commercial boost strategies.
EOSC-hub builds on existing technology already at TRL 8 and addresses the need for interoperability by promoting the adoption of open standards and protocols. By mobilising e-Infrastructures comprising more than 300 data centres worldwide and 18 pan-European infrastructures, this project is a ground-breaking milestone for the implementation of the European Open Science Cloud.
EOSCpilot Project - Architecture and the services - Brian Matthews, STFC
The presentation will provide an overview of the EOSC-Pilot project. This project is a pathfinder project
for the European Open Science Cloud which brings together research institutes, research infrastructures and
e-infrastructure providers together to explore how to set the direction for the EOSC, making
recommendations on how what the EOSC should provide, how the EOSC should operate and how it might change science.
· Recommend a governance framework for the EOSC and contribute to the development of European open science policy and best practice;
· Recommend an architecture to support the provision of common services and allow the interoperability of data and services;
· Explore how to develop people and organisations so that best practise in open-science can be propagated;
· Develop a number of demonstrators functioning as high-profile pilots that integrate services and infrastructures to show interoperability and its benefits in a number of scientific domains;
A key feature of the project is a wider engagement with a broad range of stakeholders, crossing borders and communities, to build the trust and skills required for adoption of an open approach to scientific research.
(Leader, Data Science and Technology Group ,Scientific Computing Department, Science and Technology Facilities Council)
EOSCpilot Science Demonstrator 1 (LOFAR) - Rob van der Meer
ASTRON has developed and is operating the LOFAR telescope facility both in the Netherlands and as part of the ILT. The facility collects a large amount of data of which, after correlation and first reduction, about 7 PB per year user data products are stored in the Long Term Archive (LTA).
ASTRON has focused on providing software programmes and pipelines to create user ready data products in the archive. The archive is distributed over three sites. Users have access to the data through one portal, but user site data reduction still involves local clusters and therefore transporting high volumes of data. The focus for optimisation has mostly been on the facility side. With the EOSC Pilot Science Demonstrator for LOFAR we aim to improve the user experience both for power users and non-power users. We will build on existing knowledge and combine existing tools to show the complete path from facility to user, to demonstrate it can be done and to demonstrate the capacity and shortcomings, or challenges for the future.
I wil also present a pilot from the JIVE institute, to make the European VLBI Network (EVN) data accessible through the cloud. The EVN archive has everything that a large archive has, except the size. Providing a cloud version would be a good test case for storage and data reduction in the cloud.
Third point on my list will be the AENEAS goal to design a European Science Data Center for the SKA. Building on the existing infrastructure and using knowledge and requirements of current large archives and compute facilities, and mapping a scale increase of one to two orders of magnitude, we will stretch the capacity of any cloud or existing infrastructure to the limit. It is therefore very important that the design of the ESDC runs parallel to the emergence of the EOSC and on the way learning from each other.
(JIVE), DrRob van der Meer
Long-Term Data Preservation in the EOSC according to FAIR principles - Jamie Shiers
Research Infrastructures such as the ones on the ESFRI roadmap and others, are characterised by the very significant data volumes they generate and handle. These data are of interest to thousands of researchers across scientific disciplines and to other potential users via Open Access policies. Effective data preservation and open access for immediate and future sharing and re-use are a fundamental component of today’s research infrastructures.”
An important question is how the European Open Science Cloud (EOSC) can / could help address these needs?
This talk will cover not only our experience with a Science Demonstrator in the on-going EOSC Pilot but also address how the needs of ESFRI and ESFRI-like projects could be handled.
It will also cover issues related to the FAIR principles that have recently been expanded to cover not only data / meta-data but also other “products” such as software.
HNSciCloud-EOSC - Bob Jones
Helix Nebula Science Cloud (HNSciCloud www.hnscicloud.eu ) is a H2020 Pre-Commercial Procurement project that has contracted commercial cloud providers to develop services that can become part of a hybrid cloud model tailored to the needs of the research community in Europe. The HNSciCloud services and hybrid cloud model are being tested by multiple scientific communities with a range of use-cases including CTA and will deployed for pilot usage at the start of 2018. This approach has been endorsed by the EIROforum members as a basis for a federated scientific data hub
(https://www.eiroforum.org/science-policy/eiroforum-directors-meet-european-commissioner-carlos-moedas/ ) to support the needs of their research communities in the future.
In June 2017 the EC organised a summit on the European Open Science Cloud (EOSC)
in which the role of commercial service providers and users was acknowledged. The summit resulted in a draft declaration by stakeholders that will be used as a basis for discussions with member states.
This presentation will give an overview of the HNSciCloud services, the plans to open-up their pilot usage to more research communities and how such services are foreseen to be part of the implementation of the EOSC.
HL-LHC, WLCG and the EOSC - Ian Bird
The upgrades of the LHC and its large detectors, planned for the middle of the next decade (the HL-LHC project), will pose significant new challenges for the computing and data infrastructure. HL-LHC will produce several Exabytes of scientific data per year, and will require some 20 M cores of processing power. The WLCG community is investigating potential changes to the computing models and distributed computing infrastructure that will be needed to meet those challenges over the coming years. In particular, it will address how to create a data infrastructure at the Exabyte scale, that is able to manage and process data in an effective way. This will be done in collaboration with other very large data science projects, and the results could form the basis of a data infrastructure for EOSC. The talk will review some of the lessons learned from the experience in WLCG, and discuss some of the ideas being suggested for the future.
The CERN-SKA Collaboration
The planned upgrade of CERN’s LHC facility to higher luminosity is due to happen by the mid-2020s – the same timescale for SKA-1 Operations. Both facilities will produce hundreds of petabytes of data (and exabytes over time), and will need a global network of e-Infrastructures to turn this data into science. This talk will present common challenges the two projects have identified and outline plans to collaborate in order to tackle the challenges.
Virtual Observatory Talk
Opening of Panel Discussion on behalf of ESFRI projects – Giovanni Lamanna
ESFRI-EOSC Panel Discussions Part 1
Chair: Dr Giovanni Lamanna, CTA, LAPP.
•Paul Alexander, SKA
•Simon Berry, SKA
•Silvana Muscella, EOSC-HLEG
•Ian Bird, WLCG
•Maurice Poncet, EUCLID
•Darko Jevremovic, LSST
•How the experience gained on ESFRI projects (eg. SKA, CTA etc) and Cluster actions ( ex. ASTERICS-OBELICS) can help EOSC to engage with existing communities and infrastructures?
•How this experience gained from ESFRIs to conceive and to prepare computing models for data management could help EOSC to develop most common possible scheme for data preservation ?
•In the context of the interoperability of data and the re-use of software, what added value can the ESFRI experience of connecting different research infrastructures (from different domains) provide for the realization of the EOSC?
•Despite the success of ESFRI, fragmentation across domains still produces siloes and isolated solutions. What incentives could the EOSC provide to encourage improved interconnection in ESFRI projects?
•The ESFRI initiatives have a stronger connection with those European Member States that provide both financial and political support to the scientific communities behind them. How could this connection contribute to the realization of the EOSC Roadmap for governance and funding?
Participants can have a look at some of the useful weblinks providing more information on European Open Science Cloud on the workshop webpage : https://indico.astron.nl/internalPage.py?pageId=10&confId=87
GammaLearn, building a deep learning foundation for the Cherenkov Telescope Array - Luca Antiga, CEO, Orobix
The advancements in artificial intelligence and deep learning occurred over the last few years have opened new avenues for building next generation data analysis pipelines. The GammaLearn project, kicked off in September 2017, aims to leverage these methodologies to address event discrimination as well as energy and direction estimation for the Cherenkov Telescope Array. In this talk I will provide an overview of the approaches that will be put into action, along with examples of lessons learnt from prior experiences on manufacturing and medical imaging applications. The talk will elucidate the ongoing activities in the GammaLearn project, the unique challenges posed by the analysis of IACT data and the new opportunities created by the possibility of pushing data analysis at the edge.
Machine Learning for Gravitational Wave: how to classify transient signals in LIGO and Virgo detectors - Elena Cuoco, EGO
Noise of non-astrophysical origin contaminates science data taken by the Advanced Laser Interferometer Gravitational-wave Observatory and Advanced Virgo gravitational-wave detectors. Characterization of instrumental and environmental noise transients has proven critical in identifying false positives in the first aLIGO observing runs. We investigated new ways to identify and classify signals using Machine Learning techniques. We used unsupervised algorithms for not labeled transient signals and we started using more efficient methods using supervised methods as Deep Learning on labeled training data set.
We investigated images classification techniques based on GPU technology, which can be used for pattern recognition. After an introduction of the problem, I’ll go through the main algorithm or technical solution which we efficiently used and plan to use.
Applications of deep learning in wide-field cosmological surveys - Francois Lanusse, McWilliams Center for Cosmology at Carnegie Mellon University
The next generation of cosmological surveys such as the ones conducted by LSST, Euclid and SKA will bring unprecedented constraints on the nature of dark matter and dark energy. They also entail new challenges, in particular from the sheer volume of data they will produce. In this talk, I will mention some exciting applications of Deep Learning to address these challenges at different levels, from image processing to modelling galaxy physics. I will focus in particular on the problem of automated strong lens finding (see https://goo.gl/TnnTLE), a typical image classification problem, to illustrate how Deep Learning can have a profound impact on a science analysis pipeline, in this case by dramatically reducing (and maybe even eliminating) the need for human visual inspection. As a point of reference, it was estimated that previous methods would have required around one million volunteers pariticipating in a citizen science initiative to classify the whole LSST survey in a matter of weeks.
Shallow and deep machine learning applications in KM3NeT- Stefan Geißelsöder, ECAP - University of Erlangen
Portable workflows using the Common Workflow Language standards
This talk will introduce the Common Workflow Language project. In July 2016 they released standards that enable the portable, interoperable, and executable description of command line data analysis tools and workflow made from those tools. These descriptions are enhanced by CWL's first class (but optional) support for Docker containers. CWL originated from the world of bioinformatics but is not discipline specific and is gaining interest and use in other fields. Attendees who want to play with CWL prior to attending the presentation are invited to go through the "Gentle Introduction to the Common Workflow Language" tutorial on any OS X or Linux machine on their own time: http://www.commonwl.org/user_guide/