Open data workshop: sharing scientific research - Arkivum

Blog / 04 Dec, 2020

Open data workshop: sharing scientific research

At Arkivum, a major focus for our team is to help our customers better share their data. The requirement to share comes from many places; circulating information throughout an organisation, sharing valuable information to partners or helping organisations make their research data Findable, Accessible, Interoperable and Reusable (FAIR).

Open Data Workshop

This is why last October we were delighted to support the first CMS Open Data Workshop.

CMS, or Compact Muon Solenoid, is one of the detectors at the Large Hadron Collider, CERN’s latest addition to their accelerator complex and the largest and most powerful particle accelerator in the world.

The online event sought to build on several years’ work, to improve access to and the use of data released through the CERN Open Data Portal, particularly by non-CMS analysts. The event saw participants take part in several exercises to be hands-on with the data, as well as brainstorming how the process of accessing and analysing how the data could be more useful for the broader research community.

It is important to highlight and stress the broad range of attendees at the event, many of whom do not work directly with CMS. The event both tested and showcased the ease of access to the data for scientists from different institutes.

During the workshop, rather than expecting participants to download the data and software themselves (not a practical approach due to the large data volumes involved) or needing CERN to provide remote access to their infrastructure, the approach was to allow scientists to run CERN scientific software in Google Cloud Platform (GCP) against the archived CMS data.

Accessible data

The workshop, and more importantly the wider project it is part of, demonstrates the value in making research data openly accessible, providing easy ways for scientists who are not part of any given established collaboration to use it. For example, in this instance making it easy for scientists to run scientific software applications in the public cloud.

You could even go as far to say it is a paradigm shift to ‘bring applications to the data’ so it can be processed ‘in-situ’ rather than relying on individuals to download vast volumes of data to their local compute environments. This more traditional approach is both time consuming and inefficient.

The CERN CMS dataset is, as you would expect, huge, with plenty of potential opportunities for new scientific discoveries to be made from it. Key to this is ensuring that it is safeguarded, made openly accessible, and that people have the means to analyse and use it.

Arkivum were invited to support the event through our recent work with CERN as part of the ARCHIVER project. ARCHIVER is a EUR 4.8 million project co-funded by the European Union’s Horizon 2020 research and innovation programme, to support the IT requirements of European scientists and to provide end-to-end archival and preservation services for the vast and ever-growing datasets generated by world-leading research. Arkivum was selected to take part in the Design Phase of the project which took place this year. The Prototyping Phase begins at the end of December.

Whitney Armstrong

To receive our latest news and blogs straight to your inbox, please enter your email address.

Follow us on