Workflows for Reproducible Computational Science and Data Science

Supervisors: Prof. Hans Fangohr (XFEL), Prof. Sandor Brockhauser (XFEL), Dr. Adrian Mancuso (XFEL), Prof. Volker Gülzow (XFEL, DESY), Prof. Thomas Ludwig (UHH)

Carrying out data analysis of scientific data obtained during experiments is a main activity in photon science, and is essential to convert the obtained data into understanding, and eventually publications. A topic that receives growing attention is that of reproducibility and re-usability: Given a publication, it should be possible for readers of the publication to reproduce the results published in the paper, particularly so if the results are based on computational processes. This forms the bases for re-use of the work, for example to extend the analysis software to carry out a related but new study. In practice, this is often impossible. In this project, we will investigate the process of data analysis towards publication and then work to improve this workflow. Typically, data analysis involves processing huge amounts of data (GB to PB) using a range of specialist software tools. Challenges include to preserve all these processing steps, the specialist software, and its computation environment so that the computation can be reproduced and re-used in the future. Objectives are to make the process reproducible, convenient and effective.