Workflows for Reproducible Computational Science and Data Science

Supervisors: Prof. Hans Fangohr (MPSD), Prof. Thomas Ludwig (UHH)

Carrying out data analysis of scientific data obtained from simulation or experiments is a main activity in many research disciplines, and is essential to convert the obtained data into understanding, publications and impact. A topic that receives growing attention is that of reproducibility and re-usability: Given a publication, it should be possible for readers of the publication to reproduce the results published in the paper, particularly so if the results are based on computational processes. This forms the bases for re-use of the work, for example to extend the software to carry out a related but new study. In practice, this is often impossible.
In this project, we will investigate the process of computational science research, including computing simulation and data analysis towards publication, and will then work to improve this workflow. Challenges include to preserve all computation and processing steps, the specialist software, and its computation environment so that the computation can be reproduced and re-used in the future. Objectives are to make the
process reproducible, convenient and effective.
Important tools for the technical part of this work are likely to include the Jupyter Notebook and ecosystem of tools, including Python, and package managers such as Spack, and containers.