User documentation for the Reproducible Research Platform (RRP)
October 2024
Scientists in many disciplines are working with large and complex datasets that are challenging to store, manage, analyse, preserve, and share. Furthermore, corresponding analytical workflows for the interpretation of data and generation of publishable results are also increasing in complexity. As a result, the reusability of data and (computational) reproducibility of results becomes a major challenge, if the full analytical pathway from data to the final result is not documented in detail and / or only partially available. This includes not only the analysis code but also information on the computational environment in which analyses are executed, because a single missing library or an updated package can break a published workflow (in the future).
The Reproducible Research Platform (RRP) has been developed to address this challenge. RRP builds on established open-source tools and aims for a seamless and user-friendly connection between the data and metadata management (with openBIS) and tools for code management (git), management of computational environments (repo2docker) and interactive computational notebooks (Jupyterlab). Furthermore, researchers from a research group can use RRP to easily share computational projects with each other, for example, to build upon previous work or share it between student and supervisor. In summary, the RRP platform enables researchers to achieve full reproducibility of their computational work with minimal additional effort.
To use RRP efficiently, the following basic requirements must be fullfilled:
.binder
in the
Git repository.RRP projects can access datasets stored in the openBIS server.
Selected datasets are mounted directly on the RRP server for read-only
access. To specify which datasets should be mounted, add a folder called
.rrp
to
your Git repository. In this folder, add a file called datasets.yaml
with following content:
It is also possible to add datasets to be mounted in the RRP UI once a project has been created (see below).
project
.openbis
.results
(see below).ANALYZED_DATA
)
and specify some dataset properties. Then click on Upload
results.image.tar
), a
copy of the Git repository (repository.tar
)
as well as metadata. Expect this dataset to be between several hundreds
MBs to a few GBs in size! To create a new RRP project from the openBIS,
select the + button next to Your
projects and click “Create new project from openBIS”. Enter the
openBIS dataset permID and then click Create.RRP offers several options for sharing a project (code, data, computational environment) with others. These options are available under the Share tab of the project.
Important: before a project can be shared with RRP, the repository must be in clean state. This means that any changes have to be commited and pushed to the remote before a project can be shared. RRP indicates if this is the case in the Repository status section of the Share tab.
RRP projects can be shared with other users of the same server (e.g. fellow lab members, PI etc.):
RRP projects can also be shared with other researchers (colleagues, reviewers etc.) that are not registered users on the server. The only requirement is a computer with the Docker container engine installed and running. For small to medium-sized projects (up to a few GBs), RRP Player bundles can be created that contain the whole project (code, data, computational environment) in a single file. To create a Player bundle:
play
script (on
macOS / Linux) or the play.bat
/ play.ps1
scripts
on Windows. This will launch the JupyterLab interface in the default
browser.Some caveats concerning running Player bundles:
play
script on
macOS from the Finder, you may receive an error stating “play”
cannot be opened because it is from an unidentified developer. To
execute the script, right-click on it and then choose Open. In
the security warning that pops up, also select Open.Player bundles will likely be inefficient for sharing very large RRP projects, i.e. exceeding several GBs. In these cases, it is recommended to upload the datasets to a research data repository supported by openBIS. Furthermore, the computational environment (Docker image) should be exported to a container registry, such as DockerHub. RRP Player scripts can then be used to download Docker images and research data directly from the end-users computer.
To create a Player script, first export the required openBIS datasets to one of the research data repositories supported by openBIS (currently Zenodo and ETH Research Collection). Please see the openBIS documentation for details how to export data to repositories. Exported dataset permIDs are shown on the Share tab of the RRP project in the Player scipts section. Next, export the Docker image with the computational environment by clicking Export image. Once the export has completed and and the status of Image exported has changed to Yes, click Generate player script and download the zip-archive, which will only be a few KBs in size. The downloaded zip-file can be shared with other researchers, e.g. by standard file-sharing services or e-mail.
To run the Player script on the recipient’s computer, ensure that Docker is installed and running.
Extract the zip-archive and execute either the play
script (on
macOS / Linux) or the play.bat
/ play.ps1
scripts
on Windows. This will launch the JupyterLab interface in the default
browser.
Some caveats concerning running Player scripts:
play
script on
macOS from the Finder, you may receive an error stating “play”
cannot be opened because it is from an unidentified developer. To
execute the script, right-click on it and then choose Open. In
the security warning that pops up, also select Open.