RRP User Documentation

User documentation for the Reproducible Research Platform (RRP)

Scientific IT Services, ETH Zurich

October 2024

1 Introduction

Scientists in many disciplines are working with large and complex datasets that are challenging to store, manage, analyse, preserve, and share. Furthermore, corresponding analytical workflows for the interpretation of data and generation of publishable results are also increasing in complexity. As a result, the reusability of data and (computational) reproducibility of results becomes a major challenge, if the full analytical pathway from data to the final result is not documented in detail and / or only partially available. This includes not only the analysis code but also information on the computational environment in which analyses are executed, because a single missing library or an updated package can break a published workflow (in the future).

The Reproducible Research Platform (RRP) has been developed to address this challenge. RRP builds on established open-source tools and aims for a seamless and user-friendly connection between the data and metadata management (with openBIS) and tools for code management (git), management of computational environments (repo2docker) and interactive computational notebooks (Jupyterlab). Furthermore, researchers from a research group can use RRP to easily share computational projects with each other, for example, to build upon previous work or share it between student and supervisor. In summary, the RRP platform enables researchers to achieve full reproducibility of their computational work with minimal additional effort.

2 Prerequisites

To use RRP efficiently, the following basic requirements must be fullfilled:

  1. Accessible public or private Git repository (e.g. GitLab or GitHub)
  2. Specification of the execution environment for the Git repository. RRP uses repo2docker to build a Docker image from the source repository. The execution environment can be determined with configuration files in the repository. For details see here. The configuration files must be stored in a folder called .binder in the Git repository.
  3. Access to openBIS server. Currently each RRP server is linked to one openBIS server.

3 Mounting openBIS datasets

RRP projects can access datasets stored in the openBIS server. Selected datasets are mounted directly on the RRP server for read-only access. To specify which datasets should be mounted, add a folder called .rrp to your Git repository. In this folder, add a file called datasets.yaml with following content:

server-url: "https://openbis-xyz.ethz.ch/"
destinations:
  "input":
      - "20220425184271099-1234"

It is also possible to add datasets to be mounted in the RRP UI once a project has been created (see below).

4 Creating a project

  1. Login to the RRP server with your openBIS user credentials.
  2. Select the + button next to Your projects and click Create new project from Git repository
  3. Specify the repository URL and user credentials (for private repos), then select Clone.
  4. Select a name for the project as well as Git branch, tag or commit. Next select Build. This will build the Docker image from the repository. This step will take several minutes to complete, depending on the complexity of the environment.

5 Starting a project and working with JupyterLab

  1. Once the build of the project is complete, it will be listed under Your projects, with Status: not running.
  2. Select the Play button to start the project. Status will switch to processing and then to running.
  3. Select the Launch button to start the JupyterLab UI in a new browser tab / window.
  4. You can now work as usual with your project either in the JupyterLab UI.
  5. The repository content is available in the folder project.
  6. Mounted openBIS datasets are available in the folder openbis.
  7. Results to be uploaded back to openBIS can be stored in the folder results (see below).
  8. A project can be deleted by clicking the Trashcan icon in the RRP UI.

6 Saving results and uploading to openBIS

  1. A list of results files for the project is shown in the Results tab of the RRP project.
  2. To upload results to openBIS, select the Upload tab of the RRP project.
  3. Fill the openBIS destination identifier in which the results dataset should be created. You can specify either a permId or the path to the experiment / project.
  4. Select the dataset type (e.g. ANALYZED_DATA) and specify some dataset properties. Then click on Upload results.
  5. It is also possible to save the entire RRP project in openBIS (incl. code and computational environment) by clicking Upload project. This will create a new dataset in openBIS, including the Docker image (image.tar), a copy of the Git repository (repository.tar) as well as metadata. Expect this dataset to be between several hundreds MBs to a few GBs in size! To create a new RRP project from the openBIS, select the + button next to Your projects and click “Create new project from openBIS”. Enter the openBIS dataset permID and then click Create.

7 Sharing a project

RRP offers several options for sharing a project (code, data, computational environment) with others. These options are available under the Share tab of the project.

Important: before a project can be shared with RRP, the repository must be in clean state. This means that any changes have to be commited and pushed to the remote before a project can be shared. RRP indicates if this is the case in the Repository status section of the Share tab.

7.1 Sharing projects with users of the same server with share IDs

RRP projects can be shared with other users of the same server (e.g. fellow lab members, PI etc.):

  1. Switch to the Share tab of the RRP project. Any existing share identifiers are listed under Share identifiers.
  2. To create a new share identifier, enter a short description and then select Create share identifier.
  3. If successfull, the new share identifier will be listed above. Share it with colleagues either by mail or chat.
  4. To create a new project from a share identifier, select the + button next to Your projects and click “Create new project from Share identifier”. Enter the identifier and user credentials, then click Clone.

7.2 Sharing projects with Player bundles

RRP projects can also be shared with other researchers (colleagues, reviewers etc.) that are not registered users on the server. The only requirement is a computer with the Docker container engine installed and running. For small to medium-sized projects (up to a few GBs), RRP Player bundles can be created that contain the whole project (code, data, computational environment) in a single file. To create a Player bundle:

  1. Switch to the Share tab of the RRP project and ensure the repository is clean and up-to-date with the remote.
  2. In the Create new player bundle section, enter a short description and click Create player bundle. The creation of new player bundles may take a few minutes (depending on project size) and status is indicated above in the list of Player bundles. Once the status has switched from in process to complete, the Player bundle file can be downloaded by selecting the download icon. The downloaded zip-file can be shared with other researchers, e.g. by standard file-sharing services.
  3. To run the Player bundle on the recipient’s computer, ensure that Docker is installed and running. Extract the zip-archive and execute either the play script (on macOS / Linux) or the play.bat / play.ps1 scripts on Windows. This will launch the JupyterLab interface in the default browser.

Some caveats concerning running Player bundles:

  1. Player bundles have been tested successfully on different combinations of OS and architectures, but there may be setups that have not been tested and running Player bundles could potentially fail.
  2. If you execute the play script on macOS from the Finder, you may receive an error stating “play” cannot be opened because it is from an unidentified developer. To execute the script, right-click on it and then choose Open. In the security warning that pops up, also select Open.
  3. On Windows, the script may be blocked due to restrictive execution policy. Use the Unblock-File cmdlet to remove the block.

7.3 Sharing projects with Player scripts

Player bundles will likely be inefficient for sharing very large RRP projects, i.e. exceeding several GBs. In these cases, it is recommended to upload the datasets to a research data repository supported by openBIS. Furthermore, the computational environment (Docker image) should be exported to a container registry, such as DockerHub. RRP Player scripts can then be used to download Docker images and research data directly from the end-users computer.

To create a Player script, first export the required openBIS datasets to one of the research data repositories supported by openBIS (currently Zenodo and ETH Research Collection). Please see the openBIS documentation for details how to export data to repositories. Exported dataset permIDs are shown on the Share tab of the RRP project in the Player scipts section. Next, export the Docker image with the computational environment by clicking Export image. Once the export has completed and and the status of Image exported has changed to Yes, click Generate player script and download the zip-archive, which will only be a few KBs in size. The downloaded zip-file can be shared with other researchers, e.g. by standard file-sharing services or e-mail.

To run the Player script on the recipient’s computer, ensure that Docker is installed and running. Extract the zip-archive and execute either the play script (on macOS / Linux) or the play.bat / play.ps1 scripts on Windows. This will launch the JupyterLab interface in the default browser.

Some caveats concerning running Player scripts:

  1. Player scripts have been tested successfully on different combinations of OS and architectures, but there may be setups that have not been tested and running Player scripts could potentially fail.
  2. If you execute the play script on macOS from the Finder, you may receive an error stating “play” cannot be opened because it is from an unidentified developer. To execute the script, right-click on it and then choose Open. In the security warning that pops up, also select Open.
  3. On Windows, the script may be blocked due to restrictive execution policy. Use the Unblock-File cmdlet to remove the block.
  4. The code repository is NOT included in the Player script but will be cloned directly from the Git remote upon script execution. For private repositories, user credentials must thus be provided.