Git/GitHub

Git/GitHub#

The main code repository and development location of the Databook is on GitHub, and the project is version-controlled with Git. The Databook is developed with version-control practices in mind. Branches are used for developing separate features and commits are pushed from local machines to the remote repo. A dev branch is used as the basis for all continuous integration and merging separate feature branches. During deployments, if the dev branch passes the build test, it may be merged into the main branch, intended to host the latest working release of the Databook. Finally, from the main branch, the Databook is deployed to the public website and a release is made manually using the GitHub release feature.

We also utilize GitHub in concert with ReviewNB for a formal review process, by which other teammembers can review changes prior to merging them into dev or main. Because GitHub does not provide a useful platform to evaluate the changes made to Jupyter Notebooks, ReviewNB is integrated into our repository. Anytime a Pull Request is made, ReviewNB generates a link displaying all the changes to notebooks contained within the Pull Request, and a teammember can view these changes and provide comments pertaining to specific changes or whole sections. This way, teammembers can evaluate the scientific validity of a notebook, critique the methodology of the analysis or the usability of the plots, as well as provide feedback on ways to improve the educational value of the content. Changes can continually be made leveraging feedback through ReviewNB until the notebook is prepared to be merged.

Since the Databook is intended to be a community-oriented project, GitHub issues from the public are encouraged. Users may submit issues reporting bugs, suggesting enhancements, or asking questions regarding any part of the Databook. Furthermore, the Databook may accept changes from community members. For instance, the Databook repository has been forked and a new chapter called replication was made. After providing a review of the Pull Request and changes were made, this enhancement was merged and integrated into the Databook. Users who contribute at least one commit to the databook are automatically added to the dynamic authors list, discussed below.

Testing, deployment, and generating the dynamic authors list are managed by GitHub Workflows. Two workflows are used for testing, test.yml and build.yml. test.yml is run before any changes are pushed or merged to dev. It sets up the environment within the provided Ubuntu GitHub runner using our repo’s requirements.txt file, and uses the GitHub workflow Ana06/get-changed-files to retrieve a list of the Jupyter notebooks which were changed and then runs Pytest with nbmake to run every changed notebook. If all the changed notebooks run without error, the test passes. Otherwise, the test fails. build.yml is used before any changes are pushed or merged to main. Similarly to test.yml it sets up the environment and uses pytest and nbmake. However, build.yml runs all notebooks in the repository, and also builds the Jupyter Book to validate the configuration and table of contents.

Because there are some notebooks which require a secure API key to access embargoed data on DANDI, we use GitHub secrets to store such a key. Both test.yml and build.yml assign the the GitHub secret key to an environment variable, called DANDI_API_KEY, before running the notebooks. Within the relevant notebooks, Python’s OS module is used to access DANDI_API_KEY from the runtime environment, and the embargoed files are accessible.

Deployment is performed by the deploy.yml workflow, which is dispatched manually. This workflow first runs another workflow, insert_authors.yml. insert_authors.yml runs a Python file which, in GitHub’s Ubuntu runner, checks out the latest version of the Databook repository’s main branch, parses a list of contributors from git shortlog, inserts them into a markdown header in the Intro.md file, and makes a commit. After insert_authors.yml is finished, the Jupyter Book is built, and then the deployment uses peaceiris/actions-gh-pages to deploy the notebook to GitHub pages. GitHub pages serves to reliably host the Jupyter book website for free.