DSCI 522 Slides
Reproducible and Trustworthy Data Science Workflow
Welcome
This site contains lecture and lab slides for DSCI 522 - Reproducible and Trustworthy Data Science Workflow, in section 001, for 2025W1.
Instructor
Sky (Kehan) Sheng
University of British Columbia
www.skysheng.io
Available Slides
Lectures
- Lecture 1 - Introduction to Reproducible and Trustworthy Data Science Workflow
- Lecture 2 - Conda-lock, Containerization, and Docker
- Lecture 3 - Customizing and Building Containers
- Lecture 4 - Containerizing Python Applications
- Lecture 5 - Non-interactive scripts
- Lecture 6 - Reproducible reports
- Lecture 7 - Data Analysis Pipeline and GNU Make
- Lecture 8 - Testing Code & Conclusion
Labs
- Lab 1 Introduction - Introduction to reproducible workflows, lab policies, and getting started
- Lab 4 End - Peer review, teamwork reflection, and Santa Otter’s gift bag
Cheatsheets
- Command Line Cheatsheet - Quick reference for command line operations
- Conda & Conda-lock Cheatsheet - Quick reference for conda environment management and conda-lock
- Docker & Docker Compose Cheatsheet - Quick reference for Docker containerization and Docker Compose
- Quarto Cheatsheet - Quick reference for Quarto report generation and GitHub Pages hosting
- Makefile Cheatsheet - Quick reference for Makefile
License
This work is licensed under the MIT License.