DSCI 522 Slides

Reproducible and Trustworthy Data Science Workflow

Author

Sky (Kehan) Sheng

Welcome

This site contains lecture and lab slides for DSCI 522 - Reproducible and Trustworthy Data Science Workflow, in section 001, for 2025W1.

Instructor

Sky (Kehan) Sheng
University of British Columbia
www.skysheng.io

Available Slides

Lectures

  • Lecture 1 - Introduction to Reproducible and Trustworthy Data Science Workflow
  • Lecture 2 - Conda-lock, Containerization, and Docker
  • Lecture 3 - Customizing and Building Containers
  • Lecture 4 - Containerizing Python Applications
  • Lecture 5 - Non-interactive scripts
  • Lecture 6 - Reproducible reports
  • Lecture 7 - Data Analysis Pipeline and GNU Make
  • Lecture 8 - Testing Code & Conclusion

Labs

  • Lab 1 Introduction - Introduction to reproducible workflows, lab policies, and getting started
  • Lab 4 End - Peer review, teamwork reflection, and Santa Otter’s gift bag

Cheatsheets

License

This work is licensed under the MIT License.