DSCI 522 Lecture 3

Customizing and Building Containers

Sky Sheng

๐Ÿ˜บ iClicker: How is your group collaboration going?

(A)

(B)

(C)

Source

Conda Recap: ๐Ÿง How toโ€ฆ

  • Create a new conda environment
  • List all conda environments
  • Activate a conda environment
  • Deactivate a conda environment
  • Remove a conda environment
  • List all packages in a conda environment
  • Remove a package from a conda environment
  • Update a package in a conda environment
  • Update all packages in a conda environment
  • Share a conda environment with someone else
  • Create a conda environment from environment.yml file
  • Duplicate a conda environment

๐Ÿ“œ Conda Cheat Sheet Part 1๏ธโƒฃ

Task Command
Create a new conda environment conda create -n <env_name>
Create with specific Python version conda create --name <env_name> python=3.11
Create with packages conda create --name <env_name> numpy pandas
List all conda environments conda env list or conda info --envs
Activate a conda environment conda activate <env_name>
Deactivate a conda environment conda deactivate
Remove a conda environment conda env remove -n <env_name>
List all packages in environment conda list
Remove a package from current environment conda remove <package_name>

๐Ÿ“œ Conda Cheat Sheet Part 2๏ธโƒฃ

Task Command
Update a package in current environment conda update <package_name>
Remove a package in other environment conda remove -n <env_name> <package_name>
Update a package in other environment conda update -n <env_name> <package_name>
Update all packages in current environment conda update --all
Share environment conda env export --from-history > environment.yml
Create conda environment from environment.yml conda env create --file environment.yml
Duplicate a conda environment conda create --name <new_env> --clone <old_env>

๐Ÿ“• Command Line Notes

  • --name and -n are equivalent.
  • --file and -f are equivalent.
  • -p and --platform are equivalent, to specify the platform (e.g., linux-64, osx-arm64, win-64).
  • More command line review cheatsheet is here.

Conda-lock Recap

๐Ÿ”’ What are the commands for the following tasks?

  • Generate a general conda-lock file for all platforms
  • Generate a conda-lock file for a specific platform

๐Ÿ“œ Conda-lock Cheat Sheet

Command Output
#General file all platforms
conda-lock lock --file environment.yml
conda-lock.yml
#General file for one platform (e.g., Linux)
conda-lock lock --file environment.yml -p linux-64
conda-lock.yml
#Explicit lock file for one platform (e.g., Linux)
conda-lock -k explicit --file environment.yml -p linux-64
conda-linux-64.lock
#Explicit lock file from conda-lock.yml
conda-lock render -p linux-64
conda-linux-64.lock

๐Ÿ™€ conda-lock.yml VS conda-linux-64.lock?

Feature conda-lock.yml conda-linux-64.lock
Format Unified YAML (multi-platform) Explicit (single-platform)
Content Structured metadata + dependencies for all platforms Simple list of package URLs
File Size Larger (contains all platforms) Smaller (one platform only)
Installation conda-lock install --name <env_name> conda-lock.yml conda create --name <env_name> --file conda-linux-64.lock
Use case Development across multiple platforms Production deployment, Docker, single platform
Speed Slightly slower (conda-lock processes it) Fastest

Some data science conventions

  • ๐Ÿ Use snake_case for all folder & file names (lowercase + underscore)
  • Use Markdown for documentation.
  • yaml (.yml or .yaml) files for data storage and system configurations (NOT for documentation)
    • YAML = Yet Another Markup Language
    • No use of strict symbols (e.g., braces, square brackets)
    • Use # for comments
    • Python style indentation using whitespace (NOT TABS)
    • JSON is subset of YAML. JSON file can be parsed by a YAML parser.

๐Ÿšข Docker Recap

  1. โš ๏ธ Command lines are very sensitive to whitespace and quotation marks!
docker run \
    --rm \
    -p 8788:8787 \
    -e PASSWORD="apassword" \
    rocker/rstudio:4.4.2
  1. ๐Ÿท๏ธ Donโ€™t use the TAG latest for the image, use the specific version tag instead.
  2. Docker cheatsheet is available in textbook

We donโ€™t like manual work!

Source: Minions movie

๐Ÿคฉ docker-compose.yml file comes to the rescue!

services:
  analysis-env:
    image: rocker/rstudio:4.4.2
    ports:
      - "8789:8787"
    volumes:
      - .:/home/rstudio/project
    environment:
      PASSWORD: password
    deploy:
      resources:
        limits:
          memory: 5G

๐Ÿ˜ณ But how to use docker-compose.yml?

  • Launch the container: docker compose up
  • Stop the container: type Cntrl + C in the terminal where you launched the container, and then type docker-compose rm
  • Read more in the textbook here

Todayโ€™s topic: Letโ€™s build our own container!

Docker images for RStudio, VSCode, and Cursor!

  • Daniel has kindly created this tutorial on how to build docker image for R environment managed by renv: docker-renv
  • Sky has created this tutorial for how to build docker image for VSCode & Cursor: docker-vscode-cursor

Command line using docker-compose.yml file

Code we run in class: docker-compose.yml practice

Step-by-step instructions:

# 1. Please first `cd` to a local folder of your choice, don't put all files in your home directory!
# Example `cd` command: 
cd /Users/skysheng/Desktop/github/dsci522

# 2. make a new folder called `demo_docker` (or any other name you like)
mkdir demo_docker

# 3. move into that folder we just created
cd demo_docker

# 4. create a docker-copose.yml file using nano
nano docker-compose.yml

Code we run in class: docker-compose.yml practice

  1. Copy and paste the following code into the docker-compose.yml file. Local port is set at 8789, password is set to password, username is rstudio.
services:
  analysis-env:
    image: rocker/rstudio:4.4.2
    ports:
      - "8789:8787"
    volumes:
      - .:/home/rstudio/project
    environment:
      PASSWORD: password
    deploy:
      resources:
        limits:
          memory: 5G
  • You can also create this file using Graphical user interface (GUI) like VSCode.
  • If you used nano to create this file, you need to press Cntrl + X to attempt exit, by default it will ask you to save the file. Press Y and then press Enter to save the file.

Code we run in class: docker-compose.yml practice

# 6. Print out the content of the file to make sure it is correct.
cat docker-compose.yml

# 7. Launch the container using docker compose files.
docker compose up
  1. After the container is launched, your terminal will be hanging. You can open your browser and go to http://localhost:8789 to access RStudio.

  2. To stop the container, you need to type Cntrl + C in the terminal where you launched the container, and then type:

# 10. Remove the container.
docker-compose rm

Command line creating Dockerfile

Code we run in class: Dockerfile practice

Step-by-step instructions:

# 1. Please first `cd` to a local folder of your choice, don't put all files in your home directory!
# Example `cd` command: 
cd /Users/skysheng/Desktop/github/dsci522

# 2. make a new folder called `demo_docker` (or any other name you like)
mkdir demo_docker

# 3. move into that folder we just created
cd demo_docker

# 4. create a environment.yml file using nano
nano environment.yml

Code we run in class: Dockerfile practice

  1. Copy and paste the following code into the environment.yml file.
name: my_env
channels:
- conda-forge
dependencies:
- conda-lock=3.0.4
- pandas=2.3.3
- pandera=0.26.1
- pip=25.3
- python=3.11.14
- pip:
  - deepchecks==0.19.1
  • You can also create this file using Graphical user interface (GUI) like VSCode.
  • If you used nano to create this file, you need to press Cntrl + X to attempt exit, by default it will ask you to save the file. Press Y and then press Enter to save the file.

Code we run in class: Dockerfile practice

# 6. Print out the content of the file to make sure it is correct.
cat environment.yml
  1. Create a conda-lock file using the following command:
  • Macbook users with Apple Silicon chips will need to use the following command to create a explicit lock file for linux OS:
conda-lock -k explicit --file environment.yml -p linux-aarch64
  • Everyone else can use the following command:
conda-lock -k explicit --file environment.yml -p linux-64

Code we run in class: Dockerfile practice

  1. Create a Dockerfile file using nano, or you can use GUI like VSCode.
# 8.Create a Dockerfile file using nano, or you can use GUI like VSCode.
nano Dockerfile 

Code we run in class: Dockerfile practice

  1. Copy and paste the following code into the Dockerfile file. We use jupyter minimal notebook image as an example, and copy the conda-lock file to the container.
  • Macbook users with Apple Silicon chips will need to use the following code:
FROM quay.io/jupyter/minimal-notebook:afe30f0c9ad8

COPY conda-linux-aarch64.lock /tmp/conda-linux-aarch64.lock
  • Everyone else can use the following code:
FROM quay.io/jupyter/minimal-notebook:afe30f0c9ad8

COPY conda-linux-64.lock /tmp/conda-linux-64.lock

Code we run in class: Dockerfile practice

  1. Build the docker image locally using the following command:
# 10. Build the docker image locally, with the tag name `testing_cmds`. 
# pay attention to the dot at the end of the command! 
docker build --tag testing_cmds .

Code we run in class: Dockerfile practice

  1. Run the docker image you just built using the following command:
  • Launch terminal only (Mac M4 chip you may run into bugs here):
docker run --rm -it testing_cmds ../../bin/bash
  • Launch interactive terminal on web browser:
docker run --rm -it -p 8888:8888 testing_cmds