1 Introduction

This document contains basic information required to use “Speed”, along with tips, tricks, examples, and references to projects and papers that have used Speed. User contributions of sample jobs and/or references are welcome.

Note: On October 20, 2023, we completed the migration to SLURM from Grid Engine (UGE/AGE) as our job scheduler. This manual has been updated to use SLURM’s syntax and commands. If you are a long-time GE user, refer to Appendix A.2 for key highlights needed to translate your GE jobs to SLURM as well as environment changes. These changes are also elaborated throughout this document and our examples.

1.1 Citing Us

If you wish to cite this work in your acknowledgements, you can use our general DOI found on our GitHub page https://dx.doi.org/10.5281/zenodo.5683642 or a specific version of the manual and scripts from that link individually. You can also use the “cite this repository” feature of GitHub.

1.2 Resources

1.3 Team

Speed is supported by:

We receive support from the rest of AITS teams, such as NAG, SAG, FIS, and DOG.

1.4 What Speed Consists of


Figure 1: Speed


Figure 2: Speed Cluster Hardware Architecture


Figure 3: Speed SLURM Architecture

1.5 What Speed Is Ideal For

1.6 What Speed Is Not

1.7 Available Software

There are a wide range of open-source and commercial software available and installed on “Speed.” This includes Abaqus [1], AllenNLP, Anaconda, ANSYS, Bazel, COMSOL, CPLEX, CUDA, Eclipse, Fluent [2], Gurobi, MATLAB [1530], OMNeT++, OpenCV, OpenFOAM, OpenMPI, OpenPMIx, ParaView, PyTorch, QEMU, R, Rust, and Singularity among others. Programming environments include various versions of Python, C++/Java compilers, TensorFlow, OpenGL, OpenISS, and MARF [31].

In particular, there are over 2200 programs available in /encs/bin and /encs/pkg under Scientific Linux 7 (EL7). We are building an equivalent array of programs for the EL9 SPEED2 nodes. To see the packages available, run ls -al /encs/pkg/ on speed.encs. See a complete list in Appendix D.

Note: We do our best to accommodate custom software requests. Python environments can use user-custom installs from within the scratch directory.

1.8 Requesting Access

After reviewing the “What Speed is” (Section 1.5) and “What Speed is Not” (Section 1.6), request access to the “Speed” cluster by emailing: rt-ex-hpc AT encs.concordia.ca.

2 Job Management

We use SLURM as the workload manager. It supports primarily two types of jobs: batch and interactive. Batch jobs are used to run unattended tasks, whereas, interactive jobs are are ideal for setting up virtual environments, compilation, and debugging.

Note: In the following instructions, anything bracketed like, <>, indicates a label/value to be replaced (the entire bracketed term needs replacement).

Job instructions in a script start with #SBATCH prefix, for example:

    #SBATCH --mem=100M -t 600 -J <job-name> -A <slurm account>
    #SBATCH -p pg --gpus=2 --mail-type=ALL

For complex compute steps within a script, use srun. We recommend using salloc for interactive jobs as it supports multiple steps. However, srun can also be used to start interactive jobs (see Section 2.8). Common and required job parameters include:

2.1 Getting Started

Before getting started, please review the “What Speed is” (Section 1.5) and “What Speed is Not” (Section 1.6). Once your GCS ENCS account has been granted access to “Speed”, use your GCS ENCS account credentials to create an SSH connection to speed (an alias for speed-submit.encs.concordia.ca).

All users are expected to have a basic understanding of Linux and its commonly used commands (see Appendix B for resources).

2.1.1 SSH Connections

Requirements to create connections to “Speed”:

  1. Active GCS ENCS user account: Ensure you have an active GCS ENCS user account with permission to connect to Speed (see Section 1.8).
  2. VPN Connection (for off-campus access): If you are off-campus, you wil need to establish an active connection to Concordia’s VPN, which requires a Concordia netname.
  3. Terminal Emulator for Windows: Windows systems use a terminal emulator such as PuTTY, Cygwin, or MobaXterm.
  4. Terminal for macOS: macOS systems have a built-in Terminal app or xterm that comes with XQuartz.

To create an SSH connection to Speed, open a terminal window and type the following command, replacing <ENCSusername> with your ENCS account’s username:

    ssh <ENCSusername>@speed.encs.concordia.ca

For detailed instructions on securely connecting to a GCS server, refer to the AITS FAQ: How do I securely connect to a GCS server?

2.1.2 Environment Set Up

After creating an SSH connection to Speed, you will need to make sure the srun, sbatch, and salloc commands are available to you. To check this, type each command at the prompt and press Enter. If “command not found” is returned, you need to make sure your $PATH includes /local/bin. You can check your $PATH by typing:

    echo $PATH

The next step is to set up your cluster-specific storage “speed-scratch”, to do so, execute the following command from within your home directory.

    mkdir -p /speed-scratch/$USER && cd /speed-scratch/$USER

Next, copy a job template to your cluster-specific storage

Tip: the default shell for GCS ENCS users is tcsh. If you would like to use bash, please contact rt-ex-hpc AT encs.concordia.ca.

Note: If you encounter a “command not found” error after logging in to Speed, your user account may have defunct Grid Engine environment commands. See Appendix A.2 for instructions on how to resolve this issue.

2.2 Job Submission Basics

Preparing your job for submission is fairly straightforward. Start by basing your job script on one of the examples available in the src/ directory of our GitHub repository. You can clone the repository to get the examples to start with via the command line:

    git clone --depth=1 https://github.com/NAG-DevOps/speed-hpc.git
    cd speed-hpc/src

The job script is a shell script that contains directives, module loads, and user scripting. To quickly run some sample jobs, use the following commands:

    sbatch -p ps -t 10 env.sh
    sbatch -p ps -t 10 bash.sh
    sbatch -p ps -t 10 manual.sh
    sbatch -p pg -t 10 lambdal-singularity.sh

2.2.1 Directives

Directives are comments included at the beginning of a job script that set the shell and the options for the job scheduler. The shebang directive is always the first line of a script. In your job script, this directive sets which shell your script’s commands will run in. On “Speed”, we recommend that your script use a shell from the /encs/bin directory.

To use the tcsh shell, start your script with #!/encs/bin/tcsh. For bash, start with #!/encs/bin/bash.

Directives that start with #SBATCH set the options for the cluster’s SLURM job scheduler. The following provides an example of some essential directives:

    #SBATCH --job-name=<jobname>        ## or -J. Give the job a name
    #SBATCH --mail-type=<type>          ## set type of email notifications
    #SBATCH --chdir=<directory>         ## or -D, set working directory for the job
    #SBATCH --nodes=1                   ## or -N, node count required for the job
    #SBATCH --ntasks=1                  ## or -n, number of tasks to be launched
    #SBATCH --cpus-per-task=<corecount> ## or -c, core count requested, e.g. 8 cores
    #SBATCH --mem=<memory>              ## assign memory for this job,
                                        ## e.g., 32G memory per node

Replace the following to adjust the job script for your project(s)

Example with short option equivalents:

    #SBATCH -J myjob              ## Job’s name set to ’myjob’
    #SBATCH --mail-type=ALL       ## Receive all email type notifications
    #SBATCH -D ./                 ## Use current directory as working directory
    #SBATCH -N 1                  ## Node count required for the job
    #SBATCH -n 1                  ## Number of tasks to be launched
    #SBATCH -c 8                  ## Request 8 cores
    #SBATCH --mem=32G             ## Allocate 32G memory per node

Tip: If you are unsure about memory footprints, err on assigning a generous memory space to your job, so that it does not get prematurely terminated. You can refine --mem values for future jobs by monitoring the size of a job’s active memory space on speed-submit with:

    sacct -j <jobID>
    sstat -j <jobID>

This can be customized to show specific columns:

    sacct -o jobid,maxvmsize,ntasks%7,tresusageouttot%25 -j <jobID>
    sstat -o jobid,maxvmsize,ntasks%7,tresusageouttot%25 -j <jobID>

Memory-footprint efficiency values (seff) are also provided for completed jobs in the final email notification as “maxvmsize”. Jobs that request a low-memory footprint are more likely to load on a busy cluster.

Other essential options are --time, or -t, and --account, or -A.

2.2.2 Working with Modules

After setting the directives in your job script, the next section typically involves loading the necessary software modules. The module command is used to manage the user environment, make sure to load all the modules your job depends on. You can check available modules with the module avail command. Loading the correct modules ensures that your environment is properly set up for execution.

To list for a particular program (matlab, for example):

    module avail
    module -t avail matlab  ## show the list for a particular program (e.g., matlab)
    module -t avail m       ## show the list for all programs starting with m

For example, insert the following in your script to load the matlab/R2023a module:

    module load matlab/R2023a/default

Note: you can remove a module from active use by replacing load by unload.

To list loaded modules:

    module list

To purge all software in your working environment:

    module purge

2.2.3 User Scripting

The final part of the job script involves the commands that will be executed by the job. This section should include all necessary commands to set up and run the tasks your script is designed to perform. You can use any Linux command in this section, ranging from a simple executable call to a complex loop iterating through multiple commands.

Best Practice: prefix any compute-heavy step with srun. This ensures you gain proper insights on the execution of your job.

Each software program may have its own execution framework, as it’s the script’s author (e.g., you) responsibility to review the software’s documentation to understand its requirements. Your script should be written to clearly specify the location of input and output files and the degree of parallelism needed.

Jobs that involve multiple interactions with data input and output files, should make use of TMPDIR, a scheduler-provided workspace nearly 1 TB in size. TMPDIR is created on the local disk of the compute node at the start of a job, offering faster I/O operations compared to shared storage (provided over NFS).

An sample job script using TMPDIR is available at /home/n/nul-uge/templateTMPDIR.sh: the job is instructed to change to $TMPDIR, to make the new directory input, to copy data from $SLURM_SUBMIT_DIR/references/ to input/ ($SLURM_SUBMIT_DIR represents the current working directory), to make the new directory results, to execute the program (which takes input from $TMPDIR/input/ and writes output to $TMPDIR/results/), and finally to copy the total end results to an existing directory, processed, that is located in the current working directory. TMPDIR only exists for the duration of the job, though, so it is very important to copy relevant results from it at job’s end.

2.3 Sample Job Script

Here’s a basic job script, tcsh.sh shown in Figure 4. You can copy it from our GitHub repository.

#SBATCH --job-name=tcsh-test 
#SBATCH --mem=1G 
sleep 30 
module load gurobi/8.1.0 
module list
Figure 4: Source code for tcsh.sh

The first line is the shell declaration (also know as a shebang) and sets the shell to tcsh. The lines that begin with #SBATCH are directives for the scheduler.

The script then:

  1. Sleeps on a node for 30 seconds.
  2. Uses the module command to load the gurobi/8.1.0 environment.
  3. Prints the list of loaded modules into a file.

The scheduler command, sbatch, is used to submit (non-interactive) jobs. From an ssh session on “speed-submit”, submit this job with

    sbatch ./tcsh.sh

You will see, Submitted batch job 2653 where \(2653\) is a job ID assigned. The commands squeue and sinfo can be used to look at the status of the cluster:

[serguei@speed-submit src] % squeue -l
Thu Oct 19 11:38:54 2023
 2641        ps interact   b_user  RUNNING   19:16:09 1-00:00:00      1 speed-07
 2652        ps interact   a_user  RUNNING      41:40 1-00:00:00      1 speed-07
 2654        ps tcsh-tes  serguei  RUNNING       0:01 7-00:00:00      1 speed-07
[serguei@speed-submit src] % sinfo
ps*          up 7-00:00:00     14  drain speed-[08-10,12,15-16,20-22,30-32,35-36]
ps*          up 7-00:00:00      1    mix speed-07
ps*          up 7-00:00:00      7   idle speed-[11,19,23-24,29,33-34]
pg           up 1-00:00:00      1  drain speed-17
pg           up 1-00:00:00      3   idle speed-[05,25,27]
pt           up 7-00:00:00      7   idle speed-[37-43]
pa           up 7-00:00:00      4   idle speed-[01,03,25,27]

Remember that you only have 30 seconds before the job is essentially over, so if you do not see a similar output, either adjust the sleep time in the script, or execute the squeue statement more quickly. The squeue output listed above shows that your job 2654 is running on node speed-07, and its time limit is 7 days, etc.

Once the job finishes, there will be a new file in the directory that the job was started from, with the syntax of, slurm-<job id>.out, so in this example the file is, slurm-2654.out. This file represents the standard output (and error, if there is any) of the job in question. If you look at the contents of your newly created file, you will see that it contains the output of the, module list command. Important information is often written to this file.

2.4 Common Job Management Commands Summary

Here is a summary of useful job management commands for handling various aspects of job submission and monitoring on the Speed cluster:

2.5 Advanced sbatch Options

In addition to the basic sbatch options presented earlier, there are several advanced options that are generally useful:

Note: sbatch options can be specified during the job-submission command, and these override existing script options (if present). The syntax is

sbatch [options] PATHTOSCRIPT

but unlike in the script, the options are specified without the leading #SBATCH e.g.:

sbatch -J sub-test --chdir=./ --mem=1G ./tcsh.sh

2.6 Array Jobs

Array jobs are those that start a batch job or a parallel job multiple times. Each iteration of the job array is called a task and receives a unique job ID. Array jobs are particularly useful for running a large number of similar tasks with slight variations.

To submit an array job (Only supported for batch jobs), use the --array option of the sbatch command as follows:

sbatch --array=n-m[:s]] <batch_script>



Output files for Array Jobs:
The default output and error-files are slurm-job_id_task_id.out. This means that Speed creates an output and an error-file for each task generated by the array-job, as well as one for the super-ordinate array-job. To alter this behavior use the -o and -e options of sbatch.

For more details about Array Job options, please review the manual pages for sbatch by executing the following at the command line on speed-submit man sbatch.

2.7 Requesting Multiple Cores (i.e., Multithreading Jobs)

For jobs that can take advantage of multiple machine cores, you can request up to 32 cores (per job) in your script using the following options:

#SBATCH -n <#cores for processes>
#SBATCH -n 1
#SBATCH -c <#cores for threads of a single process>

Both sbatch and salloc support -n on the command line, and it should always be used either in the script or on the command line as the default \(n=1\).

Important Considerations:

Note: --ntasks or --ntasks-per-node (-n) refers to processes (usually the ones run with srun). --cpus-per-task (-c) corresponds to threads per process.

Some programs consider them equivalent, while others do not. For example, Fluent uses --ntasks-per-node=8 and --cpus-per-task=1, whereas others may set --cpus-per-task=8 and --ntasks-per-node=1. If one of these is not 1, some applications need to be configured to use n * c total cores.

Core count associated with a job appears under, “AllocCPUS”, in the, sacct -j <job-id>, output.

[serguei@speed-submit src] % squeue -l
Thu Oct 19 20:32:32 2023
2652        ps interact   a_user  RUNNING   9:35:18 1-00:00:00      1 speed-07
[serguei@speed-submit src] % sacct -j 2652
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2652         interacti+         ps     speed1         20    RUNNING      0:0
2652.intera+ interacti+                speed1         20    RUNNING      0:0
2652.extern      extern                speed1         20    RUNNING      0:0
2652.0       gydra_pmi+                speed1         20  COMPLETED      0:0
2652.1       gydra_pmi+                speed1         20  COMPLETED      0:0
2652.2       gydra_pmi+                speed1         20     FAILED      7:0
2652.3       gydra_pmi+                speed1         20     FAILED      7:0
2652.4       gydra_pmi+                speed1         20  COMPLETED      0:0
2652.5       gydra_pmi+                speed1         20  COMPLETED      0:0
2652.6       gydra_pmi+                speed1         20  COMPLETED      0:0
2652.7       gydra_pmi+                speed1         20  COMPLETED      0:0

2.8 Interactive Jobs

Interactive job sessions allow you to interact with the system in real-time. These sessions are particularly useful for tasks such as testing, debugging, optimizing code, setting up environments, and other preparatory work before submitting batch jobs.

2.8.1 Command Line

To request an interactive job session, use the salloc command with appropriate options. This is similar to submitting a batch job but allows you to run shell commands interactively within the allocated resources. For example:

salloc -J interactive-test --mem=1G -p ps -n 8

Within the allocated salloc session, you can run shell commands as usual. It is recommended to use srun for compute-intensive steps within salloc. If you need a quick, short job just to compile something on a GPU node, you can use an interactive srun directly. For example, a 1-hour allocation:

For tcsh:

srun --pty -n 8 -p pg --gpus=1 --mem=1G -t 60 /encs/bin/tcsh

For bash:

srun --pty -n 8 -p pg --gpus=1 --mem=1G -t 60 /encs/bin/bash

2.8.2 Graphical Applications

To run graphical UI applications (e.g., MALTLAB, Abaqus CME, IDEs like PyCharm, VSCode, Eclipse, etc.) on Speed, you need to enable X11 forwarding from your client machine Speed then to the compute node. To do so, follow these steps:

  1. Run an X server on your client machine:

    For more details, see How do I remotely launch X(Graphical) applications?

  2. Verify that X11 forwarding is enabled by printing the DISPLAY variable:

         echo $DISPLAY

  3. Start an interactive session with X11 forwarding enabled (Use the --x11 with salloc or srun), for example:

         salloc -p ps --x11=first --mem=4G -t 0-06:00

  4. Once landed on a compute node, verify DISPLAY again.
  5. Set the XDG_RUNTIME_DIR variable to a directory in your speed-scratch space:

         mkdir -p /speed-scratch/$USER/run-dir
         setenv XDG_RUNTIME_DIR /speed-scratch/$USER/run-dir

  6. Launch your graphical application:

         module load matlab/R2023a/default

Note: with X11 forwarding the graphical rendering is happening on your client machine! That is you are not using GPUs on Speed to render graphics, instead all graphical information is forwarded from Speed to your desktop or laptop over X11, which in turn renders it using its own graphics card. Thus, for GPU rendering jobs either keep them non-interactive or use VirtualGL.

Here’s an example of starting PyCharm (see Figure 5). Note: If using VSCode, it’s currently only supported with the --no-sandbox option.

TCSH version:

ssh -X speed (XQuartz xterm, PuTTY or MobaXterm have X11 forwarding too)
[speed-submit] [/home/c/carlos] > echo $DISPLAY
[speed-submit] [/home/c/carlos] > cd /speed-scratch/$USER
[speed-submit] [/speed-scratch/carlos] > echo $DISPLAY
[speed-submit] [/speed-scratch/carlos] > salloc -pps --x11=first --mem=4Gb -t 0-06:00
[speed-07] [/speed-scratch/carlos] > echo $DISPLAY
[speed-07] [/speed-scratch/carlos] > hostname
[speed-07] [/speed-scratch/carlos] > setenv XDG_RUNTIME_DIR /speed-scratch/$USER/run-dir
[speed-07] [/speed-scratch/carlos] > /speed-scratch/nag-public/bin/pycharm.sh

BASH version:

bash-3.2$ ssh -X speed (XQuartz xterm, PuTTY or MobaXterm have X11 forwarding too)
serguei@speed’s password:
[serguei@speed-submit ~] % echo $DISPLAY
[serguei@speed-submit ~] % salloc -p ps --x11=first --mem=4Gb -t 0-06:00
bash-4.4$ echo $DISPLAY
bash-4.4$ hostname
bash-4.4$ export XDG_RUNTIME_DIR=/speed-scratch/$USER/run-dir
bash-4.4$ /speed-scratch/nag-public/bin/pycharm.sh


Figure 5: Launching PyCharm on a Speed Node
2.8.3 Jupyter Notebooks Jupyter Notebook in Singularity To run Jupyter Notebooks using Singularity (more on Singularity see Section 2.16), follow these steps:

  1. Connect to Speed, e.g. interactively, using salloc
  2. Load Singularity module module load singularity/3.10.4/default
  3. Execute this Singularity command on a single line or save it in a shell script from our GitHub where you could easily invoke it.

         srun singularity exec -B $PWD\:/speed-pwd,/speed-scratch/$USER\:/my-speed-scratch,/nettemp \
         --env SHELL=/bin/bash --nv /speed-scratch/nag-public/openiss-cuda-conda-jupyter.sif \
         /bin/bash -c ’/opt/conda/bin/jupyter notebook --no-browser --notebook-dir=/speed-pwd \
         --ip="*" --port=8888 --allow-root’

  4. In a new terminal window, create an ssh tunnel between your computer and the node (speed-XX) where Jupyter is running (using speed-submit as a “jump server”, see, e.g., in PuTTY, in Figure 6 and Figure 7)

         ssh -L 8888:speed-XX:8888 <ENCS-username>@speed-submit.encs.concordia.ca

    Don’t close the tunnel after establishing.

  5. Open a browser, and copy your Jupyter’s token (it’s printed to you in the terminal) and paste it in the browser’s URL field. In our case, the URL is:


  6. Access the Jupyter Notebook interface in your browser.
Figure 6: SSH tunnel configuration 1
Figure 7: SSH tunnel configuration 2
Figure 8: Jupyter running on a Speed node

Another sample is the OpenISS-derived containers with Conda and Jupyter, see Section 2.15.4 for details. JupyterLab in Conda and Pytorch For setting up Jupyter Labs with Conda and Pytorch, follow these steps: JupyterLab + Pytorch in Python venv This is an example of Jupyter Labs running in a Python Virtual environment (venv), with Pytorch on Speed.

Note: Use of Python virtual environments is preferred over Conda at Alliance Canada clusters. If you prefer to make jobs that are more compatible between Speed and Alliance clusters, use Python venvs. See https://docs.alliancecan.ca/wiki/Anaconda/en and https://docs.alliancecan.ca/wiki/JupyterNotebook.

2.8.4 Visual Studio Code

This is an example of running VScode, it’s similar to Jupyter notebooks, but it doesn’t use containers. Note: this a Web-based version; there exists the local (workstation) – remote (speed-node) client-server version too, but it is for advanced users and is out of scope here (so no support, use it at your own risk).

Figure 11: VScode running on a Speed node

2.9 Scheduler Environment Variables

The scheduler provides several environment variables that can be useful in your job scripts. These variables can be accessed within the job using commands like env or printenv. Many of these variables start with the prefix SLURM.

Here are some of the most useful environment variables:

For a more comprehensive list of environment variables, refer to the SLURM documentation for Input Environment Variables and Output Environment Variables.

An example script that utilizes some of these environment variables is in Figure 12.

#SBATCH --job-name=tmpdir      ## Give the job a name 
#SBATCH --mail-type=ALL        ## Receive all email type notifications 
#SBATCH --chdir=./             ## Use currect directory as working directory 
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --cpus-per-task=8      ## Request 8 cores 
#SBATCH --mem=32G              ## Assign 32G memory per node 
mkdir input 
rsync -av $SLURM_SUBMIT_DIR/references/ input/ 
mkdir results 
srun STAR --inFiles $TMPDIR/input --parallel $SRUN_CPUS_PER_TASK --outFiles $TMPDIR/results 
rsync -av $TMPDIR/results/ $SLURM_SUBMIT_DIR/processed/
Figure 12: Source code for tmpdir.sh

2.10 SSH Keys for MPI

Some programs, such as Fluent, utilize MPI (Message Passing Interface) for parallel processing. MPI requires ‘passwordless login’, which is achieved through SSH keys. Here are the steps to set up SSH keys for MPI:

2.11 Creating Virtual Environments

The following documentation is specific to Speed. Other clusters may have their own requirements. Virtual environments are typically created using Conda or Python. Another option is Singularity (detailed in Section 2.16). These environments are usually created once during an interactive session before submitting a batch job to the scheduler. The job script submitted to the scheduler should:

  1. Activate the virtual environment.
  2. Use the virtual environment.
  3. Deactivate the virtual environment at the end of the job.

2.11.1 Anaconda

To create an Anaconda environment, follow these steps:

  1. Request an interactive session

         salloc -p pg --gpus=1

  2. Load the Anaconda module and create your Anaconda environment in your speed-scratch directory by using the --prefix option (without this option, the environment will be created in your home directory by default).

         module load anaconda3/2023.03/default
         conda create --prefix /speed-scratch/$USER/myconda

  3. List environments (to view your conda environment)

         conda info --envs
         # conda environments:
         base                  *  /encs/pkg/anaconda3-2023.03/root

  4. Activate the environment

         conda activate /speed-scratch/$USER/myconda

  5. Add pip to your environment (this will install pip and pip’s dependencies, including python, into the environment.)

         conda install pip

A consolidated example using Conda:

salloc -p pg --gpus=1 --mem=10G -A <slurm account name>
cd /speed-scratch/$USER
module load python/3.11.0/default
conda create -p /speed-scratch/$USER/pytorch-env
conda activate /speed-scratch/$USER/pytorch-env
conda install python=3.11.0
pip3 install torch torchvision torchaudio --index-url \
conda deactivate
exit # end the salloc session

If you encounter no space left error while creating Conda environments, please refer to Appendix B.3. Likely you forgot --prefix or environment variables below.

Important Note: pip (and pip3) are package installers for Python. When you use pip install, it installs packages from the Python Package Index (PyPI), whereas, conda install installs packages from Anaconda’s repository. Conda Env without --prefix If you don’t want to use the --prefix option every time you create a new environment and do not want to use the default home directory, you can create a new directory and set the following variables to point to the newly created directory, e.g.:

mkdir -p /speed-scratch/$USER/conda
setenv CONDA_ENVS_PATH /speed-scratch/$USER/conda
setenv CONDA_PKGS_DIRS /speed-scratch/$USER/conda/pkg

If you want to make these changes permanent, add the variables to your .tcshrc or .bashrc (depending on the default shell you are using).

2.11.2 Python

Setting up a Python virtual environment is straightforward. Here’s an example that use a Python virtual environment:

salloc -p pg --gpus=1 --mem=10G -A <slurm account name>
cd /speed-scratch/$USER
module load python/3.9.1/default
mkdir -p /speed-scratch/$USER/tmp
setenv TMPDIR /speed-scratch/$USER/tmp
setenv TMP /speed-scratch/$USER/tmp
python -m venv $TMPDIR/testenv (testenv=name of the virtualEnv)
source /speed-scratch/$USER/tmp/testenv/bin/activate.csh
pip install modules...

See, e.g., gurobi-with-python.sh

Important Note: our partition ps is used for CPU jobs, while pg, pt, and cl are used for GPU jobs. You do not need to use --gpus when preparing environments for CPU jobs.

Note: Python enviornments are also preferred over Conda in some clusters, see a note in Section

2.12 Example Job Script: Fluent

#SBATCH --job-name=flu10000    ## Give the job a name 
#SBATCH --mail-type=ALL        ## Receive all email type notifications 
#SBATCH --chdir=./             ## Use currect directory as working directory 
#SBATCH --nodes=1              ## Number of nodes to run on 
#SBATCH --ntasks-per-node=32   ## Number of cores 
#SBATCH --cpus-per-task=1      ## Number of MPI threads 
#SBATCH --mem=160G             ## Assign 160G memory per node 
module avail ansys 
module load ansys/19.2/default 
set FLUENTNODES = "‘scontrol␣show␣hostnames‘" 
set FLUENTNODES = ‘echo $FLUENTNODES | tr ’ ’ ’,’‘ 
srun fluent 3ddp \ 
        -g -t$SLURM_NTASKS \ 
        -g-cnf=$FLUENTNODES \ 
        -i $SLURM_SUBMIT_DIR/fluentdata/info.jou > call.txt 
srun rsync -av $TMPDIR/ $SLURM_SUBMIT_DIR/fluentparallel/ 
Figure 13: Source code for fluent.sh

The job script in Figure 13 runs Fluent in parallel over 32 cores. Notable aspects of this script include requesting e-mail notifications (--mail-type), defining the parallel environment for Fluent with -t$SLURM_NTASKS and -g-cnf=$FLUENTNODES, and setting $TMPDIR as the in-job location for the “moment” rfile.out file. The script also copies everything from $TMPDIR to a directory in the user’s NFS-mounted home after the job completes. Job progress can be monitored by examining the standard-out file (e.g., slurm-249.out), and/or by examining the “moment” file in TMPDIR (usually /disk/nobackup/<yourjob> (it starts with your job-ID)) on the node running the job. Be cautious with journal-file paths.

2.13 Example Job: EfficientDet

The following steps describe how to create an EfficientDet environment on Speed, as submitted by a member of Dr. Amer’s research group:

2.14 Java Jobs

Jobs that call Java have a memory overhead, which needs to be taken into account when assigning a value to --mem. Even the most basic Java call, such as Java -Xmx1G -version, will need to have, --mem=5G, with the 4 GB difference representing the memory overhead. Note that this memory overhead grows proportionally with the value of -Xmx. For example,

2.15 Scheduling on the GPU Nodes

Speed has various GPU types in various subclusters of its nodes.

Job scripts for the GPU queues differ in that they need these statements, which attach either a single GPU or more GPUs to the job with the appropriate partition:

  #SBATCH --gpus=[1|x]
  #SBATCH -p [pg|pt|cl|pa]

The default max quota for \(x\) is 4.

Once your job script is ready, submit it to the GPU partition (queue) with:

  sbatch --mem=<MEMORY> -p pg ./<myscript>.sh

--mem and -p can reside in the script.

You can query nvidia-smi on the node running your job with:

  ssh <ENCSusername>@speed-[01|03|05|17|25|27|37-43]|nebulae nvidia-smi

The status of the GPU queues can be queried e.g. with:

  sinfo -p pg --long --Node
  sinfo -p pt --long --Node
  sinfo -p cl --long --Node
  sinfo -p pa --long --Node
  sinfo -p pn --long --Node

You can query rocm-smi on the AMD GPU node running your job with:

  ssh <ENCSusername>@speed-19 rocm-smi

Important note for TensorFlow and PyTorch users: if you are planning to run TensorFlow and/or PyTorch multi-GPU jobs, please do not use the tf.distribute and/or torch.nn.DataParallel functions on speed-01, speed-05, or speed-17, as they will crash the compute node (100% certainty). This appears to be a defect in the current hardware architecture. The workaround is to either manually effect GPU parallelisation (see Section 2.15.1) (TensorFlow provides an example on how to do this), or to run on a single GPU, which is now the default for those nodes.

Important: Users without permission to use the GPU nodes can submit jobs to the various GPU partitions, but those jobs will hang and never run. Their availability can be seen with:

[serguei@speed-submit src] % sinfo -p pg --long --Node
Thu Oct 19 22:31:04 2023
speed-05       1        pg        idle 32     2:16:1 515490        0      1    gpu16 none
speed-17       1        pg     drained 32     2:16:1 515490        0      1    gpu16 UGE
speed-25       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
speed-27       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
[serguei@speed-submit src] % sinfo -p pt --long --Node
Thu Oct 19 22:32:39 2023
speed-37       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-38       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-39       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-40       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-41       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-42       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-43       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none

To specifically request a GPU node, add, --gpus=[#GPUs], to your sbatch statement/script or salloc statement request. For example:

  sbatch -t 10 --mem=1G --gpus=1 -p pg ./tcsh.sh

The request can be further specified to a specific node using -w or a GPU type or feature.

[serguei@speed-submit src] % squeue -p pg -o "%15N %.6D %7P %.11T %.4c %.8z %.6m %.8d %.6w %.8f %20G %20E"
speed-05             1 pg          RUNNING    1    *:*:*     1G        0 (null)   (null) 11929     (null)
[serguei@speed-submit src] % sinfo -p pg -o "%15N %.6D %7P %.11T %.4c %.8z %.6m %.8d %.6w %.8f %20G %20E"
speed-17             1 pg          drained   32   2:16:1 515490        0      1    gpu16 gpu:6        UGE
speed-05             1 pg            mixed   32   2:16:1 515490        0      1    gpu16 gpu:6       none
speed-[25,27]        2 pg             idle   32   2:16:1 257458        0      1    gpu32 gpu:2       none

2.15.1 P6 on Multi-GPU, Multi-Node

As described earlier, P6 cards are not compatible with Distribute and DataParallel functions (PyTorch, Tensorflow) when running on multiple GPUs. One workaround is to run the job in Multi-node, single GPU per node (this applies to P6 nodes: speed-05, speed-17, speed-01):

  #SBATCH --nodes=2
  #SBATCH --gpus-per-node=1

An example script for training on multiple nodes with multiple GPUs is provided in pytorch-multinode-multigpu.sh illustrates a job for training on Multi-Nodes, Multi-GPUs

2.15.2 CUDA

When calling CUDA within job scripts, it is important to link to the desired the desired CUDA libraries and set the runtime link path to the same libraries. For example, to use the cuda-11.5 libraries, specify the following in your Makefile.

  -L/encs/pkg/cuda-11.5/root/lib64 -Wl,-rpath,/encs/pkg/cuda-11.5/root/lib64

In your job script, specify the version of GCC to use prior to calling CUDA:

  module load gcc/9.3

2.15.3 Special Notes for Sending CUDA Jobs to the GPU Queues

Interactive jobs (Section 2.8) must be submitted to the GPU partition to compile and link. Several versions of CUDA are installed in:


For CUDA to compile properly for the GPU partition, edit your Makefile replacing usrlocalcuda with one of the above.

2.15.4 OpenISS Examples

These examples represent more comprehensive research-like jobs for computer vision and other tasks with longer runtime (subject to the number of epochs and other parameters). They derive from the actual research works of students and their theses and require the use of CUDA and GPUs. These examples are available as “native” jobs on Speed and as Singularity containers.

Examples include: OpenISS and REID A computer-vision-based person re-identification (e.g., motion capture-based tracking for stage performance) part of the OpenISS project by Haotao Lai [12] using TensorFlow and Keras. The script is available here: openiss-reid-speed.sh. The fork of the original repo [14] adjusted to run on Speed is available here: openiss-reid-tfk. Detailed instructions on how to run it on Speed are in the README: https://github.com/NAG-DevOps/speed-hpc/tree/master/src#openiss-reid-tfk OpenISS and YOLOv3 The related code using YOLOv3 framework is in the the fork of the original repo [13] adjusted to to run on Speed is available here: openiss-yolov3.

Example job scripts can run on both CPUs and GPUs, as well as interactively using TensorFlow:

Detailed instructions on how to run these on Speed are in the README: https://github.com/NAG-DevOps/speed-hpc/tree/master/src#openiss-yolov3

2.16 Singularity Containers

Singularity is a container platform designed to execute applications in a portable, reproducible, and secure manner. Unlike Docker, Singularity does not require root privileges, making it more suitable for HPC environments. If the /encs software tree does not have the required software available, another option is to run Singularity containers. We run EL7 and EL9 flavors of Linux, and if some projects require Ubuntu or other distributions, it is possible to run that software as a container, including those converted from Docker. The currently recommended version of Singularity is singularity/3.10.4/default.

The example lambdal-singularity.sh showcases an immediate use of a container built for the Ubuntu-based LambdaLabs software stack, originally built as a Docker image then pulled in as a Singularity container. The source material used for the docker image was our fork of their official repository: https://github.com/NAG-DevOps/lambda-stack-dockerfiles.

Note: If you make your own containers or pull from DockerHub, use your /speed-scratch/$USER directory, as these images may easily consume gigabytes of space in your home directory, quickly exhausting your quota.

Tip: To check your quota and find big files, see Section B.3 and ENCS Data Storage.

We have also built equivalent OpenISS (Section 2.15.4) containers from their Docker counterparts for teaching and research purposes [16]. The images from https://github.com/NAG-DevOps/openiss-dockerfiles and their DockerHub equivalents https://hub.docker.com/u/openiss can be found in /speed-scratch/nag-public with a ‘.sif’ extension. Some can be run in both batch and interactive modes, covering basics with CUDA, OpenGL rendering, and computer vision tasks. Examples include Jupyter notebooks with Conda support.


This section introduces working with Singularity, its containers, and what can and cannot be done with Singularity on the ENCS infrastructure. For comprehensive documentation, refer to the authors’ guide: https://www.sylabs.io/docs/.

Singularity containers are either built from an existing container, or from scratch. Building from scratch requires a recipe file (think of like a Dockerfile) and must be done with root permissions, which are not available on the ENCS infrastructure. Therefore, built-from-scratch containers must be created on a user-managed/personal system. There are three types of Singularity containers:

“A common workflow is to use the “sandbox” mode for container development and then build it as a default (squashfs) Singularity image when done.” says the Singularity’s authors about builds. File-system containers are considered legacy and are not commonly used.

For many workflows, a Docker container might already exist. In this case, you can use Singularity’s docker pull function as part of your virtual environment setup in an interactive job allocation:

  salloc --gpus=1 -n8 --mem=4Gb -t60
  cd /speed-scratch/$USER/
  singularity pull openiss-cuda-devicequery.sif docker://openiss/openiss-cuda-devicequery
  INFO:    Converting OCI blobs to SIF format
  INFO:    Starting build...

This method can be used for converting Docker containers directly on Speed. On GPU nodes, make sure to pass on the --nv flag to Singularity so its containers could access the GPUs. See the linked example for more details.

3 Conclusion

The cluster operates on a “first-come, first-served” basis until it reaches full capacity. After that, job positions in the queue are determined based on past usage. The scheduler does attempt to fill gaps, so occasionally, a single-core job with lower priority may be scheduled before a multi-core job with higher priority.

3.1 Important Limitations

While Speed is a powerful tool, it is essential to recognize its limitations to use it effectively:

3.2 Tips/Tricks

3.3 Use Cases

A History

A.1 Acknowledgments

A.2 Migration from UGE to SLURM

For long term users who started off with Grid Engine here are some resources to make a transition and mapping to the job submission process.

A.3 Phases

Brief summary of Speed evolution phases.

A.3.1 Phase 5

Phase 5 saw incorporation of the Salus, Magic, and Nebular subclusters (see Figure 2).

A.3.2 Phase 4

Phase 4 had 7 SuperMicro servers with 4x A100 80GB GPUs each added, dubbed as “SPEED2”. We also moved from Grid Engine to SLURM.

A.3.3 Phase 3

Phase 3 had 4 vidpro nodes added from Dr. Amer totalling 6x P6 and 6x V100 GPUs added.

A.3.4 Phase 2

Phase 2 saw 6x NVIDIA Tesla P6 added and 8x more compute nodes. The P6s replaced 4x of FirePro S7150.

A.3.5 Phase 1

Phase 1 of Speed was of the following configuration:

B Frequently Asked Questions

B.1 Where do I learn about Linux?

All Speed users are expected to have a basic understanding of Linux and its commonly used commands. Here are some recommended resources:

Software Carpentry Software Carpentry provides free resources to learn software, including a workshop on the Unix shell. Visit Software Carpentry Lessons to learn more.

Udemy There are numerous Udemy courses, including free ones, that will help you learn Linux. Active Concordia faculty, staff and students have access to Udemy courses. A recommended starting point for beginners is the course “Linux Mastery: Master the Linux Command Line in 11.5 Hours”. Visit Concordia’s Udemy page to learn how Concordians can access Udemy.

B.2 How to use bash shell on Speed?

This section provides comprehensive instructions on how to utilize the bash shell on the Speed cluster.

B.2.1 How do I set bash as my login shell?

To set your default login shell to bash on Speed, your login shell on all GCS servers must be changed to bash. To make this change, create a ticket with the Service Desk (or email help at concordia.ca) to request that bash become your default login shell for your ENCS user account on all GCS servers.

B.2.2 How do I move into a bash shell on Speed?

To move to the bash shell, type bash at the command prompt:

[speed-submit] [/home/a/a_user] > bash
bash-4.4$ echo $0

Note how the command prompt changes from “[speed-submit] [/home/a/a_user] >” to “bash-4.4$” after entering the bash shell.

B.2.3 How do I use the bash shell in an interactive session on Speed?

Below are examples of how to use bash as a shell in your interactive job sessions with both the salloc and srun commands.

Note: Make sure the interactive job requests memory, cores, etc.

B.2.4 How do I run scripts written in bash on Speed?

To execute bash scripts on Speed:

  1. Ensure that the shebang of your bash job script is #!/encs/bin/bash
  2. Use the sbatch command to submit your job script to the scheduler.

Check Speed GitHub for a sample bash job script.

B.3 How to resolve “Disk quota exceeded” errors?

B.3.1 Probable Cause

The “Disk quota exceeded” error occurs when your application has run out of disk space to write to. On Speed, this error can be returned when:

  1. The NFS-provided home is full and cannot be written to. You can verify this using the quota and bigfiles commands.
  2. The “/tmp” directory on the speed node where your application is running is full and cannot be written to.

B.3.2 Possible Solutions

  1. Use the --chdir job script option to set the job working directory. This is the directory where the job will write output files.
  2. Although local disk space is recommended for IO-intensive operations, the ‘/tmp’ directory on Speed nodes is limited to 1TB, so it may be necessary to store temporary data elsewhere. Review the documentation for each module used in your script to determine how to set working directories. The basic steps are:

In the above example, $USER is an environment variable containing your ENCS username.

B.3.3 Example of setting working directories for COMSOL

In the above example, $USER is an environment variable containing your ENCS username.

B.3.4 Example of setting working directories for Python Modules

By default when adding a Python module, the /tmp directory is set as the temporary repository for files downloads. The size of the /tmp directory on speed-submit is too small for PyTorch. To add a Python module

In the above example, $USER is an environment variable containing your ENCS username.

B.4 How do I check my job’s status?

When a job with a job ID of 1234 is running or terminated, you can track its status using the following commands:

B.5 Why is my job pending when nodes are empty?

B.5.1 Disabled nodes

It is possible that one or more of the Speed nodes are disabled for maintenance. To verify if Speed nodes are disabled, check if they are in a draining or drained state:

[serguei@speed-submit src] % sinfo --long --Node
Thu Oct 19 21:25:12 2023
speed-01       1        pa        idle 32     2:16:1 257458        0      1    gpu16 none
speed-03       1        pa        idle 32     2:16:1 257458        0      1    gpu32 none
speed-05       1        pg        idle 32     2:16:1 515490        0      1    gpu16 none
speed-07       1       ps*       mixed 32     2:16:1 515490        0      1    cpu32 none
speed-08       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-09       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-10       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-11       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-12       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-15       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-16       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-17       1        pg     drained 32     2:16:1 515490        0      1    gpu16 UGE
speed-19       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-20       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-21       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-22       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-23       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-24       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-25       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
speed-25       1        pa        idle 32     2:16:1 257458        0      1    gpu32 none
speed-27       1        pg        idle 32     2:16:1 257458        0      1    gpu32 none
speed-27       1        pa        idle 32     2:16:1 257458        0      1    gpu32 none
speed-29       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-30       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-31       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-32       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-33       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-34       1       ps*        idle 32     2:16:1 515490        0      1    cpu32 none
speed-35       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-36       1       ps*     drained 32     2:16:1 515490        0      1    cpu32 UGE
speed-37       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-38       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-39       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-40       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-41       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-42       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none
speed-43       1        pt        idle 256    2:64:2 980275        0      1 gpu20,mi none

Note which nodes are in the state of drained. The reason for the drained state can be found in the reason column.

Your job will run once an occupied node becomes availble or the maintenance is completed, and the disabled nodes have a state of idle.

B.5.2 Error in job submit request.

It is possible that your job is pending because it requested resources that are not available within Speed. To verify why job ID 1234 is not running, execute:

sacct -j 1234

A summary of the reasons can be obtained via the squeue command.

C Sister Facilities

Below is a list of resources and facilities similar to Speed at various capacities. Depending on your research group and needs, they might be available to you. They are not managed by HPC/NAG of AITS, so contact their respective representatives.

D Software Installed On Speed

This is a generated section by a script; last updated on Tue Jul 23 10:48:52 PM EDT 2024. We have two major software trees: Scientific Linux 7 (EL7), which is outgoing, and AlmaLinux 9 (EL9). After major synchronization of software packages is complete, we will stop maintaining the EL7 tree and will migrate the remaining nodes to EL9.

Use --constraint=el7 to select EL7-only installed nodes for their software packages. Conversely, use --constraint=el9 for the EL9-only software. These options would be used as a part of your job parameters in either #SBATCH or on the command line.

NOTE: this list does not include packages installed directly on the OS (yet).

D.1 EL7

Not all packages are intended for HPC, but the common tree is available on Speed as well as teaching labs’ desktops.

D.2 EL9


