User's Guide - Start Here

SSH to the Compute Nodes

SSH access to all compute nodes is currently enabled but will be disabled in the future. The HPC relies on the job scheduler to manage a large number of running jobs, maintain fair share usage, and improve resource utilization efficiency for the school. Please DO NOT SSH into the compute nodes (including using VSCode) unless a special arrangement has been made. For interactive workloads using x11 VNC or VSCode, please visit: https://cseunsw.atlassian.net/wiki/x/BgCiCQ .

Overview

[b] suffix indicates hardware variants under the same processor architecture. Could be larger scratch space, or more powerful GPGPU models etc.

Compute Node

Hardware Summary

MEMORY

GPGPU Summary

Compute Node

Hardware Summary

MEMORY

GPGPU Summary

wp-delta-xxxx

Old generation (pre 2020) of Intel Xeon based servers

64 - 256GB

NVIDIA A2, NVIDIA T1000

wp-zeta-xxxx[b]

Intel Sapphire Rapids Xeon based servers (Dell R760XA)

1024GB (1TB)

NVIDIA H100NVL

wp-omega-xxxx[b]

AMD Zen 4 based EPYC servers (Dell R6625, R7625 etc)

1024GB (1TB)

NVIDIA L4, L40S

Access to HPC

Eligibility

Anyone authorised to access CSE’s public login servers has access to this service. Please ensure you can log in to login.cse.unsw.edu.au first. Make sure your home directory is set up properly; if not, please contact CSG support.

To SSH into the login nodes:

ssh zID@glab.cse.unsw.edu.au

Outside UNSW

When accessing from outside the UNSW campus, you are required to connect using login.cse.unsw.edu.au as a jumpbox.

# Login to one of the CSE VLAB servers first ssh zID@login.cse.unsw.edu.au # then connect to the HPC ssh zID@glab.cse.unsw.edu.au

Help & Support

For any technical support requests or inquiries, please email ss@cse.unsw.edu.au. If the issue is related to a job failure, always include the JOB_ID and the job output file (either attach the file or provide the file path).

What’s Next?

First few things a new user may want to do is to look around, such as:

Check the HPC Overall Status

# Check the overall HPC queues status $ qstat -f -u "*" queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- all.q@wp-delta-f01.cse.unsw.ed BIP 0/0/56 0.01 lx-amd64 --------------------------------------------------------------------------------- all.q@wp-delta-f02.cse.unsw.ed BIP 0/0/56 0.04 lx-amd64 --------------------------------------------------------------------------------- all.q@wp-delta-f03.cse.unsw.ed BIP 0/0/56 0.00 lx-amd64 --------------------------------------------------------------------------------- all.q@wp-omega-c01.cse.unsw.ed BIP 0/0/96 0.00 lx-amd64 --------------------------------------------------------------------------------- all.q@wp-omega-c02.cse.unsw.ed BIP 0/0/96 0.01 lx-amd64 --------------------------------------------------------------------------------- all.q@wp-omega-c03.cse.unsw.ed BIP 0/0/96 0.03 lx-amd64 --------------------------------------------------------------------------------- all.q@wp-omega-c04.cse.unsw.ed BIP 0/0/96 0.01 lx-amd64

Check RQS (Resource Quota Set)

sge_resource_quota