Cashew Cluster – Engineering Technology Services

Clustered
Architecture for
Scientific and
High-Performance
Engineering
Workloads

Getting Started

Cluster Machine List and Queue overview

Click here for a list of names and specs for each of the nodes in the cluster along with an overview of the queues that you can submit jobs to.

OnDemand Cluster Portal

Open Ondemand is a web-based portal to access HPC resources. VPN required if not on campus.

Getting Access and Eligibility.

All Engineering students are pre-approved for access to the cluster. Fill a request here or contact us and we will add you to our cluster users group, allowing access to the cluster.

Faculty and staff members must own at least one node to gain long-term access to the Cashew cluster. Please contact us for more info

General Guidelines

Use the job scheduler.
- Do not run jobs on the login node. Processes that impact the performance of the login node will be killed without notice. If you need help running your job, please contact us and we can sit down and walk you through it.
Be good to each other.
- This cluster is shared by all in the College of Engineering. Please do your best to be fair and kind to others.

Partitions

Partitions are separate queues for submittied jobs and can contain overlapping groups of nodes. When you submit your job, resources are allocated from the partition’s nodes and your job runs on one or more of the nodes in that group. You can use the command sinfo or overview to see the list of partitions you can submit to. Below are a list of basic partitions configured on the Cashew cluster.

all
- This partition is all nodes in the cluster
- Note: Jobs submitted to individual research group’s partition will override this partition. This stops your job completely and places it back into the queue.
  - Use this partition for multiple short term jobs, jobs that are not high priority, or where you have checkpointing built into your program.
  - Use the “–nodelist” and “–exclude” options to run your jobs on subsets of the the all partition.
- This partition is capped at 20 submitted and/or pending jobs
coe-cpu
- All College of Engineering general compute student CPU cores
- Note: coe_cpu is the default partition.
- This partition is capped at 20 submitted and/or pending jobs
coe_gpu
- All College of Engineering general compute student GPU nodes
- This partition is capped at 20 submitted and/or pending jobs
coe_all
- All College of Engineering general compute student nodes (CPU and GPU).
- This partition is capped at 20 submitted and/or pending jobs
Research group or Department partitions. E.g. smi_all, tur_all
- If your research group or department has purchased a node on the cluster they will get a priority partition for their nodes. These partitions override the “all” queues. You can find the Priority partition or partitions for each node here

Storage on the Cluster

Each user gets 1 TB of space in the /home directory on our all-flash storage server. Research Groups may add separate storage servers to the cluster and will be available under the root (/) directory.

You can see your current storage usage with the get-quota command.

Please contact us to purchase more individual space or find out about more storage options.

Running a Job On the Cluster

When working with a cluster you won’t be running programs as you would on a personal computer or server. Instead, you interact with the cluster by issuing commands to the job scheduler. Traditionally, this is done via the command line and with scripts submitted to the scheduler to run your job. A typical workflow would involve the steps below.

Move your data/code to the cluster folder.
- For Linux and OSX users, we recommend scp, sftp. You can also use git on the cluster or the OnDemand File Browser.
- For Windows we recommend WinSCP, cloning with git while on the cluster, or using the OnDemand File Browser
- For those with large datasets there is an engineering data transfer node for working with globus and processing large transfers to avoid impacting the cluster head nodes. Please contact us for help getting started.
Write or edit your submission script to add your required scheduler options and what commands are needed to launch your program. More info
Load any modules, activate virtual environments, or set any other environment variables that are required. More info.
Submit your job to the cluster with the “sbatch” command.
- e.g. sbatch sample_submission.sh
- For info on how to submit, watch, cancel, or find info on your job please see the basic usage guide.
Instead of outputting to your terminal window, standard output and errors from your job will be put in a text file in the same directory where you submitted the job from. You can then check any output or move it to another location for further processing.
- Note: There’s no need to wait on the cluster for your job to end. You can use the scheduler options --mail-type and --mail-user to email you when your job is done.

If you would prefer a web-based option instead of using the command line, please see our Open Ondemand instance here. (VPN required when off-campus)

Example Workflow

From your client machine, upload your data and connect to the cluster.

Load any needed modules.

Write your submission script. You can find this example here.

Submit your job. Make sure to take note of your job number. You can check the status of your job with the overview command.

Submit your job. When it is done, check your data and output file for errors or standard output.

Clustered Architecture for Scientific and High-Performance Engineering Workloads