If you are new to clusters, you may find this wiki contains terms you haven't seen before. Here is a list of terms you should be familiar with.
AI: Artificial intelligence is a blanket term largely meaning computer’s ability to augment human tasks.
Cluster: A cluster is a collection of servers, networked to execute large jobs that require extensive resources not available on a single physical server/computer. They are especially adept at handling many jobs in parallel.
CPU: This stands for the Central Processing Unit, and is the part of a computer which executes software programs. Typically, we will use the term to refer to individual silicon chips. A CPU contains one or more cores, and our HPC system contains many CPUs. Core A core is an individual processor: the part of a computer which actually executes programs. CPUs used to have a single core, and the terms where interchangeable. In recent years, several cores, or processors have been manufactured on a single CPU chip, which may be referred to as a multiprocessor.
GPU: Graphics cards contain Graphics Processing Units (GPUs) for processing visual data. These processors have a different architecture than that of standard CPUs. They handle large matrix multiplications extremely well, and are often used in machine learning.
HPC: High Performance Computing is the term often used for large-scale computers, and the simulations and models which run on them.
Job: Any script, code, program, etc. that is run on a computer.
Job Scheduler: In order to ensure fair access to computational resources, a person submits their script to a job scheduler. This will allocate available resources to the job for its duration. We use the IBM job scheduler LSF (Load Sharing Facility). Other exampels include Slurm, and Sun Grid Engine.
Machine Learning: A subset of AI where a computer can be trained on a specific dataset, supervised or unsupervised, to perform a specific task, sometimes using neural networks.
Node: In traditional computing, a node is an object on a network. Supercomputers, like the DeepSense platform, are essentially a group of servers (computers) connected via a fast interconnect. There are different types of nodes, used in different ways. The headnodes (our login nodes) are where end users typically connect. The compute nodes are where large computations are run. Compute nodes also fall into different categories (some have larger memory; some have GPUs).