Altair Grid Engine (Job Management)

What is Altair Grid Engine?

Altair Grid Engine is the flagship product from Altair Corporation and is often called a “workload manager” or “job scheduler.” These types of software products facilitate “cluster computing,” which means to use multiple computers (servers) at the same time to process information.

Altair Grid Engine is the leading distributed resource management system that optimizes resources in thousands of data centers by transparently selecting the resources that are best suited for each segment of work.

Pacific Teck has licensed and supported Altair Grid Engine to hundreds of sites around Asia Pacific.

Why use Altair Grid Engine?

Altair Grid Engine is the most advanced and best supported job scheduler on the market. It is unparalleled in resource utilization, tight integration with container solutions, and exascale level scalability for both size of environment and number of jobs supported.

Altair Grid Engine has advanced capabilities to map resources inside nodes including GPU, CPU, Memory, and Interconnect. Once mapped, Altair Grid Engine can allocate these resources on a per job basis. What this means practically is that when a user only needs a portion of the resources of a node for their job, they can get exactly what they need. If they had to reserve the whole node, resources would be idle, unable to be used by others, and therefore wasted.

TSUBAME3.0 Container - Based Fine-grained Spatial Resource Allocations of Fat Nodes

Get the Most Utilization Out of Your Resources

Altair has developed a feature called Resource Maps which has a map of the location of resources such as GPUs and utilizes Linux cgroups in more advanced ways than other schedulers can do. The result is the ability to control resources at a much finer level than other solutions: interconnect, GPU, NVMe, memory, Intel CPU cores can be mapped and bound together inside a node. The benefit is that instead of giving away a very resource rich node to one user, the node can be split up into only the resources a group/user/job needs at a given time. This functionally helps fully utilize the resources and when using resources in a node to use the best CPU-GPU-Memory and Interconnect combination. Resources can also be bound together into multiple containers, users or jobs per node.

Manage Containers More Easily Than Ever Before

Many schedulers only possess the ability to start and stop containers. What happens to the jobs inside the containers during runtime is a black box.

When using Docker, Altair Grid Engine uses a pair of shepherds located outside and inside the container to be able to have full control over those jobs. The result is that container jobs can be run exactly the same as non-container jobs.

Singularity Pro and Docker with Altair Grid Engine both support MPI in containers, but Singularity Pro was designed with MPI/HPC workloads specifically in mind.

Pacific Teck is also working with Sylabs to provide commercial support for Singularity in APAC.

Maximum Data Performance with BeeGFS and BeeOND

Altair Grid Engine and BeeGFS/BeeOND are integrated at TiTech (540 Nodes with 4 P100 and 4 OPA HFIs per node ) and ABCI (1088 Nodes with 4 V100 and 2 EDR ports per node). Altair Grid Engine kicks off BeeOND and tells it how many NVMe to use, when to use it, and what to do after the job finishes. This is really an on demand burst buffer using the NVMe contained in compute nodes.

Altair Grid Engine can also limit users to a certain number of BeeOND (NVMe) resources.

Use case

Machine Learning

Altair Grid Engine is used in the world’s largest machine learning supercomputers. In Japan, machine learning leaders AIST, RIKEN AIP, and TiTech are using Altair Grid Engine to maximize utilization of NVIDIA GPU heavy nodes.

Life Sciences

Altair Grid Engine is used by most major pharmaceutical companies in the world and is a dominant force in medical and genetic research labs. In Japan, some of the largest institutions utilize Altair Grid Engine such as Tohoku University Medical Megabank, National Institute of Genetics, and Human Genome Center at the University of Tokyo.

Enterprise Level Support

Pacific Teck is the official partner in Asia for Altair for over 10 years. Pacific Teck and Altair work together to provide best in class support and most robust roadmap of any job scheduler available.

Case study

RIKEN Center for Advanced Intelligence Project(AIP) – Japan
ABCI / National Institute of Advanced Industrial Science and Technology(AIST) – Japan
Kyoto University – Japan
Hirosaki University – Japan
TSUBAME3.0 / Titech – Japan
Tohoku Medical Megabank Organization(ToMMo) / Tohoku University – Japan
Institute of Fluid Science, Tohoku University – Japan
The Institute of medical science(HGC), The University of Tokyo – Japan
National Institute of Genetics(NIG) – Japan
The National Agriculture and Food Research Organization(NARO) – Japan
DENSO CORPORATION – Japan
Honda Engineering – Japan
Kazusa DNA Research Institute – Japan
National Cancer Center – Japan
National Center for High-Performance Computing – Taiwan
A*STAR the Genome Institute of Singapore(GIS) – Singapore