Altair Grid Engine is the flagship product from Altair Corporation and is often called a “workload manager” or “job scheduler.” These types of software products facilitate “cluster computing,” which means to use multiple computers (servers) at the same time to process information.
Altair Grid Engine is the leading distributed resource management system that optimizes resources in thousands of data centers by transparently selecting the resources that are best suited for each segment of work.
Pacific Teck has licensed and supported Altair Grid Engine to hundreds of sites around Asia Pacific.
Altair Grid Engine is the most advanced and best supported job scheduler on the market. It is unparalleled in resource utilization, tight integration with container solutions, and exascale level scalability for both size of environment and number of jobs supported.
Altair Grid Engine has advanced capabilities to map resources inside nodes including GPU, CPU, Memory, and Interconnect. Once mapped, Altair Grid Engine can allocate these resources on a per job basis. What this means practically is that when a user only needs a portion of the resources of a node for their job, they can get exactly what they need. If they had to reserve the whole node, resources would be idle, unable to be used by others, and therefore wasted.
Altair has developed a feature called Resource Maps which has a map of the location of resources such as GPUs and utilizes Linux cgroups in more advanced ways than other schedulers can do. The result is the ability to control resources at a much finer level than other solutions: interconnect, GPU, NVMe, memory, Intel CPU cores can be mapped and bound together inside a node. The benefit is that instead of giving away a very resource rich node to one user, the node can be split up into only the resources a group/user/job needs at a given time. This functionally helps fully utilize the resources and when using resources in a node to use the best CPU-GPU-Memory and Interconnect combination. Resources can also be bound together into multiple containers, users or jobs per node.
Many schedulers only possess the ability to start and stop containers. What happens to the jobs inside the containers during runtime is a black box.
When using Docker, Altair Grid Engine uses a pair of shepherds located outside and inside the container to be able to have full control over those jobs. The result is that container jobs can be run exactly the same as non-container jobs.
Singularity Pro and Docker with Altair Grid Engine both support MPI in containers, but Singularity Pro was designed with MPI/HPC workloads specifically in mind.
Pacific Teck is also working with Sylabs to provide commercial support for Singularity in APAC.
Altair Grid Engine and BeeGFS/BeeOND are integrated at TiTech (540 Nodes with 4 P100 and 4 OPA HFIs per node ) and ABCI (1088 Nodes with 4 V100 and 2 EDR ports per node). Altair Grid Engine kicks off BeeOND and tells it how many NVMe to use, when to use it, and what to do after the job finishes. This is really an on demand burst buffer using the NVMe contained in compute nodes.
Altair Grid Engine can also limit users to a certain number of BeeOND (NVMe) resources.
Altair Grid Engine is used in the world’s largest machine learning supercomputers. In Japan, machine learning leaders AIST, RIKEN AIP, and TiTech are using Altair Grid Engine to maximize utilization of NVIDIA GPU heavy nodes.
Altair Grid Engine is used by most major pharmaceutical companies in the world and is a dominant force in medical and genetic research labs. In Japan, some of the largest institutions utilize Altair Grid Engine such as Tohoku University Medical Megabank, National Institute of Genetics, and Human Genome Center at the University of Tokyo.
Pacific Teck is the official partner in Asia for Altair for over 5 years. Pacific Teck and Altair work together to provide best in class support and most robust roadmap of any job scheduler available.
Sharing and use of limited, often costly, application license features across users, groups, departments, or projects.
Migrate HPC workloads to the Cloud