HPC 和机器学习专家

Tokyo Institute of Technology TSUBAME3.0

Tokyo Institute of Technology TSUBAME3.0

TSUBAME3.0

Tokyo Institute of Technology Background

Tokyo Institute of Technology (Tokyo Tech, Tokodai or TITech) is a national research university located in Greater Tokyo Area, Japan. Tokyo Tech is the largest institution for higher education in Japan dedicated to science and technology, and is generally considered to be one of the most prestigious universities in Japan.

Tokyo Tech’s main campus is located at Ōokayama on the boundary of Meguro and Ota, with its main entrance facing the Ōokayama Station. Other campuses are located in Suzukakedai and Tamachi. Tokyo Tech is organised into 6 schools, within which there are over 40 departments and research centres. Tokyo Tech enrolled 4,734 undergraduates and 1,464 graduate students for 2015-2016. It employs around 1,100 faculty members.

Project Background

Tokyo Institute of Technology TSUBAME3.0 utilizes Altair Grid Engine and BeeOND an environment with 540 nodes, each with four Nvidia Tesla P100 GPUs (2,160 total), two 14-core Intel Xeon Processor E5-2680 v4 (15,120 cores total), four Intel Omni-Path Architecture (Intel OPA) 100 Series host fabric adapters (2,160 ports total), and 2 TB of Intel SSD DC Product Family for NVMe storage devices.

TSUBAME3.0

Use case

Altair Grid Engine allows TSUBAME3.0 to create resource groups from GPUs, CPUs, memory and the Omni-Path interconnect. No other job scheduler has this capability today. UGE creates and manages Docker containers dynamically. BeeOND creates a temporary high speed scratch burst buffer like file system utilizing NVMe in the compute nodes with a maximum capacity of 1PB. The size of the on demand file system is determined when a job is started by Altair Grid Engine.

Pacific Teck’s role

Pacific Teck worked closely with the staff at the Tokyo Institute of Technology and the system integrator HPE (previously SGI) to understand the requirements for this challanging project to povide some of the most advance job scheduling cabilities in the world. Pacific Teck communicated the requirments of Tokyo Institute of Technology to the Altair development team. A new version of Altair Grid Engine 8.6 was released which now makes the feature set that was developed for Tokyo Institute of Technology to be availble for the general market. We do believe that with this project we will enable higher utilization rates of supercpomuters across the globe.