BeeOND stands for BeeGFS on Demand, and is a complementary product to BeeGFS but can be used with other file systems as well. It is typically used to aggregate the performance and capacity of internal SSDs or hard disks in compute nodes for the duration of a compute job. This provides additional performance and a very elegant way of burst buffering. The way it works is to create one or multiple instances of BeeGFS on these compute nodes that can be created and destroyed “on-demand.”
Modern compute nodes are often rich in SSDs and NVMe that are under utilized. By grouping unused high-speed resources together, a space is created where users can process some or all of their data much faster than on the standard hard disk based file system.
The main advantages of the typical BeeOND use-case on compute nodes are:
BeeOND uses existing NVMes and SSDs in the system, even space from SSDs shared with the OS. Many competitive burst buffer solutions require purchasing a new layer of expensive hardware, but BeeOND uses resources that already exist.
Pacific Teck is the official ThinkParQ Platinum Partner in Asia. Installation & Support is offered entirely by Pacific Teck and through our partner SIs. We have experience with the largest sites in Asia, and work back to back with ThinkParQ for source code level fixes. We recommend pairing with enterprise supported Altair Grid Engine.
A very easy way to remove I/O load and possibly nasty I/O patterns from your persistent global file system. Temporary data created during the job runtime will never need to be moved to your global persistent file system, anyways. But even the data that should be preserved after the job end might be better stored to a BeeOND instance initially and then at the end can be copied to the persistent global storage completely sequentially in large chunks for maximum bandwidth.
Applications can complete faster, because with BeeOND, they can be running on SSDs (or maybe even a RAM-disk), while they might only be running on spinning disks on your normal persistent global file system. Combining the SSDs of multiple compute nodes not only gets you to high bandwidth easily, it also gets you to a system that can handle very high IOPS.
Machine learning environments are often rich in NVMe resources for BeeOND to utilize. In Japan, Altair Grid Engine and BeeGFS/BeeOND are integrated at TiTech (540 Nodes with 4 P100 and 4 OPA HFIs per node ) and ABCI (1088 Nodes with 4 V100 and 2 EDR ports per node). Altair Grid Engine kicks off BeeOND and tells it how many NVMe to use, when to use it, and what to do after the job finishes. This is really an on demand burst buffer using the NVMe contained in compute nodes.
BeeOND help scientific clusters around the world achieve burst buffer level performance increases without breaking the budget. Together with Altair Grid Engine, this rich resource using existing hardware can be fairly shared among groups and users.
We can customize it to your needs. Please feel free to contact us regarding system configuration. *required