Intel’s DAOS tops the IO500 speed challenge for HPC storage

Intel’s Distributed Application Object Storage (DAOS) object storage system – now open-sourced, but developed to

Intel’s Distributed Application Object Storage (DAOS) object storage system – now open-sourced, but developed to make use of its super-fast Optane storage – came out top this summer in the IO500, which measures high-performance computing (HPC) hardware, software and file system combinations against each other.

DAOS arrived in 2015, and is an object storage system designed around Intel’s 3D Xpoint Optane technology, which provides something approaching the high-speed access capabilities of RAM but with persistent storage.

Intel wants DAOS to be a successor to the Lustre file system, which it previously held rights to but sold to DDN in 2018. In the November 2019 IO500, Lustre came third with a score of 453.68 while DAOS was second with 933.64, just behind winner WekaIO with its Matrix system.

Two tiers of Optane plus SSD

The IO500 test measures throughput and the speed of access to files between a compute node chosen at random and a storage node. In all cases, files are shared between several storage nodes and accessible by a number of compute nodes.

Intel’s DAOS-based configuration saw storage nodes equipped with NVMe-connected Optane DCPMM, SSD M.2-connected Optane, and 3D NAND SSD. On the compute side, the DAOS file system handled requests via Posix, HDFS, Apache Spark and MPI I/O (an HPC-specific parallel I/O scheme).

Besides these, for more conventional business intelligence applications DAOS supports VeloC, TensorFlow, NoSQL and also S3.

Object storage and metadata store

DAOS is an open source object storage system accompanied by a metadata store. It works in the Linux user space mode, so it bypasses the Linux kernel bottleneck and it is possible to update it without having to restart the servers on which it runs.

Metadata is stored in Optane DCPMM modules, which also serve as cache for the SSD-connected media. To maximise access speed, DAOS doesn’t provide any mechanism for replication or erasure coding for data that is already saved.

It aims to be resistant to incidents that arise from writing the same data to several nodes at once. According to industry observers, DAOS is perhaps the heaviest consumer of hardware resources – and therefore the most expensive – of all cluster-based file storage solutions.

We must note that, for now, Intel hasn’t written drivers for the various types of storage access available from Linux. For Posix, for example, DAOS is happy to rest on the capabilities of the Lustre drivers. DAOS does not handle Posix requests – which many HPC applications need – natively but instead uses Lustre’s foreign layout feature which allows an entity in the Lustre namespace to be backed by something that is not managed by Lustre.

HPC on the rise

HPC used to mean mostly large batch processed calculation workloads done on mainframes and supercomputers, often in scientific work, weather prediction, oil and gas prospecting, and so on.

But the growing use of analytics in a wide variety of sectors has pushed HPC into the mainstream, where analytics are used alongside trading financial trading and commercial web activities.

The rise of artificial intelligence and machine learning in data analytics, where large amounts of processing power need to be applied, has further boosted the recent rise of HPC.

Source Article