Compute Cluster

Compute Cluster

The CM department's demand for high-performance computing is met by its own compute cluster. Since 2014, the cluster is located in and operated by the Max Planck Computing and Data Facility in Garching (near Munich), which offers not only a high expertise and excellent support, but also an energy-efficient cooling via groundwater. The computational ressources are exclusively reserved for the department and the Atomistic Modelling group of the GO department.

The cluster design is optimized for medium-scale plane-wave density-functional theory calculations (20-400 cores, compute bound), which account for the major part of its actual use. In particular, the compute nodes are connected in an imbalanced tree topology with a high blocking factor: up to 44 nodes are attached to a single leaf switch. Within this network island, the nodes can communicate with low delays among each other, at the expense of sacrificing bandwidth for communication between islands and with the fileserver. Consequently, all our calculations are forced to run on a single island (max. 1760 cores).

The cluster is heavily used and typically loaded to more than 90% throughout the year. A unique advantage of using our own cluster is that we can adapt the usage rules to the scientific demands. For data, a 420 TB fileserver is available. The cluster has two parts, which are alternatingly replaced after 6 years, i.e. every 3 years half of the cluster is renewed. At present, we have

  • cmfe (since 2016): 222 compute nodes, each with 40 CPU cores (namely 2x 20-core Intel "Broadwell" E5-2698 v4 @ 2.2 GHz) and 128 GB of RAM, connected via 56 Gbit/s infiniband (1:8 blocking); Linpack peak performance ~150 Tflop/s
  • cmti (since 2020): 358 compute nodes, each with 40 CPU cores (namely 2x 20-core Intel "Skylake" Xeon Gold 6230 @ 2.1 GHz) and 192 GB of RAM, connected via 100 Gbit/s Omnipath (1:12 blocking); Linpack performance  ~1.8 Tflops/node when run within islands (~640 Tflops in total).

The next cluster upgrade is planned for 2024. In sum, 23200 CPU cores are available, delivering ~16 million core-hours per month. The total amount of RAM is 97 TB, but this number is about as meaningful as the total number of pages in a library or the total weight of fruits available in a grocery store.

In addition, the department has access to a specialized machine-learning cluster TALOS, as well as the large-scale computing facility of the Max Planck society.

Go to Editor View