Published: Monday, 03 May 2021 09:04
New HPC cluster taken into full operation
The new HPC cluster at the Center for Information Services and High Performance Computing (ZIH) at TU Dresden was taken into full operation. The cluster from NEC Deutschland GmbH will be used by our center for applications in the fields of Artificial Intelligence (AI).
The TU Dresden is one of eleven German Universities of Excellence, a title which confirms the potential of one of Germany’s largest technical universities. TU Dresden offers a broad spectrum of study programmes, uniting the natural and engineering sciences with the humanities and social sciences, as well as medicine.
The new HPC cluster is especially designed for machine learning and was financed by the German Ministry for Science, Research and Education (BMBF) for the competence centre ScaDS.AI Dresden/Leipzig. After a Europe-wide tender, NEC Deutschland GmbH won the bid and installed the system already in December 2020.
At the heart of the system and essential for the computing power are a total of 272 NVIDIA A100 GPUs, eight of which are contained in each of the 34 compute nodes. Their theoretical maximum performance of floating point operations is more than 2.6 PFlop/s at 64-bit (double precision), more than 5.3 PFlop/s at 32-bit (single precision), and more than 42 PFlop/s in FT32-to-FP32 Tensor Operations. This is expected to make the system fast enough for an entry in the upcoming Top500 list in June 2021.
Each node also features a large 1 TB of main memory and 3.2 TB of local NVMe cache to quickly feed data to the GPUs. Fast connectivity to the central HPC storage complex is provided via two HDR InfiniBand ports each with a combined 400 Gbps of network bandwidth at a very low latency. The maximum power consumption of a node is 4.8 kW. Direct hot water cooling (DLC) ensures high energy efficiency while utilising the waste heat.
The new computing cluster will be integrated into the existing HPC infrastructure of the ZIH. As HPC competence centre, ZIH offers specialized computing resources as well as individual support and consulting for its users. The system will primarily be available for AI research of the competence centre ScaDS.AI Dresden/Leipzig. The execution of highly parallel applications that use AI methods for fast data analysis will benefit from this efficient system, driving both model development and expressiveness of analyses.
“The new Machine Learning solution from NEC provides us with a new level of High Performance Computing power for our AI research. The most important reasons for our decision have been the excellent computational capacity for the given budget, as well as a very convincing cooling concept. We are very impressed by the new system as well as by NEC’s excellence in providing service and support capabilities, and their expertise in AI” as Professor Dr. Wolfgang Nagel, Director at Center of Information Services and High Performance Computing explains.
“TU Dresden has an excellent reputation in research and ZIH is a very important HPC data centre in Germany. Therefore, we feel very honoured that NEC was given the task to deliver a new Petaflop system for their A.I. research,” Yuichi Kojima, Managing Director of NEC Deutschland GmbH and Vice President HPC at NEC Europe, adds.
ZIH is the university IT centre of TU Dresden and the High Performance Computing (HPC) competence centre for TU Dresden and the state of Saxony for over 20 years. Since the beginning of 2021, ZIH is one of eight NHR centres in the initiative “Nationales Hochleistungsrechnen” of the national Joint Science Conference (Gemeinsame Wissenschaftskonferenz, GWK) and the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) and thus offers its services to academic users from all over Germany.
About NEC Deutschland GmbH
NEC Deutschland GmbH is a wholly owned subsidiary of NEC Europe Ltd. and is a leading provider of HPC solutions, focusing on sustained performance for real-life scientific and engineering applications. To achieve this goal NEC delivers technology and professional services to industry and academia. Linux-based HPC clusters as well as our high-end vector systems meet the different needs of different customers in the most flexible way. Energy-efficiency is one of the key design objectives, addressed by advanced cooling technologies or by the high-bandwidth vector-architecture, which delivers unprecedented efficiency on real world code. The service capabilities from the operation of complex systems to the optimization of scientific codes and NEC's storage-appliances complete our solution offering.