PerficienCC: Performance and Efficiency in HPC with Custom Computing

Overview

To improve the energy efficiency of HPC systems they are increasingly augmented with hardware accelerators. The use of accelerators does however fall behind their fundamental performance and efficiency potential. In the PerficienCC project we work at closing this gap in cooperation with computational scientists that are customers of the HPC services at the Paderborn Center for Parallel Computing. We focus on application-specific hardware accelerators based on FPGAs. Through a tight cooperation of FPGA experts in our research group and developers of scientific codes at Paderborn University we will study the potential to accelerate important applications with FPGAs and we will port open-source scientific software to FPGAs. Additionally, we will provide generalized code with FPGA support as libraries to the community and will generate training material to educate computational scientists in this area. Our project aims at making FPGA technology more accessible and providing an empirical evaluation of the benefit of FPGAs for HPC and data center applications.

Key Facts

Project type:
Research
Project duration:
01/2016 - 05/2020
Contribution to sustainability:
Industry, Innovation and Infrastructure, Responsible Consumption and Production
Funded by:
DFG
Websites:
Homepage
Current research projects of the High-Performance IT Systems group
PerficienCC - Performance and Efficiency in HPC with Custom Computing

More Information

Principal Investigators

contact-box image

Prof. Dr. Christian Plessl

High-Performance Computing

About the person

Project Team

contact-box image

Dr. Tobias Kenter

High-Performance Computing

About the person
contact-box image

Dr. Michael Laß

Paderborn Center for Parallel Computing (PC2)

About the person

Results

In view of the continuing increase in computational time requirements of computational sciences, sustainable improvements of existing and further developed simulation codes with respect to performance and energy efficiency are necessary. Innovative accelerator architectures such as FPGAs can make an important contribution to this, as can new methods, but have been difficult for users to access. Through close collaboration between code developers from computational sciences and HPC experts focused on accelerator architectures, the project was able to bridge this gap in several areas. On the application side, two important simulation methods were improved, which also account for large shares of the computation time of the HPC systems at the Paderborn Center for Parallel Computing (PC2): First, the Ab-initio Molecular Dynamics (AIMD) domain with the open source code CP2K, and second, the solution of Maxwell's equations in the time domain for applications from the field of nanophotonics. For the simulations based on Maxwell's equations, a special focus was put on the energy-efficient execution with FPGAs. For the stencil-based finite-difference time-domain (FDTD) method, it was shown that FPGAs from different manufacturers, Intel and Xilinx, can be used with a common code base. For a Discontinuous Galerkin (DG) method on unstructured grids, arguably the first FPGA implementation ever was then demonstrated, showing performance and efficiency gains over the CPU reference. The good results for a method where it is difficult to exploit performance potentials with conventional architectures form the basis for further research on abstractions and transfer to other application areas where partial differential equations are solved with the finite element method. For AIMD with electron structure calculations in CP2K, major advances in performance and efficiency have been achieved across multiple levels of abstraction, which in concert enable entirely new scales of simulated problems. The methodological core of the work is a new approximation method, named the submatrix method, for matrix operations on sparse matrices that require the largest fraction of computation time for AIMD simulation with CP2K. The numerical errors introduced by the approximation, can be compensated in an innovative way. Further efficiency gains could be achieved by offloading local matrix operations to FPGAs. These innovations provided essential foundations for a recently published record simulation of up to 100 million atoms with electron structure-based AIMD, where the same approach was used to offload computations with reduced accuracy to GPU tensor cores.


Much of the code developed in PerficienCC has already been reintegrated into the public CP2K code. Further software releases support the sustainability of the achieved results. In particular, components such as matrix multiplications and 3D FFTs were extracted as FPGA libraries, which were initially developed to accelerate CP2K, but have broader applicability. In addition, tutorials on application acceleration on FPGAs with vendor tools for high-level language synthesis have been conducted at the Paderborn site and at international conferences. The project results contribute significantly to the strategic development of PC2 into a center of excellence for FPGA acceleration of scientific simulation and as a computational center in the NHR network, which has received coverage in the international trade press as well as in regional media.


Selected Pubilcations:


„Flexible FPGA design for FDTD using OpenCL“. In: Proc. Int. Conf. on Field Programmable Logic and Applications (FPL). IEEE, 2017, S. 1–7. ISBN: 978-9-0903-0428-1

T. Kenter, J. Förstner und C. Plessl

(Siehe online unter https://doi.org/10.23919/FPL.2017.8056844)


„A Massively Parallel Algorithm for the Approximate Calculation of Inverse P-Th Roots of Large Sparse Matrices“. In: Proc. Platform for Advanced Scientific Computing Conf. (PASC). PASC ’18. Basel, Switzerland: Association for Computing Machinery, 2018. ISBN: 9781450358910

M. Lass, S. Mohr, H. Wiebeler, T. D. Kühne und C. Plessl

(Siehe online unter https://doi.org/10.1145/3218176.3218231)


„OpenCL-based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes“. In: Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2018

T. Kenter, G. Mahale, S. Alhaddad, Y. Grynko, C. Schmitt, A. Afzal, F. Hannig, J. Förstner u. a.

(Siehe online unter https://doi.org/10.1109/FCCM.2018.00037)


„OpenCL Implementation of Cannon’s Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs“. In: 2019 International Conference on Field- Programmable Technology (ICFPT). Dez. 2019, S. 99–107

P. Gorlani, T. Kenter und C. Plessl

(Siehe online unter https://doi.org/10.1109/ICFPT47387.2019.00020)


„A Submatrix-Based Method for Approximate Matrix Function Evaluation in the Quantum Chemistry Code CP2K“. In: Proc. Int. Conf. on High Performance Computing, Networking, Storage and Analysis (SC). SC ’20. Atlanta, Georgia: IEEE Press, 2020. ISBN: 9781728199986

M. Lass, R. Schade, T. D. Kühne und C. Plessl

(Siehe online unter https://doi.org/10.1109/SC41405.2020.00084)


„CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations“. In: The Journal of Chemical Physics 152.19194103 (2020)

T. Kühne, M. Iannuzzi, M. D. Ben, V. V. Rybkin, P. Seewald, F. Stein, T. Laino, R. Z. Khaliullin u. a.

(Siehe online unter https://doi.org/10.1063/5.0007045)


Publications

CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations
T. Kühne, M. Iannuzzi, M.D. Ben, V.V. Rybkin, P. Seewald, F. Stein, T. Laino, R.Z. Khaliullin, O. Schütt, F. Schiffmann, D. Golze, J. Wilhelm, S. Chulkov, M.H.B.-H. Mohammad Hossein Bani-Hashemian, V. Weber, U. Borstnik, M. Taillefumier, A.S. Jakobovits, A. Lazzaro, H. Pabst, T. Müller, R. Schade, M. Guidon, S. Andermatt, N. Holmberg, G.K. Schenter, A. Hehn, A. Bussy, F. Belleflamme, G. Tabacchi, A. Glöß, M. Lass, I. Bethune, C.J. Mundy, C. Plessl, M. Watkins, J. VandeVondele, M. Krack, J. Hutter, The Journal of Chemical Physics 152 (2020).
A Submatrix-Based Method for Approximate Matrix Function Evaluation in the Quantum Chemistry Code CP2K
M. Lass, R. Schade, T. Kühne, C. Plessl, in: Proc. International Conference for High Performance Computing, Networking, Storage and Analysis (SC), IEEE Computer Society, Los Alamitos, CA, USA, 2020, pp. 1127–1140.
Accurate Sampling with Noisy Forces from Approximate Computing
V. Rengaraj, M. Lass, C. Plessl, T. Kühne, Computation 8 (2020).
OpenCL Implementation of Cannon's Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs
P. Gorlani, T. Kenter, C. Plessl, in: Proceedings of the International Conference on Field-Programmable Technology (FPT), IEEE, 2019.
A General Algorithm to Calculate the Inverse Principal p-th Root of Symmetric Positive Definite Matrices
D. Richters, M. Lass, A. Walther, C. Plessl, T. Kühne, Communications in Computational Physics 25 (2019) 564–585.
Show all publications