8–12 Sept 2025
ASTRON Netherlands Institute for Radio Astronomy
Europe/Amsterdam timezone

High Performance GPU building blocks for radio astronomy and beyond

9 Sept 2025, 08:40
1h 15m
Auditorium + ISZoomRoom2 (ASTRON)

Auditorium + ISZoomRoom2

ASTRON

Oude Hoogeveensedijk 4 7991 PD Dwingeloo

Speakers

Dr Bram Veenboer (ASTRON) John W. Romein (ASTRON (Netherlands Institute for Radio Astronomy))

Description

Modern GPUs with tensor cores deliver exceptional throughput and energy efficiency for digital signal processing. Due to new hardware and software innovations, the computational performance of GPU correlators has improved two orders of magnitude during the past decade. However, to take full advantage from their increasing computing power, GPU systems have to handle Ethernet packets at proportionally increasing data rates, and this is a challenge, as I/O has traditionally been the GPU's Achilles heel.

In this session, we will present "RADIOBLOCKS": a collection of reusable GPU building blocks for radio telescopes that provide high performance and high I/O data rates. We will focus on the following radio blocks:
- the Tensor-Core Correlator (TCC); an efficient GPU library for computing correlations using tensor cores for complex matrix operations. We include a brief update on new features and improved performance.
- the Tensor-Core Beam Former (TCBF); a new GPU beam forming library for radio astronomy and medical ultrasound imaging, built on the ccglib library for fast complex matrix multiplications.
- a new GPU filter library, that implements a polyphase filter bank with fine delay compensation, bandpass correction, and a transpose for seamless integration with the TCC and TCBF. It uses the new cuFFTDx library to embed FFTs in a (large) GPU kernel, significantly reducing memory bandwidth use and improving performance compared to a cuFFT-based filter.
- High-speed Ethernet packet handling; we demonstrate a method, based on the Data Plane Development Kit and its GPUdev extension, with which a Grace Hopper GPU receives, filters, and correlates 1.2 Tb/s of Ethernet packets.

For each building block we will present measured performance and energy efficiency, showing how modern GPUs can address both computational and I/O challenges for future high data rate instruments.

Author

John W. Romein (ASTRON (Netherlands Institute for Radio Astronomy))

Co-author

Dr Bram Veenboer (ASTRON)

Presentation materials

There are no materials yet.