Sailfish: massively parallel LBM simulations with open source software on GPUs

Michal Januszewski

University of Silesia in Katowice, Poland; Google Switzerland

Sailfish is an advanced LBM code designed from the ground up for GPUs. Implemented predominantly in Python, it uses run-time code generation techniques to automatically build optimized code for CUDA and OpenCL devices.

Python was our language of choice due to its excellent support for both GPU programming and for various system libraries. It was also helpful in fulfilling our goal of shortening development time and encouraging experimentation without any performance sacrifices. With a single precision speed of 685 MLUPS with the D3Q19 lattice on a Tesla C2050 device, 1175 MLUPS on a Tesla K20x, as well as linear weak scaling in distributed multi-GPU simulations, Sailfish compares favorably with the best published results.

Sailfish takes advantage of a computer algebra system (Sympy) to use a high-level model description where many formulae are entered in symbolic form and are then automatically translated into lower-level CUDA or OpenCL code. This makes the source code shorter, easier to read, and allows for model verification at the level of mathematics. This in turn supplements automated tests at a lower level of abstraction, which automatically verify isolated functionality or the precision of simulations for specific geometries.

With an open source, freely available code base, support for a large number of LB models (LBGK, MRT, regularized LB, entropic LB as well as Shen-Chen and free-energy for binary fluid simulations) and a wide range of boundary conditions, Sailfish is a powerful tool for both research and production-level engineering simulations.

An overview of the architecture of the code will be presented, together with illustrations of how the high-level model description is used in practice. This will be followed by a discussion of the performance/precision trade-offs between various LB models and optimization techniques in relation to the last 3 generations of NVIDIA GPUs (Tesla, Fermi, Kepler). Since these trade-offs are specific to the GPU architectures and not a particular implementation, we hope the conclusions will be useful also to users of other GPU codes. Last but not least, some examples of simulations related to turbulence and biofluid dynamics will be presented to illustrate the simplicity of using Sailfish for real-life problems.