German Research School for Simulation Sciences, Aachen, Germany
The lattice Boltzmann method is often said to be parallelised easily with a quasi-linear speed-up.
Illustrated by application scenarios from the area of medical physics, we show that this statement is only true for the most simple approaches.
Typical coupled multi-scale simulations (including the popular local mesh-refinement) suffer from a trade-off between memory- and CPU load balancing on the one hand side, and data locality on the other hand side, which can not be achieved simultaneously.
Since coming generations of supercomputers do not provide any faster, but just more processing units (CPUs,cores, …), load-balancing and efficient parallel communication with several 100,000 of cores will become an increasingly important issue for any large-scale LB application.
In our presentation we will outline the problem with the example of a modern octree-based LB-HPC implementation providing local mesh refinement.
An extension of the issue towards general (coupled) multi-scale applications, which are becoming increasingly popular in the field of medical physics HPC-simulation, is sketched.
Currently, we are not able to suggest an easy solution for the conflict of objectives concerning load balancing and data locality for LB implementations, and we have the impression that a major part of the community is even unaware of the problem itself.
If not resolved, that might well impose severe restrictions regarding our expectations for increasing problem sizes of LB simulations on the next generations of HPC systems.