Articles | Volume 8, issue 9
https://doi.org/10.5194/gmd-8-2977-2015
https://doi.org/10.5194/gmd-8-2977-2015
Development and technical paper
 | 
30 Sep 2015
Development and technical paper |  | 30 Sep 2015

Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme

M. Huang, J. Mielikainen, B. Huang, H. Chen, H.-L. A. Huang, and M. D. Goldberg

Abstract. The planetary boundary layer (PBL) is the lowest part of the atmosphere and where its character is directly affected by its contact with the underlying planetary surface. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transport in the whole atmospheric column. It determines the flux profiles within the well-mixed boundary layer and the more stable layer above. It thus provides an evolutionary model of atmospheric temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. For such purposes, several PBL models have been proposed and employed in the weather research and forecasting (WRF) model of which the Yonsei University (YSU) scheme is one. To expedite weather research and prediction, we have put tremendous effort into developing an accelerated implementation of the entire WRF model using graphics processing unit (GPU) massive parallel computing architecture whilst maintaining its accuracy as compared to its central processing unit (CPU)-based implementation. This paper presents our efficient GPU-based design on a WRF YSU PBL scheme. Using one NVIDIA Tesla K40 GPU, the GPU-based YSU PBL scheme achieves a speedup of 193× with respect to its CPU counterpart running on one CPU core, whereas the speedup for one CPU socket (4 cores) with respect to 1 CPU core is only 3.5×. We can even boost the speedup to 360× with respect to 1 CPU core as two K40 GPUs are applied.

Download
Short summary
To expedite weather research and prediction, we have put tremendous effort into developing an accelerated implementation of the entire WRF model using GPU massive parallel computing architecture. This paper presents our efficient GPU-based design on WRF YSU PBL scheme. Using one NVIDIA Tesla K40 GPU, the GPU-based YSU PBL scheme achieves a speedup of 193x with respect to its runtime on 1 CPU core. We can even boost the speedup to 360x with respect to 1 CPU core as two K40 GPUs are applied.