Articles | Volume 8, issue 10
https://doi.org/10.5194/gmd-8-3321-2015
https://doi.org/10.5194/gmd-8-3321-2015
Development and technical paper
 | 
22 Oct 2015
Development and technical paper |  | 22 Oct 2015

Par@Graph – a parallel toolbox for the construction and analysis of large complex climate networks

H. Ihshaish, A. Tantet, J. C. M. Dijkzeul, and H. A. Dijkstra

Abstract. In this paper, we present Par@Graph, a software toolbox to reconstruct and analyze complex climate networks having a large number of nodes (up to at least 106) and edges (up to at least 1012). The key innovation is an efficient set of parallel software tools designed to leverage the inherited hybrid parallelism in distributed-memory clusters of multi-core machines. The performance of the toolbox is illustrated through networks derived from sea surface height (SSH) data of a global high-resolution ocean model. Less than 8 min are needed on 90 Intel Xeon E5-4650 processors to reconstruct a climate network including the preprocessing and the correlation of 3 × 105 SSH time series, resulting in a weighted graph with the same number of vertices and about 3.2 × 108 edges. In less than 14 min on 30 processors, the resulted graph's degree centrality, strength, connected components, eigenvector centrality, entropy and clustering coefficient metrics were obtained. These results indicate that a complete cycle to construct and analyze a large-scale climate network is available under 22 min Par@Graph therefore facilitates the application of climate network analysis on high-resolution observations and model results, by enabling fast network reconstruct from the calculation of statistical similarities between climate time series. It also enables network analysis at unprecedented scales on a variety of different sizes of input data sets.

Download
Short summary
Par@Graph, a software toolbox to reconstruct and analyze large-scale complex climate networks. It exposes parallelism on distributed-memory computing platforms to enable the construction of massive networks from large number of time series based on the calculation of common statistical similarity measures between them. Providing additionally parallel graph algorithms to enable fast calculation of important and common properties of the generated networks on SMP machines.