June 15, 2014
June 15, 2014
June 18, 2014
Computing & Information Technology
24.680.1 - 24.680.18
Hybrid MPI-OpenMP versus MPI Implementations: A Case Study We explored the performance of a hybrid MPI-OpenMP parallel implementation versus adirect MPI implementation in a 64-processor cluster architecture featuring 16 nodes with 4 coresper node. A scalability study was carried out where we varied the signal length for two differentsets of parallel cores. The algorithm being benchmarked is a parallel cyclic convolution algorithm with nointerprocessor communication that tightly matches our particular cluster architecture. In thisparticular case study a time-domain-based cyclic convolution algorithm was used for eachparallel subsection. By using MPI for distributing the data to the nodes, and then using OpenMPfor distributing the data among the cores inside each node we can match the architecture of ouralgorithm to the architecture of the cluster. Each core processes and identical program withdifferent data using a single program multiple data (SPMD) approach. All pre and post-processing tasks were performed at the master node. We first partitioned the cyclic convolution into 16 parallel subsections of one-fourth theoriginal convolution length using a radix-4 approach. This implementation was tackled usingMPI where the data was distributed from the master node to 16 parallel cores. We then repeatedthe execution by using an MPI-OpenMP approach. Using this hybrid technique data of one-halfthe original length was distributed from the master node to four nodes using MPI while the finalexecution was performed under OpenMP at each of the four nodes by halving each subsection intwo and using all four processors in the multicore node. The final processing length is again one-fourth of the original signal length and both method use 16 processors. We repeated the processfor eight different signal lengths. We then used the algorithm to partition the cyclic convolution into 64 parallelsubsections of one-eighth the original convolution length using a radix-8 approach. Thisimplementation was tackled using MPI where the data was distributed from the master node to64 parallel cores. We then repeated the execution by using an MPI-OpenMP approach. Usingthis hybrid technique data of one-fourth the original length was distributed to 16 nodes usingMPI while the final execution was performed under OpenMP at each of the 16 nodes by halvingeach subsection in two and using the four processors in the multicore node. The final processinglength is again one-eighth of the original signal length and both method use 64 processors. Werepeated the process for the same eight different signal lengths plus two additional lengthsafforded by the greater memory available when using 64 cores. We found that the MPI implementation had a slightly better performance than the hybrid,MPI-OpenMP implementation. We established that the speedup increases very slowly, in favorof the MPI-only approach, as the signal size increases. This is consistent with what is reported inthe literature. As a future work we plan to further our code optimization efforts and to benchmark formemory efficiency, where the hybrid approach could have an advantage, as well as for increasedperformance.
Mangual, O., & Teixeira, M., & Lopez-Roig, R., & Nevarez-Ayala, F. J. (2014, June), Hybrid MPI-OpenMP versus MPI Implementations: A Case Study Paper presented at 2014 ASEE Annual Conference & Exposition, Indianapolis, Indiana. 10.18260/1-2--20571
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2014 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015