CPU vs. GPU: Performance comparison of OpenCL Applications on a Heterogeneous Architecture

Muhammad Nadeem Nadir; Muhammad Siraj Rathore; Asad Hayat; Junaid Abdullah Mansoor

Authors

Muhammad Nadeem Nadir The Univeristy of Lahore, Lahore, 54590, Pakistan.
Muhammad Siraj Rathore Capital Univeristy of Sciecne and Technology, Islamabad, 46000, Pakistan.
Asad Hayat Taiyuan University of Science and Technology,Taiyuan, 030000, China.
Junaid Abdullah Mansoor University of Management and Technology, Lahore, 54770, Pakistan.

Keywords:

Heterogeneous Computing, CPU/GPU, Loop Unrolling, GP-GPU, OpenCL

Abstract

The objective of researchers and developers has always been to attain superior performance for their computing applications. In this regard, the use of Graphic Processing Unit (GPU) is very common and initially it is used to accelerate the performance of graphic applications. The success of GPU has attracted researchers and they have shown keen interest to use GPU acceleration for regular applications. However, there have been many studies in recent past claiming, even though the application is well suited for parallelism it is not guaranteed to run faster on the GPU. In this this paper we compare performance of commonly used OpenCL applications both on CPU and GPU platforms. We measure the execution time of each application on both platforms and investigate why an application performed better on a particular platform. In this regard, we analyze the source code of each application and identify program features which contributes towards the better performance on a particular platform. The study has identified that loop unrolling and data dimensionality are crucial program features that can be leveraged to utilize the parallel processing capabilities of a GPU platform. We find that when maximum loop unrolling is used with two-dimensional input data, the 2D Convolution application executes around 20 times faster on GPU. Similarly, when the level of loop unrolling reduces, the performance gain also decreases on GPU. Ultimately, in the absence of loop unrolling along single-dimensional input data, CPU performs better. In this case, the ATAX application executes around 9x faster on CPU as compared to GPU.

CPU vs. GPU: Performance comparison of OpenCL Applications on a Heterogeneous Architecture

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference