CPU vs. GPU: Performance comparison of OpenCL Applications on a Heterogeneous Architecture
Keywords:
Heterogeneous Computing, CPU/GPU, Loop Unrolling, GP-GPU, OpenCLAbstract
The objective of researchers and developers has always been to attain superior performance for their computing applications. In this regard, the use of Graphic Processing Unit (GPU) is very common and initially it is used to accelerate the performance of graphic applications. The success of GPU has attracted researchers and they have shown keen interest to use GPU acceleration for regular applications. However, there have been many studies in recent past claiming, even though the application is well suited for parallelism it is not guaranteed to run faster on the GPU. In this this paper we compare performance of commonly used OpenCL applications both on CPU and GPU platforms. We measure the execution time of each application on both platforms and investigate why an application performed better on a particular platform. In this regard, we analyze the source code of each application and identify program features which contributes towards the better performance on a particular platform. The study has identified that loop unrolling and data dimensionality are crucial program features that can be leveraged to utilize the parallel processing capabilities of a GPU platform. We find that when maximum loop unrolling is used with two-dimensional input data, the 2D Convolution application executes around 20 times faster on GPU. Similarly, when the level of loop unrolling reduces, the performance gain also decreases on GPU. Ultimately, in the absence of loop unrolling along single-dimensional input data, CPU performs better. In this case, the ATAX application executes around 9x faster on CPU as compared to GPU.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License