CitedEvidence
User Settings
Article

Efficient Data Transfer Method for Image Filtering Implementation on FPGA Using OpenCL

1

TL;DRAbstract

Heterogeneous platforms which commonly consist of a central processing unit (CPU) and a graphic processing unit (GPU) receive lots of attention to achieve both high performance and low power consumption. Furthermore, modern heterogeneous platforms often employ a field programmable gate array (FPGA) device in addition to a CPU and a GPU. To fully utilize these heterogeneous hardware accelerators, Open Computing Language (OpenCL) has been developed. In this paper, an FPGA implementation of image filtering with effective data transfer using OpenCL is proposed. To utilize the configurable pipelined architecture of the target FPGA, an effective local memory allocation scheme is proposed for a convolution kernel, and a loopunrolling method is applied to increase the local memory allocation efficiency. By using the proposed method, the average local memory access latency is improved significantly for various memory access patterns. Also, the proposed filtering kernel shows a better performanc

Chat with Paper

AI Agents for this Paper

Heterogeneous platforms which commonly consist of a central processing unit (CPU) and a graphic processing unit (GPU) receive lots of attention to achieve both high performance and low power consumption. Furthermore, modern heterogeneous platforms often employ a field programmable gate array (FPGA) device in addition to a CPU and a GPU. To fully utilize these heterogeneous hardware accelerators, Open Computing Language (OpenCL) has been developed. In this paper, an FPGA implementation of image filtering with effective data transfer using OpenCL is proposed. To utilize the configurable pipelined architecture of the target FPGA, an effective local memory allocation scheme is proposed for a convolution kernel, and a loopunrolling method is applied to increase the local memory allocation efficiency. By using the proposed method, the average local memory access latency is improved significantly for various memory access patterns. Also, the proposed filtering kernel shows a better performanc

Keywords

Computer scienceField-programmable gate arrayKernel (algebra)Central processing unitParallel computingComputer hardwareEmbedded system

Chat

Click to start Chat