Browsing by Author "Levent, Vecdi Emre"
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
Conference ObjectPublication Metadata only An area efficient real time implementation of dual tree complex wavelet transform in field programmable gate arrays(IEEE, 2015) Canbay, F.; Levent, Vecdi Emre; Serbes, G.; Uğurdağ, Hasan Fatih; Goren, S.; Aydin, N.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Levent, Vecdi EmreBiomedical signals (BSs), which give information about the normal condition and also the inherent irregularities of our body, are expected to have non-stationary character due to the time-varying behavior of physiological systems. The Fourier transform and the short time Fourier transform are the widely used frequency and time-frequency analysis methods for extracting information from BSs with fixed frequency and time-frequency resolution respectively. However, in order to derive relevant information from non-stationary BSs, an appropriate analysis method which exhibits adjustable time-frequency resolution is needed. The wavelet transform (WT) can be used as a mathematical microscope in which the time-frequency resolution can be adjusted according to the different parts of the signal. The discrete wavelet transform (DWT) is a fast and discretized implementation for classical WT. Due to the aliasing, lack of directionality and shift-variance disadvantages, the DWT exhibits limited performance in the process of BSs. In literature, an improved version of the DWT, which is named as Dual Tree Complex Wavelet Transform (DTCWT), is employed in the analysis of BSs with great success. In this study, considering the improvements in embedded system technology and the needs for wavelet based real-time feature extraction or de-noising systems in portable medical devices, the DTCWT is implemented as a sub-system in field programmable gate arrays. In proposed hardware architecture, for every data input-channel, DTCWT is implemented by using only one adder and one multiplier. Additionally, considering the multi-channel outputs of biomedical data acquisition systems, this architecture is designed with the capability of running in parallel for N channels.Conference ObjectPublication Metadata only Field programmable gate arrays implementation of dual tree complex wavelet transform(IEEE, 2015) Canbay, F.; Levent, Vecdi Emre; Serbes, G.; Goren, S.; Aydin, N.; Levent, Vecdi EmreDue to the inherent time-varying characteristics of physiological systems, most biomedical signals (BSs) are expected to have non-stationary character. Therefore, any appropriate analysis method for dealing with BSs should exhibit adjustable time-frequency (TF) resolution. The wavelet transform (WT) provides a TF representation of signals, which has good frequency resolution at low frequencies and good time resolution at high frequencies, resulting in an optimized TF resolution. Discrete wavelet transform (DWT), which is used in various medical signal processing applications such as denoising and feature extraction, is a fast and discretized algorithm for classical WT. However, the DWT has some very important drawbacks such as aliasing, lack of directionality, and shift-variance. To overcome these drawbacks, a new improved discrete transform named as Dual Tree Complex Wavelet Transform (DTCWT) can be used. Nowadays, with the improvements in embedded system technology, portable real-time medical devices are frequently used for rapid diagnosis in patients. In this study, in order to implement DTCWT algorithm in FPGAs, which can be used as real-time feature extraction or denoising operator for biomedical signals, a novel hardware architecture is proposed. In proposed architecture, DTCWT is implemented with only one adder and one multiplier. Additionally, considering the multi-channel outputs of biomedical data acquisition systems, this architecture is capable of running N channels in parallel.Conference ObjectPublication Metadata only FPGA implementation of a dense optical flow algorithm using altera openCL SDK(Springer International Publishing, 2017) Ulutaş, Umut; Tosun, Mustafa; Levent, Vecdi Emre; Büyükaydın, D.; Akgün, T.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Ulutaş, Umut; Tosun, Mustafa; Levent, Vecdi EmreFPGA acceleration of compute-intensive algorithms is usually not regarded feasible because of the long Verilog or VHDL RTL design efforts they require. Data-parallel algorithms have an alternative platform for acceleration, namely, GPU. Two languages are widely used for GPU programming, CUDA and OpenCL. OpenCL is the choice of many coders due to its portability to most multi-core CPUs and most GPUs. OpenCL SDK for FPGAs and High-Level Synthesis (HLS) in general make FPGA acceleration truly feasible. In data-parallel applications, OpenCL based synthesis is preferred over traditional HLS as it can be seamlessly targeted to both GPUs and FPGAs. This paper shares our experiences in targeting a demanding optical flow algorithm to a high-end FPGA as well as a high-end GPU using OpenCL. We offer throughput and power consumption results on both platforms.Book PartPublication Unknown A multi-channel real time implementation of dual tree complex wavelet Transform in field programmable gate arrays(Springer International Publishing, 2016) Canbay, F.; Levent, Vecdi Emre; Serbes, G.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Levent, Vecdi EmreIn medical applications, biomedical acquisition systems (BASs) are frequently used in order to diagnose and monitor critical conditions such as stroke, epilepsy, Alzheimer disease, arrhythmias and etc. Biomedical signals (BSs), which produce valuable information about the condition of various physiological subsystems in our body, can be obtained by using multi-channel BASs. Due to the time-varying behavior of physiological sub-systems, most of the BSs are expected to have non-stationary character. In order to derive desired clinical information from these non-stationary BSs, an appropriate analysis method which exhibits adjustable time-frequency resolution is needed. The wavelet transform (WT), in which the time-frequency resolution can be adjusted according to the different parts of the signal, are widely used in the analysis of BSs. The discrete wavelet transform (DWT) is a fast and discretized implementation of classical WT and was employed as a feature extractor and de-noising operator for BSs in literature. However, due to the aliasing, lack of directionality and being shift-variance disadvantages, the DWT exhibits limited performance. A modified version of the DWT, which is named as Dual Tree Complex Wavelet Transform (DTCWT), is employed in the analysis of BSs and improved results are obtained. Therefore, in this study, considering the improvements in embedded system technology and the needs for wavelet based multi-channel real-time feature-extraction/de-noising operations in portable medical devices, the DTCWT is implemented as a multi-channel system-on-chip by using field programmable gate arrays. In proposed hardware architecture, for N input-channels, the DTCWT is implemented by using only one adder and one multiplier. The area efficiency and speed limits of proposed system are presented comparing with our previous approaches.Conference ObjectPublication Unknown Output domain downscaler(ISCIS 2016: Computer and Information Sciences, 2016) Büyükmıhçı, M.; Levent, Vecdi Emre; Guzel, Aydın Emre; Ates, Ozgur; Tosun, Mustafa; Akgün, T.; Erbas, C.; Gören, S.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Levent, Vecdi Emre; Guzel, Aydın Emre; Ates, Ozgur; Tosun, MustafaThis paper offers an area-efficient video downscaler hardware architecture, which we call Output Domain Downscaler (ODD). ODD is demonstrated through an implementation of the bilinear interpolation method combined with Edge Detection and Sharpening Spatial Filter. We compare ODD to a straight-forward implementation of the same combination of methods, which we call Input Domain Downscaler (IDD). IDD tries to output a new pixel of the downscaled video frame every time a new pixel of the original video frame is received. However, every once in a while, there is no downscaled pixel to produce, and hence, IDD stalls. IDD sometimes also skips a complete row of input pixels. ODD, on the other hand, spreads out the job of producing downscaled pixels almost uniformly over a frame. As a result, ODD is able to employ more resource sharing, i.e., can do the same job with fewer arithmetic units, thus offers a more area-efficient solution than IDD. In this paper, we explain how ODD and IDD work and also share their FPGA synthesis results.Conference ObjectPublication Unknown Rapid design of real-time image fusion on FPGA using HLS and other techniques(IEEE, 2018) Aydın, Furkan; Uğurdağ, Hasan Fatih; Levent, Vecdi Emre; Güzel, Aydın Emre; Annafianto, Nur Fajar Rızqı; Özkan, M. A.; Akgun, T.; Erbas, C.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Aydın, Furkan; Levent, Vecdi Emre; Güzel, Aydın Emre; Annafianto, Nur Fajar RızqıDuring the process of implementing a parameterized hardware IP generator for an image fusion algorithm, we had a chance to test various tools and techniques such as HLS, pipelining, and PCIe logic/software porting, which we developed in a previous design project. Image fusion combines two or more images through a color transformation process. Depending on the application, different fps and/or resolution may be needed. Yet the specifics of the image-processing algorithm may frequently change causing redesign. If the target platform is FPGA, usually rapid yet optimized hardware implementation is required. All these requirements cannot be met only by HLS. Clever approaches in terms of architectural techniques such as unorthodox ways of pipelining, RTL coding, and creative ways of porting interface logic/software allowed us to meet the requirements outlined above. With all these in our arsenal, we were able to get 3 versions of the algorithm (with different fps and/or resolution) running on Cyclone IV and Arria 10 FPGAs in a fairly short amount of time. This paper explains the image fusion algorithm, our hardware architecture as well as our specific flow for rapid implementation of it.ArticlePublication Unknown Tools and techniques for implementation of real-time video processing algorithms(Springer Nature, 2019-01) Levent, Vecdi Emre; Güzel, Aydın Emre; Tosun, M.; Büyükmıhcı, Mert; Aydın, Furkan; Goren, S.; Erbas, C.; Akgun, T.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Levent, Vecdi Emre; Güzel, Aydın Emre; Büyükmıhcı, Mert; Aydın, FurkanThis paper describes flexible tools and techniques that can be used to efficiently design/generate quite a variety of hardware IP blocks for highly parameterized real-time video processing algorithms. The tools and techniques discussed in the paper include host software, FPGA interface IP (PCIe, USB 3.0, DRAM), high-level synthesis, RTL generation tools, synthesis automation as well as architectural concepts (e.g., nested pipelining), an architectural estimation tool, and verification methodology. The paper also discusses a specific use case to deploy the mentioned tools and techniques for hardware design of an optical flow algorithm. The paper shows that in a fairly short amount of time, we were able to implement 11 versions of the optical flow algorithm running on 3 different FPGAs (from 2 different vendors), while we generated and synthesized several thousand designs for architectural trade-off.PhD DissertationPublication Metadata only Tools and tecniques for implementation of real-time video processing algorithms(2018-09) Levent, Vecdi Emre; Uğurdağ, Hasan Fatih; Uğurdağ, Hasan Fatih; Uysal, Murat; Kıraç, Furkan; Demir, O.; Aydın, N.; Department of Computer Science; Levent, Vecdi EmreHardware implementation of video processing algorithms, which are usually real-time by nature, need architectural exploration so that we achieve the required performance with minimal cost. In addition, the video algorithm to be implemented may need to be used with di erent frames-per-second and resolution in di erent applications. Hence, we usually need to design a parameterized IP block instead of a xed design. Also, during the hardware design process, the requirements fed from the algorithms team may change as well as the algorithm itself. As a result of these, hardware implementation iterations need to be as fast as the algorithms development iterations. This is only possible with the use of tools and techniques speci cally geared towards hardware design generation for video processing. The tools and techniques discussed in this dissertation include host software, FPGA interface IP, HLS, RTL generation tools, an architectural estimation tool, ow based veri cation approach, and logic synthesis automation as well as architectural concepts (e.g., nested pipelining). The architectural estimation tool estimates many design metrics. These metrics are area, throughput, latency, DRAM usage, interface bandwidth, temperature, and compilation time. While we explain the above tools and techniques within a speci c use case, namely, optical ow, we also present results from another use case, image fusion. Using our methodology and tools, we were able to design and bring up to 11 versions of optical ow and 3 versions of image fusion on 3 di erent FPGAs from 2 di erent vendors. The rst version of these designs (hence the generators) took several months; however, the subsequent design versions each took a few days with a few people. In the case where only architectural trade-o is needed, we were able to generate and synthesize around one thousand designs in a single day on a 48-core server.Conference ObjectPublication Metadata only Using high-level synthesis for rapid design of video processing pipes(IEEE, 2016) Güzel, Aydin Emre; Levent, Vecdi Emre; Tosun, Mustafa; Özkan, M. Akif; Akgun, T.; Büyükaydın, D.; Erbas, C.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Güzel, Aydin Emre; Levent, Vecdi Emre; Tosun, Mustafa; Özkan, M. AkifIn this work, we share our experience in using High-Level Synthesis (HLS) for rapid development of an optical flow design on FPGA. We have performed HLS using Vivado HLS as well as a HLS tool we have developed for the optical flow design at hand and similar video processing problems. The paper first describes the design problem we have and then discusses our own HLS tool. The tool we developed has turned out to be pretty general-purpose except for the ability to handle cyclic inter-iteration dependencies. It also introduces some novel concepts to HLS, such as “pipelined multiplexers”. The synthesis results show that we can achieve better timing or better area results compared to Vivado HLS. Furthermore, the Verilog RTL our HLS tool outputs is much more readable than the one from Vivado HLS. This makes it much easier for the designer to debug and modify the RTL.