Browsing by Author "Tosun, Mustafa"

Now showing 1 - 5 of 5

Metadata only
Fast one- and two-pick fixed-priority selection and muxing circuits
(IEEE, 2016) Tosun, Mustafa; Özkan, M. Akif; Güzel, Aydin Emre; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Tosun, Mustafa; Özkan, M. Akif; Güzel, Aydin Emre
Priority encoders and arbiters usually drive multiplexers (muxes). Latency optimization of priority encoders and multiplexer trees has usually been handled separately in the literature. However, in some applications with circular data dependencies, the combined latency of the arbiter and muxing needs to be optimized. Moreover, there is an ever growing need for throughput. This requires switches that pick and multiplex more than one request per cycle. In this paper, we propose a family of circuit topologies where priority encoding picks one or two requests and takes place in parallel with muxing. We first present a scalable logic circuit for the 1-pick fixed-priority muxing problem and then extend it to the 2-pick problem. We compare the proposed architecture to its counterpart that does only priority encoding using Synopsis Design Compiler with ARM-Artisan TSMC 180 nm worst-case standard library. The results show that most of the priority encoding latency is hidden in the proposed circuit topology.
Metadata only
FPGA implementation of a dense optical flow algorithm using altera openCL SDK
(Springer International Publishing, 2017) Ulutaş, Umut; Tosun, Mustafa; Levent, Vecdi Emre; Büyükaydın, D.; Akgün, T.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Ulutaş, Umut; Tosun, Mustafa; Levent, Vecdi Emre
FPGA acceleration of compute-intensive algorithms is usually not regarded feasible because of the long Verilog or VHDL RTL design efforts they require. Data-parallel algorithms have an alternative platform for acceleration, namely, GPU. Two languages are widely used for GPU programming, CUDA and OpenCL. OpenCL is the choice of many coders due to its portability to most multi-core CPUs and most GPUs. OpenCL SDK for FPGAs and High-Level Synthesis (HLS) in general make FPGA acceleration truly feasible. In data-parallel applications, OpenCL based synthesis is preferred over traditional HLS as it can be seamlessly targeted to both GPUs and FPGAs. This paper shares our experiences in targeting a demanding optical flow algorithm to a high-end FPGA as well as a high-end GPU using OpenCL. We offer throughput and power consumption results on both platforms.
Metadata only
M-pick fixed-priority selection and muxing
(2017-01) Tosun, Mustafa; Uğurdağ, Hasan Fatih; Uğurdağ, Hasan Fatih; Uğurdağ, S.G.; Aktemur, Tankut Barış; Department of Electrical and Electronics Engineering; Tosun, Mustafa
In this thesis, we propose a class of logic architectures for multi-pick (m-pick) fixed-priority arbitration (FPA) and muxing. An m-pick FPA selects the m topmost requests out of n inputs with priority order. Arbiters usually drive multiplexers (muxes). Latency optimization of FPAs and mux trees have usually been handled separately in the literature. However, in some applications with circular data dependencies, it is the combined latency of the arbiter and muxing that needs to be optimized. Moreover, there is an ever growing need for throughput. This requires, for example, network switches that pick and mux m requests per cycle, where m > 1. This thesis starts with 1-pick priority based selection and muxing and then generalizes it to m-pick. A logic building block that we call \Saturated Adder" plays a key role in this generalization, which makes the 1-pick and 2-pick architectures simply special cases. We have implemented the proposed architectures through Perl programs generating Verilog netlists and synthesized them using Synopsys Design Compiler with ARMArtisan TSMC 180 nm worst case standard-cell library. Through the results we have obtained, we demonstrated the trade-o ffs in the design of m-pick FPA and muxing.
Metadata only
Output domain downscaler
(ISCIS 2016: Computer and Information Sciences, 2016) Büyükmıhçı, M.; Levent, Vecdi Emre; Guzel, Aydın Emre; Ates, Ozgur; Tosun, Mustafa; Akgün, T.; Erbas, C.; Gören, S.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Levent, Vecdi Emre; Guzel, Aydın Emre; Ates, Ozgur; Tosun, Mustafa
This paper offers an area-efficient video downscaler hardware architecture, which we call Output Domain Downscaler (ODD). ODD is demonstrated through an implementation of the bilinear interpolation method combined with Edge Detection and Sharpening Spatial Filter. We compare ODD to a straight-forward implementation of the same combination of methods, which we call Input Domain Downscaler (IDD). IDD tries to output a new pixel of the downscaled video frame every time a new pixel of the original video frame is received. However, every once in a while, there is no downscaled pixel to produce, and hence, IDD stalls. IDD sometimes also skips a complete row of input pixels. ODD, on the other hand, spreads out the job of producing downscaled pixels almost uniformly over a frame. As a result, ODD is able to employ more resource sharing, i.e., can do the same job with fewer arithmetic units, thus offers a more area-efficient solution than IDD. In this paper, we explain how ODD and IDD work and also share their FPGA synthesis results.
Metadata only
Using high-level synthesis for rapid design of video processing pipes
(IEEE, 2016) Güzel, Aydin Emre; Levent, Vecdi Emre; Tosun, Mustafa; Özkan, M. Akif; Akgun, T.; Büyükaydın, D.; Erbas, C.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Güzel, Aydin Emre; Levent, Vecdi Emre; Tosun, Mustafa; Özkan, M. Akif
In this work, we share our experience in using High-Level Synthesis (HLS) for rapid development of an optical flow design on FPGA. We have performed HLS using Vivado HLS as well as a HLS tool we have developed for the optical flow design at hand and similar video processing problems. The paper first describes the design problem we have and then discusses our own HLS tool. The tool we developed has turned out to be pretty general-purpose except for the ability to handle cyclic inter-iteration dependencies. It also introduces some novel concepts to HLS, such as “pipelined multiplexers”. The synthesis results show that we can achieve better timing or better area results compared to Vivado HLS. Furthermore, the Verilog RTL our HLS tool outputs is much more readable than the one from Vivado HLS. This makes it much easier for the designer to debug and modify the RTL.