# Person: UĞURDAĞ, Hasan Fatih

## Name

## Job Title

## First Name

Hasan Fatih

## Last Name

UĞURDAĞ

60 results Back to results

### Filters

### Settings

Sort By

Results per page

## Publication Search Results

Now showing 1 - 10 of 60

ArticlePublication Metadata only Defect-aware nanocrossbar logic mapping through matrix canonization using two-dimensional radix sort(ACM, 2011-08) Gören, S.; Uğurdağ, Hasan Fatih; Palaz, O.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan FatihShow more Nanocrossbars (i.e., nanowire crossbars) offer extreme logic densities but come with very high defect rates; stuck-open/closed, broken nanowires. Achieving reasonable yield and utilization requires logic mapping that is defect-aware even at the crosspoint level. Such logic mapping works with a defect map per each manufactured chip. The problem can be expressed as matching of two bipartite graphs; one for the logic to be implemented and other for the nanocrossbar. This article shows that the problem becomes a Bipartite SubGraph Isomorphism (BSGI) problem within sub-nanocrossbars free of stuck-closed faults. Our heuristic KNS-2DS is an iterative rough canonizer with approximately O(N2) complexity followed by an O(N3) matching algorithm. Canonization brings a partial or full order to graph nodes. It is normally used for solving the regular Graph Isomorphism (GI) problem, while we apply it to BSGI. KNS stands for K-Neighbor Sort and is used for initializing our main contribution 2-Dimensional-Sort (2DS). 2DS operates on the adjacency matrix of a bipartite graph. Radix-2 2DS solves the problem in the absence of stuck-closed faults. With the addition of Radix-3 and our novel Radix-2.5 sort, we solve problems that also have stuck-closed faults. We offer very short runtimes (due to canonization) compared to previous work and have success on all benchmarks. KNS-2DS is also novel from the perspective of BSGI problem as it is based on canonization but not on a search tree with backtracking.Show more ArticlePublication Metadata only Fast two-pick n2n round-robin arbiter circuit(IEEE, 2012-06) Uğurdağ, Hasan Fatih; Temizkan, Fatih; Baskirt, O.; Yuce, B.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Temizkan, FatihShow more A regular (one-pick) round-robin arbiter circuit picks one active requester (if any) out of n requesters. A two-pick round-robin arbiter selects up to two requesters. An n2n two-pick round-robin arbiter indicates the picked requests with (at most) two-hot n-bit output. A round-robin arbiter is fair to its requesters and does this by repeatedly moving its highest priority pointer to the position immediately next to the second requester picked. Presented is the circuit architecture and VLSI implementation of a new scalable two-pick round-robin arbiter with low latency, which is compared with previous work based on logic synthesis results.Show more Conference paperPublication Metadata only Referanssız görüntü bloklanma ölçümü için yeni bir yöntem(IEEE, 2014) Ozansoy, Koray; Özer, N.; Dönmez, F.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Ozansoy, KorayShow more Internet’te ve servis sağlayıcı ağlarında video trafiğinin tavan yaptığı günümüzde otomatik görüntü kalitesi ölçümünün faydaları aşikardır. Bu ölçümlerin birçok uygulamada gerçekzamanlı yapılması gerekir ve de bu “Referanssız” yani sıkıştırılmamış (ham) görüntülerin kullanılmadığı bir ölçümleme gerektirir. Dünyadaki video akışlarının artık çoğunluğu sayısaldır. Sayısal video akışları, sıkıştırılmış video iletimi kullanır ve kullanılan sıkıştırma yöntemlerinin çoğu DCT tabanlıdır. Bu tür akışlarda görüntü kalitesi düşüşü genellikle iletim hızının fazla kısılmasından dolayı oluşur ve DCT algoritması blok-tabanlı olduğu için kalite kaybı kendini “Bloklanma” olarak gösterir. Bu çalışmada literatürdeki yöntemlere göre insan algısına daha yakın sonuçlar veren bir bloklanma ölçüm yöntemi (RED isimli) sunuyoruz. RED’in en önemli katkılarından biri otomatik olarak hesapladığı bloklanma değerleri ile testçi insanların verdikleri notlar arasında analitik bir ilişki kurmayı başarmış olmasıdır. RED bu ilişkinin parametrelerini “Regresyon” ile optimize eder. RED, yine literatürden farklı olarak, bloklanmayı hesaplamadan önce “Kenar Tespiti (Edge Detection)” kullanarak bazı blokları hesaplama dışı bırakır.Show more Conference paperPublication Metadata only Hardware implementation of field oriented control for three phase machine drives(IEEE, 2020-10-05) Tüfekçi, B.; Önal, B.; Önal, H.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan FatihShow more This paper presents a high switching frequency FPGA implementation of Maximum Torque Per Ampere (MTPA) and Flux Weakening which are branch of Field Oriented Control (FOC) method for 3-phase machine drives. A common architecture has been constructed for both BrushLess DC motors (BLDC) and Permanent Magnet Synchronous Motors (PMSM). For this purpose, the controller module was implemented using Space Vector Modulation (SVM) technique. The user interface module was designed to provide real-time torque-time, speed-time, and current-time plots for the user. This interface runs on the PS part of the FPGA and interacts with the user through a UART. The entire system has been verified through simulation.Show more ArticlePublication Metadata only Efficient combinational circuits for division by small integer constants(IEEE, 2016) Uğurdağ, Hasan Fatih; Bayram, A.; Levent, Vecdi Levent; Gören, S.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Levent, Vecdi LeventShow more Division of an integer by an integer constant is a widely used operation and hence justifies a customized efficient implementation. There are various versions of this operation. This paper attacks a particular version of this problem, where the divisor is small and the circuit outputs a quotient and remainder. We propose a fast (low-latency) yet area-efficient combinational circuit topology, which we call Binary Tree based Constant Division (BTCD). BTCD uses a collection of small LUTs wired to each other to form a binary tree. The circuit also has bunch of adders, whose latencies are almost hidden as they operate in parallel with the binary tree. We wrote RTL code generators for BTCD and two previous works in the literature, then generated circuits for dividends of up to 128 bits and divisors of 3, 5, 11, and 23. We synthesized the generated RTL designs using a commercial ASIC synthesis tool. BTCD strikes a good balance between timing (latency) and area. It is up to 3.3 times better in Area-Timing Product (ATP) compared to the best alternative. ATP has a good correlation with energy consumption.Show more Conference paperPublication Metadata only FPGA-based minimal Latency HEFT scheduler for heterogeneous computing(IEEE, 2021) Aliyev, Ilkin; Mack, J.; Kumbhare, N.; Akoglu, A.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Aliyev, IlkinShow more This paper proposes a new hardware scheduler. As heterogeneous computing becomes prevalent, mapping applications on to multiple processing elements (PEs) proves to be nontrivial. Heterogeneous Earliest Finish Time (HEFT) algorithm is an already existing scheduler that aims to minimize the total execution time of an application. The paradigm of HEFT is such that it accepts an acyclic task graph as input at run-time and assigns/schedules the precompiled atomic tasks to PEs. HEFT stands out among many such schedulers not only in terms of producing shorter schedules but also in terms of its own short execution time. However, in real-time applications, the lower the latency, the better it is. To the best of our knowledge, this work is the only work that implements HEFT in hardware (on FPGA) further lowering its latency from milliseconds to as much as less than a microsecond. Porting HEFT to hardware has been challenging as data dependencies limit the amount of parallelism. Design of an efficient memory access pattern as well as an “incremental sorter” were key enablers in reducing the latency of the hardware implementation. We also integrated our FPGA-HEFT into an ARM-based SoC and validated its functionality using a realistic workload.Show more ArticlePublication Metadata only Fast multiplier generator for FPGAs with LUT based partial product generation and column/row compression(Elsevier, 2017) Kakacak, Ahmet; Guzel, Aydın Emre; Cihangir, Ozan; Gören, S.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Kakacak, Ahmet; Guzel, Aydın Emre; Cihangir, OzanShow more We present a new parallel integer multiplier generator for FPGAs. It combines (i) a new Generalized Parallel Counter (GPC) grouping algorithm for column compression with (ii) a LUT based partial product generation, is (iii) unique as it automatically generates placement pragmas, (iv) uses a ternary adder as a final adder to exploit FPGA's internal carry-chains, and (v) employs a novel GPC based row compression, which aims to reduce the width of the final adder. We wrote Verilog generators for our method as well as one leading work in the literature. For synthesis, we wrote a script that can do “binary search” for the optimum latency. Our extensive implementation results on Xilinx Virtex-6 FPGAs show that we almost always produce circuits with smaller latency (i.e., timing) and Area-Timing Product (ATP) compared to the state-of-the-art in the literature, by 18% and 12% (on the average), respectively. We also offer smaller latency compared to the HDL * operator by 9% on the average at a cost of 12% larger ATP on the average. We are worse in latency in 6 cases out of 33, in all of which synthesis maps * to DSP slices. We also include area and energy results on Virtex-6 as well as a limited amount of latency, area, and ATP results on Virtex-5 and Altera Stratix III.Show more EditorialPublication Metadata only Welcome note from the general chairs(IEEE, 2017-12-13) Elfadel, I. A. M.; Uğurdağ, Hasan Fatih; Electrical & Electronics Engineering; UĞURDAĞ, Hasan FatihShow more The following topics are dealt with: low-power electronics; system-on-chip; integrated circuit design; CMOS integrated circuits; microprocessor chips; SRAM chips; logic design; flip-flops; power aware computing; MOSFET circuits.Show more Conference paperPublication Metadata only FPGA implementation of a low latency and high SFDR direct digital synthesizer for resource-efficient quantum-enhanced communication(IEEE, 2020-09) Annafıanto, Nur Fajar Rızqı; Jabir, M. V.; Burenkov, I. A.; Uğurdağ, Hasan Fatih; Battou, A.; Polyakov, S. V.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan Fatih; Annafıanto, Nur Fajar RızqıShow more A Direct Digital Synthesizer (DDS) generates a sinusoidal signal, which is a significant component of many communication systems using modulation schemes. A CORDIC algorithm offers minimum memory requirements compared to look-up-based methods and low latency. The latency depends on the number of iterations, which is determined by the number of angles in the rotation set. However, it is necessary to maintain high spectral purity to optimize the overall system performance. To optimize the opportunity of quantum measurement, low latency and a high spectral purity sine wave generator is essential. The implementation of this design generates output with 64% latency reduction compared to that of the conventional CORDIC design and 72.2 dB SFDR value.Show more Conference paperPublication Metadata only FPGA based particle identification in high energy physics experiments(IEEE, 2012) Uğurdağ, Hasan Fatih; Başaran, A.; Akdogan, T.; Güney, V. U.; Gören, S.; Electrical & Electronics Engineering; UĞURDAĞ, Hasan FatihShow more High energy physics experiments require on-the-fly processing of signals from many particle detectors. Such signals contain a high and fluctuating rate of pulses. Pulse shape hints particle type, and the amplitude relates to energy of the particle, while pulse occurrence times are needed for event reconstruction. Traditionally, these parameters have been extracted with the help of complete racks of dedicated electronics. Our FPGA design on a general-purpose DAQ card does real-time pulse detection and high-precision curve fitting. It greatly shrinks required equipment in terms of form factor, cost, power usage, and setup time. Unlike traditional systems, we can handle bursts of back-to-back pulses, pulses as narrow as 6 ns and at rates over 1M pulses per second. We have a novel scalable architecture that combines pipelining and parallelism. Moreover, the parallel part of the architecture uses loop pipelining in each of its interleaved identical parallel processors (IIPPs). An IIPP is a specialized CPU, which executes nested loops, with number of iterations that varies from pulse to pulse. IIPPs are fed data from a FIFO by a priority encoder based dispatcher. Number of IIPPs can be calculated to meet any pulse rate and average pulse width. The architecture is flexible enough to work with a variety of curve fitting algorithms.Show more