IISc Logo    Title

etd AT Indian Institute of Science >
Centres under the Director (formely kown as Division of Information Sciences) >
Centre for Nano Science and Engineering (cense) >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2005/2557

Title: ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
Authors: Kala, S
Advisors: Nandy, S K
Jamadagni, H S
Keywords: Wireless Communication Systems
Fast Fourier Transformation Processor
Fast Fourier Transform Archirecture
Fast Fourier Transform - Algorithms
Application Specific Integrated Circuit
FFT Processor
FFT Architecture
Orthogonal Frequency Division Multiplexing (OFDM)
Submitted Date: Dec-2012
Series/Report no.: G25691
Abstract: The rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-offs between performance, area and power. Fast Fourier Transform is an important component in most of the wireless communication systems. FFTs are widely used in applications like OFDM transceivers, Spectrum sensing in Cognitive Radio, Image Processing, Radar Signal Processing etc. FFT is the most compute intensive and time consuming operation in most of the above applications. It is always a challenge to develop an architecture which gives high throughput while reducing the latency without much area overhead. Next generation wireless systems demand high transmission efficiency and hence FFT processor should be capable of doing computations much faster. Architectures based on smaller radices for computing longer FFTs are inefficient. In this thesis, a fully parallel unrolled FFT architecture based on novel radix-4 engine is proposed which is catered for wide range of applications. The radix-4 butterfly unit takes all four inputs in parallel and can selectively produce one out of the four outputs. The proposed architecture uses Radix-4^3 and Radix-4^4 algorithms for computation of various FFTs. The Radix-4^4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. In existing Cooley-Tukey architectures, the output from each stage has to be reordered before the next stage can start computation. This needs intermediate storage after each stage. In our architecture, each stage can directly generate the reordered outputs and hence reduce these buffers. A solution for output reordering problem in Radix-4^3 and Radix-4^4 FFT architectures are also discussed in this work. Although the hardware complexity in terms of adders and multipliers are increased in our architecture, a significant reduction in intermediate memory requirement is achieved. FFTs of varying sizes starting from 64 point to 64K point have been implemented in ASIC using UMC 130nm CMOS technology. The data representation used in this work is fixed point format and selected word length is 16 bits to get maximum Signal to Quantization Noise Ratio (SQNR). The architecture has been found to be more suitable for computing FFT of large sizes. For 4096 point and 64K point FFTs, this design gives comparable throughput with considerable reduction in area and latency when compared to the state-of-art implementations. The 64K point FFT architecture resulted in a throughput of 1332 mega samples per second with an area of 171.78 mm^2 and total power of 10.7W at 333 MHz.
Abstract file URL: http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdf
URI: http://etd.iisc.ernet.in/handle/2005/2557
Appears in Collections:Centre for Nano Science and Engineering (cense)

Files in This Item:

File Description SizeFormat
G25691.pdf2.83 MBAdobe PDFView/Open

Items in etd@IISc are protected by copyright, with all rights reserved, unless otherwise indicated.


etd@IISc is a joint service of SERC & IISc Library ||
|| Powered by DSpace || Compliant to OAI-PMH V 2.0 and ETD-MS V 1.01