Starting to build an FFT core generator for FPGAs

C++ codeI have begun working on a new project and would like to take a moment and tell you about it, if you don’t mind. I have been working on it for about a month now, and have been continuing to become more excited about it as I have started to see some of the foundation pieces fall into place.


My senior project, to finish my Computer Engineering degree at Oregon Institute of Technology was supposed to be an audio spectrum visualizer using an FFT processor in an FPGA. If you don’t know what an audio spectrum analyzer is, check out this video below.

Anyway, it turned out that an FFT is a major pain to build from scratch, especially for a person like me who has no clue what he’s doing when it comes to digital signal processing (DSP) techniques. I learned a great deal by reading many technical papers and articles on the subject, but could not get anything to work. I even tried using Altera’s (as was it’s name at the time) proprietary FFT IP core but the user guide was not built for a beginner and neither myself nor my professors could get the core to function properly.

Long story short, I altered my senior project to simply display audio volume in a fancy way and proceeded to market the product as a very, uh, expandable platform that could easily implement features like frequency analysis and whatnot.

Ever since that time, this whole idea of an FFT processing core in an FPGA has been haunting me, so now that I have some more time this summer, I have set out with a goal of building an application which generates a completely customizable FFT core in the form of SystemVerilog compliant HDL modules, for use in Altera or Xilinx FPGAs and CPLDs.

Design concept

This project ultimately would allow a developer to provide the width and length of the FFT they need for their system and then the program would generate all the SystemVerilog modules required for that system and the developer would then only need to interface his design with the input and output of the top-level FFT module and he would be ready to roll in a matter of minutes.

I have been designing this project in the ever-so-popular object-oriented C++ and have been compiling with “mingw” in Code::Blocks.

I know that both the Altera and Xilinx FFT IP core configuration wizards presented the user with some nice extra features like being able to select presets which modify the design to consume more or less memory or have a higher or lower throughput, for example.

This design, for the time being, will not have such features. I think it’s better to focus on one FFT architecture and generate a nice pipelined FFT core that is simple to configure and very easy to integrate into a synthesize design. Once I have one FFT architecture working well, then I might consider expanding the idea to generate different kinds of FFT architectures.

Multipliers are a very important part of calculating an FFT. Most FPGA I have encountered are manufactured with physical multiplier hardware on the silicon. This conserves the Logic Elements (LEs) or Logic Cells (LCs) to be used for other things in the design. It also means that (ideally) highly-optimized hardware multipliers are readily available to the developer. However, from the little I have seen of the FPGA market, it seems like access to these multipliers is not allowed unless your design uses a piece of the vendor’s proprietary code. In Altera’s world, this code is called a macrofunction. I’m not sure what it’s called in the Xilinx world just yet.

With all of that in mind, this project will probably use custom multiplier modules for the time being, simply because it simplifies the cross-vendor compatibility a great deal. While I haven’t gotten quite that far yet, I will probably implement it as a Booth multiplier, but I am still doing a little bit more research before a really settle on a multiplier architecture.

Current progress

So far, I have implemented and tested code to generate a SystemVerilog compatible ROM module and then attached that code on to new code which calculates twiddle factors, based on the length and width of the desired FFT. Reworded, basically right now I can generate both the real and imaginary twiddle factor ROMs required to process an FFT of (almost) any length and bit width.The output I’ve been getting matches the twiddle values which George Slade calculated and published in his paper/tutorial on this subject, so that has been promising.


The simple future plan, of course, is to finish the project. But what does that mean? Well, from what I know based on what I have read online, I will need to expand my C++ program to generate the following SystemVerilog modules:

  • 2-port RAM module
  • Booth multiplier module
  • Address generator module
  • Butterfly module
  • Top-level module (which wires most of the sub-modules together correctly)

So we’ll see how this goes! Again, I’m pretty excited about this project and really hope I can build a usable FFT core by the end of this. If not, it has already been a great review of some C++ programming skills I haven’t used for over a year, so that’s always a good thing.

I am keeping this project hosted up in my (currently private) Bitbucket repository. If I do make the repository public, please go check it out over here:

If you made it this far, thank you so much for reading this! I really appreciate it! Please ask any questions or leave any comments you might have below!

Rate this Post/Page