FIFO

Free-RAM Home

Introduction

There are many types of FIFO cores included in the Free-RAM library.  Of course there is the typical synchronous FIFO, but there is also five different types of asynchronous FIFO's.  And all the FIFO's are parameterized for size and depth as well as some other things.  Since these FIFO's use the Dual Port RAM cores, they are portable between many different platforms.

FIFO Types

These FIFO's can be put into two categories:  synchronous and asynchronous.  A synchronous FIFO has a single clock that governs both reads and writes, while an asynchronous FIFO has separate clocks for the read and write ports.

Synchronous FIFO's are nice an simple-- and small.  And because of the nature of having a single clock, it is fairly simple for these FIFO's to keep an accurate count of what's in the FIFO.   This is not so for an asynchronous FIFO.  Async FIFO's have a difficult time keeping track of how many elements are in the FIFO at any particular moment.  That's why other async FIFO implementations use a strange system of flags to indicate the fullness or emptiness of the FIFO.   There are many problems with these types of flags, which we'll get into later, but suffice it to say that our async FIFO's do keep a mostly-true count of how full or how empty the FIFO is.

Read and Write Timing

The timing for reads and writes for all types of FIFO's is basically the same, so we'll cover them together.

A read is initiated by a read enable signal.  An active high enable signal will cause the FIFO contents to be returned on the following clock.  The diagram below shows this:

It is important to note that the value of the read data bus is invalid during all times other than when a read is being performed.

A write is performed by providing valid write data on the same clock as an active high write enable signal.  

What is now shown in the diagrams is the timing of the empty and full flag outputs.  The empty flag is synchronized with the read clock, while the full flag is synchronized with the write clock.

 

Standard FIFO's

Because of the many types of FIFO's, there are many types of component declarations.  All FIFO's can be found in the free_fifo package, so remember to put a "use work.free_fifo.all;" statement in your code that uses these fifo's.

The synchronous FIFO declaration looks like this:

component fifo_sync
    generic (data_bits  :integer;
             addr_bits  :integer;
             block_type :integer := 0);
    port (reset		:in std_logic;
          clk		:in std_logic;
          wr_en		:in std_logic;
          wr_data	:in std_logic_vector (data_bits-1 downto 0);
          rd_en		:in std_logic;
          rd_data	:out std_logic_vector (data_bits-1 downto 0);
          count		:out std_logic_vector (addr_bits-1 downto 0);
          full		:out std_logic;
          empty		:out std_logic
         );
  end component;

Where data_bits is the width of the data.  Addr_bits determines the depth of the FIFO where the depth = 2^addr_bits-1.  Block_type has the same meaning as for the dual port RAM core.

As with the rest of the Free-IP Project cores, reset is an active high asynchronous reset signal.

Count is the number of elements currently in the FIFO, full and empty are active high flags indicating if the FIFO is, well, full or empty.

The asynchronous FIFO is only slightly more complicated:

component fifo_async
    generic (data_bits	:integer;
             addr_bits  :integer;
             block_type	:integer := 0;
             fifo_arch  :integer := 0); -- 0=Generic architecture, 
                                        -- 1=Xilinx XAPP131, 
                                        -- 2=Xilinx XAPP131 w/carry mux
    port (reset		:in  std_logic;
          wr_clk	:in  std_logic;
          wr_en		:in  std_logic;
          wr_data	:in  std_logic_vector (data_bits-1 downto 0);
          rd_clk	:in  std_logic;
          rd_en		:in  std_logic;
          rd_data	:out std_logic_vector (data_bits-1 downto 0);
          full		:out std_logic;
          empty		:out std_logic
         );
  end component;
Everything is the same as with the synchronous FIFO, except that there is no count output, there are separate read and write clocks, and there is the added parameter of FIFO type.  Full and Empty sill signal the obvious, except now the full flag is synchronized with the write clock and the empty flag is synchronized with the read clock.

The fifo_arch parameter is a small but important parameter.  It is absolutely a must to understand this parameter.  Failure to use this parameter wisely could result in legal problems.  There are currently three different async FIFO architectures.  The first is a simple generic FIFO that is portable across all FPGA and ASIC architectures and uses no patented design information.  The other two architectures are adaptations of a FIFO described in the Xilinx App Notes XAPP131 and XAPP175.  This normally would not be a problem, except that Xilinx has chosen to patent the FIFO's described in those app notes.   At the time of this writing (April 2000) this patent was still pending.   If you are using this FIFO in a Xilinx part then you may choose to use those FIFO architectures.  If you aren't using this FIFO in a Xilinx part, then you must choose the generic architecture (fifo_arch=0).

It is the recommendation of the Free-IP Project that anyone who uses the FIFO's based on the Xilinx app notes to get permission in writing from Xilinx to do so.  We should point out that the Xilinx XAPP051 covers a different FIFO that is also patented (although the Free-IP FIFO doesn't use this app note).   When asking for permission, write to your manufactures representative saying something like:

"It has come to our attention that XAPP131 and XAPP175 covers patented or patent pending information.  We intend to use this information in an upcoming design.  Our company policy requires written permission permission before using any patented information.  Since we would be using this app note in a Xilinx FPGA we don't see any problems doing this, but never the less we need permission in writing.   Can Xilinx please provide this for us?  Thanks in advance."

Currently, Xilinx has no stated policy concerning the licensing of these patents-- even though they have published this information in their app notes!  So, technically speaking, Xilinx customers have no legal right to use Xilinx app notes in the design of their logic even if that logic resides in a Xilinx chip!  Clearly it is the intent of Xilinx to help customers make successful designs with their FPGA's, but at the same time they have neglected to properly notify people of the patented nature of some of their app notes.  By requesting permission in writing you will be helping to force the issue by making everyone in the distribution chain aware of these issues-- and you will also be doing the proper legal thing.

On the technical side, use the fifo_arch parameter during optimization.  More information will be given below, but in general...  Using the generic architecture results in smaller logic but slower operation.  Using the Xilinx version with the carry mux results in a much larger FIFO that is faster.  The Xilinx FIFO without the carry mux is somewhere in between.  The speed difference is not much with smaller FIFO's (<512 entries deep), but becomes greater at larger sizes.  Your mileage may vary.

New FIFO's

There is a new type of FIFO in the Free-RAM library.  We call them fifo_rdcount and fifo_wrcount.  These are similar to an async FIFO, but they have a count output.  The rdcount FIFO has an output indicating the number of FIFO elements that are available for reading (and thus synched to the read clock).  The wrcount FIFO's count output indicates the number of empty FIFO elements that can be written to.  To say this again, rdcount indicates how FULL the FIFO is, while wrcount indicates how EMPTY the FIFO is.

Standard asynchronous FIFO's  have difficulty giving an accurate count of the elements in the FIFO.  This is due to the complex nature of having two clocks.   These FIFO's get around the problem by not being a standard FIFO!  The rdcount and wrcount FIFO's are not for every application, but they do have some characteristics that can be a big help with some designs. 

You can think of these FIFO's as being a combined synch and async FIFO placed in series.  The FIFO counts are taken from the synchronous side.  In the case of the wrcount FIFO, data written goes into a synchronous FIFO.  It is then transferred to the asynchronous FIFO where it is then read out.   The rdcount FIFO works the same way except that data is written into the async FIFO and read out of the sync FIFO.  The main benefit of using these FIFO's rather than manually chaining together two FIFO's is that we have taken the time to optimize the design to reduce latency through the FIFO, reduce the amount of logic required, and to allow sharing of the same RAM blocks for both FIFO's.

How these FIFO's are used depends greatly on your application.  One possible application is as FIFO's for an Ethernet controller.   Let's say that for the transmit data side, data gets written to the FIFO in bursts (from DMA), and it gets outputted to the MII interface at a steady rate.  Our system bus could be anything from 33 to 100 MHz, but it gets outputted at 25 MHz.  To optimize our DMA bursts, we would like to DMA as much data as possible for each burst.  In a typical async FIFO we would only know how full the FIFO is in 1/4 or 1/8th increments, so our DMA burst would have to be conservatively small to avoid a FIFO overrun.  If our FIFO was 512 entries long and our FIFO only reported the available space with a resolution of 1/8th then each DMA burst could be 64 words shorter than need be.  That would result in more DMA bursts and a higher than needed bus overhead due to the increased bus arbitration and turn around cycles.  By allowing for optimized DMA bursts the rdcount and wrcount FIFO's allow a more optimized system.

These FIFO's are fully parameterized to allow the selection of FIFO depth, and the portion of the FIFO dedicated to the asynchronous side.   The first parameter, addr_bits, works exactly as it did in the previous FIFO's.   There is a new parameter, async_size, which determines the size of the async portion of the FIFO.  While addr_bits represents a power of two value of the FIFO depth (real size = 2^addr_bits), async_size is a simple integer (real size = async_size).  So if addr_bits=8 then our total FIFO depth is 256 words.  And if async_size=16 then the async portion of the FIFO is 16 words from the total of 256.  Async_size does not have to be a power of two.

Choose the value of async_size carefully.  You want to use the smallest value possible so that the count output is as accurate as possible (I.E., so that most of the data remains in the sync portion of the FIFO) but you don't want to have overflows and underflows either.   Generally speaking, if clock for the sync side of the FIFO is faster than the async side then an async size of 8 to 16 is reasonable.  However, if the clock for the async side is faster then the async_size might have to be increased-- but how much depends heavily on your application.

The component declarations for these two FIFO's are:

component fifo_wrcount
    generic (data_bits	:integer;
             addr_bits  :integer;
             block_type	:integer := 0;
             async_size :integer := 16); 
    port (reset		:in  std_logic;
          wr_clk	:in  std_logic;
          wr_en		:in  std_logic;
          wr_data	:in  std_logic_vector (data_bits-1 downto 0);
          rd_clk	:in  std_logic;
          rd_en		:in  std_logic;
          rd_data	:out std_logic_vector (data_bits-1 downto 0);
          count		:out std_logic_vector (addr_bits-1 downto 0);
          full		:out std_logic;
          empty		:out std_logic
         );
  end component;

  component fifo_rdcount
    generic (data_bits	:integer;
             addr_bits  :integer;
             block_type	:integer := 0;
             async_size :integer := 16); 
    port (reset		:in  std_logic;
          wr_clk	:in  std_logic;
          wr_en		:in  std_logic;
          wr_data	:in  std_logic_vector (data_bits-1 downto 0);
          rd_clk	:in  std_logic;
          rd_en		:in  std_logic;
          rd_data	:out std_logic_vector (data_bits-1 downto 0);
          count		:out std_logic_vector (addr_bits-1 downto 0);
          full		:out std_logic;
          empty		:out std_logic
         );
  end component;

(Version 0.5 Note:  async_size is ignored in fifo_rdcount since it really is irrelevant.)

 

Implementations

Here are some sample implementations for the Free-RAM FIFO's.  These were all done in a Xilinx Virtex-E V100ECS144-8, using Foundation 2.1i +sp6 with no manual floorplanning.  The FIFO size was set to 256x16 (block_type=0), and only a simple clock period timing constraint was used.  As with anything, your mileage may vary, but these numbers give you a starting point.

FIFO Type Variation Clock Constraint Speed Slices
fifo_sync   170 MHz 171.2 MHz 19
fifo_async fifo_arch=0 170 MHz 170.2 MHz 32
fifo_async fifo_arch=1 170 MHz 173.0 MHz 51
fifo_async fifo_arch=2 170 MHz 170.2 MHz 54
fifo_rdcount   155 MHz 156.3 MHz 57
fifo_wrcount   155 MHz 158.2 MHz 71

There are several important notes about these numbers...

The async FIFO (fifo_async) has several different architectures that trade off size for speed.  The difference in speed isn't as apparent at the smaller FIFO depths.  It just happens that the FIFO depth that we chose for this trial (256 entries) is about the break even point.  

 

© 1999-2000, The Free-IP Project.  This page was last updated on June 20, 2000 12:32 PM.