CIC filter register pruning utility

18-Oct-2014

By request, here is the C language utility from the KiwiSDR project that is a translation of Rick Lyons' Matlab implementation[1] of Hogenauer's CIC filter register pruning algorithm.

At the output of a CIC filter, when you use fewer than the full number of bits in the integrator/comb stages, there is a certain amount of noise generated. This is due to quantization error introduced when the low-order bits are truncated. Hogenauer's original CIC paper[2] showed how this error can be back-propagated up through the comb and integrator stages, successively shortening the register length of each stage just to the point of maintaining the error level (but not adding to it).

As an example here is the output of the program for an N=5 stage CIC with decimation R=16, Bin=24 bits in and Bout=16 bits out. The register length normally required is acc_max = Bin + ceil(N * log2(R)) or 44 in this case. Right away the first integrator stage only has to be 36 bits instead of 44. From there the lengths decrease with the last comb stage only being 19 bits from which the 16 bit output is taken.


"cic_test.vh" INTEG_COMB N=5 R=16 M=1 Bin=24 Bout=16
growth 20 = ceil(N=5 * log2(R=16))
Bin 24 + growth 20 = acc_max 44
 stage     Fj    Bj   acc trunc
integ1  large     8    36     0
integ2  large    12    32     4
integ3  large    16    28     4
integ4  185.3    18    26     2
integ5   33.5    21    23     3
 comb1   15.9    22    22     1
 comb2    8.4    23    21     1
 comb3    4.5    24    20     1
 comb4    2.4    25    19     1
 comb5    1.4    25    19     0
  out0    1.0    28    16     3

A Verilog file is also generated containing unrolled calls to the integrator and comb sections at the pruned bit widths. Your Verilog wrapper code would look something like this:

wire signed [IN_WIDTH-1:0] in = ...;
wire signed [OUT_WIDTH-1:0] out;
`include "cic_test.vh"

And the generated file "cic_test.vh" for the above example looks like this:

// generated file

// CIC: INTEG_COMB N=5 R=16 M=1 Bin=24 Bout=16
// growth 20 = ceil(N=5 * log2(R=16))
// Bin 24 + growth 20 = acc_max 44

wire signed [35:0] integrator0_data;
wire signed [35:0] integrator1_data;
wire signed [31:0] integrator2_data;
wire signed [27:0] integrator3_data;
wire signed [25:0] integrator4_data;
wire signed [22:0] integrator5_data;
wire signed [22:0] comb0_data;
wire signed [21:0] comb1_data;
wire signed [20:0] comb2_data;
wire signed [19:0] comb3_data;
wire signed [18:0] comb4_data;
wire signed [18:0] comb5_data;

// important that "in" be declared signed by wrapper code
// so this assignment will sign-extend:
assign integrator0_data = in;

cic_integrator #(.WIDTH(36)) cic_integrator1_inst(
	.clock(clock),
	.strobe(in_strobe),
	.in_data(integrator0_data[35 -:36]),	// trunc 0 bits
	.out_data(integrator1_data)
);

cic_integrator #(.WIDTH(32)) cic_integrator2_inst(
	.clock(clock),
	.strobe(in_strobe),
	.in_data(integrator1_data[35 -:32]),	// trunc 4 bits
	.out_data(integrator2_data)
);

cic_integrator #(.WIDTH(28)) cic_integrator3_inst(
	.clock(clock),
	.strobe(in_strobe),
	.in_data(integrator2_data[31 -:28]),	// trunc 4 bits
	.out_data(integrator3_data)
);

cic_integrator #(.WIDTH(26)) cic_integrator4_inst(
	.clock(clock),
	.strobe(in_strobe),
	.in_data(integrator3_data[27 -:26]),	// trunc 2 bits
	.out_data(integrator4_data)
);

cic_integrator #(.WIDTH(23)) cic_integrator5_inst(
	.clock(clock),
	.strobe(in_strobe),
	.in_data(integrator4_data[25 -:23]),	// trunc 3 bits
	.out_data(integrator5_data)
);

assign comb0_data = integrator5_data;

cic_comb #(.WIDTH(22)) cic_comb1_inst(
	.clock(clock),
	.strobe(out_strobe),
	.in_data(comb0_data[22 -:22]),	// trunc 1 bits
	.out_data(comb1_data)
);

cic_comb #(.WIDTH(21)) cic_comb2_inst(
	.clock(clock),
	.strobe(out_strobe),
	.in_data(comb1_data[21 -:21]),	// trunc 1 bits
	.out_data(comb2_data)
);

cic_comb #(.WIDTH(20)) cic_comb3_inst(
	.clock(clock),
	.strobe(out_strobe),
	.in_data(comb2_data[20 -:20]),	// trunc 1 bits
	.out_data(comb3_data)
);

cic_comb #(.WIDTH(19)) cic_comb4_inst(
	.clock(clock),
	.strobe(out_strobe),
	.in_data(comb3_data[19 -:19]),	// trunc 1 bits
	.out_data(comb4_data)
);

cic_comb #(.WIDTH(19)) cic_comb5_inst(
	.clock(clock),
	.strobe(out_strobe),
	.in_data(comb4_data[18 -:19]),	// trunc 0 bits
	.out_data(comb5_data)
);

assign out = comb5_data[18 -:16];	// trunc 3 bits

"cic_integrator" and "cic_comb" are the modules that implement the internals of the CIC filter and are included in the GitHub package.

This technique is especially useful in FPGAs where resources are limited. Sometimes not all the slices in a particular FPGA architecture are capable of generating the carry term when used as an adder. Multiple CIC filters where the integrator/comb stages contain long adders can quickly exhaust the carry-slices long before the total number of FPGA slices are filled.

[1] Computing CIC Filter Register Pruning Using Matlab (Rick Lyons) / CC BY 3.0

[2] Eugene Hogenauer, "An Economical Class of Digital Filters For Decimation and Interpolation," IEEE Trans. Acoust. Speech and Signal Proc., Vol. ASSP 29, April 1981, pp. 155-162.