Design Algorithm of FIR Filter Based on Coefficient Compression ()
1. Introduction
The application of Finite Impulse Response (FIR) filters in various digital systems has long been a challenging problem [1] [2]. The proposal of Parks-Mcclellan and other ripple design algorithms based on Remez exchange theory has made it possible to efficiently design high-order FIR filters through computer-aided design [2]-[5]. In the specific implementation process of FIR filters [6] [7], some problems will be encountered [8] [9]. The FIR filter structure of the most common coefficient pre storage architecture is shown in Figure 1 [1] [2].
Many people have made efforts to design and improve high-performance FIR filters, such as decomposing the transfer function and implementing it with multi-stage filters; the problem of using floating-point filters to calculate coefficients; using a sharpening filter to obtain better passband stopband response, and utilizing a low sensitivity filter to reduce the impact of bit width on the passband; implementing a filter using a completely multiplier free approach.
Figure 1. Structure of direct FIR filter.
2. Methods and Filter Expression
Due to the linear phase characteristics of FIR filters, their coefficients are symmetric about the center. We can implement this using a folded filter structure, where the corresponding units of the lateral delay are added first and then multiplied by the coefficients. This can reduce the number of multiplications by half, as shown in Figure 2.
Figure 2. Folding structure of FIR filter.
For traditional quantification methods, their folded structure can be expressed as:
(1)
is an odd number.
(2)
is an even number.
where
is the order of the filter,
is the ideal design coefficient value, and the round [] operation takes the closest integer, BW is the design coefficient quantization bit width. When using non equal width quantization, the output is represented as:
(3)
(4)
Among them,
is the equivalent quantization bit width of each coefficient. For
:
(5)
The accumulation method is shown in Figure 3. The serial implementation structure is shown in Figure 4.
Figure 3. Accumulator modification structure.
Figure 4. Serial implementation structure.
In this algorithm, due to the different equivalent quantization bit widths of the coefficients in FIR filters with coefficient compression, their quantization methods differ from traditional methods. The comparison of methods is also different. First, shift all coefficients to the power of
:
(6)
Make the coefficient with the maximum absolute value within the normalized range of 0.5 - 1, i.e.:
(7)
Then, for the coefficient with the highest absolute value in the middle, allocate the same fixed-point quantization bit width as the conventional method to it Quantify. The initial quantization bit width is set to BW based on the actual bit width of the final memory, and the ideal value of the middle coefficient is
. The quantization operation that shifts
by BW bit width is defined as
and the quantized coefficient value is represented as
by binary complement. Therefore, the binary complement quantization range
can be known as:
(8)
Its equivalent quantization bit width
:
(9)
Its quantitative operation is as follows:
(10)
(11)
The quantified range Temp_Scale is:
(12)
The quantization bit width Temp_BW is:
(13)
The cumulative left shift
is:
(14)
When the quantified range Temp_Scale reaches half, it can be determined whether the following equation holds:
(15)
(16)
(17)
To quantify it, the equivalent quantization bit width is:
(18)
(19)
When all unquantified coefficients meet the constraint requirements, there are:
(20)
(21)
Halving operation:
(22)
We can obtain:
(23)
(24)
3. Example of Filter Design
The serial structure coefficient quantization flow chart is shown in Figure 5.
The serial structure quantization process is shown in the following Table 1:
Figure 5. Serial structure coefficient quantization flow chart.
Table 1. Serial quantization process.
Coefficient to be quantified
|
Unquantified coefficients all < 0.5Temp_Scale? |
After update Temp_BW |
After update Temp_Scale |
Equivalent quantization bit width
|
Shift flag Flag |
Left shift and quantification |
|
No
|
8 |
1 |
8 |
0 |
|
|
Yes
|
9 |
0.5 |
9 |
1 |
|
|
No
|
9 |
0.5 |
9 |
0 |
|
|
Yes
|
10 |
0.25 |
10 |
1 |
|
|
Yes
|
11 |
0.125 |
11 |
1 |
|
As shown in the above Figure 6 and Figure 7, the new algorithm uses serial quantization, which improves the coefficient accuracy better than traditional fixed-point quantization methods. Specifically, it has more stop band attenuation, smaller transition band width, and smaller pass band ripple. The minimum stop band attenuation of traditional quantization methods is 48 dB, while the improved algorithm is 61.3 dB, and the effect is very significant. The comparison chart of the minimum stop band attenuation of its spectral response is as follows:
As shown in the above Figure 8, the new algorithm has more quantization word lengths on both sides of the coefficients, and its accuracy is significantly higher than traditional algorithms at lower quantization memory bit widths.
Figure 6. Comparison of amplitude frequency response of serial quantization coefficients.
Figure 7. Frequency response pass band details.
Table 2. Resource comparison of equal bit width serial implementation.
DSP achieve |
ALUT |
memory |
DSP |
memory (bit) |
Traditional serial structure |
118 |
240 |
1 |
126 |
Serial structure of this article |
131 |
261 |
1 |
126 |
Resource increment |
11.0% |
8.8% |
0.0% |
0.0% |
As shown in the table above (Table 2), the implementation of the new algorithm includes a shift binary selector, which is the reason for the increased consumption of ALUT resources.
From the above table (Table 3), it can be seen that due to the increase in quantization bit width, ALUT and memory resources have increased by about 15%; The original 9 × 9 multiplier could be implemented using only one DSP, but with the increase in the bit width of the multiplier, the system needs to use an 18 × 18 multiplier, which is equivalent to two DSPs in effect. Therefore, the resources of the multiplier have increased, doubling by 100%.
Figure 8. Comparison of stop band attenuation under different position widths.
Table 3. Resource consumption of performance prerequisites such as serial implementation.
DSP achieve |
ALUT |
memory |
DSP |
memory (bit) |
Traditional serial structure |
562 |
1105 |
1 |
594 |
Serial structure of this article |
611 |
1209 |
1 |
594 |
Resource increment |
8.7% |
9.4% |
0.0% |
0.0% |
4. Conclusion
Due to the use of serial quantization, the accuracy of coefficient quantization values has been significantly improved compared to traditional fixed-point quantization methods, provided that the bit width of the multiplier and coefficient memory is not increased, and the filter order is not increased. Next, parallel structures can be used to further improve its performance, or a series parallel hybrid approach can be adopted to further enhance the accuracy of coefficient quantization values.