Research and Implementation of MP3 Audio Coding Algorithm Based on Fixed Point DSP

This article refers to the address: http://

Abstract: By simplifying the psychoacoustic model and using a fast algorithm in the subband filter and quantization coding module, the computational complexity is greatly reduced, and real-time compression is realized on a 100MIPS fixed-point DSP.
Keywords: audio coding masking threshold psychoacoustic model analysis subband filter

MP3 is the abbreviation of audio compression layer 3 in MPEG-1 international standard. The mono bit rate is generally 64kbps. When the sampling rate is 44.1kHz, the compression ratio can be more than 12 times. It is widely used in the Internet and many others. occasion. Since decoding is much simpler than the encoding process, MP3 players or Walkmans can be seen everywhere, but MP3 encoding is implemented on a single-chip fixed-point DSP, and the sound quality is rarely heard. Considering that the psychoacoustic model accounts for a large proportion of the entire MP3 audio coding algorithm, the author has simplified the model manpower, and adopted a fast algorithm to reduce the computational amount and data amount of the subband coding, and to reduce the number of iteration cycles of the quantization coding as much as possible. A real-time compression of MP3 is realized on a TMS320C549 chip of Texas Instruments, which is played back with standard decoding software. Subjective evaluation can achieve sound quality close to CD for normal audio.

1 MP3 encoding algorithm and principle

Figure 1 is a system block diagram of an MP3 encoder. Each channel is processed with 1152 samples for one frame. First, the analysis sub-band filtering uses a quadrature mirror filter bank to divide the signal of the bandwidth of about 20 kHz into 32 sub-bands of equal bandwidth. The sub-band samples are then MDCT to compensate for the lack of sub-band filtering, mainly to improve the frequency resolution and eliminate the inter-band aliasing caused by sub-band filtering.

At the same time, the sampled values ​​are calculated by the psychoacoustic model to determine the masking threshold of each frequency band.

The distortion control loop and the non-normalized quantization control loop are quantization quantization loop processes that reduce the precision of each MDCT coefficient by quantization, thereby reducing the number of coded bits. Different coefficients use different quantization steps, the sensitivity of the frequency sensitive to the human ear is high, the accuracy of the insensitive frequency is low, and the quantization error is not detected by the human ear. The basis for selecting the quantization step is the masking threshold calculated by the psychoacoustic model.

Finally, the quantization order information and the Huffman code are packed into a bit stream for decoding.

So why does the masking threshold reflect the auditory characteristics of the human ear?

The auditory characteristics of the human ear involve problems in physiological acoustics and psychoacoustics. For example, the human ear feels different from the sound of different frequencies is a physiological problem, which is most sensitive to the sound of 2 kHz to 4 kHz, and the low frequency is more sensitive. The degree of sensitivity is embodied as a static masking threshold, as shown by the dashed line in Figure 2, which indicates the volume at which the sound of various frequencies is just heard in a quiet situation. There are masking effects related to human psychological perception. The masking effect refers to the phenomenon that the auditory feeling of one sound is affected by another sound, which is divided into temporal masking (forward and backward masking) and frequency masking (simultaneous masking). For example, when a strong sound stops, it takes a while to hear another weaker sound. This is the time masking effect. Frequency masking refers to the effect of a sound on the sound of its adjacent frequency at the same time, as shown by the solid line in Figure 2. The solid line of the flag 1 indicates that when the masking sound of 1 kHz is 60 dB, the sound of different frequencies is just heard of the decibel value, and the closer the frequency is masked, the more the frequency is masked, and the low frequency is easier to mask the high frequency.

Therefore, the psychoacoustic model first uses FFT to analyze the frequency components contained in the signal, and adds the values ​​masked by all other frequency components at each frequency. The curve obtained by the connection is the masking threshold, which is a function of frequency. When the energy of a certain frequency component is below the curve and cannot be felt by the human ear, the frequency component can be encoded with zero bits. On the other hand, if the quantization step is selected, if the quantization noise is lower than the masking curve, it is not The ear perceives, so the larger the masking threshold, the larger the frequency component quantization step can be. Therefore, using the masking threshold as the basis for the quantization coding, the quality of the compressed sound can be guaranteed. Since the sound signal changes with time, the psychoacoustic model is calculated twice for each frame of the signal, and a large amount of experimental test data is used, and the amount of calculation is imaginable.

2 algorithm simplification and optimization

2.1 Fast Algorithm for Analyzing Subband Filters

The input of the analysis subband filter is 32 sample values, and the output is a subband sample with equal intervals of 32 frequencies. It first puts 32 sample values ​​into a first-in, first-out (FIFO) buffer of length 512; windowing the buffer; then accumulating every 8 values ​​in 512 caches, converting to 64 intermediate values; finally passing (1) The equation converts 64 intermediate values ​​into 32 sample values:

The key to finding a fast algorithm is this last step. Set the coefficient to an array:

The array can be found to have the following symmetry:

Therefore, if the combining factors are equal or opposite, the formula (1) becomes:

It can be seen that the substitution of (5) instead of (1) can reduce the multiplication by half. It is also found that (5) is very similar to the standard IDCT, and the fast IDCT algorithm proposed by Lee can be slightly modified to derive the fast algorithm of (5). So the 32-point transform is broken down into the following two 16-point transforms:

Direct calculation of (1) requires 64 × 32 times multiplication and 63 × 32 additions, using fast algorithm requires 16 × 16 × 2 × 16 × 2 times multiplication and 15 × 16 × 2 + 16 × 2 + 31 + 15 additions The calculation amount is 1/4 of the original, and the storage space occupied by the data table is also reduced to about 1/8 of the original.

2.2 Simplification of psychoacoustic model

According to the experimental observation, the masking threshold curve of each frame is almost the same, so consider using the static acoustic mental model. The specific method is: first, for a representative audio frame, calculate the masking threshold curve according to the psychoacoustic model, and compress other audio. At the source, the psychoacoustic model of each frame is no longer calculated, but the frame signal is considered to have the same masking characteristics as the representative frame analyzed above. In this way, although not very accurate, under normal circumstances, the error is not too large, it is not easy to be perceived by the human ear, and the huge computational amount and storage space required for the psychoacoustic model are omitted. Practice has proved that the coding effect is satisfactory, and for applications where the requirements are not very high, the masking threshold can be considered as a constant function of frequency, and the same quantization step is used for each frequency band, and the sound quality is not obviously degraded.

2.3 Simplification of the quantization coding iterative loop

The quantization coding iteration is a two-loop process. Figure 3 is an external iterative loop flow diagram. The purpose of the iteration is to determine the global gain (indicating the global quantization step) based on the masking value of each frequency band within the limit of the available number of bits. And the scaling factor of each frequency band (reflecting the local quantization step). The inner loop gradually increases the quantizer step size, that is, the global gain, until the MDCT coefficients are quantized and Huffman can be encoded by the available bits, that is, by increasing the global quantization step to reduce the number of coded bits; the outer loop detects each scale factor band according to the masking threshold. The distortion, if it exceeds the allowable distortion, expands the MDCT coefficient of the band, ie increases the scaling factor of the band to reduce local distortion; the result of the last iteration is used as the final Huffman code. Each cycle is quantized with the current quantization step and Huffman coded once, and the amount of computation is quite large. It can be seen from the outer loop that the masking threshold ultimately determines the scaling factor. In order to eliminate the outer iterative loop, the scaling factor representing the frame is tabulated for each frame.

Since the above three modules are the most important and most computationally intensive modules, the size and computation of the program can be greatly reduced by simplifying and optimizing them.

3 Implementing MP3 compression algorithm with fixed-point DSP

In order to realize real-time encoding of MP3, a high-speed DSP chip must be used. Using TMS320C549, the mainstream fixed-point DSP chip of Texas Instruments (T1), the computing speed is 100MIPS. The debugging development environment is the EVM evaluation board of TI's third-party Spectrum Digital company. On the board, in addition to the 32K word memory on the TMS320C549. In addition, there is 128K words of off-chip memory, digital-to-analog conversion uses TI's TLC320AD55, and PCs through the JTAG port to achieve data and program loading and debugging.

Because the interface between the evaluation board and the host is too slow, even if real-time compression can be achieved, the speed at which the bit stream is transmitted to the PC can not keep up. Therefore, the method adopted by the author is: loading the original PCM audio data from the hard disk file of the PC to the off-chip memory on the board, the compressed data is transmitted to the PC to save the file, and then the subsequent file is loaded, and the file is compressed and stored until the entire audio file is all. After the compression, the data block is finally assembled into an MP3 file by the C language program, and played back by the software decoding program. Whether real-time requirements can be met can only be judged by testing the number of instructions that are run per frame.

When using the fast algorithm to calculate the subband analysis filter, taking into account the characteristics of the DSP chip, each time it is decomposed, it is necessary to make an addition such as (10), which is bound to reduce the accuracy, and the coefficient dynamics of (11) and (12) The range is too large and the accuracy is affected, so only the 16-point DCT operation is decomposed.

With a static psychoacoustic model, the amount of computation required for the psychoacoustic model and the quantized outer loop is zero. The psychoacoustic model and scaling factor of the representative frame are calculated by C language or MATLAB language, or the scaling factor information in the MP3 file downloaded from the Internet is deciphered and utilized, and the MDCT after the subband analysis filter is all long blocks. Table 1 is a setting scheme of the static scaling factor bit number and the scaling factor.

In addition, in the inner loop, a global gain is first selected to make the maximum quantization value smaller than the maximum codeable code table. The standard recommendation is that the global gain starts from a small time. After each quantization, the maximum quantization value is compared and adjusted globally. Gain until the requirements are met. This program eliminates this cycle, and calculates the global gain according to the maximum spectral line value in advance, and creates a data table. The program only needs to look up the table according to the maximum spectral line value. After initializing the global gain determination, partition, quantize, encode, and calculate the number of coded bits. If the number of bits is too large or too small, adjust the global gain. For this iterative loop process, the method of folding the cow search is used, that is to say, the global gain takes half of the above initialization value in the first cycle. If the number of coded bits exceeds the requirement, then half is taken as the new global gain, otherwise Increase by half, so loop continuously until you can't fold it. This method of binary search is much faster than searching one by one.

Using these simplifications, optimization measures and programming skills, the entire encoding program requires only about 75 MIPS, and the on-chip storage space takes up about 27K words. Decoded with standard MP3 playback software, the sound quality is close to CD through the main observation.

Because the system has greatly simplified the psychoacoustic model, the sound quality caused by this simplification is not obvious for general music, especially in applications where the requirements are not high. However, when applied to some audio signals with higher difficulty in encoding, such as castanets, the sound quality is significantly reduced. Therefore, if a higher computing speed DSP is used, a complete or simplified dynamic psychoacoustic model can be added to the coding system, and the coding quality can be further improved. As for the simplified dynamic psychoacoustic model, further exploration is needed.

references

1 Draft International Standard ISO/IEC CD 11172—3.1992

2 Wang Jianhong, Wu Haihua, Chen Jian. Fast algorithm for MPEG audio decoding neutron band synthesis filter and fixed point DSP implementation. Journal of Shanghai Jiaotong University, 2000; 34(6): 761-764

3 Chen Jian, Li Lili, Chen Yajun. MUSICAM algorithm simulation and DSP implementation. Journal of Shanghai Jiaotong University, 1997; 31(1): 74-78

4 Wu Haihua, Wang Jianhong, Chen Jian. MP3 decoding with low-cost DSP. Electroacoustic Technology, 1999, 10: 11-14

5 Byeong Gi Lee. FCT-A Fast Cosine Transform. ICASSP, San Diego. Califonia, America, 1984; 10(2)

Ceshi Xiaojuzizhi Gold spiral evil vehicle parts

cccccccccccccccccccvvvvvvvvvvvvvvvvvvvvvvvvvvvvxvxvxvxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvujfgjfvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvgjnfgjfvvvvvvvvvvvvvvvvvgjfcjnfcvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvfhdfhdhvvvvvvvvvvvvvvvvvvvvvvvvvvhjvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvfhfchxcfvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzx

xiaojuzizhi Manufacturer of Orange Enzyme Solution in China,dog ktv dfdv

Bossgoo(China)Tecgnology , https://www.cn-gangdao.com