H. based on ADSP-BF5619 processing. 264 video encoder design


0 Introduction H. 264/AVC is the latest international video coding standard jointly developed by ITU-TVCEG and ISO/IEC MPEG. It is one of the hot issues in the field of image communication research. H. The Video Coding Layer (VCL) of the 264 uses a number of new technologies, resulting in a significant increase in coding performance. Compared with the previous video codec standard, at the same bit rate, H. 264 has better image quality, which makes H. 264 has been widely used in low-rate video applications such as wireless communication and network transmission. But this is at the cost of increased complexity, so H. 264 faces enormous challenges in real-time video encoding and transmission applications. And use high performance digital signal processor (DSP) to achieve H. The 264 real-time encoder is a fast and efficient method that helps H. The rapid promotion and application of the 264 video standard. The ADSP-BF561 processor delivers a perfect system-level on-chip solution for multimedia and imaging applications with superior performance at 600 MHz and an integrated digital image processing peripheral interface. This paper studies and implements H. based on the needs of low bit rate video transmission. 264 standard video coding system, while discussing H. Implementation and optimization of 264 software encoder on DSP.

This article refers to the address: http://


1 H. Introduction to 264 encoding algorithm and ADSP-BF561 In the actual development process, for H. The algorithm features of 264 and the structural characteristics of the ADSP-BF561 dual-core processor, this paper has done a lot of optimization work, so as to ensure the coding accuracy, the coding speed is greatly improved. The following is a brief introduction to H. 264 video encoding algorithm and ADSP-BF561 dual-core processor system.
1.1 H. 264 encoding algorithm H. 264 is a new generation video coding standard jointly developed by ISO and ITU. It has high compression ratio and good robustness. The overall framework is shown in Figure 1.

On the basis of inheriting the original video coding standard, H. 264 has been improved in many aspects, including the introduction of intra prediction of 9 modes of 4×4 sub-blocks and 16×16 sub-blocks. The introduction of the intra mode is used together with transform coding to eliminate spatial redundancy, thereby greatly improving coding efficiency. In inter mode, H. The 264 supports multi-size motion estimation and compensation. The block size is not fixed at 8x8, but can be from 4×4 to 16x16, and includes blocks of different length and width (a total of 7 types), and supports multiple reference frames at the same time, which can greatly improve prediction performance. . In addition, H. 264 also uses integer DCT transform to reduce the amount of computation, while adaptive arithmetic coding is used to improve the coding efficiency, and the filter can be used to eliminate the blockiness caused by low bit quantization. In fact, H. The efficiency of 264 is 50% higher than existing coding techniques.
1.2 ADSP-BF561 chip structure The ADSP-BF561 is a dual-core 750 MHz processor with a symmetric multiprocessing (SMP) system architecture. Its SMP architecture provides users with higher performance and greater design flexibility in the integration and segmentation of signal processing and control functions. The system structure of ADSP-BF561 is shown in Figure 2. It consists of two cores, coreA and coreB, each with a processing frequency of 750MHz. Both cores have their own independent 32KB L1 instruction memory (16KB Cache/SRAM) and 64KB L1 data memory (32KB Cache/SDRAM), and can share 128KB L2 memory. When the two cores access different memory, the rate is significantly different. The access to L1 memory is the fastest, L2 is second, and the access to off-chip memory and devices is the slowest.
Due to the difference in access memory rates. Data exchange between the dual cores is best done directly in the L1 segment and requires the use of an IMDMA controller. The main function of this DMA controller is to exchange data between the L1 memories between the dual cores. Using the IMDMA controller, the rate of data processing can be increased when the access rate of the off-chip memory is slower or the data processing operation of L2 is performed, thereby improving the coding efficiency.


2 H. Optimization and Implementation of 264 Video Coding Algorithm The optimization of the encoder is mainly to optimize the P frame encoding process and optimize the ADSP-BF561 dual-core processing system. The reasonable process is beneficial to the independence and integrity of various modules. It is beneficial to optimize or upgrade only for a certain module in the future. The advantage of the dual-core coordinated processing of the ADSP-BF561 can further increase its speed.
2.1 Optimization of P frame encoding process due to H. The 264 encoding algorithm is quite large, and the optimization of the program details can not bring about significant efficiency improvement, so the program flow itself should be adjusted. In H. In the JM86 version of the 264 encoder, the encoding of I frames and P frames uses the same module, so that there are a large number of repeated judgments of intra and inter macroblocks, which limits the encoding speed. The Micro_h264 encoding software model deals with this shortcoming, and extracts the encoding of I frame and P frame separately. But unfortunately, the micro_h264 encoding software model encodes the macroblocks of one frame of image according to the raster scan order of the macroblocks in the image, without considering that the macroblocks have different positions in one frame of the image. Features, and the use of a unified mode for encoding these macroblocks, will also produce a lot of judgment conditions, which is not only not conducive to the pipeline operation of the DSP, but also not conducive to module optimization. This paper optimizes the P-frame encoding process of micro h264 for this shortcoming.
Macroblocks of different positions can be independently encoded according to the position of the macroblock in one frame of the image. At the same time, according to the different positions of the sub-blocks in the macroblock, they can also be independently coded.
When a frame of image is divided into a plurality of macroblocks, the macroblocks at different positions have different characteristics. Therefore, macroblocks can be classified according to different positions of macroblocks in one frame of image to classify macroblocks having the same encoding characteristics into one class, so that macroblocks of frame images can be classified into five categories. Figure 3 shows its macroblock classification diagram.
By classifying macroblocks, different functions can be called by different functions for different macroblocks, thereby reducing many unnecessary judgments, so that the pipeline operation of the DSP is not interrupted, and the speed is improved, and at the same time, It is also more targeted when optimizing.
In the P frame encoding, the encoder uses only one frame of reference frame, and improves the macro-h264 encoder software model to use the macroblock coding mode to traverse the algorithm one by one, but uses the macroblock coding mode to quickly select the algorithm. The flow chart of P frame coding is shown in Fig. 4.

Different operating platforms, the software structure should also be adjusted according to their respective characteristics. The lower complexity encoder can separate multiple different types of macroblocks and process them separately, which can save many intermediate repeated judgments, not only improve the encoding speed, but also make the program structure clearer. Moreover, due to each module The relative independence is also conducive to the expansion of the program. Although this increases the amount of code to a certain extent, it can effectively improve the encoding speed.
2.2 Optimization of ADSP-BF561 dual-core processing system In order to ensure the stable operation of the encoder, the kernel is positioned at 600 MHz. If the real-time encoding of 4CIF format can be realized on the basis of 600 MHz, then the core processing frequency can be improved. Supports higher quality video encoding processing in 4CIF format. In order to realize real-time encoding of 25 frames of images, the number of clock cycles required per frame is 600 MHz/25=24 MHz, that is, one frame needs to be encoded within 24 MHz clock cycles. It is equivalent to performing video processing in one frame of CIF format within 6 MHz. Obviously, if a core is used, it is difficult to perform real-time encoding processing. This article differs from the operating system of one core of most dual-core systems, while the other core runs the other software by placing the encoder in both cores for simultaneous processing.
When implementing this encoding algorithm on the ADSP-BF561 development board, the main difficulty is how to communicate and coordinate between the two cores. When dual cores run a video encoding program at the same time, data needs to be shared and exchanged. Although the implementation of macroblock data exchange using off-chip memory or L2 shared memory is relatively simple and does not require copying of data, the operation of accessing low-rate memory in a large amount greatly affects the rate of program execution, thereby affecting The coding efficiency of the encoder, therefore, the shared memory cannot be used for the exchange of macroblock data. In this paper, IMDMA directly exchanges data in the L1 data segments of the dual core, and exchanges the memory data at the same time of encoding processing, thereby avoiding a large number of access to low-rate storage space operations and reducing the execution time of the program. Since the amount of data exchanged by the message is small, the shared memory can be used, and therefore, the L2 memory with a relatively fast access rate can be used for access. In fact, the author has optimized the programming to implement the above encoding algorithm on the BF561 development board. The main process of its dual-core encoding is shown in Figure 5.

3 Test results and data analysis After optimization, H. The encoding performance of 264 has been greatly improved, realizing the real-time encoding processing of 4CIF format video on the BF561 chip. At the same time, the author also tested the encoding results of the original encoder and the dual-core encoder in the VisualDSP++5.0 compiler environment. The results are listed in Table 1. In fact, the encoding speed basically depends on the motion of the image and the color. As can be seen from the above data, the encoding speed is different for different sequences. The encoding speed of the Claire sequence is very fast because the background of the image is still, only the shoulders and the head have motion, so the amount of encoded data is smaller than that of Table 1 for different sequences (25f/s CIF format), and the encoding speed is shorter. high. In addition, if the image is relatively simple, its encoding speed will be higher, thus saving coding time.

The test results show that the optimization method of this paper can save a lot of H. The processing time of 264 video encoded data can better meet the requirements of real-time encoding of 4CIF video sequences. For very complex images, real-time encoding of 4CIF can also be achieved under certain quantization parameters.


4 Conclusion This paper focuses on H. based on ADSP-BF561 dual-core processor. Optimization of 264 video coding algorithm and its implementation method. At the same time, according to the architecture of ADSP-BF561 dual-core processor, the algorithm flow is adjusted on the key part of the encoding, and the real-time encoding of 4CIF format video data is realized on the dual core through the data exchange and coordination between the BF561 dual core. Practice has proved that using VisualDSP++5.0 simulation software, realize 25f/s H on the ADSP-BF561 development board. 264 4CIF video coding system to meet people's demand for video transmission.

CAT6A Keystone Jack

The CAT6A 10Gig UTP modular jacks provide transmission performance beyond 500MHz and Alien Crosstalk requirements exceeding the industry standards set by ANSI/TIA-568-C.2 for supporting the operation of 10Gig Ethernet applications (10GBASE-T). It is designed with a shielded termination cap which minimizes additional AXT from neighboring connectors. It also features twisted pair separation posts which minimize the untwisting of pairs to allow for efficient installation and long-term peak performance. The modular jacks are also backward compatible supporting current CAT6, CAT5e, and voice network applications.

CAT6A Keystone Jack,Keystone Jack CAT6A,Cat6a Jack,Keystone 6a

NINGBO UONICORE ELECTRONICS CO., LTD , https://www.uniconmelectronics.com