AV1 provides multiple transform (Tx) type options to work with. Moreover, transform block size could be the same of smaller than the prediction block size. In the following Tx type search and Tx size search are discussed in detail.
The AV1 specifications indicate that four transform options could be considered, namely DCT, Asymmetric Discrete Sine Transform (ADST), Flipped (reverse) ADST and the Identity transform. A total of 16 transform combinations with independent horizontal & vertical 1D transform selection are available as shown in Table 1.
Transform Type | Vertical | Horizontal |
---|---|---|
DCT_DCT | DCT | DCT |
ADST_DCT | ADST | DCT |
DCT_ADST | DCT | ADST |
ADST_ADST | ADST | ADST |
FLIPADST_DCT | FLIPADST | DCT |
DCT_FLIPADST | DCT | FLIPADST |
FLIPADST_FLIPADST | FLIPADST | FLIPADST |
ADST_FLIPADST | ADST | FLIPADST |
FLIPADST_ADST | FLIPADST | ADST |
IDTX | IDTX | IDTX |
V_DCT | DCT | IDTX |
H_DCT | IDTX | DCT |
V_ADST | ADST | IDTX |
H_ADST | IDTX | ADST |
V_FLIPADST | FLIPADST | IDTX |
H_FLIPADST | IDTX | FLIPADST |
For best performance, all the applicable transform options would be evaluated for a given candidate prediction and the transform option that results in the best cost would be selected. Given that the exhaustive search option can be computationally expensive, it is desired to find approaches to evaluate the least number of transform options without incurrent a significant loss in compression performance. The options considered in the following are listed below:
-
Deciding whether to perform Tx search, and if so where in the pipeline to perform the search.
-
To use a subset of the transform options.
-
To exploit already computed cost information for different prediction candidates for the same block to decide whether to skip the Tx search for the current candidate based on the cost difference between the current candidate and the best candidate.
Inputs: Prediction candidate.
Outputs: Transform type to use.
Control macros/flags:
Flag | Level (sequence/Picture) | Description |
---|---|---|
tx_search_level | Picture | Indicates whether Tx search is to be performed, and if so, whether it would be considered in MD or in the encode pass. |
The main function calls associated with Tx search in MD and in the encode pass are outlined in Figure 1 below.
Tx type search is performed using the functions tx_type_Search
and product_full_loop_tx_search
in MD and the function encode_pass_tx_search
in the encode pass.
A summary of the different optimization approaches considered in Tx search is presented in Figure 2 below.
The optimization of the Tx search is performed using different approaches as outlined in the following.
Tx search level: The Tx search level indicates where in the encoder pipeline the Tx search would be performed. The three candidate components of the encoder where Tx search could be performed are the full-loop in MD, inter-depth decision in MD, and the encode pass. The flag tx_search_level is used to indicate where Tx search would be performed. Table 3 summarized the values and associated descriptions of the flag. The settings for tx_search_level as a function of the encoder preset and other settings are given in Table 4.
tx_search_level | Value | Description |
---|---|---|
TX_SEARCH_OFF | 0 | Tx search OFF |
TX_SEARCH_ENC_DEC | 1 | Tx search performed only in the encode pass, the lowest complexity option. |
TX_SEARCH_INTER_DEPTH | 2 | Tx search performed only in inter-depth decision in MD, intermediate complexity level. |
TX_SEARCH_FULL_LOOP | 3 | Tx search performed only in the full-loop in MD, highest complexity level. |
Cost-dependent Tx search
If Tx search is to be performed in the full-loop in MD, the decision on whether to perform Tx search could be further refined based on the difference between the fast loop cost of the current candidate and the best fast loop cost for the block. If the difference is greater than a given threshold, Tx search is skipped. The threshold value is specified by the variable tx_weight. The values of tx_weight and corresponding descriptions are given in Table 5. The settings for tx_weight as a function of the encoder preset and other settings are given in Table 6.
tx_weight | Value | Description |
---|---|---|
0 | Always skip. | |
FC_SKIP_TX_SR_TH010 | 110 | Skip if difference in cost is 10% or more. |
FC_SKIP_TX_SR_TH025 | 125 | Skip if difference in cost is 25% or more. |
MAX_MODE_COST | 13616969489728 * 8 | No skipping |
Preset | PD_PASS_0 | PD_PASS_1 | PD_PASS_2 |
---|---|---|---|
M0 | MAX_MODE_COST | FC_SKIP_TX_SR_TH025 | if (tx_search_level == TX_SEARCH_ENC_DEC) then MAX_MODE_COST else FC_SKIP_TX_SR_TH025 |
M1 | MAX_MODE_COST | FC_SKIP_TX_SR_TH025 | if (tx_search_level == TX_SEARCH_ENC_DEC) then MAX_MODE_COST else FC_SKIP_TX_SR_TH025 |
M2 - M8 | MAX_MODE_COST | FC_SKIP_TX_SR_TH025 | if is_used_as_reference_flag then FC_SKIP_TX_SR_TH025 else FC_SKIP_TX_SR_TH010 |
Search subset: If Tx search is performed in either full-loop in MD or
in encode pass in enc/dec, a Tx search subset could be considered
instead of the full Tx search set. The use of a reduced search subset is
specified by the flag tx_search_reduced_set
. The values of tx_search_reduced_set
and the corresponding descriptions are given in Table 7. The
settings for tx_search_reduced_set
as a function of the encoder
preset and other settings are given in Table 8.
tx_search_reduced_set | Description |
---|---|
0 | Full Tx set |
1 | Reduced Tx set |
2 | Two Tx |
For a given block, Tx size search is used to determine the transform
block size that yields the best rate-distortion cost for the block
under consideration. In the current implementation of the Tx size
search feature, only one depth below the current block depth is
considered, as determined by the function get_end_tx_depth
. This
is true for inter and intra blocks and for the following block
sizes: 8X8, 8X16, 16X8, 16X16, 16X32, 32X16, 32X32, 32X64, 64X32,
64X64, 4X16, 16X4, 8X32, 32X8, 16X64, 64X16.
Inputs: Prediction candidate.
Outputs: Transform block size to use.
Control macros/flags:
Flag | Level (sequence/Picture) | Description |
---|---|---|
md_atb_mode | When set, it allows transform block size search. | |
md_staging_skip_atb | When set, transform block size search is skipped. | |
use_intrabc | Block | When set it indicates that Intra Block Copy prediction could be used. |
Details of the implementation
The main function calls associated with Tx size search in MD are outlined in Figure 3 below.
Tx size search is enabled currently only in MD_stage_2
since in
MD_Stage_1
we have md_staging_skip_atb == EB_TRUE
.
The function tx_partitioning_path
performs the Tx size search in MD.
Currently, only the original transform block and the corresponding
one-depth below partitioning transform blocks are evaluated, i.e. only
the original block depth and one depth below are evaluated. The flow of
the evaluation depends on whether the block is an inter coded block or
an intra coded block, as outlined below.
-
In the case of an inter block (i.e. the candidate type is INTER or Intra Block Copy), the residual block can be computed for the whole block based on the already computed prediction. This is done in the function
full_loop_core
through the call to the functionresidual_kernel
. -
Determine the setting for the flag
tx_search_skip_flag
, which indicates whether transform type search would be performed or not. The functionget_skip_tx_search_flag
is used to determine the setting for the flag. -
The function
tx_reset_neighbor_arrays
is used to reset the neighbor arrays. -
Loop over the depths to be evaluated (i.e. current depth and thenext depth).
a. Initialize the neighbor arrays using
tx_initialize_neighbor_arrays
b. Loop over the Tx blocks in the depth being evaluated.
-
If the block is not an inter block, then:
- Perform luma intra prediction in av1_intra_luma_prediction.
- Compute the luma resulting residuals in residual_kernel.
-
Perform Tx search for the current Tx block in
tx_type_search
-
Perform Tx, quantization, inverse quantization, and if spatialSSE, inverse transform. Compute the cost of the current transform type for the transform block size under consideration. All these operations are performed in
product_full_loop
-
If the block is not an inter block, update both the recon sample neighbor array and the transform-related neighbor array
tx_update_neighbor_array
. Otherwise, update only the transform-related neighbor array in the same function.
c. Estimate the rate associated with signaling the Tx size in
get_tx_size_bits
.d. Update
best_cost_search
andbest_tx_depth
based on the depths evaluated so far. -
The Tx size search optimization is based on checking whether the
parent transform block for the current transform block has all zero
coefficients. If the parent Tx block does not have any non-zero
coefficients, then no further Tx size search is considered. The feature
is controlled by the flag tx_size_early_exit
. The flag is used in
tx_partitioning_path
to exit the Tx size search if the flag is set. A
description of the flag settings is given in Table 10.
tx_size_early_exit | Description |
---|---|
0 | Feature OFF |
1 | Feature ON |
The flag tx_size_early_exit
is set to 1.