Use 4x4 selector dictionary for ETC1 compression

This change significantly improves the ETC1 compression ratio.

Explanation:

As has been shown in the previous commit, each element of the ETC1 endpoint dictionary should correspond to a single ETC1 base color. In order to achieve near-lossless compression with unlimited dictionary, it has been proposed to use 4x2 or 2x4 ETC1 subblocks as building elements, defined by a single endpoint and selector. This scheme is equivalent to the original DXT compression scheme, expect the different size of the block, defined by the dictionary elements.

Now let's pay attention to the following interesting observation. Even though in the original DXT compression scheme the dictionaries are defined in such a way, so that both endpoints and selectors from the dictionaries correspond to the same size of the decoded block (in case of DXT it is 4x4), there is no requirement for this implied by the Crunch algorithms. In fact, selector dictionary and indices are defined after the endpoint optimization is complete. At this point each image pixel is already associated with a specific endpoint. At the same time, the selector computation step is only using those per-pixel endpoint associations as an input information, so the size and the shape of the blocks, defined by selector dictionary elements, does not depend in any way on the size or shape of the blocks, defined by endpoint dictionary elements.

In other words, the endpoint space of the texture can be split into one set of blocks, defined by endpoint dictionary and endpoint indices. And the selector space of the texture can be split into absolutely different set of blocks, defined by selector dictionary and selector indices. Endpoint blocks can be different in size from the selector blocks, as well as endpoint blocks can overlap in arbitrary way with the selector blocks, and such setup will still be fully compatible with the existing Crunch algorithms.

In the current commit, the size of the block, defined by an ETC1 selector dictionary element, has been set to 4x4, which significantly improves the compression ratio (the ETC1 quantization parameters have been adjusted to preserve the average Luma PSNR).

Future research:
The discovered property of the Crunch algorithms opens another dimension for optimization of the compression ratio. Specifically, the quality of the compressed selectors can now be adjusted in two ways: by changing the size of the selector dictionary and by changing the size of the selector block. Note that both DXT and ETC formats have selectors encoded as plain bits in the output format, so there is no technical limitation on the size or shape of the selector block (though, for performance reasons, non-power-of-two selector blocks might require some specific optimizations in the decoder).

DXT Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.

[Compressing Kodak set without mipmaps using DXT1 encoding]
Original: 1582222 bytes / 28.859 sec
Modified: 1482780 bytes / 13.326 sec
Improvement: 6.28% (compression ratio) / 53.82% (compression time)

[Compressing Kodak set with mipmaps using DXT1 encoding]
Original: 2065243 bytes / 36.996 sec
Modified: 1931586 bytes / 18.121 sec
Improvement: 6.47% (compression ratio) / 51.02% (compression time)

ETC Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).

[Compressing Kodak set without mipmaps using ETC1 encoding]
Total size: 1692204 bytes
Total time: 17.528 sec
Average bitrate: 1.434 bpp
Average Luma PSNR: 34.057 dB
This commit is contained in:
Alexander Suvorov
2017-07-07 17:32:50 +02:00
parent f284523b15
commit 205f8a171d
4 changed files with 13 additions and 14 deletions
Binary file not shown.
+4 -4
View File
@@ -191,7 +191,7 @@ bool crn_comp::pack_color_selectors(crnlib::vector<uint8>& packed_data, const cr
for (uint selector_index = 0; selector_index < m_color_selectors.size(); selector_index++) {
uint32 cur_selector = remapped_selectors[selector_index];
uint prev_sym = 0;
for (uint32 selector = cur_selector, s = m_pParams->m_format == cCRNFmtETC1 ? 8 : 16, i = 0; i < s; i++, selector >>= 2, prev_selector >>= 2) {
for (uint32 selector = cur_selector, i = 0; i < 16; i++, selector >>= 2, prev_selector >>= 2) {
int sym = 3 + (selector & 3) - (prev_selector & 3);
if (i & 1) {
uint paired_sym = 7 * sym + prev_sym;
@@ -337,7 +337,7 @@ bool crn_comp::pack_blocks(
}
}
for (uint c = 0; c < cNumComps; c++) {
if (selector_remap[c]) {
if (selector_remap[c] && (m_pParams->m_format != cCRNFmtETC1 || !(bx & 1))) {
uint index = (*selector_remap[c])[m_selector_indices[b].component[c]];
if (!pCodec)
m_selector_index_hist[c ? 1 : 0].inc_freq(index);
@@ -458,12 +458,12 @@ bool crn_comp::quantize_images() {
} else if (m_pParams->m_format == cCRNFmtDXT5) {
color_quality_power_mul = .75f;
} else if (m_pParams->m_format == cCRNFmtETC1) {
color_quality_power_mul = 1.6f;
color_quality_power_mul = 1.28f;
params.m_adaptive_tile_color_psnr_derating = 5.0f;
}
float color_endpoint_quality = powf(quality, 1.8f * color_quality_power_mul);
float color_selector_quality = powf(quality, 1.65f * color_quality_power_mul * (m_pParams->m_format == cCRNFmtETC1 ? 2 : 1));
float color_selector_quality = powf(quality, 1.65f * color_quality_power_mul);
params.m_color_endpoint_codebook_size = math::clamp<uint>(math::float_to_uint(.5f + math::lerp<float>(math::maximum<float>(64, cCRNMinPaletteSize), (float)max_codebook_entries, color_endpoint_quality)), cCRNMinPaletteSize, cCRNMaxPaletteSize);
params.m_color_selector_codebook_size = math::clamp<uint>(math::float_to_uint(.5f + math::lerp<float>(math::maximum<float>(96, cCRNMinPaletteSize), (float)max_codebook_entries, color_selector_quality)), cCRNMinPaletteSize, cCRNMaxPaletteSize);
+6 -6
View File
@@ -839,12 +839,12 @@ void dxt_hc::create_color_selector_codebook_task(uint64 data, void* pData_ptr) {
uint E2[16][4];
uint E4[8][16];
uint E8[4][256];
for (uint b = m_num_blocks * data / num_tasks, bEnd = m_num_blocks * (data + 1) / num_tasks; b < bEnd; b++) {
for (uint n = m_params.m_format == cETC1 ? m_num_blocks >> 1 : m_num_blocks, b = n * data / num_tasks, bEnd = n * (data + 1) / num_tasks; b < bEnd; b++) {
color_cluster& cluster = m_color_clusters[m_endpoint_indices[b].color];
color_quad_u8* endpoint_colors = cluster.color_values;
for (uint p = 0; p < 16; p++) {
for (uint s = 0; s < 4; s++)
E2[p][s] = m_params.m_format == cETC1 ? p & 8 ? 0 : color::color_distance(m_params.m_perceptual, ((color_quad_u8(*)[8])m_blocks)[b][p], endpoint_colors[s], false) :
E2[p][s] = m_params.m_format == cETC1 ? color::color_distance(m_params.m_perceptual, m_blocks[b][p], m_color_clusters[m_endpoint_indices[b << 1 | p >> 3].color].color_values[s], false) :
color::color_distance(m_params.m_perceptual, m_blocks[b][p], endpoint_colors[s], false);
}
for (uint p = 0; p < 8; p++) {
@@ -870,18 +870,18 @@ void dxt_hc::create_color_selector_codebook_task(uint64 data, void* pData_ptr) {
total_errors[p][s] += E2[p][s];
}
selector_details[best_index].used = true;
m_selector_indices[b].color = best_index;
m_selector_indices[m_params.m_format == cETC1 ? b << 1 : b].color = best_index;
}
}
void dxt_hc::create_color_selector_codebook() {
tree_clusterizer<vec16F> selector_vq;
vec16F v;
for (uint b = 0; b < m_num_blocks; b++) {
uint64 selector = m_block_selectors[cColor][b];
for (uint n = m_params.m_format == cETC1 ? m_num_blocks >> 1 : m_num_blocks, b = 0; b < n; b++) {
uint64 selector = m_params.m_format == cETC1 ? m_block_selectors[cColor][b << 1] | m_block_selectors[cColor][b << 1 | 1] << 16 : m_block_selectors[cColor][b];
for (uint8 p = 0; p < 16; p++, selector >>= 2)
v[p] = ((selector & 3) + 0.5f) * 0.25f;
selector_vq.add_training_vec(v, selector);
selector_vq.add_training_vec(v, m_params.m_format == cETC1 ? (selector & 0xFFFF) + (selector >> 16) : selector);
}
selector_vq.generate_codebook(m_params.m_color_selector_codebook_size);
m_color_selectors.resize(selector_vq.get_codebook_size());
+3 -4
View File
@@ -3211,8 +3211,8 @@ class crn_unpacker {
const uint8* pFrom_linear = m_pHeader->m_format == cCRNFmtETC1 ? g_etc1_from_linear : g_dxt1_from_linear;
for (uint32 s = m_pHeader->m_format == cCRNFmtETC1 ? 4 : 8, i = 0; i < num_color_selectors; i++) {
for (uint32 j = 0; j < s; j++) {
for (uint32 i = 0; i < num_color_selectors; i++) {
for (uint32 j = 0; j < 8; j++) {
int32 sym = m_codec.decode(dm);
cur[j * 2 + 0] = (delta0[sym] + cur[j * 2 + 0]) & 3;
cur[j * 2 + 1] = (delta1[sym] + cur[j * 2 + 1]) & 3;
@@ -3603,14 +3603,13 @@ class crn_unpacker {
if (color_endpoint_index >= num_color_endpoints)
color_endpoint_index -= num_color_endpoints;
*(uint32*)&e0 = m_color_endpoints[color_endpoint_index];
uint32 selector = m_color_selectors[m_codec.decode(m_selector_delta_dm[0])] & 0xFFFF;
uint32 selector = m_color_selectors[m_codec.decode(m_selector_delta_dm[0])];
if (endpoint_reference) {
color_endpoint_index += m_codec.decode(m_endpoint_delta_dm[0]);
if (color_endpoint_index >= num_color_endpoints)
color_endpoint_index -= num_color_endpoints;
}
*(uint32*)&e1 = m_color_endpoints[color_endpoint_index];
selector |= m_color_selectors[m_codec.decode(m_selector_delta_dm[0])] << 16;
if (visible) {
uint32 block_selector = 0, flip = endpoint_reference >> 1 ^ 1, diff = 1;
for (uint32 t = 8, i = 0; i < 4; i++, t -= 15) {