Add compression support for ETC1 textures

Explanation: Crunch algorithms are normally used for compression of DXTn textures. However, Crunch algorithms are much more powerful, and with some minor adjustments, those algorithms can be directly used to compress other texture formats. For example, the current commit demonstrates how to use the existing Crunch algorithms to compress ETC1 textures. Basics: In general, Crunch is performing the following steps: - tiling (determines block encodings) - quantization of the tile endpoints (determines endpoint indices) - optimization of the endpoints for each tile group (determines endpoint dictionary) - quantization of the selectors (determines selector indices) - selector refinement for each selector group (determines selector dictionary) - compression of the previously determined block encodings, dictionaries and indices Dictionary element: When applying Crunch algorithms to a new texture format, it is necessary to first define the dictionary element. In context of Crunch, this means thats the whole image consists of smaller non-overlapping blocks, while the contents of each individual block is determined by an endpoint and a selector from the corresponding dictionaries. For example, in case of DXT format, each endpoint and selector codebook element corresponds to a 4x4 pixel block. In general, the size of the blocks, which form the encoded image, depends on the texture format and quality considerations. It is proposed to define the dictionaries according to the following limitations: - The dictionary elements should be compatible with the existing Crunch algorithms, while the image blocks defined by those dictionary elements should be compatible with the texture encoding format. - It should be possible to cover a wide range of image quality and bitrates by just changing the size of the endpoint and selector dictionaries. If there is no limitation on the dictionary size, the encoding should preferably become lossless or near-lossless (not considering the quality loss implied by the texture format itself). In case of ETC1, the texture format itself determines the minimal size of the image block, defined by endpoint and selector: it can be either 2x4 or 4x2 rectangle, aligned to the borders of the 4x4 grid. It is not possible to use higher granularity, because each of those rectangles can have only one base color, according to the ETC1 format. For the same reason, any image block, defined by an endpoint and a selector from the dictionary, should be combined from those aligned 2x4 or 4x2 rectangles. Let's investigate the possibilities for the endpoint dictionary. According to the ETC1 format, each 4x4 ETC1 block is split in half, while each ETC1 subblock has it's own base color and a modifier table index. In fact, the base color and the modifier table index simply define the high and the low colors for the subblock (while there are some limitations on the position of those high and low colors, implied by the ETC1 encoding). If we define the endpoint dictionary element in such a way that it contains information about more than one ETC1 base color, then such a dictionary will become incompatible with the existing tile quantization algorithm, and the reason for this is the following. The Crunch tiling algorithm first performs quantization of all the tile pixel colors, down to just 2 colors. Then it quantizes all those color pairs, coming from different tiles. This approach works quite well for 4x4 DXT blocks, as those 2 colors approximately represent the principle component of the tile pixel colors. In case of ETC1 however, mixing together pixels, which correspond to different base colors, does not make much sense, as each group of those pixels has it's own low and high color values, independent from other groups. When those pixels are mixed together, the information about the original principle components of each subblock gets lost. For the mentioned reason, each endpoint dictionary element should correspond to a single ETC1 base color. In such case, the tile quantization algorithm will work almost the same way as for DXT format. Each pair of colors, generated by the tile palletizer, will normally have the subblock base color value somewhere in the middle between those 2 colors, so quantizing those color pairs should also automatically quantize the corresponding base colors. Moreover, each color pair implicitly contains information about the modifier table index (which corresponds to the distance between the high and the low colors), and therefore the corresponding table index will also get automatically quantized. Endpoint and selector dictionary elements, which define a single 2x4 or 4x2 ETC1 subblock, are fully compatible with the existing Crunch algorithms (because each ETC1 subblock is associated with a single base color and a single modifier table index). At the same time, those subblocks are minimal possible blocks, which can be defined by a dictionary element for ETC1 format (as has been shown earlier). Of course, it is also possible to use blocks larger than 2x4 or 4x2 (assuming that all the ETC1 subblocks, which form such a block, will have the same base color and the same modifier table index), however, with a larger block area it would be not possible to achieve near-lossless quality when the dictionary size is not limited. As the result, it is proposed to define the dictionaries in the following way: - Each element of the endpoint dictionary defines a single base color and a single modifier table index of a 2x4 or a 4x2 pixel block (which represents an ETC1 subblock). - Each endpoint is encoded as 3555 (3 bits for the table index and 5 bits for each component of the base color). - Each element of the selector dictionary defines selectors for a 2x4 or a 4x2 block. - Each selector is encoded using 16 bits. ETC1-specific adjustments: In case of DXT, the size of the encoded block is 4x4, while the tiling is performed in a 8x8 area (4 blocks). In case of ETC1, the tiling can be performed either in a 4x4 area (2 blocks), or in a 8x8 area (8 blocks), while other possibilities are either not symmetrical or too complex. For simplicity it is proposed to use 4x4 area for tiling. There are therefore 3 possible encodings: the 4x4 block is not split (encoded with a single endpoint), the 4x4 block is split horizontally, the 4x4 block is split vertically. For simplicity, endpoint references are currently determined only within the tiling area, while the encoding of the endpoint references has been adjusted in the following way: - The first ETC1 subblock will always have the reference value of 0 - The second ETC1 subblock can have the reference value of 0 if it has the same endpoint as the first subblock (note that in such case the flip of the ETC1 block does not need to be defined), the value of 1 if the corresponding ETC1 block is split horizontally, and the value of 2 if the corresponding ETC1 block is split vertically According to the ETC1 format, the base colors within an ETC1 block can be encoded either as 444 and 444, or differentially as 555 and 333. For simplicity, this aspect is currently not taken into account (all the endpoints are encoded as 3555 in the codebook). If it appears that the base colors in the resulting ETC1 block can not be encoded differentially, the decoder will convert both base colors from 555 to 444. At first, it might look like the ETC1 block flipping can bring some complications for Crunch, as the subblock structure might not look like a grid. This can be easily resolved by mirroring all the vertical ETC1 blocks across the main diagonal of the block after the tiling step (so that all the ETC1 subblocks will become 4x2 and form a regular grid). The decoder can mirror the ETC1 selector back according to the decoded block flip. The code adjustments for the ETC1 compression support are pretty straightforward and mostly trivial. Just note that when format-specific adjustments affect performance critical code, it makes sense to duplicate the body of the affected function and perform format-specific optimizations in each copy of the function individually. For performance reasons, the following 4 functions now got both ETC and DTX specific versions: - determine_tiles_task_etc() is an ETC-optimized version of the determine_tiles_task(), where dxt_fast class has been replaced with the etc1_optimizer class. - determine_color_endpoint_codebook_task_etc() is an ETC-optimized version of the determine_color_endpoint_codebook_task(), where dxt1_endpoint_optimizer class has been replaced with the etc1_optimizer class. - pack_color_endpoints_etc() is an ETC-optimized version of the pack_color_endpoints(), where 565565 DXT color endpoint encoding has been replaced with 3555 ETC color endpoint encoding. - unpack_etc1() is an ETC version of the unpack_dxt1() function. The color_quality_power_mul and m_adaptive_tile_color_psnr_derating parameters for ETC1 format have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the equivalent DXT1 compression, when compressing the Kodak test set without mipmaps using default quality. In order to use ETC1 compression, use the -ETC1 command line option (i.e. "crunch_x64.exe -ETC1 input.png"). By default, compressed ETC1 textures will be decompressed into KTX file format. DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch. [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.876 sec Modified: 1482780 bytes / 13.255 sec Improvement: 6.28% (compression ratio) / 54.10% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.987 sec Modified: 1931586 bytes / 18.068 sec Improvement: 6.47% (compression ratio) / 51.15% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1887265 bytes Total time: 14.954 sec Average bitrate: 1.600 bpp Average Luma PSNR: 34.049 dB
2017-07-05 17:55:14 +02:00
parent 39b85b74c2
commit f284523b15
8 changed files with 313 additions and 36 deletions
@@ -90,6 +90,30 @@ bool crn_comp::pack_color_endpoints(crnlib::vector<uint8>& packed_data, const cr
  return true;
 }

+bool crn_comp::pack_color_endpoints_etc(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping) {
+  crnlib::vector<uint32> remapped_endpoints(m_color_endpoints.size());
+  for (uint i = 0; i < m_color_endpoints.size(); i++)
+    remapped_endpoints[remapping[i]] = m_color_endpoints[i] & 0x07000000 | m_color_endpoints[i] >> 3 & 0x001F1F1F;
+
+  symbol_histogram hist(32);
+  for (uint32 prev_endpoint = 0, p = 0; p < remapped_endpoints.size(); p++) {
+    for (uint32 _e = prev_endpoint, e = prev_endpoint = remapped_endpoints[p], c = 0; c < 4; c++, _e >>= 8, e >>= 8)
+      hist.inc_freq(e - _e & 0x1F);
+  }
+  static_huffman_data_model dm;
+  dm.init(true, hist, 15);
+  symbol_codec codec;
+  codec.start_encoding(1024 * 1024);
+  codec.encode_transmit_static_huffman_data_model(dm, false);
+  for (uint32 prev_endpoint = 0, p = 0; p < remapped_endpoints.size(); p++) {
+    for (uint32 _e = prev_endpoint, e = prev_endpoint = remapped_endpoints[p], c = 0; c < 4; c++, _e >>= 8, e >>= 8)
+      codec.encode(e - _e & 0x1F, dm);
+  }
+  codec.stop_encoding(false);
+  packed_data.swap(codec.get_encoding_buf());
+  return true;
+}
+
 bool crn_comp::pack_alpha_endpoints(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping) {
  crnlib::vector<uint> remapped_endpoints(m_alpha_endpoints.size());

@@ -167,7 +191,7 @@ bool crn_comp::pack_color_selectors(crnlib::vector<uint8>& packed_data, const cr
  for (uint selector_index = 0; selector_index < m_color_selectors.size(); selector_index++) {
    uint32 cur_selector = remapped_selectors[selector_index];
    uint prev_sym = 0;
-    for (uint32 selector = cur_selector, i = 0; i < 16; i++, selector >>= 2, prev_selector >>= 2) {
+    for (uint32 selector = cur_selector, s = m_pParams->m_format == cCRNFmtETC1 ? 8 : 16, i = 0; i < s; i++, selector >>= 2, prev_selector >>= 2) {
      int sym = 3 + (selector & 3) - (prev_selector & 3);
      if (i & 1) {
        uint paired_sym = 7 * sym + prev_sym;
@@ -300,7 +324,7 @@ bool crn_comp::pack_blocks(
      for (uint c = 0; c < cNumComps; c++) {
        if (endpoint_remap[c]) {
          uint index = (*endpoint_remap[c])[m_endpoint_indices[b].component[c]];
-          if (!m_endpoint_indices[b].reference) {
+          if (m_pParams->m_format == cCRNFmtETC1 ? !(bx & 1) || m_endpoint_indices[b].reference : !m_endpoint_indices[b].reference) {
            int sym = index - endpoint_index[c];
            if (sym < 0)
              sym += endpoint_remap[c]->size();
@@ -352,7 +376,7 @@ bool crn_comp::alias_images() {
  m_total_blocks = 0;
  for (uint level = 0; level < m_pParams->m_levels; level++) {
    uint blockHeight = (math::maximum(1U, m_pParams->m_height >> level) + 7 & ~7) >> 2;
-    m_levels[level].block_width = (math::maximum(1U, m_pParams->m_width >> level) + 7 & ~7) >> 2;
+    m_levels[level].block_width = (math::maximum(1U, m_pParams->m_width >> level) + 7 & ~7) >> (m_pParams->m_format == cCRNFmtETC1 ? 1 : 2);
    m_levels[level].first_block = m_total_blocks;
    m_levels[level].num_blocks = m_pParams->m_faces * m_levels[level].block_width * blockHeight;
    m_total_blocks += m_levels[level].num_blocks;
@@ -431,11 +455,15 @@ bool crn_comp::quantize_images() {
      color_quality_power_mul = 3.5f;
      alpha_quality_power_mul = .35f;
      params.m_adaptive_tile_color_psnr_derating = 5.0f;
-    } else if (m_pParams->m_format == cCRNFmtDXT5)
+    } else if (m_pParams->m_format == cCRNFmtDXT5) {
      color_quality_power_mul = .75f;
+    } else if (m_pParams->m_format == cCRNFmtETC1) {
+      color_quality_power_mul = 1.6f;
+      params.m_adaptive_tile_color_psnr_derating = 5.0f;
+    }

    float color_endpoint_quality = powf(quality, 1.8f * color_quality_power_mul);
-    float color_selector_quality = powf(quality, 1.65f * color_quality_power_mul);
+    float color_selector_quality = powf(quality, 1.65f * color_quality_power_mul * (m_pParams->m_format == cCRNFmtETC1 ? 2 : 1));
    params.m_color_endpoint_codebook_size = math::clamp<uint>(math::float_to_uint(.5f + math::lerp<float>(math::maximum<float>(64, cCRNMinPaletteSize), (float)max_codebook_entries, color_endpoint_quality)), cCRNMinPaletteSize, cCRNMaxPaletteSize);
    params.m_color_selector_codebook_size = math::clamp<uint>(math::float_to_uint(.5f + math::lerp<float>(math::maximum<float>(96, cCRNMinPaletteSize), (float)max_codebook_entries, color_selector_quality)), cCRNMinPaletteSize, cCRNMaxPaletteSize);

@@ -522,8 +550,9 @@ bool crn_comp::quantize_images() {
      break;
    }
    case cCRNFmtETC1: {
-      console::warning("crn_comp::quantize_images: This class does not support ETC1");
-      return false;
+      params.m_format = cETC1;
+      m_has_comp[cColor] = true;
+      break;
    }
    default: {
      return false;
@@ -674,7 +703,7 @@ void crn_comp::optimize_color_endpoints_task(uint64 data, void* pData_ptr) {
    optimize_color_selectors();
  }

-  pack_color_endpoints(pParams->pResult->packed_endpoints, remapping);
+  m_pParams->m_format == cCRNFmtETC1 ? pack_color_endpoints_etc(pParams->pResult->packed_endpoints, remapping) : pack_color_endpoints(pParams->pResult->packed_endpoints, remapping);
  uint total_bits = pParams->pResult->packed_endpoints.size() << 3;

  crnlib::vector<uint> hist(n);
@@ -760,7 +789,7 @@ void crn_comp::optimize_color() {
  crnlib::vector<uint> sum(n);
  for (uint i, i_prev = 0, b = 0; b < m_endpoint_indices.size(); b++, i_prev = i) {
    i = m_endpoint_indices[b].color;
-    if (!m_endpoint_indices[b].reference && i != i_prev) {
+    if ((!m_endpoint_indices[b].reference || m_pParams->m_format == cCRNFmtETC1) && i != i_prev) {
      hist[i * n + i_prev]++;
      hist[i_prev * n + i]++;
      sum[i]++;
@@ -777,8 +806,8 @@ void crn_comp::optimize_color() {
  }
  crnlib::vector<optimize_color_params::unpacked_endpoint> unpacked_endpoints(n);
  for (uint16 i = 0; i < n; i++) {
-    unpacked_endpoints[i].low = dxt1_block::unpack_color(m_color_endpoints[i] & 0xFFFF, true);
-    unpacked_endpoints[i].high = dxt1_block::unpack_color(m_color_endpoints[i] >> 16, true);
+    unpacked_endpoints[i].low.m_u32 = m_pParams->m_format == cCRNFmtETC1 ? m_color_endpoints[i] & 0xFFFFFF : dxt1_block::unpack_color(m_color_endpoints[i] & 0xFFFF, true).m_u32;
+    unpacked_endpoints[i].high.m_u32 = m_pParams->m_format == cCRNFmtETC1 ? m_color_endpoints[i] >> 24 : dxt1_block::unpack_color(m_color_endpoints[i] >> 16, true).m_u32;
  }

  optimize_color_params::result remapping_trial[4];
@@ -85,6 +85,7 @@ class crn_comp : public itexture_comp {
  crnlib::vector<uint8> m_packed_alpha_selectors;

  bool pack_color_endpoints(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
+  bool pack_color_endpoints_etc(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
  bool pack_color_selectors(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
  bool pack_alpha_endpoints(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
  bool pack_alpha_selectors(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
@@ -5,6 +5,7 @@
 #include "crn_image_utils.h"
 #include "crn_console.h"
 #include "crn_dxt_fast.h"
+#include "crn_etc.h"

 namespace crnlib {

@@ -76,7 +77,7 @@ bool dxt_hc::compress(
    const params& p
  ) {
  clear();
-  m_has_color_blocks = p.m_format == cDXT1 || p.m_format == cDXT5;
+  m_has_color_blocks = p.m_format == cDXT1 || p.m_format == cDXT5 || p.m_format == cETC1;
  m_num_alpha_blocks = p.m_format == cDXT5 || p.m_format == cDXT5A ? 1 : p.m_format == cDXN_XY || p.m_format == cDXN_YX ? 2 : 0;
  if (!m_has_color_blocks && !m_num_alpha_blocks)
    return false;
@@ -115,7 +116,7 @@ bool dxt_hc::compress(
  }

  for (uint i = 0; i <= m_pTask_pool->get_num_threads(); i++)
-    m_pTask_pool->queue_object_task(this, &dxt_hc::determine_tiles_task, i);
+    m_pTask_pool->queue_object_task(this, m_params.m_format == cETC1 ? &dxt_hc::determine_tiles_task_etc : &dxt_hc::determine_tiles_task, i);
  m_pTask_pool->join();

  m_num_tiles = 0;
@@ -141,7 +142,8 @@ bool dxt_hc::compress(
  hash_map<uint32, uint> color_endpoints_map;
  for (uint i = 0; i < m_color_clusters.size(); i++) {
    if (m_color_clusters[i].pixels.size()) {
-      uint32 endpoint = dxt1_block::pack_endpoints(m_color_clusters[i].first_endpoint, m_color_clusters[i].second_endpoint);
+      uint32 endpoint = m_params.m_format == cETC1 ? m_color_clusters[i].first_endpoint :
+        dxt1_block::pack_endpoints(m_color_clusters[i].first_endpoint, m_color_clusters[i].second_endpoint);
      hash_map<uint32, uint>::insert_result insert_result = color_endpoints_map.insert(endpoint, color_endpoints.size());
      if (insert_result.second) {
        color_endpoints_remap[i] = color_endpoints.size();
@@ -216,7 +218,7 @@ bool dxt_hc::compress(
          uint16 selector_index = (c ? alpha_selectors_remap : color_selectors_remap)[m_selector_indices[b].component[c]];
          selector_indices[b].component[c] = selector_index;
        }
-        endpoint_indices[b].reference = left_match ? 1 : top_match ? 2 : 0;
+        endpoint_indices[b].reference = m_params.m_format == cETC1 ? m_endpoint_indices[b].reference : left_match ? 1 : top_match ? 2 : 0;
      }
    }
  }
@@ -370,6 +372,99 @@ void dxt_hc::determine_tiles_task(uint64 data, void* pData_ptr) {
  }
 }

+void dxt_hc::determine_tiles_task_etc(uint64 data, void* pData_ptr) {
+  uint num_tasks = m_pTask_pool->get_num_threads() + 1;
+  uint offsets[5] = {0, 8, 16, 24, 16};
+  uint8 tiles[3][2] = {{4}, {2, 3}, {0, 1}};
+  uint8 tile_map[3][2] = {{ 0, 0 }, { 0, 1 }, { 0, 1 }};
+  color_quad_u8 tilePixels[32];
+  uint8 selectors[32];
+  uint tile_error[5];
+  uint total_error[3];
+  tree_clusterizer<vec3F> color_palettizer;
+
+  etc1_optimizer optimizer;
+  etc1_optimizer::params params;
+  params.m_use_color4 = false;
+  params.m_constrain_against_base_color5 = false;
+  etc1_optimizer::results results;
+  results.m_pSelectors = selectors;
+  int scan[] = {-1, 0, 1};
+  int refine[] = {-3, -2, 2, 3};
+
+  for (uint level = 0; level < m_params.m_num_levels; level++) {
+    float weight = m_params.m_levels[level].m_weight;
+    uint b = m_params.m_levels[level].m_first_block + m_params.m_levels[level].m_num_blocks * data / num_tasks & ~1;
+    uint bEnd = m_params.m_levels[level].m_first_block + m_params.m_levels[level].m_num_blocks * (data + 1) / num_tasks & ~1;
+    for (; b < bEnd; b += 2) {
+      for (uint p = 0; p < 16; p++)
+        tilePixels[p] = m_blocks[b >> 1][p << 2 & 12 | p >> 2];
+      memcpy(tilePixels + 16, m_blocks[b >> 1], 64);
+      for (uint t = 0; t < 5; t++) {
+        params.m_pSrc_pixels = tilePixels + offsets[t];
+        params.m_num_src_pixels = results.m_n = 8 << (t >> 2);
+        optimizer.init(params, results);
+        params.m_pScan_deltas = scan;
+        params.m_scan_delta_size = sizeof(scan) / sizeof(*scan);
+        optimizer.compute();
+        if (results.m_error > 375 * params.m_num_src_pixels) {
+          params.m_pScan_deltas = refine;
+          params.m_scan_delta_size = sizeof(refine) / sizeof(*refine);
+          optimizer.compute();
+        }
+        tile_error[t] = results.m_error;
+      }
+
+      for (uint8 e = 0; e < 3; e++) {
+        total_error[e] = 0;
+        for (uint8 t = 0, s = e + 1; s; s >>= 1, t++)
+          total_error[e] += tile_error[tiles[e][t]];
+      }
+
+      float best_quality = 0.0f;
+      uint best_encoding = 0;
+      for (uint e = 0; e < 3; e++) {
+        float quality = 0;
+        double peakSNR = total_error[e] ? log10(255.0f / sqrt(total_error[e] / 48.0)) * 20.0f : 999999.0f;
+        quality = (float)math::maximum<double>(peakSNR - m_color_derating[level][e], 0.0f);
+        if (quality > best_quality) {
+          best_quality = quality;
+          best_encoding = e;
+        }
+      }
+
+      for (uint tile_index = 0, s = best_encoding + 1; s; s >>= 1, tile_index++) {
+        tile_details& tile = m_tiles[b | tile_index];
+        uint t = tiles[best_encoding][tile_index];
+        tile.pixels.append(tilePixels + offsets[t], 8 << (t >> 2));
+        tile.weight = weight;
+        color_palettizer.clear();
+        for (uint p = 0; p < tile.pixels.size(); p++) {
+          const color_quad_u8& pixel = tile.pixels[p];
+          vec3F v(m_uint8_to_float[pixel[0]], m_uint8_to_float[pixel[1]], m_uint8_to_float[pixel[2]]);
+          color_palettizer.add_training_vec(m_params.m_perceptual ? vec3F(v[0] * 0.5f, v[1], v[2] * 0.25f) : v, 1);
+        }
+        color_palettizer.generate_codebook(2);
+        bool single = color_palettizer.get_codebook_size() == 1;
+        bool reorder = !single && color_palettizer.get_codebook_entry(0).length() > color_palettizer.get_codebook_entry(1).length();
+        for (uint t = 0, i = 0; i < 2; i++) {
+          vec3F v = color_palettizer.get_codebook_entry(single ? 0 : reorder ? 1 - i : i);
+          for (uint c = 0; c < 3; c++, t++)
+            tile.color_endpoint[t] = v[c];
+        }
+      }
+
+      for (uint bx = 0; bx < 2; bx++) {
+        m_block_encodings[b | bx] = best_encoding;
+        m_tile_indices[b | bx] = b | tile_map[best_encoding][bx];
+        m_endpoint_indices[b | bx].reference = bx ? best_encoding : 0;
+      }
+      if (best_encoding >> 1)
+        memcpy(m_blocks[b >> 1], tilePixels, 64);
+    }
+  }
+}
+
 void dxt_hc::determine_color_endpoint_codebook_task(uint64 data, void* pData_ptr) {
  pData_ptr;
  const uint thread_index = static_cast<uint>(data);
@@ -467,6 +562,75 @@ void dxt_hc::determine_color_endpoint_codebook_task(uint64 data, void* pData_ptr
  }
 }

+void dxt_hc::determine_color_endpoint_codebook_task_etc(uint64 data, void* pData_ptr) {
+  uint num_tasks = m_pTask_pool->get_num_threads() + 1;
+  uint8 delta[8][2] = { {2, 8}, {5, 17}, {9, 29}, {13, 42}, {18, 60}, {24, 80}, {33, 106}, {47, 183} };
+  int scan[] = {-1, 0, 1};
+  int refine[] = {-3, -2, 2, 3};
+  for (uint iCluster = m_color_clusters.size() * data / num_tasks, iEnd = m_color_clusters.size() * (data + 1) / num_tasks; iCluster < iEnd; iCluster++) {
+    color_cluster& cluster = m_color_clusters[iCluster];
+    if (cluster.pixels.size()) {
+      etc1_optimizer optimizer;
+      etc1_optimizer::params params;
+      params.m_use_color4 = false;
+      params.m_constrain_against_base_color5 = false;
+      etc1_optimizer::results results;
+      crnlib::vector<uint8> selectors(cluster.pixels.size());
+      params.m_pSrc_pixels = cluster.pixels.get_ptr();
+      results.m_pSelectors = selectors.get_ptr();
+      results.m_n = params.m_num_src_pixels = cluster.pixels.size();
+      optimizer.init(params, results);
+      params.m_pScan_deltas = scan;
+      params.m_scan_delta_size = sizeof(scan) / sizeof(*scan);
+      optimizer.compute();
+      if (results.m_error > 375 * params.m_num_src_pixels) {
+        params.m_pScan_deltas = refine;
+        params.m_scan_delta_size = sizeof(refine) / sizeof(*refine);
+        optimizer.compute();
+      }
+      color_quad_u8 endpoint;
+      for (int c = 0; c < 3; c++)
+        endpoint.c[c] = results.m_block_color_unscaled.c[c] << 3 | results.m_block_color_unscaled.c[c] >> 2;
+      endpoint.c[3] = results.m_block_inten_table;
+      cluster.first_endpoint = endpoint.m_u32;
+      for (uint8 d0 = delta[endpoint.c[3]][0], d1 = delta[endpoint.c[3]][1], c = 0; c < 3; c++) {
+        uint8 q = endpoint.c[c];
+        cluster.color_values[0].c[c] = q <= d1 ? 0 : q - d1;
+        cluster.color_values[1].c[c] = q <= d0 ? 0 : q - d0;
+        cluster.color_values[2].c[c] = q >= 255 - d0 ? 255 : q + d0;
+        cluster.color_values[3].c[c] = q >= 255 - d1 ? 255 : q + d1;
+      }
+      for (int t = 0; t < 4; t++)
+        cluster.color_values[t].c[3] = 0xFF;
+      uint endpoint_weight = color::color_distance(m_params.m_perceptual, cluster.color_values[0], cluster.color_values[3], false) / 2000;
+
+      float encoding_weight[8];
+      for (uint i = 0; i < 8; i++)
+        encoding_weight[i] = math::lerp(1.15f, 1.0f, i / 7.0f);
+
+      crnlib::vector<uint>& blocks = cluster.blocks[cColor];
+      for (uint i = 0; i < blocks.size(); i++) {
+        uint b = blocks[i];
+        uint weight = (uint)(math::clamp<uint>(endpoint_weight * m_block_weights[b], 1, 2048) * encoding_weight[m_block_encodings[b]]);
+        uint32 selector = 0;
+        for (uint sh = 0, p = 0; p < 8; p++, sh += 2) {
+          uint error_best = cUINT32_MAX;
+          uint8 s_best = 0;
+          for (uint8 s = 0; s < 4; s++) {
+            uint error = color::color_distance(m_params.m_perceptual, ((color_quad_u8(*)[8])m_blocks)[b][p], cluster.color_values[s], false);
+            if (error < error_best) {
+              s_best = s;
+              error_best = error;
+            }
+          }
+          selector |= s_best << sh;
+        }
+        m_block_selectors[cColor][b] = selector | (uint64)weight << 32;
+      }
+    }
+  }
+}
+
 void dxt_hc::determine_color_endpoint_clusters_task(uint64 data, void* pData_ptr) {
  tree_clusterizer<vec6F>* vq = (tree_clusterizer<vec6F>*)pData_ptr;
  uint num_tasks = m_pTask_pool->get_num_threads() + 1;
@@ -499,10 +663,19 @@ void dxt_hc::determine_color_endpoints() {
    uint cluster_index = m_tiles[m_tile_indices[b]].cluster_indices[cColor];
    m_endpoint_indices[b].component[cColor] = cluster_index;
    m_color_clusters[cluster_index].blocks[cColor].push_back(b);
+    if (m_params.m_format == cETC1 && m_endpoint_indices[b].reference && cluster_index == m_endpoint_indices[b - 1].component[cColor]) {
+      if (m_endpoint_indices[b].reference >> 1) {
+        color_quad_u8 mirror[16];
+        for (uint p = 0; p < 16; p++)
+          mirror[p] = m_blocks[b >> 1][p << 2 & 12 | p >> 2];
+        memcpy(m_blocks[b >> 1], mirror, 64);
+      }
+      m_endpoint_indices[b].reference = 0;
+    }
  }

  for (uint i = 0; i <= m_pTask_pool->get_num_threads(); i++)
-    m_pTask_pool->queue_object_task(this, &dxt_hc::determine_color_endpoint_codebook_task, i, NULL);
+    m_pTask_pool->queue_object_task(this, m_params.m_format == cETC1 ? &dxt_hc::determine_color_endpoint_codebook_task_etc : &dxt_hc::determine_color_endpoint_codebook_task, i, NULL);
  m_pTask_pool->join();
 }

@@ -671,7 +844,8 @@ void dxt_hc::create_color_selector_codebook_task(uint64 data, void* pData_ptr) {
    color_quad_u8* endpoint_colors = cluster.color_values;
    for (uint p = 0; p < 16; p++) {
      for (uint s = 0; s < 4; s++)
-        E2[p][s] = color::color_distance(m_params.m_perceptual, m_blocks[b][p], endpoint_colors[s], false);
+        E2[p][s] = m_params.m_format == cETC1 ? p & 8 ? 0 : color::color_distance(m_params.m_perceptual, ((color_quad_u8(*)[8])m_blocks)[b][p], endpoint_colors[s], false) :
+          color::color_distance(m_params.m_perceptual, m_blocks[b][p], endpoint_colors[s], false);
    }
    for (uint p = 0; p < 8; p++) {
      for (uint s = 0; s < 16; s++)
@@ -184,8 +184,10 @@ class dxt_hc {
  int m_prev_percentage_complete;

  void determine_tiles_task(uint64 data, void* pData_ptr);
+  void determine_tiles_task_etc(uint64 data, void* pData_ptr);

  void determine_color_endpoint_codebook_task(uint64 data, void* pData_ptr);
+  void determine_color_endpoint_codebook_task_etc(uint64 data, void* pData_ptr);
  void determine_color_endpoint_clusters_task(uint64 data, void* pData_ptr);
  void determine_color_endpoints();

@@ -560,11 +560,6 @@ bool process(convert_params& params, convert_stats& stats) {
  else if (dst_format == PIXEL_FMT_DXT1A)
    comp_params.set_flag(cCRNCompFlagDXT1AForTransparency, true);

-  if ((dst_format == PIXEL_FMT_ETC1) && (params.m_dst_file_type == texture_file_types::cFormatCRN)) {
-    console::warning("CRN file format does not support ETC1 compressed textures - converting to DXT1 instead.");
-    dst_format = PIXEL_FMT_DXT1;
-  }
-
  if ((dst_format == PIXEL_FMT_DXT1A) && (params.m_dst_file_type == texture_file_types::cFormatCRN)) {
    console::warning("CRN file format does not support DXT1A compressed textures - converting to DXT5 instead.");
    dst_format = PIXEL_FMT_DXT5;
@@ -478,8 +478,11 @@ class crunch {
        } else {
          texture_file_types::format input_file_type = texture_file_types::determine_file_format(in_filename.get_ptr());
          if (input_file_type == texture_file_types::cFormatCRN) {
-            // Automatically transcode CRN->DXTc and write to DDS files, unless the user specifies either the /fileformat or /split options.
            out_file_type = texture_file_types::cFormatDDS;
+            cfile_stream in_stream;
+            crnd::crn_header in_header;
+            if (in_stream.open(in_filename.get_ptr()) && in_stream.read(&in_header, sizeof(in_header)) == sizeof(in_header) && in_header.m_format == cCRNFmtETC1)
+              out_file_type = texture_file_types::cFormatKTX;
          } else if (input_file_type == texture_file_types::cFormatKTX) {
            // Default to converting KTX files to PNG
            out_file_type = texture_file_types::cFormatPNG;
@@ -2272,7 +2272,7 @@ bool crnd_get_texture_info(const void* pData, uint32 data_size, crn_texture_info
  pInfo->m_levels = pHeader->m_levels;
  pInfo->m_faces = pHeader->m_faces;
  pInfo->m_format = static_cast<crn_format>((uint32)pHeader->m_format);
-  pInfo->m_bytes_per_block = ((pHeader->m_format == cCRNFmtDXT1) || (pHeader->m_format == cCRNFmtDXT5A)) ? 8 : 16;
+  pInfo->m_bytes_per_block = (pHeader->m_format == cCRNFmtDXT1 || pHeader->m_format == cCRNFmtDXT5A || pHeader->m_format == cCRNFmtETC1) ? 8 : 16;
  pInfo->m_userdata0 = pHeader->m_userdata0;
  pInfo->m_userdata1 = pHeader->m_userdata1;

@@ -2740,6 +2740,7 @@ uint64 symbol_codec::stop_decoding() {
 namespace crnd {
 const uint8 g_dxt1_to_linear[cDXT1SelectorValues] = {0U, 3U, 1U, 2U};
 const uint8 g_dxt1_from_linear[cDXT1SelectorValues] = {0U, 2U, 3U, 1U};
+const uint8 g_etc1_from_linear[cDXT1SelectorValues] = {3U, 2U, 0U, 1U};

 const uint8 g_dxt5_to_linear[cDXT5SelectorValues] = {0U, 7U, 1U, 2U, 3U, 4U, 5U, 6U};
 const uint8 g_dxt5_from_linear[cDXT5SelectorValues] = {0U, 2U, 3U, 4U, 5U, 6U, 7U, 1U};
@@ -3010,7 +3011,7 @@ class crn_unpacker {
    const uint32 height = math::maximum(m_pHeader->m_height >> level_index, 1U);
    const uint32 blocks_x = (width + 3U) >> 2U;
    const uint32 blocks_y = (height + 3U) >> 2U;
-    const uint32 block_size = ((m_pHeader->m_format == cCRNFmtDXT1) || (m_pHeader->m_format == cCRNFmtDXT5A)) ? 8 : 16;
+    const uint32 block_size = (m_pHeader->m_format == cCRNFmtDXT1 || m_pHeader->m_format == cCRNFmtDXT5A || m_pHeader->m_format == cCRNFmtETC1) ? 8 : 16;

    uint32 minimal_row_pitch = block_size * blocks_x;
    if (!row_pitch_in_bytes)
@@ -3042,6 +3043,9 @@ class crn_unpacker {
      case cCRNFmtDXN_YX:
        status = unpack_dxn((uint8**)pDst, row_pitch_in_bytes, blocks_x, blocks_y);
        break;
+      case cCRNFmtETC1:
+        status = unpack_etc1((uint8**)pDst, row_pitch_in_bytes, blocks_x, blocks_y);
+        break;
      default:
        return false;
    }
@@ -3141,7 +3145,7 @@ class crn_unpacker {
      return false;

    static_huffman_data_model dm[2];
-    for (uint32 i = 0; i < 2; i++)
+    for (uint32 i = 0; i < (m_pHeader->m_format == cCRNFmtETC1 ? 1 : 2); i++)
      if (!m_codec.decode_receive_static_data_model(dm[i]))
        return false;

@@ -3151,13 +3155,19 @@ class crn_unpacker {
    uint32* CRND_RESTRICT pDst = &m_color_endpoints[0];

    for (uint32 i = 0; i < num_color_endpoints; i++) {
-      a = (a + m_codec.decode(dm[0])) & 31;
-      b = (b + m_codec.decode(dm[1])) & 63;
-      c = (c + m_codec.decode(dm[0])) & 31;
-      d = (d + m_codec.decode(dm[0])) & 31;
-      e = (e + m_codec.decode(dm[1])) & 63;
-      f = (f + m_codec.decode(dm[0])) & 31;
-      *pDst++ = c | (b << 5U) | (a << 11U) | (f << 16U) | (e << 21U) | (d << 27U);
+      if (m_pHeader->m_format == cCRNFmtETC1) {
+        for (b = 0; b < 32; b += 8)
+          a += m_codec.decode(dm[0]) << b;
+        *pDst++ = a &= 0x1F1F1F1F;
+      } else {
+        a = (a + m_codec.decode(dm[0])) & 31;
+        b = (b + m_codec.decode(dm[1])) & 63;
+        c = (c + m_codec.decode(dm[0])) & 31;
+        d = (d + m_codec.decode(dm[0])) & 31;
+        e = (e + m_codec.decode(dm[1])) & 63;
+        f = (f + m_codec.decode(dm[0])) & 31;
+        *pDst++ = c | (b << 5U) | (a << 11U) | (f << 16U) | (e << 21U) | (d << 27U);
+      }
    }

    m_codec.stop_decoding();
@@ -3199,10 +3209,10 @@ class crn_unpacker {

    uint32* CRND_RESTRICT pDst = &m_color_selectors[0];

-    const uint8* pFrom_linear = g_dxt1_from_linear;
+    const uint8* pFrom_linear = m_pHeader->m_format == cCRNFmtETC1 ? g_etc1_from_linear : g_dxt1_from_linear;

-    for (uint32 i = 0; i < num_color_selectors; i++) {
-      for (uint32 j = 0; j < 8; j++) {
+    for (uint32 s = m_pHeader->m_format == cCRNFmtETC1 ? 4 : 8, i = 0; i < num_color_selectors; i++) {
+      for (uint32 j = 0; j < s; j++) {
        int32 sym = m_codec.decode(dm);
        cur[j * 2 + 0] = (delta0[sym] + cur[j * 2 + 0]) & 3;
        cur[j * 2 + 1] = (delta1[sym] + cur[j * 2 + 1]) & 3;
@@ -3560,6 +3570,69 @@ class crn_unpacker {
    return true;
  }

+  bool unpack_etc1(uint8** pDst, uint32 output_pitch_in_bytes, uint32 output_width, uint32 output_height) {
+    const uint32 num_color_endpoints = m_color_endpoints.size();
+    const uint32 width = output_width + 1 & ~1;
+    const uint32 height = output_height + 1 & ~1;
+    const int32 delta_pitch_in_dwords = (output_pitch_in_bytes >> 2) - (width << 1);
+
+    if (m_block_buffer.size() < width)
+      m_block_buffer.resize(width);
+
+    uint32 color_endpoint_index = 0;
+    uint8 reference_group = 0;
+
+    for (uint32 f = 0; f < m_pHeader->m_faces; f++) {
+      uint32* pData = (uint32*)pDst[f];
+      for (uint32 y = 0; y < height; y++, pData += delta_pitch_in_dwords) {
+        bool visible = y < output_height;
+        for (uint32 x = 0; x < width; x++, pData += 2) {
+          visible = visible && x < output_width;
+          block_buffer_element &buffer = m_block_buffer[x];
+          uint8 endpoint_reference, block_endpoint[4], e0[4], e1[4];
+          if (y & 1) {
+            endpoint_reference = buffer.endpoint_reference;
+          } else {
+            reference_group = m_codec.decode(m_reference_encoding_dm) >> 4;
+            endpoint_reference = reference_group & 3;
+            reference_group >>= 2;
+            buffer.endpoint_reference = reference_group & 3;
+            reference_group >>= 2;
+          }
+          color_endpoint_index += m_codec.decode(m_endpoint_delta_dm[0]);
+          if (color_endpoint_index >= num_color_endpoints)
+            color_endpoint_index -= num_color_endpoints;
+          *(uint32*)&e0 = m_color_endpoints[color_endpoint_index];
+          uint32 selector = m_color_selectors[m_codec.decode(m_selector_delta_dm[0])] & 0xFFFF;
+          if (endpoint_reference) {
+            color_endpoint_index += m_codec.decode(m_endpoint_delta_dm[0]);
+            if (color_endpoint_index >= num_color_endpoints)
+              color_endpoint_index -= num_color_endpoints;
+          }
+          *(uint32*)&e1 = m_color_endpoints[color_endpoint_index];
+          selector |= m_color_selectors[m_codec.decode(m_selector_delta_dm[0])] << 16;
+          if (visible) {
+            uint32 block_selector = 0, flip = endpoint_reference >> 1 ^ 1, diff = 1;
+            for (uint32 t = 8, i = 0; i < 4; i++, t -= 15) {
+              for (uint32 j = 0; j < 4; j++, t += 4) {
+                uint32 s = selector >> (flip ? i << 3 | j << 1 : j << 3 | i << 1);
+                block_selector |= (s >> 1 & 1 | (s & 1) << 16) << (t & 15);
+              }
+            }
+            for (uint c = 0; diff && c < 3; c++)
+              diff = e0[c] + 3 >= e1[c] && e1[c] + 4 >= e0[c] ? diff : 0;
+            for (uint c = 0; c < 3; c++)
+              block_endpoint[c] = diff ? e0[c] << 3 | e1[c] - e0[c] & 7 : e0[c] << 3 & 0xF0 | e1[c] >> 1;
+            block_endpoint[3] = e0[3] << 5 | e1[3] << 2 | diff << 1 | flip;
+            pData[0] = *(uint32*)&block_endpoint;
+            pData[1] = block_selector;
+          }
+        }
+      }
+    }
+    return true;
+  }
+
 };

 crnd_unpack_context crnd_unpack_begin(const void* pData, uint32 data_size) {