unity

T

Alexander Suvorov f284523b15 Add compression support for ETC1 textures

Explanation:

Crunch algorithms are normally used for compression of DXTn textures. However, Crunch algorithms are much more powerful, and with some minor adjustments, those algorithms can be directly used to compress other texture formats. For example, the current commit demonstrates how to use the existing Crunch algorithms to compress ETC1 textures.

Basics:

In general, Crunch is performing the following steps:
- tiling (determines block encodings)
- quantization of the tile endpoints (determines endpoint indices)
- optimization of the endpoints for each tile group (determines endpoint dictionary)
- quantization of the selectors (determines selector indices)
- selector refinement for each selector group (determines selector dictionary)
- compression of the previously determined block encodings, dictionaries and indices

Dictionary element:

When applying Crunch algorithms to a new texture format, it is necessary to first define the dictionary element. In context of Crunch, this means thats the whole image consists of smaller non-overlapping blocks, while the contents of each individual block is determined by an endpoint and a selector from the corresponding dictionaries. For example, in case of DXT format, each endpoint and selector codebook element corresponds to a 4x4 pixel block. In general, the size of the blocks, which form the encoded image, depends on the texture format and quality considerations.

It is proposed to define the dictionaries according to the following limitations:
- The dictionary elements should be compatible with the existing Crunch algorithms, while the image blocks defined by those dictionary elements should be compatible with the texture encoding format.
- It should be possible to cover a wide range of image quality and bitrates by just changing the size of the endpoint and selector dictionaries. If there is no limitation on the dictionary size, the encoding should preferably become lossless or near-lossless (not considering the quality loss implied by the texture format itself).

In case of ETC1, the texture format itself determines the minimal size of the image block, defined by endpoint and selector: it can be either 2x4 or 4x2 rectangle, aligned to the borders of the 4x4 grid. It is not possible to use higher granularity, because each of those rectangles can have only one base color, according to the ETC1 format. For the same reason, any image block, defined by an endpoint and a selector from the dictionary, should be combined from those aligned 2x4 or 4x2 rectangles.

Let's investigate the possibilities for the endpoint dictionary. According to the ETC1 format, each 4x4 ETC1 block is split in half, while each ETC1 subblock has it's own base color and a modifier table index. In fact, the base color and the modifier table index simply define the high and the low colors for the subblock (while there are some limitations on the position of those high and low colors, implied by the ETC1 encoding). If we define the endpoint dictionary element in such a way that it contains information about more than one ETC1 base color, then such a dictionary will become incompatible with the existing tile quantization algorithm, and the reason for this is the following. The Crunch tiling algorithm first performs quantization of all the tile pixel colors, down to just 2 colors. Then it quantizes all those color pairs, coming from different tiles. This approach works quite well for 4x4 DXT blocks, as those 2 colors approximately represent the principle component of the tile pixel colors. In case of ETC1 however, mixing together pixels, which correspond to different base colors, does not make much sense, as each group of those pixels has it's own low and high color values, independent from other groups. When those pixels are mixed together, the information about the original principle components of each subblock gets lost.

For the mentioned reason, each endpoint dictionary element should correspond to a single ETC1 base color. In such case, the tile quantization algorithm will work almost the same way as for DXT format. Each pair of colors, generated by the tile palletizer, will normally have the subblock base color value somewhere in the middle between those 2 colors, so quantizing those color pairs should also automatically quantize the corresponding base colors. Moreover, each color pair implicitly contains information about the modifier table index (which corresponds to the distance between the high and the low colors), and therefore the corresponding table index will also get automatically quantized.

Endpoint and selector dictionary elements, which define a single 2x4 or 4x2 ETC1 subblock, are fully compatible with the existing Crunch algorithms (because each ETC1 subblock is associated with a single base color and a single modifier table index). At the same time, those subblocks are minimal possible blocks, which can be defined by a dictionary element for ETC1 format (as has been shown earlier). Of course, it is also possible to use blocks larger than 2x4 or 4x2 (assuming that all the ETC1 subblocks, which form such a block, will have the same base color and the same modifier table index), however, with a larger block area it would be not possible to achieve near-lossless quality when the dictionary size is not limited.

As the result, it is proposed to define the dictionaries in the following way:
- Each element of the endpoint dictionary defines a single base color and a single modifier table index of a 2x4 or a 4x2 pixel block (which represents an ETC1 subblock).
- Each endpoint is encoded as 3555 (3 bits for the table index and 5 bits for each component of the base color).
- Each element of the selector dictionary defines selectors for a 2x4 or a 4x2 block.
- Each selector is encoded using 16 bits.

ETC1-specific adjustments:

In case of DXT, the size of the encoded block is 4x4, while the tiling is performed in a 8x8 area (4 blocks). In case of ETC1, the tiling can be performed either in a 4x4 area (2 blocks), or in a 8x8 area (8 blocks), while other possibilities are either not symmetrical or too complex. For simplicity it is proposed to use 4x4 area for tiling. There are therefore 3 possible encodings: the 4x4 block is not split (encoded with a single endpoint), the 4x4 block is split horizontally, the 4x4 block is split vertically.

For simplicity, endpoint references are currently determined only within the tiling area, while the encoding of the endpoint references has been adjusted in the following way:
- The first ETC1 subblock will always have the reference value of 0
- The second ETC1 subblock can have the reference value of 0 if it has the same endpoint as the first subblock (note that in such case the flip of the ETC1 block does not need to be defined), the value of 1 if the corresponding ETC1 block is split horizontally, and the value of 2 if the corresponding ETC1 block is split vertically

According to the ETC1 format, the base colors within an ETC1 block can be encoded either as 444 and 444, or differentially as 555 and 333. For simplicity, this aspect is currently not taken into account (all the endpoints are encoded as 3555 in the codebook). If it appears that the base colors in the resulting ETC1 block can not be encoded differentially, the decoder will convert both base colors from 555 to 444.

At first, it might look like the ETC1 block flipping can bring some complications for Crunch, as the subblock structure might not look like a grid. This can be easily resolved by mirroring all the vertical ETC1 blocks across the main diagonal of the block after the tiling step (so that all the ETC1 subblocks will become 4x2 and form a regular grid). The decoder can mirror the ETC1 selector back according to the decoded block flip.

The code adjustments for the ETC1 compression support are pretty straightforward and mostly trivial. Just note that when format-specific adjustments affect performance critical code, it makes sense to duplicate the body of the affected function and perform format-specific optimizations in each copy of the function individually. For performance reasons, the following 4 functions now got both ETC and DTX specific versions:
- determine_tiles_task_etc() is an ETC-optimized version of the determine_tiles_task(), where dxt_fast class has been replaced with the etc1_optimizer class.
- determine_color_endpoint_codebook_task_etc() is an ETC-optimized version of the determine_color_endpoint_codebook_task(), where dxt1_endpoint_optimizer class has been replaced with the etc1_optimizer class.
- pack_color_endpoints_etc() is an ETC-optimized version of the pack_color_endpoints(), where 565565 DXT color endpoint encoding has been replaced with 3555 ETC color endpoint encoding.
- unpack_etc1() is an ETC version of the unpack_dxt1() function.

The color_quality_power_mul and m_adaptive_tile_color_psnr_derating parameters for ETC1 format have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the equivalent DXT1 compression, when compressing the Kodak test set without mipmaps using default quality.

In order to use ETC1 compression, use the -ETC1 command line option (i.e. "crunch_x64.exe -ETC1 input.png"). By default, compressed ETC1 textures will be decompressed into KTX file format.

DXT Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.

[Compressing Kodak set without mipmaps using DXT1 encoding]
Original: 1582222 bytes / 28.876 sec
Modified: 1482780 bytes / 13.255 sec
Improvement: 6.28% (compression ratio) / 54.10% (compression time)

[Compressing Kodak set with mipmaps using DXT1 encoding]
Original: 2065243 bytes / 36.987 sec
Modified: 1931586 bytes / 18.068 sec
Improvement: 6.47% (compression ratio) / 51.15% (compression time)

ETC Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).

[Compressing Kodak set without mipmaps using ETC1 encoding]
Total size: 1887265 bytes
Total time: 14.954 sec
Average bitrate: 1.600 bpp
Average Luma PSNR: 34.049 dB

2017-07-05 18:19:23 +02:00

bin

Add compression support for ETC1 textures

2017-07-05 18:19:23 +02:00

crnlib

Add compression support for ETC1 textures

2017-07-05 18:19:23 +02:00

crunch

Add compression support for ETC1 textures

2017-07-05 18:19:23 +02:00

emscripten

Reformat the source files. The source files have been reformatted using: clang-format.exe -style="{BasedOnStyle: Google, AllowAllParametersOfDeclarationOnNextLine: false, AllowShortFunctionsOnASingleLine: Inline, AllowShortIfStatementsOnASingleLine: false, AllowShortLoopsOnASingleLine: false, ColumnLimit: 0, DerivePointerAlignment: false, SortIncludes: false}"

2017-04-26 11:41:07 +02:00

example1

2017-04-26 11:41:07 +02:00

example2

2017-04-26 11:41:07 +02:00

example3

2017-04-26 11:41:07 +02:00

inc

Add compression support for ETC1 textures

2017-07-05 18:19:23 +02:00

.gitignore

Update .gitignore

2017-04-26 15:54:16 +02:00

CHANGELOG.md

docs: Add changelog

2016-06-16 09:22:38 +02:00

crn_examples.2010.sln

Update solution to use Visual C++ 2010 compiler and libraries. When compiled with Visual Studio 2010, the code will produce the same results as the originally distributed Crunch binaries.

2017-04-26 10:59:07 +02:00

crn_linux.workspace

Adding new files

2012-04-26 07:59:22 +00:00

crn.2010.sln

Update solution to use Visual C++ 2010 compiler and libraries. When compiled with Visual Studio 2010, the code will produce the same results as the originally distributed Crunch binaries.

2017-04-26 10:59:07 +02:00

crn.workspace

2011-12-27 21:18:07 +00:00

license.txt

Fixing copyright

2016-06-16 20:08:40 -07:00

README.md

Update solution to use Visual C++ 2010 compiler and libraries. When compiled with Visual Studio 2010, the code will produce the same results as the originally distributed Crunch binaries.

2017-04-26 10:59:07 +02:00

README.md

For bugs or support contact Binomial info@binomial.info.

This software uses the ZLIB license, which is located in license.txt. http://opensource.org/licenses/Zlib

Portions of this software make use of public domain code originally written by Igor Pavlov (LZMA), RYG (crn_ryg_dxt*), and Sean Barrett (stb_image.c).

If you use this software in a product, an acknowledgment in the product documentation would be highly appreciated but is not required.

Note: crunch originally used to live on Google Code: https://code.google.com/p/crunch/

Overview

crnlib is a lossy texture compression library for developers that ship content using the DXT1/5/N or 3DC compressed color/normal map/cubemap mipmapped texture formats. It was written by the same author as the open source LZHAM compression library.

It can compress mipmapped 2D textures, normal maps, and cubemaps to approx. 1-1.25 bits/texel, and normal maps to 1.75-2 bits/texel. The actual bitrate depends on the complexity of the texture itself, the specified quality factor/target bitrate, and ultimately on the desired quality needed for a particular texture.

crnlib's differs significantly from other approaches because its compressed texture data format was carefully designed to be quickly transcodable directly to DXTn with no intermediate recompression step. The typical (single threaded) transcode to DXTn rate is generally between 100-250 megatexels/sec. The current library supports PC (Win32/x64) and Xbox 360. Fast random access to individual mipmap levels is supported.

crnlib can also generates standard .DDS files at specified quality setting, which results in files that are much more compressible by LZMA/Deflate/etc. compared to files generated by standard DXTn texture tools (see below). This feature allows easy integration into any engine or graphics library that already supports .DDS files.

The .CRN file format supports the following core DXTn texture formats: DXT1 (but not DXT1A), DXT5, DXT5A, and DXN/3DC

It also supports several popular swizzled variants (several are also supported by AMD's Compressonator): DXT5_XGBR, DXT5_xGxR, DXT5_AGBR, and DXT5_CCxY (experimental luma-chroma YCoCg).

Recommended Software

AMD's Compressonator tool is recommended to view the .DDS files created by the crunch tool and the included example projects.

Note: Some of the swizzled DXTn .DDS output formats (such as DXT5_xGBR) read/written by the crunch tool or examples deviate from the DX9 DDS standard, so DXSDK tools such as DXTEX.EXE won't load them at all or they won't be properly displayed.

Compression Algorithm Details

The compression process employed in creating both .CRN and clustered .DDS files utilizes a very high quality, scalable DXTn endpoint optimizer capable of processing any number of pixels (instead of the typical hard coded 16), optional adaptive switching between several macroblock sizes/configurations (currently any combination of 4x4, 8x4, 4x8, and 8x8 pixel blocks), endpoint clusterization using top-down cluster analysis, vector quantization (VQ) of the selector indices, and several custom algorithms for compressing the resulting endpoint/selector codebooks and macroblock indices. Multiple feedback passes are performed between the clusterization and VQ steps to optimize quality, and several steps use a brute force refinement approach to improve quality. The majority of compression steps are multithreaded.

The .CRN format currently utilizes canonical Huffman coding for speed (similar to Deflate but with much larger tables), but the next major version will also utilize adaptive binary arithmetic coding and higher order context modeling using already developed tech from the my LZHAM compression library.

Supported File Formats

crnlib supports two compressed texture file formats. The first format (clustered .DDS) is simple to integrate into an existing project (typically, no code changes are required), but it doesn't offer the highest quality/compression ratio that crnlib is capable of. Integrating the second, higher quality custom format (.CRN) requires a few typically straightforward engine modifications to integrate the .CRN->DXTn transcoder header file library into your tools/engine.

.DDS

crnlib can compress textures to standard DX9-style .DDS files using clustered DXTn compression, which is a subset of the approach used to create .CRN files.(For completeness, crnlib also supports vanilla, block by block DXTn compression too, but that's not very interesting.) Clustered DXTn compressed .DDS files are much more compressible than files created by other libraries/tools. Apart from increased compressibility, the .DDS files generated by this process are completely standard so they should be fairly easy to add to a project with little to no code changes.

To actually benefit from clustered DXTn .DDS files, your engine needs to further losslessly compress the .DDS data generated by crnlib using a lossless codec such as zlib, lzo, LZMA, LZHAM, etc. Most likely, your engine does this already. (If not, you definitely should because DXTn compressed textures generally contain a large amount of highly redundant data.)

Clustered .DDS files are intended to be the simplest/fastest way to integrate crnlib's tech into a project.

.CRN

The second, better, option is to compress your textures to .CRN files using crnlib. To read the resulting .CRN data, you must add the .CRN transcoder library (located in the included single file, stand-alone header file library inc/crn_decomp.h) into your application. .CRN files provide noticeably higher quality at the same effective bitrate compared to clustered DXTn compressed .DDS files. Also, .CRN files don't require further lossless compression because they're already highly compressed.

.CRN files are a bit more difficult/risky to integrate into a project, but the resulting compression ratio and quality is superior vs. clustered .DDS files.

.KTX

crnlib and crunch can read/write the .KTX file format in various pixel formats. Rate distortion optimization (clustered DXTc compression) is not yet supported when writing .KTX files.

The .KTX file format is just like .DDS, except it's a fairly well specified standard created by the Khronos Group. Unfortunately, almost all of the tools I've found that support .KTX are fairly (to very) buggy, or are limited to only a handful of pixel formats, so there's no guarantee that the .KTX files written by crnlib can be reliably read by other tools.

Building the Examples

This release contains the source code and projects for three simple example projects:

crn_examples.2010.sln is a Visual Studio 2010 (VC10) solution file containing projects for Win32 and x64. crnlib itself also builds with VS2005, VS2008, and gcc 4.5.0 (TDM GCC+MinGW). A codeblocks 10.05 workspace and project file is also included, but compiling crnlib this way hasn't been tested much.

example1

Demonstrates how to use crnlib's high-level C-helper compression/decompression/transcoding functions in inc/crnlib.h. It's a fairly complete example of crnlib's functionality.

example2

Shows how to transcodec .CRN files to .DDS using only the functionality in inc/crn_decomp.h. It does not link against against crnlib.lib or depend on it in any way. (Note: The complete source code, approx. 4800 lines, to the CRN transcoder is included in inc/crn_decomp.h.)

example2 is intended to show how simple it is to integrate CRN textures into your application.

example3

Shows how to use the regular, low-level DXTn block compressor functions in inc/crnlib.h. This functionality is included for completeness. (Your engine or toolchain most likely already has its own DXTn compressor. crnlib's compressor is typically very competitive or superior to most available closed and open source CPU-based compressors.)

Creating Compressed Textures from the Command Line (crunch.exe)

The simplest way to create compressed textures using crnlib is to integrate the bin\crunch.exe or bin\crunch_x64.exe) command line tool into your texture build toolchain or export process. It can write DXTn compressed 2D/cubemap textures to regular DXTn compressed .DDS, clustered (or reduced entropy) DXTn compressed .DDS, or .CRN files. It can also transcode or decompress files to several standard image formats, such as TGA or BMP. Run crunch.exe with no options for help.

The .CRN files created by crunch.exe can be efficiently transcoded to DXTn using the included CRN transcoding library, located in full source form under inc/crn_decomp.h.

Here are a few example crunch.exe command lines:

Compress blah.tga to blah.dds using normal DXT1 compression:

crunch -file blah.tga -fileformat dds -dxt1

Compress blah.tga to blah.dds using clustered DXT1 at an effective bitrate of 1.5 bits/texel, display image statistic:

crunch -file blah.tga -fileformat dds -dxt1 -bitrate 1.5 -imagestats

Compress blah.tga to blah.dds using clustered DXT1 at quality level 100 (from [0,255]), with no mipmaps, display LZMA statistics:

crunch -file blah.tga -fileformat dds -dxt1 -quality 100 -mipmode none -lzmastats

Compress blah.tga to blah.crn using clustered DXT1 at a bitrate of 1.2 bits/texel, no mipmaps:

crunch -file blah.tga -dxt1 -bitrate 1.2 -mipmode none

Decompress blah.dds to a .tga file:

crunch -file blah.dds -fileformat tga

Transcode blah.crn to a .dds file:

crunch -file blah.crn

Decompress blah.crn, writing each mipmap level to a separate .tga file:

crunch -split -file blah.crn -fileformat tga

crunch.exe can do a lot more, like rescale/crop images before compression, convert images from one file format to another, compare images, process multiple images, etc.

Note: I would have included the full source to crunch.exe, but it still has some low-level dependencies to crnlib internals which I didn't have time to address. This version of crunch.exe has some reduced functionality compared to an earlier eval release. For example, XML file support is not included in this version.

Using crnlib

The most flexible and powerful way of using crnlib is to integrate the library into your editor/toolchain/etc. and directly supply it your raw/source texture bits. See the C-style API's and comments in inc/crnlib.h.

To compress, you basically fill in a few structs in and call one function:

void *crn_compress( const crn_comp_params &comp_params,
                    crn_uint32 &compressed_size,
                    crn_uint32 *pActual_quality_level = NULL,
                    float *pActual_bitrate = NULL);

Or, if you want crnlib to also generate mipmaps, you call this function:

void *crn_compress( const crn_comp_params &comp_params,
                    const crn_mipmap_params &mip_params,
                    crn_uint32 &compressed_size,
                    crn_uint32 *pActual_quality_level = NULL,
                    float *pActual_bitrate = NULL);

You can also transcode/uncompress .DDS/.CRN files to raw 32bpp images using crn_decompress_crn_to_dds() and crn_decompress_dds_to_images().

Internally, crnlib just uses inc/crn_decomp.h to transcode textures to DXTn. If you only need to transcode .CRN format files to raw DXTn bits at runtime (and not compress), you don't actually need to compile or link against crnlib at all. Just include inc/crn_decomp.h, which contains a completely self-contained CRN transcoder in the "crnd" namespace. The crnd_get_texture_info(), crnd_unpack_begin(), crnd_unpack_level(), etc. functions are all you need to efficiently get at the raw DXTn bits, which can be directly supplied to whatever API or GPU you're using. (See example2.)

Important note: When compiling under native client, be sure to define the PLATFORM_NACL macro before including the inc/crn_decomp.h header file library.

Known Issues/Bugs

crnlib currently assumes you'll be further losslessly compressing its output .DDS files using LZMA. However, some engines use weaker codecs such as LZO, zlib, or custom codecs, so crnlib's bitrate measurements will be inaccurate. It should be easy to allow the caller to plug-in custom lossless compressors for bitrate measurement.
Compressing to a desired bitrate can be time consuming, especially when processing large (2k or 4k) images to the .CRN format. There are several high-level optimizations employed when compressing to clustered DXTn .DDS files using multiple trials, but not so for .CRN.
The .CRN compressor does not currently use 3 color (transparent) DXT1 blocks at all, only 4 color blocks. So it doesn't support DXT1A transparency, and its output quality suffers a little due to this limitation. (Note that the clustered DXTn compressor used when writing clustered .DDS files does not have this limitation.)
Clustered DXT5/DXT5A compressor is able to group DXT5A blocks into clusters only if they use absolute (black/white) selector indices. This hurts performance at very low bitrates, because too many bits are effectively given to alpha.
DXT3 is not supported when writing .CRN or clustered DXTn DDS files. (DXT3 is supported by crnlib's when compressing to regular DXTn DDS files.) You'll get DXT5 files if you request DXT3. However, DXT3 is supported by the regular DXTn block compressor. (DXT3's 4bpp fixed alpha sucks verses DXT5 alpha blocks, so I don't see this as a bug deal.)
The DXT5_CCXY format uses a simple YCoCg encoding that is workable but hasn't been tuned for max. quality yet.
Clustered (or rate distortion optimized) DXTc compression is only supported when writing to .DDS, not .KTX. Also, only plain block by block compression is supported when writing to ETC1, and .CRN does not support ETC1.

Compile to Javascript with Emscripten

Download and install Emscripten: http://kripken.github.io/emscripten-site/docs/getting_started/downloads.html

From the root directory, run:

    emcc -O3 emscripten/crn.cpp -I./inc -s EXPORTED_FUNCTIONS="['_malloc', '_free', '_crn_get_width', '_crn_get_height', '_crn_get_levels', '_crn_get_dxt_format', '_crn_get_bytes_per_block', '_crn_get_uncompressed_size', '_crn_decompress']" -s NO_EXIT_RUNTIME=1 -s NO_FILESYSTEM=1 -s ELIMINATE_DUPLICATE_FUNCTIONS=1 -s ALLOW_MEMORY_GROWTH=1 --memory-init-file 0 -o crunch.js