Implement ETC1S/ETC2AS image compression

Explanation: ETC1S encoding is a subset of ETC1, which is using only one color endpoint per 4x4 block (modifier indices are identical for both subblocks, base color is encoded differentially as RGB555 with the differential RGB333 part always set to zero, flip bit is always set to zero). Usage: crunch_x64.exe -ETC1S input.png -out output.ktx ETC2AS encoding is a subset of ETC2A encoding which is using ETC1S encoding for color and ETC2A encoding for alpha. Usage: crunch_x64.exe -ETC2AS input.png -out output.ktx
Make Crunch compression work correctly on CPU supporting 16 or more threads
2018-10-23 23:16:00 +02:00 · 2018-10-23 20:23:56 +02:00 · 2018-06-07 19:20:30 +02:00 · 2017-10-27 16:50:44 +02:00 · 2017-10-25 19:15:36 +02:00 · 2017-10-25 16:17:06 +02:00
235 changed files with 91498 additions and 1384 deletions
@@ -0,0 +1,17 @@
+*.o
+*.2010.vcxproj.user
+*.2010.suo
+/crnlib/crunch
+/crnlib/Win32
+/crnlib/x64
+/crunch/Win32
+/crunch/x64
+/example1/Win32
+/example1/x64
+/example2/Win32
+/example2/x64
+/example3/Win32
+/example3/x64
+/lib
+/bin/*
+!bin/crunch_x64.exe
@@ -1,489 +0,0 @@
-## Contents ##
-
-  * [Introduction](API_Docs#Introduction.md)
-  * [Public API Overview](API_Docs#Public_API_Overview.md)
-  * [Public Enums](API_Docs#Public_Enums.md)
-  * [Public Structs](API_Docs#Public_Structs.md)
-  * [Public Functions](API_Docs#Public_Functions.md)
-    * [Memory Allocation](API_Docs#Memory_Allocation.md)
-    * [Compression](API_Docs#Compression.md)
-    * [Transcoding](API_Docs#Transcoding.md)
-    * [Decompression](API_Docs#Decompression.md)
-    * [DXTn Block Compression](API_Docs#DXTn_Block_Compression.md)
-    * [Helper Functions](API_Docs#Helper_Functions.md)
-
-
---
-
-
-## Introduction ##
-
-crnlib is a C++ library designed to be statically linked into the calling application. It can compress to .CRN, regular .DDS, or clustered .DDS files. It can also transcode .CRN to .DDS, and unpack .DDS files to individual 24/32-bit images. For completeness, crnlib's high-quality DXTn block compressor is also accessible.
-
-The library does not use C++ exceptions, but it does use some C++ features such as templates, virtual functions, and inheritance. It also makes heavy use of heap allocation. Due to porting, exception, and inconsistent performance issues (especially in debug builds) crnlib  mostly uses custom containers instead of STL.
-
-The VC9 (Visual Studio 2008) .LIB files are built here:
-
-```
-  lib\VC9\release\win32\crnlib_vc9.lib
-  lib\VC9\release\win64\crnlib_x64_vc9.lib
-  lib\VC9\release_dll\win32\crnlib_DLL_vc9.lib
-  lib\VC9\release_dll\win64\crnlib_DLL_x64_vc9.lib
-```
-
-crnlib should also build with VC10 (Visual Studio 2010), and Codeblocks 10.05 using TDM-GCC, but the majority of my testing has been with VC9.
-
-Currently crnlib is Win32 only, but it already compiles with GCC so a Linux/BSD/Mac port shouldn't be too difficult. (The threading related code is the biggest blocker to porting.) crnlib itself has only been tested on PC's, but [crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h) (the stand-alone transcoder header file library) should work fine on consoles.
-
-A Xbox 360 specific version of crn\_decomp.h is available that can transcode .CRN textures into X360 tiled textures located in cached or write combined memory at only a ~10% slowdown. Please email me if you're interested (it's bitrotted a bit since the public release).
-
-
---
-
-
-## Public API Overview ##
-
-There are two header files of interest, both under the [inc](http://code.google.com/p/crunch/source/browse/trunk/#trunk%2Finc) directory. crnlib exposes a simple high level, C-style function based API, which is defined in the single public header file [inc/crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crnlib.h).
-
-The second public header file, [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h), contains all the functionality needed to transcode .CRN files to raw DXTn bits. It does not depend on crnlib in any way, although crnlib internally uses `crn_decomp.h` itself to transcode, examine, and validate .CRN files.
-
-Each crnlib API falls into one of the following categories:
-
-  * **Memory management**:
-    * `crn_set_memory_callbacks()`
-    * `crn_free_block()`
-
-  * **Image or texture compression** from memory to .CRN or .DDS file in memory:
-    * crn\_compress()
-
-  * **Texture decompression** from a .CRN or .DDS file memory to memory:
-    * `crn_decompress_crn_to_dds()`
-    * `crn_decompress_dds_to_images()`
-    * `crn_free_all_images()`
-
-  * **Plain DXTn block compression** of 4x4 pixel blocks to DXTn compressed blocks:
-    * `crn_create_block_compressor()`
-    * `crn_compress_block()`
-    * `crn_free_block_compressor()`
-
-  * **Misc. helpers**:
-    * **crn\_format info**:
-      * `crn_get_format_fourcc()`
-      * `crn_get_format_bits_per_texel()`
-      * `crn_get_bytes_per_dxt_block()`
-      * `crn_get_fundamental_dxt_format()`
-    * **crn\_format to/from ANSI and UTF16 string**:
-      * `crn_get_file_type_exta()`
-      * `crn_get_file_type_ext()`
-      * `crn_get_format_stringa()`
-      * `crn_get_format_string()`
-      * `crn_get_dxt_quality_stringa()`
-      * `crn_get_dxt_quality_string()`
-
-Several custom types and parameter structs are also defined in [inc/crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crnlib.h). The most important structs are:
-  * [crn\_comp\_params](API_Docs#enum_crn_comp_params.md), which contains all the parameters passed to the compression function `crn_compress()`
-  * `struct crn_mipmap_params`, which contains a bunch of parameters that control crnlib's optional mipmap generator. (This struct is not yet documented here.)
-
-
---
-
-
-## Public Enums ##
-
-### enum crn\_file\_type ###
-```
-enum crn_file_type
-{
-   cCRNFileTypeCRN = 0,
-   cCRNFileTypeDDS,
-};
-```
-
-`crn_file_type` contains the supported file types. crnlib only supports DX9-style .DDS files.
-
-`cCRNFileTypeCRN`: .CRN file format
-
-`cCRNFileTypeDDS`: .DDS file format
-
-
-### enum crn\_format ###
-```
-enum crn_format
-{
-   cCRNFmtInvalid = -1,
-
-   cCRNFmtDXT1 = 0,
-   
-   cCRNFmtFirstValid = cCRNFmtDXT1,
-
-   // cCRNFmtDXT3 is not currently supported when writing to CRN - only DDS.
-   cCRNFmtDXT3,
-
-   cCRNFmtDXT5,
-   
-   // Various DXT5 derivatives
-   cCRNFmtDXT5_CCxY,    // Luma-chroma
-   cCRNFmtDXT5_xGxR,    // Swizzled 2-component
-   cCRNFmtDXT5_xGBR,    // Swizzled 3-component
-   cCRNFmtDXT5_AGBR,    // Swizzled 4-component
-
-   // ATI 3DC and X360 DXN
-   cCRNFmtDXN_XY,       
-   cCRNFmtDXN_YX,
-
-   // DXT5 alpha blocks only
-   cCRNFmtDXT5A,
-
-   cCRNFmtTotal,
-};
-```
-
-The `crn_format` enum contains the supported compressed pixel formats. It lists all the standard DX9 compressed pixel formats (BC1-BC5), with some swizzled DXT5 formats (most of them supported by ATI's Compressonator).
-
-### enum crn\_limits ###
-```
-enum crn_limits
-{
-   cCRNMaxLevelResolution     = 4096,
-
-   cCRNMinPaletteSize         = 8,
-   cCRNMaxPaletteSize         = 8192,
-
-   cCRNMaxFaces               = 6,
-   cCRNMaxLevels              = 16,
-
-   cCRNMaxHelperThreads       = 16,
-
-   cCRNMinQualityLevel        = 0,
-   cCRNMaxQualityLevel        = 255
-};
-```
-
-The `crn_limits` enum lists various library limits. Notably, the max supported texture resolution is currently 4096x4096 (this can be easily increased in the x64 version).
-
-### enum crn\_comp\_flags ###
-```
-enum crn_comp_flags
-{
-   cCRNCompFlagPerceptual = 1,    
-   cCRNCompFlagHierarchical = 2,    
-   cCRNCompFlagQuick = 4,
-   cCRNCompFlagUseBothBlockTypes = 8,    
-   cCRNCompFlagUseTransparentIndicesForBlack = 16,
-   cCRNCompFlagDisableEndpointCaching = 32, 
-   cCRNCompFlagManualPaletteSizes = 64,
-   cCRNCompFlagDXT1AForTransparency = 128,
-   cCRNCompFlagGrayscaleSampling = 256,
-   cCRNCompFlagDebugging = 0x80000000,
-};
-```
-
-The `crn_comp_flags` enum contains a number of compression related flags:
-
-`cCRNCompFlagPerceptual`: Default: Enabled. If enabled, perceptual colorspace distance metrics are enabled. **Important**: Be sure to **disable** this flag when compressing non-sRGB colorspace images, like normal maps!
-
-`cCRNCompFlagHierarchical`: Default: Enabled. If enabled, 4x4, 4x8, 8x4, and 8x8 tiles may be used in each macroblock. If disabled, all macroblocks are forced to use four 4x4 pixel tiles. Compression ratio will be lower when disabled, and transcoding will be a bit slower, but this will reduce macroblock tiling artifacts.
-
-`cCRNCompFlagQuick`: Default: Disabled. If enabled, this flag disables several output file optimizations. Intended for things like quicker previews.
-
-`cCRNCompFlagUseBothBlockTypes`: Default: Enabled. This flag controls which block types are used when compressing to .DDS. (This flag is not relevant when compressing to .CRN, which only uses a subset of the possible DXTn block types.)
-
-> DXT1: OK to use DXT1A (3 color) alpha blocks if doing so results in lower RGB error, or for transparent pixels.
-
-> DXT5: OK to use both DXT5 block types.
-
-`cCRNCompFlagUseTransparentIndicesForBlack`: Default: Disabled. If enabled, it's OK to use DXT1A transparent indices to encode full black colors (assumes pixel shader ignores fetched alpha). (Not relevant when compressing to .CRN files, because it never uses alpha blocks.)
-
-`cCRNCompFlagDisableEndpointCaching`: Default: Disabled. When set, this flag disables endpoint caching, for deterministic output. Only relevant when compressing to .DDS.
-
-`cCRNCompFlagManualPaletteSizes`: Default: Disabled. If enabled, use the cCRNColorEndpointPaletteSize, etc. params to control the CRN palette sizes. Only relevant when compressing to .CRN.
-
-`cCRNCompFlagDXT1AForTransparency`: Default: Disabled. If enabled, DXT1A alpha blocks are used to encode single bit transparency. Only relevant when compressing to .DDS, .CRN does not support DXT1A alpha blocks.
-
-`cCRNCompFlagGrayscaleSampling`: Default: Disabled. If enabled, the DXT1 compressor's color distance metric assumes the pixel shader will be converting the fetched RGB results to luma (Y part of YCbCr).
-
-This increases quality when compressing grayscale images, because the compressor can spread the luma error amoung all three channels (i.e. it can generate blocks with some chroma present if doing so will ultimately lead to lower luma error). Of course, only enable on grayscale source images.
-
-`cCRNCompFlagDebugging`: Default: Disabled. If enabled, the frontend and backend gather and dump various statistics during the compression process. Only used for development/debugging purposes.
-
-### enum crn\_dxt\_quality ###
-```
-enum crn_dxt_quality
-{
-   cCRNDXTQualitySuperFast,
-   cCRNDXTQualityFast,
-   cCRNDXTQualityNormal,
-   cCRNDXTQualityBetter,
-   cCRNDXTQualityUber,
-};
-```
-
-The `crn_dxt_quality` enum lists the various quality modes supported by the endpoint optimizers. This enum is only relevant when compressing to .DDS. cCRNDXTQualityUber is slower, but it has the best PSNR.
-
-
-### enum crn\_dxt\_compressor\_type ###
-```
-enum crn_dxt_compressor_type
-{
-   cCRNDXTCompressorCRN,
-   cCRNDXTCompressorCRNF,
-   cCRNDXTCompressorRYG
-};
-```
-
-This enum lists the DXTn block compressors supported by the library. This enum is only relevant when compressing to non-clustered .DDS files.
-
-`cCRNDXTCompressorCRN`: crnlib's default endpoint optimizer.
-
-`cCRNDXTCompressorCRNF`: A faster version of the default optimizer.
-
-`cCRNDXTCompressorRYG`: RYG's public domain endpoint optimizer.
-
-
---
-
-
-## Public Structs ##
-
-### enum crn\_comp\_params ###
-```
-typedef crn_bool (*crn_progress_callback_func)(crn_uint32 phase_index, crn_uint32 total_phases, 
-  crn_uint32 subphase_index, crn_uint32 total_subphases, void* pUser_data_ptr);
-
-struct crn_comp_params
-{
-   inline crn_comp_params();
-
-   inline void clear();
-
-   inline bool check() const;
-
-   inline bool get_flag(crn_comp_flags flag) const;
-   inline void set_flag(crn_comp_flags flag, bool val);
-   
-   crn_uint32                 m_size_of_obj;
-      
-   crn_file_type              m_file_type;               
-
-   crn_uint32                 m_faces;                   
-   crn_uint32                 m_width;                   
-   crn_uint32                 m_height;                  
-   crn_uint32                 m_levels;                  
-   
-   crn_format                 m_format;                  
-
-   crn_uint32                 m_flags;                   
-
-   const crn_uint32*          m_pImages[cCRNMaxFaces][cCRNMaxLevels];
-
-   float                      m_target_bitrate;
-   
-   crn_uint32                 m_quality_level;           
-   
-   crn_uint32                 m_dxt1a_alpha_threshold;
-   crn_dxt_quality            m_dxt_quality;
-   crn_dxt_compressor_type    m_dxt_compressor_type;
-   
-   crn_uint32                 m_alpha_component;
-
-   float                      m_crn_adaptive_tile_color_psnr_derating;
-   float                      m_crn_adaptive_tile_alpha_psnr_derating;
-
-   crn_uint32                 m_crn_color_endpoint_palette_size;  
-   crn_uint32                 m_crn_color_selector_palette_size;  
-
-   crn_uint32                 m_crn_alpha_endpoint_palette_size;  
-   crn_uint32                 m_crn_alpha_selector_palette_size;  
-
-   crn_uint32                 m_num_helper_threads;
-
-   crn_uint32                 m_userdata0;
-   crn_uint32                 m_userdata1;
-
-   crn_progress_callback_func m_pProgress_func;
-   void*                      m_pProgress_func_data;
-};
-```
-
-The `crn_comp_params` struct contains all parameters passed to the compressor. The caller must fill in this struct before calling `crn_compress()`. Note that some parameters/flags are relevant only when compressing to .CRN, clustered .DDS, or regular .DDS (I've tried to document all dependencies).
-
-This struct contains several simple inline methods defined in this header. The constructor calls `clear()`, and the `clear()` method sets all parameters to their defaults. The `check()` method returns true if all parameters are within reasonable/supported ranges. The `get_flag()` and `set_flag()` helpers directly manipulate the `m_flags` member.
-
-**crn\_file\_type m\_file\_type**: Default: cCRNFileTypeCRN. Output file type. May be `cCRNFileTypeCRN` or `cCRNFileTypeDDS`.
-
-**crn\_uint32 m\_faces**: Default: 1. Set to 1 to compress 2D textures, or 6 to compress cubemaps.
-
-**crn\_uint32 m\_width** and **crn\_uint32 m\_height**: Default: (0,0). The source texture's topmost (largest) mipmap dimensions in pixels. Must be in the range [1, cCRNMaxLevelResolution], non-power of 2 is OK, non-square OK. Textures that don't have dimensions divisible by 4 will be padded to the next multiple of 4.
-
-**crn\_uint32 m\_levels**: Default: 1. The source texture's total mipmap chain size, where 1 is not mipmapped. Must be in the range [1, cCRNMaxLevels].
-
-**crn\_format m\_format**: Default: cCRNFmtDXT1. Sets the output file's compressed pixel format.
-
-**crn\_uint32 m\_flags**: Defualt: `cCRNCompFlagPerceptual` | `cCRNCompFlagHierarchical` | `cCRNCompFlagUseBothBlockTypes`. Compressor flags logically OR'd together, see the [crn\_comp\_flags enum](API_Docs#enum_crn_comp_flags.md).
-
-
-**const crn\_uint32`*` m\_pImages`[`cCRNMaxFaces`]``[`cCRNMaxLevels`]`**: Default: All NULL. 2D array of pointers to 32bpp RGBA input images. The red component is always first in memory, independent of platform endianness.
-
-**float m\_target\_bitrate**: Default: 0. Target bitrate. If non-zero, the compressor will use an interpolative search to find the highest quality level that results in a file length that is <= the target bitrate. If it fails to find a bitrate high enough, the compressor will disable adaptive block sizes (by disabled the cCRNCompFlagHierarchical flag) and try again. This process can be pretty slow.
-
-**crn\_uint32 m\_quality\_level**: Default: cCRNMaxQualityLevel (255). Sets the desired quality level (higher=better). Must range between [cCRNMinQualityLevel, cCRNMaxQualityLevel]. Note that .CRN and .DDS quality levels are not compatible with each other from an image quality standpoint.
-
-m\_quality\_level directly controls the endpoint/selector palette sizes used by the .CRN/clustered .DDS frontends.
-
-**crn\_uint32 m\_dxt1a\_alpha\_threshold**, **crn\_dxt\_quality m\_dxt\_quality**, **crn\_dxt\_compressor\_type m\_dxt\_compressor\_type**: These parameters are only relevant when compressing to .DDS files.
-
-**crn\_uint32 m\_alpha\_component**: Default: 3. Specifies which source image component contains the alpha channel.
-
-**crn\_uint32 m\_num\_helper\_threads**: Number of helper threads to create to assist the compressor. 0=no threading. Must be in the range [0,cCRNMaxHelperThreads].
-
-**crn\_uint32 m\_userdata0**, **crn\_uint32 m\_userdata1**: Default: 0. These two 32-bit values are written directly to the header of the output .CRN file. They can be retrieved from a .CRN file by using the `crnd::crnd_get_texture_info()` helper function in `inc/crn_decomp.h`.
-
-**crn\_progress\_callback\_func m\_pProgress\_func**, **void`*`                      m\_pProgress\_func\_data**: Pointer to a user-provided progress function and user data. This function is called periodically during compression, and can be used to terminate compression before it completes.
-
-Various low-level .CRN specific parameters:
-
-**float m\_crn\_adaptive\_tile\_color\_psnr\_derating** and **float m\_crn\_adaptive\_tile\_alpha\_psnr\_derating**: Default: 2.0f PSNR. Controls how aggressively the frontend uses large (non-4x4) tiles. Higher settings result in fewer tiles, resulting in lower quality/more blockiness, but smaller files. If this value is set too high the output may become too blocky.
-
-**crn\_uint32 m\_crn\_color\_endpoint\_palette\_size**, **crn\_uint32 m\_crn\_color\_selector\_palette\_size**, **crn\_uint32 m\_crn\_alpha\_endpoint\_palette\_size**, **crn\_uint32 m\_crn\_alpha\_selector\_palette\_size**: Default: 0. These parameters allow the caller to directly control the palette sizes used by the frontend. The `cCRNCompFlagManualPaletteSizes` flag must be set.
-
-
---
-
-
-## Public Functions ##
-
-### Memory Allocation ###
-```
-#define CRNLIB_MIN_ALLOC_ALIGNMENT sizeof(size_t) * 2
-
-typedef void*  (*crn_realloc_func)(void* p, size_t size, size_t* pActual_size, bool movable, void* pUser_data);
-typedef size_t (*crn_msize_func)(void* p, void* pUser_data);
-
-void crn_set_memory_callbacks(crn_realloc_func pRealloc, crn_msize_func pMSize, void* pUser_data);
-
-void crn_free_block(void *pBlock);
-```
-
-By default, crnlib calls the usual C-API's to manage memory (`malloc`, `realloc`, `free`, etc.). Call `crn_set_memory_callbacks` to globally override this behavior. The user must implement two callbacks, one to handle block allocation/reallocation/freeing, and another that returns the size of allocated blocks.
-
-This function is not thread safe, so don't call it while another thread is inside the library.
-
-The custom realloc and msize functions must be implemented in a thread safe manner. These functions can be called from multiple threads when threaded compression is enabled.
-
-All block pointers returned by the realloc callback must be aligned to at least `CRNLIB_MIN_ALLOC_ALIGNMENT` bytes.
-
-### realloc callback ###
-
-The custom reallocation function callback `crn_realloc_func` must examine its input parameters to determine the caller's actual intent. If the input pointer `p` is NULL, the caller wants to allocate a block which must be at least as large as `size`. NULL is returned if the allocation fails.
-
-If `p` is not NULL but `size` is 0, the caller wants to free the block pointed to by `p`.
-
-Otherwise, the caller wants to attempt to change the size of the block pointed to by `p`. In this case, if `movable` is true, it is acceptable to physically move the block to satisfy the reallocation request. If `movable` is false, the block **must not** be moved. NULL is returned if reallocation fails for any reason. In this case, the original allocated block must remain allocated.
-
-If `pActual_size` is not NULL, `*pActual_size` should be set to the actual size of the returned block.
-
-### crn\_free\_block function ###
-
-Call this function to free the memory blocks allocated and returned by `crn_compress()`, `crn_decompress_crn_to_dds()`, or `crn_decompress_dds_to_images()`.
-
-## Compression ##
-
-### crn\_compress functions (overloaded) ###
-```
-void *crn_compress(const crn_comp_params &comp_params, 
-  crn_uint32 &compressed_size, crn_uint32 *pActual_quality_level = NULL, float *pActual_bitrate = NULL);
-
-void *crn_compress(const crn_comp_params &comp_params, const crn_mipmap_params &mip_params, 
-  crn_uint32 &compressed_size, crn_uint32 *pActual_quality_level = NULL, float *pActual_bitrate = NULL);
-```
-
-These functions compress a 32-bit/pixel texture to either: a regular DX9-style .DDS file, a "clustered" (or reduced entropy) .DDS file, or a .CRN file in memory.
-
-This function is overloaded. The first variant cannot automatically generate mipmap levels, and the second one can.
-
-Input parameters:
-
-  * **comp\_params** is the [compression parameters struct](API_Docs#enum_crn_comp_params.md).
-
-  * **compressed\_size** will be set to the size of the returned memory block containing the output file. The returned block must be freed by calling `crn_free_block()`.
-
-  * **`*`pActual\_quality\_level** will be set to the actual quality level used to compress the image. May be NULL.
-
-  * **`*`pActual\_bitrate** will be set to the output file's effective bitrate, possibly taking into account LZMA compression. May be NULL.
-
-Return values:
-> A pointer to the compressed file data, or NULL on failure. The returned block must be freed by calling `crn_free_block()`. The **compressed\_size** parameter will be set to the size of the returned memory buffer.
-
-Notes:
-  * A "regular" .DDS file is compressed using normal (plain block by block) DXTn compression at the specified DXT quality level, using multiple threads if threading is enabled.
-  * A "clustered" DDS file is compressed using clustered DXTn compression to either the target bitrate or the specified integer quality factor.
-  * The output file is a standard DX9 format DDS file, except the compressor assumes you will be later losslessly compressing the DDS output file using the LZMA algorithm.
-  * A texture is defined as an array of 1 or 6 "faces" (6 faces=cubemap), where each "face" consists of between [1,cCRNMaxLevels] mipmap levels.
-  * Mipmap levels are simple 32-bit 2D images with a pitch of width\*sizeof(uint32), arranged in the usual raster order (top scanline first). Each pixel is arranged in memory as [R,G,B,A], where R is always first independent of platform endianness.
-  * The image pixels may be grayscale (YYYX), grayscale/alpha (YYYA), 24-bit RGBX, or 32-bit RGBA colors (where "X"=don't care).
-  * If the input is not sRGB, be sure to clear the `cCRNCompFlagPerceptual` flag in the [crn\_comp\_params](API_Docs#enum_crn_comp_params.md) struct.
-
-For a usage example, see [example1.cpp](http://code.google.com/p/crunch/source/browse/trunk/example1/example1.cpp).
-
-## Transcoding ##
-
-### crn\_decompress\_crn\_to\_dds function ###
-```
-void *crn_decompress_crn_to_dds(const void *pCRN_file_data, crn_uint32 &file_size);
-```
-
-`crn_decompress_crn_to_dds()` transcodes an entire .CRN file to .DDS using the [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h) header file library to do most of the heavy lifting. The output .DDS file's format is guaranteed to be one of the DXTn formats in the `crn_format` enum. This is a very fast operation, because the .CRN format is explicitly designed to be efficiently transcodable to DXTn.
-
-For more control over decompression (particularly over memory management, and to implement palette caching), see the lower-level helper functions in [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h), which do not depend at all on crnlib.
-
-For a usage example, see [example1.cpp](http://code.google.com/p/crunch/source/browse/trunk/example1/example1.cpp).
-
-## Decompressing ##
-
-### crn\_decompress\_dds\_to\_images function ###
-```
-struct crn_texture_desc
-{
-   crn_uint32 m_faces;
-   crn_uint32 m_width;
-   crn_uint32 m_height;
-   crn_uint32 m_levels;
-   crn_uint32 m_fmt_fourcc; // Same as crnlib::pixel_format
-};
-bool crn_decompress_dds_to_images(const void *pDDS_file_data, crn_uint32 dds_file_size, 
-  crn_uint32 **ppImages, crn_texture_desc &tex_desc);
-
-void crn_free_all_images(crn_uint32 **ppImages, const crn_texture_desc &desc);
-```
-
-`crn_decompress_dds_to_images()` decompresses an entire .DDS file in any supported compressed/uncompressed pixel format to one or more uncompressed 32-bit/pixel images. See the crnlib::pixel\_format enum in [inc/dds\_defs.h](http://code.google.com/p/crunch/source/browse/trunk/inc/dds_defs.h) for a list of the supported .DDS pixel formats.
-
-The caller is responsible for freeing each returned image, either by calling `crn_free_all_images()` or by manually calling `crn_free_block()` on each image pointer.
-
-For a usage example, see [example1.cpp](http://code.google.com/p/crunch/source/browse/trunk/example1/example1.cpp).
-
-## DXTn Block Compression ##
-
-```
-typedef void *crn_block_compressor_context_t;
-
-crn_block_compressor_context_t crn_create_block_compressor(const crn_comp_params &params);
-
-void crn_compress_block(crn_block_compressor_context_t pContext, const crn_uint32 *pPixels, void *pDst_block);
-
-void crn_free_block_compressor(crn_block_compressor_context_t pContext);
-```
-
-These functions allow the caller to compress 4x4 pixel image blocks to any non-swizzled DXTn format supported by crnlib: DXT1, DXT3, DXT5, DXT5A, DXN\_XY and DXN\_YX (basically BC1-BC5). For a usage example, see [example3.cpp](http://code.google.com/p/crunch/source/browse/trunk/example3/example3.cpp).
-
-Unlike most other DXTn block compressors (such as ATI\_Compress or squish) crnlib's is stateful, so for efficient usage you should call `crn_create_block_compressor()` to create a state object and reuse it as many times as possible. (If you're curious, the state consists of an endpoint cache, and a bunch of heap memory used by the compressor for temporary arrays.) Don't call `crn_create_block_compressor()` once for each block to compress, or performance will be dreadful.
-
-crnlib's DXTn endpoint optimizer actually supports any number of source pixels (i.e. from 1 to thousands, not just 16), but for simplicity this API currently only supports 4x4 texel blocks.
-
-`crn_compress_block()` is thread safe (it may be called in parallel from multiple threads), as long as each thread uses its own state context.
-
-## Helper Functions ##
-
-crnlib exposes a number of straightforward functions to convert the crn-related enums defined above to ANSI/Unicode strings and back. There are also functions to retrieve various bits of info about the supported pixel formats.
-
-They don't seem worth individually listing here, just see the [inc/crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crnlib.h).
@@ -1,34 +0,0 @@
-# Building #
-
-## Windows ##
-
-`crn.2008.sln` builds crnlib and the command line tool, and `crn_examples.2008.sln` builds the examples. Both are Visual Studio 2008 (VC9) solution file containing projects for Win32 and x64.
-
-crnlib and crunch have also been built with VS2005, VS2010, and gcc 4.5.0 ([TDM GCC+MinGW](http://tdm-gcc.tdragon.net/)).  A Codeblocks 10.05 workspace is also included (but building crnlib this way hasn't been tested a whole lot - it mostly exists to make porting to Linux using gcc a little easier).
-
-## Linux ##
-
-I simple makefile to build only the crunch executable is in crnlib/Makefile. I've only built/tested under 32-bit Ubuntu 12.04, however 64-bit should be easy to get working with minimal tweaks. Alternately, you can use the Codeblocks v10.05 Linux workspace in "crn\_linux.workspace".
-
-**Important**: When compiling with gcc, be sure to use **-fno-strict-aliasing** otherwise crnlib will randomly misbehave. This also applies to the transcoder library in [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h).
-
-## [example1](http://code.google.com/p/crunch/source/browse/trunk/example1/example1.cpp) ##
-Demonstrates how to use crnlib's high-level C-helper
-compression/decompression/transcoding functions in [inc/crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crnlib.h). It's a
-fairly complete example of crnlib's functionality.
-
-## [example2](http://code.google.com/p/crunch/source/browse/trunk/example2/example2.cpp) ##
-Shows how to transcodec .CRN files to .DDS using **only**
-the functionality in [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h). It does not link against against
-crnlib.lib or depend on it in any way. (Note: The complete source code,
-approx. 4800 lines, to the CRN transcoder is included in [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h).)
-
-example2 is intended to show how simple it is to integrate CRN textures
-into your application.
-
-## [example3](http://code.google.com/p/crunch/source/browse/trunk/example3/example3.cpp) ##
-Shows how to use the regular, low-level DXTn block compressor
-functions in [inc/crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h). This functionality is included for
-completeness. (Your engine or toolchain most likely already has its own
-DXTn compressor. crnlib's compressor is very competitive to most available closed and open source CPU-based
-compressors.)
@@ -0,0 +1,17 @@
+# Change Log
+
+## [0.1.4] - 2012-11-24
+### Added
+* KTX file format
+* Basic ETC1 support
+* Simple makefile
+
+### Fixed
+* Various DDS format fixes
+
+## [0.1.3] - 2012-04-26
+### Added
+* Ported to Linux (tested on Ubuntu x86 w/Codeblocks). Note that a few features of the cmd line tool don't work yet (eg. -timestamp)
+
+[0.1.4]: https://github.com/BinomialLLC/crunch
+[0.1.3]: https://github.com/BinomialLLC/crunch
@@ -1,23 +0,0 @@
-# Known Issues/Bugs #
-
-  * .DDS files written by crunch v1.00 (and output by crnlib v1.00) don't have the pitch/linearsize fields set in the DDS header. Some DDS readers expect valid values here (and expect the DDSD\_LINEARSIZE flag to be set too). This should be fixed in v1.01.
-
-  * You really should provide crnlib with raw, 24/32-bit source textures. Don't provide it with second generation textures that have already been DXTn or JPEG compressed. Of course, you can do so, and I know of one company doing this to repackage existing assets (without source art) so they download more quickly, but obviously don't expect the highest quality.
-
-> crnlib's custom DXT1 endpoint optimizer can detect pixel blocks which have been previously compressed to DXT1 using another DXTn compression library. It attempts to derive the endpoints originally used to compress these blocks in order to reduce artifacts, but it's not always successful.
-
-  * crnlib currently assumes you'll be further losslessly compressing its output .DDS files using LZMA. However, some engines use weaker codecs such as LZO, zlib, etc., so crnlib's bitrate measurements will be inaccurate. It should be easy to allow the caller to plug-in custom lossless compressors for bitrate measurement.
-
-  * Compressing to a desired bitrate can be very (to extremely) time consuming, especially when processing large (2k or 4k) images to the .CRN format. There are several high-level optimizations employed when compressing to clustered DXTn .DDS files using multiple trials, but not so for .CRN.
-
-> The current approach compresses the input image multiple times, using an [interpolation search](http://en.wikipedia.org/wiki/Interpolation_search) to find the quality level index that gets closest to the target bitrate. The lib does have some functionality to save the closest quality level found for later runs, but the command line tool doesn't expose this feature yet.
-
-  * The .CRN compressor doesn't use 3 color (transparent) DXT1 blocks at all, only 4 color blocks. (Supporting both block types would be a major pain at this point.) So it doesn't support DXT1A transparency, and its output quality suffers a little due to this limitation. (Note that the clustered DXTn compressor does not have this limitation.)
-
-  * DXT3 is not supported when writing .CRN or clustered DXTn DDS files. (DXT3 is supported by crnlib when compressing to regular DXTn DDS files.) You'll get DXT5 files if you request DXT3. However, DXT3 is supported by the regular DXTn block compressor.
-
-  * The DXT5\_CCXY format uses a simple YCoCg encoding that seems workable but hasn't been tuned for max. quality yet.
-
-  * Ignore the SSIM statistics printed when using the -imagestats option - it's currently bogus. I've been tuning the codec using PSNR/RMSE so far.
-
-  * The crn\_decomp.h header file library is freaking huge (~4800 lines). It would be nice to port it to C and shrink it.
@@ -1,19 +0,0 @@
-# crunch/crnlib's license #
-
-crnlib uses the (very permissive) open source ZLIB license:
-
-[http://opensource.org/licenses/Zlib](http://opensource.org/licenses/Zlib)
-
-License text from [crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crnlib.h):
-
-Copyright (c) 2010-2012 Rich Geldreich and Tenacious Software LLC
-
-This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.
-
-Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:
-
-1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.
-
-2. Altered source versions must be plainly marked as such, and must not be  misrepresented as being the original software.
-
-3. This notice may not be removed or altered from any source distribution.
@@ -1,244 +0,0 @@
-**crunch** is an open source ([ZLIB license](http://www.opensource.org/licenses/Zlib)) lossy texture compression library and command line compression tool for developers that distribute and use
-content in the [DXT1/5/N](http://en.wikipedia.org/wiki/S3_Texture_Compression) or [3DC/BC5](http://en.wikipedia.org/wiki/3Dc) compressed [mipmapped](http://en.wikipedia.org/wiki/Mipmap) GPU texture formats. It consists of a command line tool named "crunch", a compression library named "crnlib", and a single-header file, completely stand alone .CRN->DXTc transcoder C++ class located in [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h). crnlib's results are competitive to transform based recompression approaches, as shown [here](http://code.google.com/p/crunch/wiki/Stats).
-
-If you're going to SIGGRAPH this year, Brandon Jones is going to be showing crunch at the WebGL BoF (Birds of a Feather) event: [Brandon Jones: Crunch/DXT/Rage demo](http://www.khronos.org/news/events/siggraph-los-angeles-2012). Wednesday, August 8, 4-5pm, JW Marriott Los Angeles at LA Live, Gold Ballroom – Salon 3
-
-For background info/history of crunch: [blog post](http://richg42.blogspot.com/2012/07/doug-has-updated-his-blog-hes-now.html) (or the original post [here](http://richg42.blogspot.com/2012/07/the-saga-of-crunch.html)).
-
-
---
-
-
-## Technical Summary ##
-
-crnlib can compress mipmapped 2D textures and cubemaps to
-approximately .8-1.25 bits/texel, and normal maps to 1.75-2 bits/texel with reasonable quality (comparable or better than JPEG followed by real-time or even offline DXTc compression). The actual bitrate is indirectly controllable using an integer quality factor (like JPEG), or directly by specifying a target bitrate. crnlib implements a form of "clustered DXTn" compression, which is ultimately bounded by the quality achievable by DXTn itself. (DXTn's quality is actually [pretty low](http://cbloomrants.blogspot.com/2008/11/11-18-08-dxtc-part-2.html), but it's directly supported by Direct3D and OpenGL, and in hardware by practically every PC/console GPU.)
-
-The approach used by crnlib differs significantly from [other approaches](http://www.intel.com/jp/software/pix/324337_324337.pdf), such as using JPEG decompression followed by compression using a real-time DXTn compressor. Its compressed texture data format was carefully designed to be quickly transcodable directly to DXTn with no intermediate recompression step.
-The single threaded transcode to DXTn rate is approximately 100 (DXT5/3DC) and 250 (DXT5A/DXT1) megatexels/sec. (Core i7 2.6 GHz). Fast random access to individual mipmap levels is supported. No pixel-level operations are performed during transcoding. The core transcode loops operate at the 4x4 block or 8x8 macroblock level.
-
-crnlib can also generate standard .DDS files that, when losslessly post-compressed using LZMA/Deflate/LZO/etc., result in much smaller compressed files. (This is effectively a form of [rate-distortion optimization](http://en.wikipedia.org/wiki/Rate%E2%80%93distortion_optimization) applied to DXT+LZMA.) This feature allows easy integration into any engine or graphics library that already supports .DDS files and applies some form of lossless post-compression to the DXTn bits stored in those files (most engines do). Here's a Windows app that demonstrates this capability: [DDSExport](http://sites.google.com/site/richgel99/ddsexport).
-
-The .CRN file format supports BC1-BC5, corresponding to the following DXTn texture formats: DXT1 (but not DXT1A), DXT5, DXT5A, and DXN/ATI\_3DC (either XY or YX component order).
-
-The library also supports several popular swizzled variants, typically used for normal maps (several are supported by [AMD's Compressonator](http://developer.amd.com/tools/compressonator/pages/default.aspx)):
-DXT5\_XGBR, DXT5\_xGxR, DXT5\_AGBR, and DXT5\_CCxY (experimental luma-chroma YCoCg).
-
-crnlib currently compiles under Linux (using gcc, currently only x86 but x64 support should be easy), and Windows (both x86 and x64) using Visual Studio 2008/2010. It also compiles and has been minimally tested with Codeblocks 10.05 using [TDM-GCC x64/MinGW](http://tdm-gcc.tdragon.net/) under Windows.
-
-crnlib also contains some other possibly useful bits of code, like a multithreaded version of my [image resampler](http://code.google.com/p/imageresampler/) class, and my fast symbol\_codec class.
-
-
---
-
-
-## Upcoming Release ##
-
-Planned features for v1.05 as of 11/25/12.
-
-  * Continue experimenting with PVRTC: Implement a PVRTC decompressor, then a basic compressor.
-  * Add support for "raw" CRN files, assuming the user will post compress using gzip (useful for Javascript/WebGL apps). This is my highest priority after releasing v1.04.
-  * Now that miniz is in the project, add support for DXTc+ZLIB rate distortion optimization, instead of just DXTc+LZMA.
-  * Figure out the most elegant way to add support for writing 555, 565, and 4444 .DDS/.KTX textures (useful for mobile).
-  * Compile with LLVM
-  * Compile and test under 64-bit Linux (I only have 32-bit installed right now). Improve makefile.
-  * Add rate distortion optimization for .KTX files
-
-
---
-
-
-## Release History ##
-
-  * v1.04 (SVN trunk) - Nov. 25, 2012: Currently only checked into SVN trunk:
-    * Added "-fno-strict-aliasing" gcc compiler option, otherwise crnlib randomly crashes in weird spots.
-    * Fixed various DDS reader problems.
-    * Better Linux support. Added makefile with proper command line options, see crnlib/Makefile (thanks alonzakai). Modified gcc compiler options used by Codeblocks Linux projects.
-    * Basic ETC1 support - vanilla 4x4 block packing/unpacking. (More or less complete - see [rg\_etc1](http://code.google.com/p/rg-etc1/).) No support for rate distortion optimized or .CRN ETC1 files, though, just vanilla block by block ETC1.
-    * .KTX file format reading/writing. The .KTX file format is not well supported by any tools yet - I've tested crnlib's KTX writer as best as I can with what's available.
-    * Low-level support for reading/writing/flipping/unflipping Y flipped textures in all possible formats (useful to OpenGL/OpenGL ES devs) (more or less complete).
-    * Integrate miniz and jpeg-compressor to crnlib so crunch can write PNG's and read progressive JPEG's without adding messy external dependencies (completed).
-    * Fixed assertion problems in crn\_threading's "task\_pool" class.
-
-  * v1.03 (SVN tags/v103) - Apr. 26, 2012: Currently only checked in to SVN trunk until I finish the Linux port and fully regression test the codec. If you would like to give the Linux port a spin, you can download prebuilt binaries of v1.03 for 32-bit Linux/Win32/x86 [here](http://www.tenacioussoftware.com/crunch_v103_prerelease_win_linux_execs.7z) (or just build them yourself using Codeblocks v10.05).
-  * v1.02 - Apr. 22, 2012: Full Linux port of crnlib and crunch for Evan Parker to test at Google. Lots of files modified: Got rid of all wchar\_t usage (wasn't worth the effort to port), now using LZHAM's more cross platform multithreading/threadpool code, added platform independent file and directory I/O wrappers. Also, I optimized the task\_pool "join" method a bit (it now uses a semaphore compared to spinning with sleep(1) while waiting for the workers to finish).
-  * v1.01 - Apr. 15, 2012: DDS reader/writer fixes, -adding -usesourceformat command line option, merged over a few minor fixes from the ddsexport branch. Thanks to the devs at [The Happy Cloud](http://www.thehappycloud.com/) for reporting the DDS header problem.
-  * v1.00 - Dec. 27, 2011: Initial release
-
-
---
-
-
-## Applications Using crunch ##
-
-Jean Sabatier reports that the [Fly! Legacy](http://fly.simvol.org/indexus.php) open source flight simulator is using a database of ~30,000 DXT5 .CRN textures as part of its texture streaming system.
-
-[Planetside 2](http://www.planetside2.com/) is using crunch's rate distortion optimized (or "clustered") DXTc compressor and the [LZHAM lossless codec](http://code.google.com/p/lzham/) for most of its texture assets, which greatly reduces the title's download time.
-
-[Evan Parker](http://plus.google.com/104261567553968048744) at Google has compiled the CRN->DXTc transcoder header file library [inc/crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h) to Javascript using [Emscripten](https://github.com/kripken/emscripten/wiki) (here's his [post](http://plus.google.com/104261567553968048744/posts/28jEPHtuhq5) with details). This allows him to quickly transcode CRN compressed images/textures directly to DXTc in Javascript (which is ~10x faster than decoding JPG's and packing to DXTc). Here's a [demo](http://www-cs-students.stanford.edu/~eparker/files/crunch/decode_test.html) (needs the latest Chrome beta with DXT texture support to fully function), and here's more [technical info](http://www-cs-students.stanford.edu/~eparker/files/crunch/more_info.html).
-
-Brandon Jones has tested the Javascript emscripten port of crn\_decomp.h and reported his results [here](http://plus.google.com/101501294230020638079/posts/KJ42NGorLTj).
-
-I believe the first shipping product to use crunch/crnlib compressed textures is ["Zombie Track Meat"](http://toucharcade.com/2012/03/10/gdc-2012-a-look-at-the-zombie-track-meat-collaboration/), a free to play NaCL game in the Chrome App Store [here](http://chrome.google.com/webstore/detail/jmfhnfnjfdoplkgbkmibfkdjolnemfdk). More technical info [here](http://fuzzycube.blogspot.com/2012/04/zombie-track-meat-post-mortem.html). ZTM uses .CRN compressed textures and the CRN->DXTc real-time transcoder library.
-
-If you use crunch/crnlib, I would greatly appreciate it if you sent me an email with any feedback, or info on how you're using it in practice. (Credits somewhere would also be much appreciated, but are not required.)
-
-
---
-
-
-## Documents ##
-
-  * [Various RMSE statistics (charts, graphs, etc.), and an example image (kodim14) compressed at various bitrates](http://code.google.com/p/crunch/wiki/Stats)
-  * [Building the examples](http://code.google.com/p/crunch/wiki/Building)
-  * [crnlib API Documentation](http://code.google.com/p/crunch/wiki/API_Docs)
-  * [Supported file formats](http://code.google.com/p/crunch/wiki/SupportedFormats)
-  * [Known problems](http://code.google.com/p/crunch/wiki/KnownProblems)
-  * [Technical details](http://code.google.com/p/crunch/wiki/TechnicalDetails) is a high level description of the CRN data format, the CRN->DXTn transcoding process, and how the current compressor works.
-  * Here's an external website showing the quality achievable with an early version of the lib (called hx/hxc at the time) at various palette (quality) settings: [Kodak test images](http://www.tenacioussoftware.com/hx/kodak/)
-
-
---
-
-
-## Recommended Software ##
-
-[AMD's Compressonator](http://developer.amd.com/gpu/compressonator/pages/default.aspx) tool is recommended to view the .DDS files created by the crunch tool and the included example projects.
-
-Note: Some of the funky swizzled DXTn .DDS output formats (such as DXT5\_xGBR)
-read/written by the crunch tool or examples deviate from the DX9 DDS
-standard, so DXSDK tools such as DXTEX.EXE won't load them at all or
-they won't be properly displayed. AMD's tool can view these files.
-
-
---
-
-
-## Creating Compressed Textures from the Command Line (crunch.exe) ##
-
-The simplest way to create compressed textures using crnlib is to
-integrate the bin\crunch.exe (or bin\crunch\_x64.exe) command line tool
-into your texture build toolchain or export process. It can write DXTn
-compressed 2D/cubemap textures to regular DXTn compressed .DDS,
-clustered (or reduced entropy) DXTn compressed .DDS, or .CRN files. It
-can also transcode or decompress files to several standard image
-formats, such as TGA or BMP. Run crunch.exe with no options for help.
-
-The .CRN files created by crunch.exe can be efficiently transcoded to
-DXTn using the stand-alone CRN transcoding header file library located in `inc/crn_decomp.h`.
-
-Here are a few example crunch.exe command lines:
-
-1. Compress blah.tga to blah.dds using normal DXT1 compression:
-
-`crunch -file blah.tga -fileformat dds -dxt1`
-
-2. Compress blah.tga to blah.dds using clustered DXT1 at an effective bitrate of 1.5 bits/texel (after the .DDS file is post-compressed using LZMA), display image statistic:
-
-`crunch -file blah.tga -fileformat dds -dxt1 -bitrate 1.5 -imagestats`
-
-3. Compress blah.tga to blah.dds using clustered DXT1 at quality level 100 (from [0,255]), with no mipmaps, display LZMA statistics:
-
-`crunch -file blah.tga -fileformat dds -dxt1 -quality 100 -mipmode none -lzmastats`
-
-3. Compress blah.tga to blah.crn using clustered DXT1 at a bitrate of 1.2 bits/texel, no mipmaps:
-
-`crunch -file blah.tga -dxt1 -bitrate 1.2 -mipmode none`
-
-4. Decompress blah.dds to a .tga file:
-
-`crunch -file blah.dds -fileformat tga`
-
-5. Transcode blah.crn to a .dds file:
-
-`crunch -file blah.crn`
-
-6. Decompress blah.crn, writing each mipmap level to a separate .tga file:
-
-`crunch -split -file blah.crn -fileformat tga`
-
-crunch.exe can do a lot more, like rescale/crop images before
-compression, convert images from one file format to another, compare
-images, process multiple images, etc.
-
-
---
-
-
-## Using crnlib ##
-
-The most flexible and powerful way of using crnlib is to integrate the
-library into your editor/toolchain/etc. and directly supply it your
-raw/source texture bits. See the C-style API's and comments in
-[inc/crnlib.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crnlib.h).
-
-To compress, #include "crnlib.h", fill in the `crn_comp_params` struct, and call one function:
-
-```
-  void *crn_compress(const crn_comp_params &comp_params, crn_uint32 &compressed_size,   
-        crn_uint32 *pActual_quality_level = NULL, float *pActual_bitrate = NULL);
-```
-
-The returned pointer will be NULL on failure, or a pointer to the .CRN or .DDS file data.
-
-Or, if you want crnlib to also generate mipmaps, you call this function:
-
-```
-  void *crn_compress(const crn_comp_params &comp_params, const crn_mipmap_params &mip_params, 
-        crn_uint32 &compressed_size, crn_uint32 *pActual_quality_level = NULL, float *pActual_bitrate = NULL);
-```
-
-You can also transcode/uncompress .DDS/.CRN files to raw 32bpp images
-using `crn_decompress_crn_to_dds()` and `crn_decompress_dds_to_images()`.
-
-Internally, crnlib just uses `inc/crn_decomp.h` to transcode textures to
-DXTn. If you only need to transcode .CRN format files to raw DXTn bits
-at runtime (and not compress), you don't actually need to compile or
-link against crnlib at all. Just include inc/crn\_decomp.h, which
-contains a completely self-contained CRN transcoder in the "crnd"
-namespace. The `crnd_get_texture_info()`, `crnd_unpack_begin()`,
-`crnd_unpack_level()`, etc. functions are all you need to efficiently get
-at the raw DXTn bits, which can be directly supplied to whatever API or
-GPU you're using. (See example2.)
-
-
---
-
-
-## Related Links ##
-
-I'm not aware of any other open source libraries that solve this problem in a usable, "out of the box" manner yet, but these links are interesting:
-
-  * [Experiments in Luma-Optimized and Mipmapped DXT1 Compression](http://sites.google.com/site/richgel99/luma_chroma_texture_compression)
-  * [ddsexport](http://sites.google.com/site/richgel99/ddsexport) - A GUI demo of crnlib's ability to create rate-distortion optimized DDS textures.
-  * Strom and Wennersten, [Lossless Compression of Already Compressed Textures](http://www.jacobstrom.com/publications/StromWennerstenHPG2011.pdf)
-  * van Waveren, [Real-Time DXT Compression](http://www.intel.com/jp/software/pix/324337_324337.pdf)
-  * Excellent public domain real-time DXTn compressor: [rygDXT](http://www.farb-rausch.de/~fg/code/) and [stb\_dxt.h](http://nothings.org/stb/stb_dxt.h)
-  * [Charles Bloom's various blog posts on DXT compression](http://cbloomrants.blogspot.com/2009/06/06-17-09-dxtc-more-followup.html)
-  * [FastDXT](http://www.evl.uic.edu/cavern/fastdxt/)
-  * [Spiro's DXT Compression Algorithm Experiments](http://lspiroengine.com/?p=260)
-  * [LSDxt DXT](http://lspiroengine.com/?p=516) - Spiro's texture compression tool
-  * [Variable Bit Rate GPU Texture Compression](http://www.csee.umbc.edu/~olano/papers/#texcompress)
-  * [libsquish](http://code.google.com/p/libsquish/) - Open source (MIT license) DXT compression library.
-  * [Super Simple Texture Compression](http://github.com/divVerent/s2tc/wiki) - Alternative DXT1-compatible compression method that is purposely limited to a subset of DXT1 (only uses 2 colors per block, and is effectively only 1 bit per selector). The [Quality Comparison](http://github.com/divVerent/s2tc/wiki/QualityComparison) page is interesting.
-
-
---
-
-
-## Special Thanks ##
-
-Thanks to [Colt McAnlis](https://plus.google.com/105062545746290691206/posts) at Google for porting the CRN->DXT transcoder library (in crn\_decomp.h) to Native Client, and for mentioning crunch in his [GDC presentation](http://www.youtube.com/watch?v=7bJ-D1xXEeg) on his texture compression R&D.
-
-Some portions of this software make use of public domain code
-originally written by Igor Pavlov (LZMA), RYG's public domain real-time DXTn compressor, and stb\_image.c from [Sean Barrett](http://nothings.org/).
-
-Many thanks to Violet Koppel for funding much of crnlib's development in 2009. Also, thanks to Colt again, and John Brooks at [Blue Shift, Inc.](http://www.blueshiftinc.com/) for helping test and giving feedback on crnlib. Also thanks to Charles Bloom's informative [blog posts](http://cbloomrants.blogspot.com/2008/11/11-18-08-dxtc.html) on his work on DXT compression.
-
-
---
-
-
-## Support Contact ##
-
-For any questions or problems with this software please **email** [Rich Geldreich](http://www.mobygames.com/developer/sheet/view/developerId,190072/) at <richgel99 _at_ gmail _dot_ com>. Here's my [twitter page](http://twitter.com/#!/richgel999).
@@ -0,0 +1,307 @@
+crunch/crnlib v1.04 - Advanced DXTn texture compression library
+Copyright (C) 2010-2017 Richard Geldreich, Jr. and Binomial LLC http://binomial.info 
+
+For bugs or support contact Binomial <info@binomial.info>.
+
+This software uses the ZLIB license, which is located in license.txt.
+http://opensource.org/licenses/Zlib
+
+Portions of this software make use of public domain code originally
+written by Igor Pavlov (LZMA), RYG (crn_ryg_dxt*), and Sean Barrett (stb_image.c).
+
+If you use this software in a product, an acknowledgment in the product 
+documentation would be highly appreciated but is not required.
+
+Note: crunch originally used to live on Google Code: https://code.google.com/p/crunch/
+
+## Overview
+
+crnlib is a lossy texture compression library for developers that ship
+content using the DXT1/5/N or 3DC compressed color/normal map/cubemap
+mipmapped texture formats. It was written by the same author as the open
+source [LZHAM compression library](http://code.google.com/p/lzham/).
+
+It can compress mipmapped 2D textures, normal maps, and cubemaps to
+approx. 1-1.25 bits/texel, and normal maps to 1.75-2 bits/texel. The
+actual bitrate depends on the complexity of the texture itself, the
+specified quality factor/target bitrate, and ultimately on the desired
+quality needed for a particular texture. 
+
+crnlib's differs significantly from other approaches because its
+compressed texture data format was carefully designed to be quickly
+transcodable directly to DXTn with no intermediate recompression step.
+The typical (single threaded) transcode to DXTn rate is generally
+between 100-250 megatexels/sec. The current library supports PC
+(Win32/x64) and Xbox 360. Fast random access to individual mipmap levels
+is supported.
+
+crnlib can also generates standard .DDS files at specified quality
+setting, which results in files that are much more compressible by
+LZMA/Deflate/etc. compared to files generated by standard DXTn texture
+tools (see below). This feature allows easy integration into any engine
+or graphics library that already supports .DDS files.
+
+The .CRN file format supports the following core DXTn texture formats:
+DXT1 (but not DXT1A), DXT5, DXT5A, and DXN/3DC
+
+It also supports several popular swizzled variants (several are
+also supported by AMD's Compressonator): 
+DXT5_XGBR, DXT5_xGxR, DXT5_AGBR, and DXT5_CCxY (experimental luma-chroma YCoCg).
+
+## Recommended Software
+
+AMD's [Compressonator tool](https://github.com/GPUOpen-Tools/Compressonator)
+is recommended to view the .DDS files created by the crunch tool and the included example projects.
+
+Note: Some of the swizzled DXTn .DDS output formats (such as DXT5_xGBR)
+read/written by the crunch tool or examples deviate from the DX9 DDS
+standard, so DXSDK tools such as DXTEX.EXE won't load them at all or
+they won't be properly displayed.
+
+## Compression Algorithm Details
+
+The compression process employed in creating both .CRN and
+clustered .DDS files utilizes a very high quality, scalable DXTn
+endpoint optimizer capable of processing any number of pixels (instead
+of the typical hard coded 16), optional adaptive switching between
+several macroblock sizes/configurations (currently any combination of
+4x4, 8x4, 4x8, and 8x8 pixel blocks), endpoint clusterization using
+top-down cluster analysis, vector quantization (VQ) of the selector
+indices, and several custom algorithms for compressing the resulting
+endpoint/selector codebooks and macroblock indices. Multiple feedback
+passes are performed between the clusterization and VQ steps to optimize
+quality, and several steps use a brute force refinement approach to improve 
+quality. The majority of compression steps are multithreaded.
+
+The .CRN format currently utilizes canonical Huffman coding for speed
+(similar to Deflate but with much larger tables), but the next major
+version will also utilize adaptive binary arithmetic coding and higher
+order context modeling using already developed tech from the my LZHAM
+compression library.
+
+## Supported File Formats
+
+crnlib supports two compressed texture file formats. The first
+format (clustered .DDS) is simple to integrate into an existing project
+(typically, no code changes are required), but it doesn't offer the
+highest quality/compression ratio that crnlib is capable of. Integrating
+the second, higher quality custom format (.CRN) requires a few
+typically straightforward engine modifications to integrate the
+.CRN->DXTn transcoder header file library into your tools/engine.
+
+### .DDS
+
+crnlib can compress textures to standard DX9-style .DDS files using
+clustered DXTn compression, which is a subset of the approach used to
+create .CRN files.(For completeness, crnlib also supports vanilla, block
+by block DXTn compression too, but that's not very interesting.)
+Clustered DXTn compressed .DDS files are much more compressible than
+files created by other libraries/tools. Apart from increased
+compressibility, the .DDS files generated by this process are completely
+standard so they should be fairly easy to add to a project with little
+to no code changes.
+
+To actually benefit from clustered DXTn .DDS files, your engine needs to
+further losslessly compress the .DDS data generated by crnlib using a
+lossless codec such as zlib, lzo, LZMA, LZHAM, etc. Most likely, your
+engine does this already. (If not, you definitely should because DXTn
+compressed textures generally contain a large amount of highly redundant
+data.)
+
+Clustered .DDS files are intended to be the simplest/fastest way to
+integrate crnlib's tech into a project.
+
+### .CRN
+
+The second, better, option is to compress your textures to .CRN files
+using crnlib. To read the resulting .CRN data, you must add the .CRN
+transcoder library (located in the included single file, stand-alone
+header file library inc/crn_decomp.h) into your application. .CRN files
+provide noticeably higher quality at the same effective bitrate compared
+to clustered DXTn compressed .DDS files. Also, .CRN files don't require
+further lossless compression because they're already highly compressed.
+
+.CRN files are a bit more difficult/risky to integrate into a project, but
+the resulting compression ratio and quality is superior vs. clustered .DDS files.
+
+### .KTX
+
+crnlib and crunch can read/write the .KTX file format in various pixel formats.
+Rate distortion optimization (clustered DXTc compression) is not yet supported
+when writing .KTX files. 
+
+The .KTX file format is just like .DDS, except it's a fairly well specified
+standard created by the Khronos Group. Unfortunately, almost all of the tools I've
+found that support .KTX are fairly (to very) buggy, or are limited to only a handful
+of pixel formats, so there's no guarantee that the .KTX files written by crnlib can
+be reliably read by other tools.
+
+## Building the Examples
+
+This release contains the source code and projects for three simple
+example projects:
+
+crn_examples.2010.sln is a Visual Studio 2010 (VC10) solution file
+containing projects for Win32 and x64. crnlib itself also builds with
+VS2005, VS2008, and gcc 4.5.0 (TDM GCC+MinGW).  A codeblocks 10.05
+workspace and project file is also included, but compiling crnlib this
+way hasn't been tested much.
+
+### example1
+
+Demonstrates how to use crnlib's high-level C-helper
+compression/decompression/transcoding functions in inc/crnlib.h. It's a
+fairly complete example of crnlib's functionality.
+
+### example2
+Shows how to transcodec .CRN files to .DDS using **only**
+the functionality in inc/crn_decomp.h. It does not link against against
+crnlib.lib or depend on it in any way. (Note: The complete source code,
+approx. 4800 lines, to the CRN transcoder is included in inc/crn_decomp.h.)
+
+example2 is intended to show how simple it is to integrate CRN textures
+into your application.
+
+### example3
+Shows how to use the regular, low-level DXTn block compressor
+functions in inc/crnlib.h. This functionality is included for
+completeness. (Your engine or toolchain most likely already has its own
+DXTn compressor. crnlib's compressor is typically very competitive or
+superior to most available closed and open source CPU-based
+compressors.)
+
+## Creating Compressed Textures from the Command Line (crunch.exe)
+
+The simplest way to create compressed textures using crnlib is to
+integrate the bin\crunch.exe or bin\crunch_x64.exe) command line tool
+into your texture build toolchain or export process. It can write DXTn
+compressed 2D/cubemap textures to regular DXTn compressed .DDS,
+clustered (or reduced entropy) DXTn compressed .DDS, or .CRN files. It
+can also transcode or decompress files to several standard image
+formats, such as TGA or BMP. Run crunch.exe with no options for help.
+
+The .CRN files created by crunch.exe can be efficiently transcoded to
+DXTn using the included CRN transcoding library, located in full source
+form under inc/crn_decomp.h.
+
+Here are a few example crunch.exe command lines:
+
+1. Compress blah.tga to blah.dds using normal DXT1 compression:
+  * `crunch -file blah.tga -fileformat dds -dxt1`
+
+2. Compress blah.tga to blah.dds using clustered DXT1 at an effective bitrate of 1.5 bits/texel, display image statistic:
+  * `crunch -file blah.tga -fileformat dds -dxt1 -bitrate 1.5 -imagestats`
+
+3. Compress blah.tga to blah.dds using clustered DXT1 at quality level 100 (from [0,255]), with no mipmaps, display LZMA statistics:
+  * `crunch -file blah.tga -fileformat dds -dxt1 -quality 100 -mipmode none -lzmastats`
+
+3. Compress blah.tga to blah.crn using clustered DXT1 at a bitrate of 1.2 bits/texel, no mipmaps:
+  * `crunch -file blah.tga -dxt1 -bitrate 1.2 -mipmode none`
+
+4. Decompress blah.dds to a .tga file:
+  * `crunch -file blah.dds -fileformat tga`
+
+5. Transcode blah.crn to a .dds file:
+  * `crunch -file blah.crn`
+
+6. Decompress blah.crn, writing each mipmap level to a separate .tga file:
+  * `crunch -split -file blah.crn -fileformat tga`
+
+crunch.exe can do a lot more, like rescale/crop images before
+compression, convert images from one file format to another, compare
+images, process multiple images, etc.
+
+Note: I would have included the full source to crunch.exe, but it still
+has some low-level dependencies to crnlib internals which I didn't have
+time to address. This version of crunch.exe has some reduced
+functionality compared to an earlier eval release. For example, XML file
+support is not included in this version.
+
+## Using crnlib
+
+The most flexible and powerful way of using crnlib is to integrate the
+library into your editor/toolchain/etc. and directly supply it your
+raw/source texture bits. See the C-style API's and comments in
+inc/crnlib.h.
+
+To compress, you basically fill in a few structs in and call one function:
+
+```c
+void *crn_compress( const crn_comp_params &comp_params,
+                    crn_uint32 &compressed_size,
+                    crn_uint32 *pActual_quality_level = NULL,
+                    float *pActual_bitrate = NULL);
+```
+
+Or, if you want crnlib to also generate mipmaps, you call this function:
+
+```c
+void *crn_compress( const crn_comp_params &comp_params,
+                    const crn_mipmap_params &mip_params,
+                    crn_uint32 &compressed_size,
+                    crn_uint32 *pActual_quality_level = NULL,
+                    float *pActual_bitrate = NULL);
+```
+
+You can also transcode/uncompress .DDS/.CRN files to raw 32bpp images
+using `crn_decompress_crn_to_dds()` and `crn_decompress_dds_to_images()`.
+
+Internally, crnlib just uses inc/crn_decomp.h to transcode textures to
+DXTn. If you only need to transcode .CRN format files to raw DXTn bits
+at runtime (and not compress), you don't actually need to compile or
+link against crnlib at all. Just include inc/crn_decomp.h, which
+contains a completely self-contained CRN transcoder in the "crnd"
+namespace. The `crnd_get_texture_info()`, `crnd_unpack_begin()`,
+`crnd_unpack_level()`, etc. functions are all you need to efficiently get
+at the raw DXTn bits, which can be directly supplied to whatever API or
+GPU you're using. (See example2.)
+
+Important note: When compiling under native client, be sure to define
+the `PLATFORM_NACL` macro before including the `inc/crn_decomp.h` header file library.
+
+## Known Issues/Bugs
+
+* crnlib currently assumes you'll be further losslessly compressing its
+output .DDS files using LZMA. However, some engines use weaker codecs
+such as LZO, zlib, or custom codecs, so crnlib's bitrate measurements
+will be inaccurate. It should be easy to allow the caller to plug-in
+custom lossless compressors for bitrate measurement.
+
+* Compressing to a desired bitrate can be time consuming, especially when
+processing large (2k or 4k) images to the .CRN format. There are several
+high-level optimizations employed when compressing to clustered DXTn .DDS
+files using multiple trials, but not so for .CRN.
+
+* The .CRN compressor does not currently use 3 color (transparent) DXT1
+blocks at all, only 4 color blocks. So it doesn't support DXT1A
+transparency, and its output quality suffers a little due to this
+limitation. (Note that the clustered DXTn compressor used when
+writing clustered .DDS files does *not* have this limitation.)
+
+* Clustered DXT5/DXT5A compressor is able to group DXT5A blocks into
+clusters only if they use absolute (black/white) selector indices. This
+hurts performance at very low bitrates, because too many bits are
+effectively given to alpha.
+
+* DXT3 is not supported when writing .CRN or clustered DXTn DDS files.
+(DXT3 is supported by crnlib's when compressing to regular DXTn DDS
+files.) You'll get DXT5 files if you request DXT3. However, DXT3 is
+supported by the regular DXTn block compressor. (DXT3's 4bpp fixed alpha
+sucks verses DXT5 alpha blocks, so I don't see this as a bug deal.)
+
+* The DXT5_CCXY format uses a simple YCoCg encoding that is workable but
+hasn't been tuned for max. quality yet.
+
+* Clustered (or rate distortion optimized) DXTc compression is only
+supported when writing to .DDS, not .KTX. Also, only plain block by block
+compression is supported when writing to ETC1, and .CRN does not support ETC1.
+
+## Compile to Javascript with Emscripten
+
+Download and install Emscripten:
+    http://kripken.github.io/emscripten-site/docs/getting_started/downloads.html
+
+From the root directory, run:
+```c
+    emcc -O3 emscripten/crn.cpp -I./inc -s EXPORTED_FUNCTIONS="['_malloc', '_free', '_crn_get_width', '_crn_get_height', '_crn_get_levels', '_crn_get_dxt_format', '_crn_get_bytes_per_block', '_crn_get_uncompressed_size', '_crn_decompress']" -s NO_EXIT_RUNTIME=1 -s NO_FILESYSTEM=1 -s ELIMINATE_DUPLICATE_FUNCTIONS=1 -s ALLOW_MEMORY_GROWTH=1 --memory-init-file 0 -o crunch.js
+```
@@ -1,80 +0,0 @@
-## CRN vs. JPEG+Real-Time DXT1 ##
-
-A popular alternative technique involves bolting some sort of real-time DXT compressor onto the back end of a transform coder, like JPEG. JPEG is far from state of the art, but it's very fast and quite popular. The following data shows that CRN is competitive against transform based solutions.
-
-In this test, I compressed the test corpus with libjpeg at various bitrates using a binary search to find the JPEG quality factor level closest to each test bitrate. The test bitrates where .75-2.0 bpp at .25 bpp increments.
-
-These .JPG files where then unpacked to 24-bit RGB (using the decompressor in [stb\_image.c](http://nothings.org/stb_image.c)), then compressed to DXT1 using [RYG's real-time DXT1 compressor](http://www.farb-rausch.de/~fg/code/). The resulting DXT1 bits where then unpacked using crnlib to 24-bit RGB and compared to the original images to generate this RMSE data:
-
-[Charts at various bitrates](http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_stats_dec29.htm)
-
-[Raw Excel spreadsheet](http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_stats_dec29.xlsx)
-
-This test shows that, in a RMSE sense, CRN is worse than JPG+RYG\_DXT1 at less than 1.0bpp. Between 1-1.25bpp, CRN is roughly comparable (please excuse the line colors - they are different on each chart):
-
-![http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_1_0.png](http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_1_0.png)
-
-Beginning at 1.25bpp CRN is usually better than JPEG followed by real-time DXT1 compression:
-
-![http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_1_25.png](http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_1_25.png)
-
-At 1.5bpp or higher CRN was always equal or better:
-
-![http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_1_5.png](http://www.tenacioussoftware.com/crn_stats/dec29/crn_vs_jpg_1_5.png)
-
-These results are similar to [Stromm and Wennersten's published results of testing JPEG followed by ETC1](http://www.jacobstrom.com/publications/StromWennerstenHPG2011.pdf):
-
-> "If JPEG is used as the transport format, the textures will need to be compressed on-the-ﬂy to a texture compression format such as ETC1 after download. To ﬁnd out how much this transcoding will lower the quality, we compressed the JPEG images to ETC1 and measured the average mean square error. The result was an increased error equivalent to a PSNR drop of 2.02 dB than if just ETC1 encoding were used. Thus even if the JPEGs have equal quality to the proposed scheme, after the transcoding there will be a signiﬁcant quality penalty. The transcoding used slow exhaustive compression—fast transcoding will give an even higher penalty"
-
-Also, these observations from the introduction are relevant to this comparison:
-
-> "[...] This is often a good solution, especially if low bit rates are of interest, but the resulting texture quality suffers for two reasons: First, the ﬁnal texture will include image artifacts both from JPEG and from the texture codec. Second, to make recompression from JPEG to the texture codec quick enough, shortcuts may be necessary, especially on mobile devices with limited computational power. This lowers quality,  specially when compared to slow, perhaps exhaustive compression."
-
-
---
-
-
-## Raw Data: CRN .75-2.0bpp vs. DXT1 (real-time and offline) ##
-
-Here's a [table](http://www.tenacioussoftware.com/crn_stats/dec29/crn_stats_dec_29.htm) containing the RGB [RMSE](http://en.wikipedia.org/wiki/RMSE) of RYG's real-time DXT1 compressor, crnlib's regular DXT1 compressor using uniform or perceptual metrics (which is very similar to ATI\_Compress or squish), and .CRN at .75-2.0bpp at .25bpp increments. This chart also has two columns showing how many bits/pixel are needed by LZMA to compress DXT1 uniform and perceptual .DDS files.
-
-Here's a [chart](http://www.tenacioussoftware.com/crn_stats/dec29/rmse_chart.png) of the above spreadsheet, with the images sorted by DXT1 uniform RMSE. and here's the [raw Excel spreadsheet file](http://www.tenacioussoftware.com/crn_stats/dec29/crn_stats_dec_29.xlsx).
-
-Note that each .CRN column was generated by specifying the -bitrate option, which really only limits the **maximum** bitrate a file is allowed to use. .CRN is a variable block size format, and the front end limits the maximum endpoint/selector palette sizes to 8192 entries each. So on a few of the simpler images here very high bitrates (like 1.75 or 2.0bpp) may not actually be achievable. (You can see this effect clearly on serrano.)
-
-These RMSE values should be comparable to the values in Charles Bloom's very useful [blog post](http://cbloomrants.blogspot.com/2008/11/11-20-08-dxtc-part-3.html) comparing various DXT1 compressors. (My RYG RMSE stats are slightly different from Bloom's, and I don't know exactly why, but I'm guessing our versions of RYG's compressor are slightly different.)
-
-
---
-
-
-## Compressed Images ##
-
-Ordered by highest to lowest quality. It's pretty clear that .CRN's subjective quality starts dropping fast around ~1.0 bpp.
-
-**crnlib DXT1** (uniform colorspace metrics), RMSE: 8.29, .DDS+LZMA: 2.95 bpp
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_dds_uniform.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_dds_uniform.png)
-
-**RYG DXT1**, RMSE: 8.97
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_dds_ryg.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_dds_ryg.png)
-
-**CRN DXT1 2.0 bpp**, RMSE: 10.36
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_2_00.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_2_00.png)
-
-**CRN DXT1 1.75 bpp**, RMSE: 11.23
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_75.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_75.png)
-
-**CRN DXT1 1.5 bpp**, RMSE: 12.44
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_50.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_50.png)
-
-**CRN DXT1 1.25 bpp**, RMSE: 14.2
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_25.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_25.png)
-
-**CRN DXT1 1.0 bpp**, RMSE: 16.75
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_00.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_1_00.png)
-
-**CRN DXT1 .75bpp**, RMSE: 20.69
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_0_75.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_0_75.png)
-
-**CRN DXT1 .65bpp**, RMSE: 22.93
-![http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_0_65.png](http://www.tenacioussoftware.com/crn_stats/kodim14_pics/kodim14_crn_0_65.png)
@@ -1,41 +0,0 @@
-# Supported File Formats #
-
-crnlib supports two compressed texture file formats. The first
-format (clustered [.DDS](http://en.wikipedia.org/wiki/DirectDraw_Surface)) is simple to integrate into an existing project
-(no code changes are typically required), but it doesn't offer the
-highest quality/compression ratio that crnlib is capable of. Integrating
-the second, higher quality custom format (.CRN) requires a few
-typically straightforward engine modifications to integrate the
-.CRN->DXTn transcoder header file library into your tools/engine.
-
-## .DDS ##
-crnlib can compress textures to standard DX9-style [.DDS](http://en.wikipedia.org/wiki/DirectDraw_Surface) files using
-clustered DXTn compression, which is a subset of the approach used to
-create .CRN files. (For completeness, crnlib also supports vanilla, block by block DXTn compression too, but that's not very interesting.)
-Clustered DXTn compressed .DDS files are much more compressible than
-files created by other libraries/tools. Apart from increased
-compressibility, the .DDS files generated by this process are completely
-standard so they should be fairly easy to add to a project with little
-to no code changes.
-
-To actually benefit from clustered DXTn .DDS files, your engine needs to
-further losslessly compress the .DDS data generated by crnlib using a
-lossless codec such as zlib, lzo, LZMA, LZHAM, etc. Most likely, your
-engine does this already. (If not, you definitely should because DXTn
-compressed textures generally contain a large amount of highly redundant
-data.)
-
-Clustered .DDS files are intended to be the simplest/fastest way to
-integrate crnlib's tech into a project.
-
-## .CRN ##
-The second, better, option is to compress your textures to .CRN files
-using crnlib. To read the resulting .CRN data, you must add the .CRN
-transcoder library (located in the included single file, stand-alone
-header file library inc/crn\_decomp.h) into your application. .CRN files
-provide noticeably higher quality at the same effective bitrate compared
-to clustered DXTn compressed .DDS files. Also, .CRN files don't require
-further lossless compression because they're already highly compressed.
-
-.CRN files are a bit more difficult/risky to integrate into a project, but
-the resulting compression ratio and quality is superior vs. clustered .DDS files.
@@ -1,118 +0,0 @@
-# Compression Algorithm Details #
-
-This is pretty high level and could be much better. I'll improve this over time, for now I hope this is enough:
-
-
---
-
-
-## Data Format ##
-
-The easiest way to describe how crnlib works is to start at the compressed data stream and the transcoding process and work backwards to the compressor, which also mirrors the design process followed when I designed crnlib.
-
-.CRN DXT1 files consist of a small header, followed by a [DPCM](http://en.wikipedia.org/wiki/DPCM)+[Huffman](http://en.wikipedia.org/wiki/Huffman_Compression) compressed endpoint palette, and a DPCM+Huffman compressed selector palette. (DXT5 files contain two more palettes for alpha endpoints and selectors. Also, I'm not sure that "palette" is the best word. "Codebook" may be more appropriate, but graphics programmers seem more familiar with the concept of palettes.)
-
-Here's a visualization of the DXT1 color selector palette for kodim04.png. It's 2692x16. The 2-bit selectors where scaled to [0,255]. (I think this palette was sorted by similarity, which is one of the palette orderings tested by the compressor's backend.)
-
-![http://crunch.googlecode.com/svn/wiki/crunch_selectors.png](http://crunch.googlecode.com/svn/wiki/crunch_selectors.png)
-
-And here's a visualization of the DXT1 color endpoint palette:
-![http://crunch.googlecode.com/svn/wiki/crunch_color_endpoints.png](http://crunch.googlecode.com/svn/wiki/crunch_color_endpoints.png)
-
-(These a very wide images, so they get downsampled when viewed in the wiki.)
-
-This particular color endpoint palette contains 2415 entries (horizontal axis), where each entry contains a 32-bit integer containing two 565 colors (vertical axis, enlarged by 8x in this image).
-
-Each mipmap is divided up into 8x8 pixel "macroblocks". Each macroblock corresponds to four 4x4 pixel DXTn blocks arranged in a 2x2 checkerboard pattern. Each macroblock is adaptively subdivided by the compressor into one or more "tiles". Very simple macroblocks (say solid ones that use only a single color) can use a single 8x8 pixel tile, but more complex macroblocks can use any non-overlapping combination of 8x4, 4x8, or 4x4 tiles. (There are 9 possible ways of arranging the tiles in a single macroblock.)
-
-In this image, the macroblock tile boundaries are outlined in gray:
-
-![http://crunch.googlecode.com/svn/wiki/crunch_macroblock_tiles.png](http://crunch.googlecode.com/svn/wiki/crunch_macroblock_tiles.png)
-
-Notice that the more complex areas of the image contain smaller tiles, so these image areas get assigned more endpoints. Simpler areas use larger tiles, so the DXT1 blocks in these tiles are constrained to share the same endpoints. Also, a single color endpoint pair can be shared by many tiles, independent of their location in the image.
-
-The endpoint/selector palettes are shared by all mipmap levels present in the .CRN file.
-
-For each tile, a compressed index is sent to select the macroblock tile arrangement, followed by between one to four DPCM+Huffman compressed endpoint palette indices. Four selector indices (again coded using DPCM+Huffman) are always sent immediately after the endpoint(s). The macroblock rows are raster scanned in a serpentine order: left->right, then right->left, etc.
-
-The C++ code for the transcoder's inner loop is in [crn\_decomp.h](http://code.google.com/p/crunch/source/browse/trunk/inc/crn_decomp.h). DXT1 textures are handled by `crn_unpacker::unpack_dxt1()`.
-
-Zeng's technique is used to order the palettes so DPCM coding of the various block palette indices works efficiently. See [An efficient color re-indexing scheme for palette-based compression](http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=899448)
-
-For some textures, it's more efficient to reorder the palettes by similarity (effectively the [traveling salesman problem](http://en.wikipedia.org/wiki/Traveling_salesman_problem)) so they compress more effectively, but this can hurt index compression. The compressor tries several palette orderings and chooses whatever is cheapest overall.
-
-The example1 tool can display a bunch of information about .CRN files, such as the compressed size of each palette, Huffman tables, and mip levels. For example:
-
-```
-   E:\crunch17_3\bin>example1 i kodim04.crn
-   example1 - Version v1.00 Built Dec 27 2011, 17:18:08
-   Loading source file: kodim04.crn
-   crnd_validate_file:
-   File size: 85949
-   ActualDataSize: 85949
-   HeaderSize: 110
-   TotalPaletteSize: 12687
-   TablesSize: 1448
-   Levels: 10
-   LevelCompressedSize: 51830 14460 3968 1050 271 68 26 12 13 6 0 0 0 0 0 0
-   ColorEndpointPaletteSize: 2415
-   ColorSelectorPaletteSize: 2692
-   AlphaEndpointPaletteSize: 0
-   AlphaSelectorPaletteSize: 0
-   crnd_get_texture_info:
-   Dimensions: 512x768
-   Levels: 10
-   Faces: 1
-   BytesPerBlock: 8
-   UserData0: 0
-   UserData1: 0
-   CrnFormat: DXT1
-```
-
-
---
-
-
-## Transcoding to DXTn ##
-
-To transcode a mipmap level to DXTn, the palettes must be first unpacked, either into a temporary array or cache. Currently, all mipmaps in a .CRN file share the same set of endpoint/selector palettes. To generate the DXTn bits, the transcoder iterates through each macroblock and decodes the palette indices. The actual DXTn bits are effectively just memcpy'd from the palette arrays directly into the destination DXTn texture. The transcoder doesn't care at all what the endpoint/selector palette entries actually consist of during transcoding -- it just copies the bits. Transcoding is quite fast because it works at the macroblock/block level, never at the pixel level.
-
-
---
-
-
-## Compression ##
-
-The compressor is very complex, partially due to the weird and surprisingly deep properties imposed by the DXTn block format. It consists of two independent parts, called the "frontend" and "backend". The frontend is by far the most complex, and a good chunk of crnlib is devoted to helper classes used by the frontend.
-
-The frontend, located in [dxt\_hc.cpp/h](http://code.google.com/p/crunch/source/browse/trunk/crnlib/crn_dxt_hc.cpp), takes the 24/32bpp source texture mipmaps as inputs. It adaptively subdivides the texture macroblocks into tiles, finds the endpoint and selector clusters, and then generates optimized, but unordered palettes based off these clusters. The backend, located in [crn\_comp.cpp/h](http://code.google.com/p/crunch/source/browse/trunk/crnlib/crn_comp.cpp) takes the raw palettes, macroblock tile layouts, and indices supplied by the frontend and tries to efficiently code them.
-
-The color endpoint palettes are created from their source clusters using a very high quality, scalable DXT1 endpoint optimizer located in [crn\_dxt1.cpp/h](http://code.google.com/p/crunch/source/browse/trunk/crnlib/crn_dxt1.cpp). This custom optimizer is capable of processing any number of source pixels, instead of the typical hard coded 16. crnlib's DXT1 endpoint optimizer's quality (in a PSNR sense) is comparable to ATI's, NVidia's, or squish's. (I verified this while building the endpoint optimizer by randomly extracting millions of 4x4 pixel blocks from a large corpus of game textures and photos, compressing->decompressing them using each compressor, comparing the results, and ruthlessly investigating and fixing any blocks where crnlib's output was lower quality. I hope to eventually release this tool.)
-
-Interestingly, crnlib's DXT1 endpoint optimizer is equal or better than squish or ATI\_Compress (in a PSNR sense), and of comparable speed, without using a single line of SIMD or assembly code.
-
-The endpoint clusterization step uses top-down [cluster analysis](http://en.wikipedia.org/wiki/Cluster_analysis), and [vector quantization](http://en.wikipedia.org/wiki/Vector_quantization) is used to create the initial selector palette. The frontend performs several feedback passes, in between the clusterization and VQ steps, to optimize quality, and the compressor uses several brute force refinement stages to improve quality even more.
-
-Most of the compression steps are multithreaded in a relatively straightforward way: subdivide the work into independent threadpool tasks, fork to multiple threads, then join. The clusterizer is also multithreaded, where it forks to multiple threads after the initial tree subdivision steps.
-
-The .CRN format currently utilizes [canonical Huffman coding](http://en.wikipedia.org/wiki/Canonical_Huffman_code) for speed. The symbol codelengths for each Huffman table are sent in a simple compressed manner after the header (like Deflate).
-
-
---
-
-
-## The Path Forward ##
-
-Given a fixed amount of additional developer time to improve .CRN's bitrate/quality, I think the backend would benefit the most from more work. (So far, much more effort has been devoted to the DXT1 endpoint optimizer and the frontend stages.) The current format is probably favoring transcoding speed too highly vs. ratio. Also, the Huffman tables contain too many symbols, and alternatives to the DPCM coding should be explored.
-
-Ideas for crunch v2.0:
-  * Use techniques from LZHAM to improve backend coding (mix bitwise arithmetic with semi-adaptive Huffman).
-  * Port transcoder library to plain C vs. C++
-  * Add smarter prediction to the macroblock tile layout selector indices
-  * Native Javascript transcoders
-  * Palette compression improvements
-  * Split mipchain from mip0, so individual mips can be transcoded more quickly
-  * Support "raw" CRN files that use no additional compression (assume they will be post-compressed by the user using gzip or LZMA - useful for Javascript/WebGL)
-  * Support uncompressed palettes, for high speed random access in the transcoder
-  * Investigate 16x16 or 32x32 macroblock sizes. Optimize .CRN for bitrates below 1.0 bpp.
-  * Clustered (rate distortion optimized) DDS: Add support for ZLIB, LZO, and Snappy lossless post-compression
@@ -0,0 +1,52 @@
+
+Microsoft Visual Studio Solution File, Format Version 11.00
+# Visual Studio 2010
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "crunch", "crunch\crunch.2010.vcxproj", "{8F645BA1-B996-49EB-859B-970A671DE05D}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "crnlib", "crnlib\crnlib.2010.vcxproj", "{CF2E70E8-7133-4D96-92C7-68BB406C0664}"
+EndProject
+Global
+	GlobalSection(SolutionConfigurationPlatforms) = preSolution
+		Debug_DLL|Win32 = Debug_DLL|Win32
+		Debug_DLL|x64 = Debug_DLL|x64
+		Debug|Win32 = Debug|Win32
+		Debug|x64 = Debug|x64
+		Release_DLL|Win32 = Release_DLL|Win32
+		Release_DLL|x64 = Release_DLL|x64
+		Release|Win32 = Release|Win32
+		Release|x64 = Release|x64
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Debug_DLL|Win32.ActiveCfg = Debug|Win32
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Debug_DLL|x64.ActiveCfg = Debug|x64
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Debug|Win32.ActiveCfg = Debug|Win32
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Debug|Win32.Build.0 = Debug|Win32
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Debug|x64.ActiveCfg = Debug|x64
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Debug|x64.Build.0 = Debug|x64
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Release_DLL|Win32.ActiveCfg = Release|Win32
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Release_DLL|x64.ActiveCfg = Release|x64
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Release|Win32.ActiveCfg = Release|Win32
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Release|Win32.Build.0 = Release|Win32
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Release|x64.ActiveCfg = Release|x64
+		{8F645BA1-B996-49EB-859B-970A671DE05D}.Release|x64.Build.0 = Release|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug_DLL|Win32.ActiveCfg = Debug_DLL|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug_DLL|Win32.Build.0 = Debug_DLL|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug_DLL|x64.ActiveCfg = Debug_DLL|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug_DLL|x64.Build.0 = Debug_DLL|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug|Win32.ActiveCfg = Debug|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug|Win32.Build.0 = Debug|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug|x64.ActiveCfg = Debug|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Debug|x64.Build.0 = Debug|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release_DLL|Win32.ActiveCfg = Release_DLL|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release_DLL|Win32.Build.0 = Release_DLL|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release_DLL|x64.ActiveCfg = Release_DLL|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release_DLL|x64.Build.0 = Release_DLL|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release|Win32.ActiveCfg = Release|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release|Win32.Build.0 = Release|Win32
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release|x64.ActiveCfg = Release|x64
+		{CF2E70E8-7133-4D96-92C7-68BB406C0664}.Release|x64.Build.0 = Release|x64
+	EndGlobalSection
+	GlobalSection(SolutionProperties) = preSolution
+		HideSolutionNode = FALSE
+	EndGlobalSection
+EndGlobal
@@ -0,0 +1,9 @@
+<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
+<CodeBlocks_workspace_file>
+	<Workspace title="Workspace">
+		<Project filename="crunch/crunch.cbp" active="1">
+			<Depends filename="crnlib/crnlib.cbp" />
+		</Project>
+		<Project filename="crnlib/crnlib.cbp" />
+	</Workspace>
+</CodeBlocks_workspace_file>
@@ -0,0 +1,74 @@
+
+Microsoft Visual Studio Solution File, Format Version 11.00
+# Visual Studio 2010
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "example1", "example1\example1.2010.vcxproj", "{8F745B42-F996-49EB-859B-970A671DE05D}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "example2", "example2\example2.2010.vcxproj", "{AF745B42-F996-49EB-859B-970A671DEF5E}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "example3", "example3\example3.2010.vcxproj", "{AF745B42-E296-46EB-859B-970A671DEF5E}"
+EndProject
+Global
+	GlobalSection(SolutionConfigurationPlatforms) = preSolution
+		Debug_DLL|Win32 = Debug_DLL|Win32
+		Debug_DLL|x64 = Debug_DLL|x64
+		Debug|Win32 = Debug|Win32
+		Debug|x64 = Debug|x64
+		Release_DLL|Win32 = Release_DLL|Win32
+		Release_DLL|x64 = Release_DLL|x64
+		Release|Win32 = Release|Win32
+		Release|x64 = Release|x64
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug_DLL|Win32.ActiveCfg = Debug_DLL|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug_DLL|Win32.Build.0 = Debug_DLL|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug_DLL|x64.ActiveCfg = Debug_DLL|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug_DLL|x64.Build.0 = Debug_DLL|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug|Win32.ActiveCfg = Debug|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug|Win32.Build.0 = Debug|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug|x64.ActiveCfg = Debug|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Debug|x64.Build.0 = Debug|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release_DLL|Win32.ActiveCfg = Release_DLL|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release_DLL|Win32.Build.0 = Release_DLL|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release_DLL|x64.ActiveCfg = Release_DLL|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release_DLL|x64.Build.0 = Release_DLL|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release|Win32.ActiveCfg = Release|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release|Win32.Build.0 = Release|Win32
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release|x64.ActiveCfg = Release|x64
+		{8F745B42-F996-49EB-859B-970A671DE05D}.Release|x64.Build.0 = Release|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug_DLL|Win32.ActiveCfg = Debug_DLL|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug_DLL|Win32.Build.0 = Debug_DLL|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug_DLL|x64.ActiveCfg = Debug_DLL|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug_DLL|x64.Build.0 = Debug_DLL|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug|Win32.ActiveCfg = Debug|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug|Win32.Build.0 = Debug|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug|x64.ActiveCfg = Debug|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Debug|x64.Build.0 = Debug|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release_DLL|Win32.ActiveCfg = Release_DLL|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release_DLL|Win32.Build.0 = Release_DLL|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release_DLL|x64.ActiveCfg = Release_DLL|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release_DLL|x64.Build.0 = Release_DLL|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release|Win32.ActiveCfg = Release|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release|Win32.Build.0 = Release|Win32
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release|x64.ActiveCfg = Release|x64
+		{AF745B42-F996-49EB-859B-970A671DEF5E}.Release|x64.Build.0 = Release|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug_DLL|Win32.ActiveCfg = Debug_DLL|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug_DLL|Win32.Build.0 = Debug_DLL|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug_DLL|x64.ActiveCfg = Debug_DLL|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug_DLL|x64.Build.0 = Debug_DLL|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug|Win32.ActiveCfg = Debug|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug|Win32.Build.0 = Debug|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug|x64.ActiveCfg = Debug|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Debug|x64.Build.0 = Debug|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release_DLL|Win32.ActiveCfg = Release_DLL|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release_DLL|Win32.Build.0 = Release_DLL|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release_DLL|x64.ActiveCfg = Release_DLL|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release_DLL|x64.Build.0 = Release_DLL|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release|Win32.ActiveCfg = Release|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release|Win32.Build.0 = Release|Win32
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release|x64.ActiveCfg = Release|x64
+		{AF745B42-E296-46EB-859B-970A671DEF5E}.Release|x64.Build.0 = Release|x64
+	EndGlobalSection
+	GlobalSection(SolutionProperties) = preSolution
+		HideSolutionNode = FALSE
+	EndGlobalSection
+EndGlobal
@@ -0,0 +1,9 @@
+<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
+<CodeBlocks_workspace_file>
+	<Workspace title="Workspace">
+		<Project filename="crunch/crunch_linux.cbp" active="1">
+			<Depends filename="crnlib/crnlib_linux.cbp" />
+		</Project>
+		<Project filename="crnlib/crnlib_linux.cbp" />
+	</Workspace>
+</CodeBlocks_workspace_file>
@@ -1,336 +0,0 @@
-<html xmlns:v="urn:schemas-microsoft-com:vml"
-xmlns:o="urn:schemas-microsoft-com:office:office"
-xmlns:x="urn:schemas-microsoft-com:office:excel"
-xmlns="http://www.w3.org/TR/REC-html40">
-
-<head>
-<meta name="Excel Workbook Frameset">
-<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
-<meta name=ProgId content=Excel.Sheet>
-<meta name=Generator content="Microsoft Excel 12">
-<link rel=File-List href="crn_stats_dec_29_files/filelist.xml">
-<![if !supportTabStrip]>
-<link id="shLink" href="crn_stats_dec_29_files/sheet001.htm">
-<link id="shLink" href="crn_stats_dec_29_files/sheet002.htm">
-<link id="shLink" href="crn_stats_dec_29_files/sheet003.htm">
-
-<link id="shLink">
-
-<script language="JavaScript">
-<!--
- var c_lTabs=3;
-
- var c_rgszSh=new Array(c_lTabs);
- c_rgszSh[0] = "Sheet1";
- c_rgszSh[1] = "Sheet2";
- c_rgszSh[2] = "Sheet3";
-
-
-
- var c_rgszClr=new Array(8);
- c_rgszClr[0]="window";
- c_rgszClr[1]="buttonface";
- c_rgszClr[2]="windowframe";
- c_rgszClr[3]="windowtext";
- c_rgszClr[4]="threedlightshadow";
- c_rgszClr[5]="threedhighlight";
- c_rgszClr[6]="threeddarkshadow";
- c_rgszClr[7]="threedshadow";
-
- var g_iShCur;
- var g_rglTabX=new Array(c_lTabs);
-
-function fnGetIEVer()
-{
- var ua=window.navigator.userAgent
- var msie=ua.indexOf("MSIE")
- if (msie>0 && window.navigator.platform=="Win32")
-  return parseInt(ua.substring(msie+5,ua.indexOf(".", msie)));
- else
-  return 0;
-}
-
-function fnBuildFrameset()
-{
- var szHTML="<frameset rows=\"*,18\" border=0 width=0 frameborder=no framespacing=0>"+
-  "<frame src=\""+document.all.item("shLink")[0].href+"\" name=\"frSheet\" noresize>"+
-  "<frameset cols=\"54,*\" border=0 width=0 frameborder=no framespacing=0>"+
-  "<frame src=\"\" name=\"frScroll\" marginwidth=0 marginheight=0 scrolling=no>"+
-  "<frame src=\"\" name=\"frTabs\" marginwidth=0 marginheight=0 scrolling=no>"+
-  "</frameset></frameset><plaintext>";
-
- with (document) {
-  open("text/html","replace");
-  write(szHTML);
-  close();
- }
-
- fnBuildTabStrip();
-}
-
-function fnBuildTabStrip()
-{
- var szHTML=
-  "<html><head><style>.clScroll {font:8pt Courier New;color:"+c_rgszClr[6]+";cursor:default;line-height:10pt;}"+
-  ".clScroll2 {font:10pt Arial;color:"+c_rgszClr[6]+";cursor:default;line-height:11pt;}</style></head>"+
-  "<body onclick=\"event.returnValue=false;\" ondragstart=\"event.returnValue=false;\" onselectstart=\"event.returnValue=false;\" bgcolor="+c_rgszClr[4]+" topmargin=0 leftmargin=0><table cellpadding=0 cellspacing=0 width=100%>"+
-  "<tr><td colspan=6 height=1 bgcolor="+c_rgszClr[2]+"></td></tr>"+
-  "<tr><td style=\"font:1pt\">&nbsp;<td>"+
-  "<td valign=top id=tdScroll class=\"clScroll\" onclick=\"parent.fnFastScrollTabs(0);\" onmouseover=\"parent.fnMouseOverScroll(0);\" onmouseout=\"parent.fnMouseOutScroll(0);\"><a>&#171;</a></td>"+
-  "<td valign=top id=tdScroll class=\"clScroll2\" onclick=\"parent.fnScrollTabs(0);\" ondblclick=\"parent.fnScrollTabs(0);\" onmouseover=\"parent.fnMouseOverScroll(1);\" onmouseout=\"parent.fnMouseOutScroll(1);\"><a>&lt</a></td>"+
-  "<td valign=top id=tdScroll class=\"clScroll2\" onclick=\"parent.fnScrollTabs(1);\" ondblclick=\"parent.fnScrollTabs(1);\" onmouseover=\"parent.fnMouseOverScroll(2);\" onmouseout=\"parent.fnMouseOutScroll(2);\"><a>&gt</a></td>"+
-  "<td valign=top id=tdScroll class=\"clScroll\" onclick=\"parent.fnFastScrollTabs(1);\" onmouseover=\"parent.fnMouseOverScroll(3);\" onmouseout=\"parent.fnMouseOutScroll(3);\"><a>&#187;</a></td>"+
-  "<td style=\"font:1pt\">&nbsp;<td></tr></table></body></html>";
-
- with (frames['frScroll'].document) {
-  open("text/html","replace");
-  write(szHTML);
-  close();
- }
-
- szHTML =
-  "<html><head>"+
-  "<style>A:link,A:visited,A:active {text-decoration:none;"+"color:"+c_rgszClr[3]+";}"+
-  ".clTab {cursor:hand;background:"+c_rgszClr[1]+";font:9pt Arial;padding-left:3px;padding-right:3px;text-align:center;}"+
-  ".clBorder {background:"+c_rgszClr[2]+";font:1pt;}"+
-  "</style></head><body onload=\"parent.fnInit();\" onselectstart=\"event.returnValue=false;\" ondragstart=\"event.returnValue=false;\" bgcolor="+c_rgszClr[4]+
-  " topmargin=0 leftmargin=0><table id=tbTabs cellpadding=0 cellspacing=0>";
-
- var iCellCount=(c_lTabs+1)*2;
-
- var i;
- for (i=0;i<iCellCount;i+=2)
-  szHTML+="<col width=1><col>";
-
- var iRow;
- for (iRow=0;iRow<6;iRow++) {
-
-  szHTML+="<tr>";
-
-  if (iRow==5)
-   szHTML+="<td colspan="+iCellCount+"></td>";
-  else {
-   if (iRow==0) {
-    for(i=0;i<iCellCount;i++)
-     szHTML+="<td height=1 class=\"clBorder\"></td>";
-   } else if (iRow==1) {
-    for(i=0;i<c_lTabs;i++) {
-     szHTML+="<td height=1 nowrap class=\"clBorder\">&nbsp;</td>";
-     szHTML+=
-      "<td id=tdTab height=1 nowrap class=\"clTab\" onmouseover=\"parent.fnMouseOverTab("+i+");\" onmouseout=\"parent.fnMouseOutTab("+i+");\">"+
-      "<a href=\""+document.all.item("shLink")[i].href+"\" target=\"frSheet\" id=aTab>&nbsp;"+c_rgszSh[i]+"&nbsp;</a></td>";
-    }
-    szHTML+="<td id=tdTab height=1 nowrap class=\"clBorder\"><a id=aTab>&nbsp;</a></td><td width=100%></td>";
-   } else if (iRow==2) {
-    for (i=0;i<c_lTabs;i++)
-     szHTML+="<td height=1></td><td height=1 class=\"clBorder\"></td>";
-    szHTML+="<td height=1></td><td height=1></td>";
-   } else if (iRow==3) {
-    for (i=0;i<iCellCount;i++)
-     szHTML+="<td height=1></td>";
-   } else if (iRow==4) {
-    for (i=0;i<c_lTabs;i++)
-     szHTML+="<td height=1 width=1></td><td height=1></td>";
-    szHTML+="<td height=1 width=1></td><td></td>";
-   }
-  }
-  szHTML+="</tr>";
- }
-
- szHTML+="</table></body></html>";
- with (frames['frTabs'].document) {
-  open("text/html","replace");
-  charset=document.charset;
-  write(szHTML);
-  close();
- }
-}
-
-function fnInit()
-{
- g_rglTabX[0]=0;
- var i;
- for (i=1;i<=c_lTabs;i++)
-  with (frames['frTabs'].document.all.tbTabs.rows[1].cells[fnTabToCol(i-1)])
-   g_rglTabX[i]=offsetLeft+offsetWidth-6;
-}
-
-function fnTabToCol(iTab)
-{
- return 2*iTab+1;
-}
-
-function fnNextTab(fDir)
-{
- var iNextTab=-1;
- var i;
-
- with (frames['frTabs'].document.body) {
-  if (fDir==0) {
-   if (scrollLeft>0) {
-    for (i=0;i<c_lTabs&&g_rglTabX[i]<scrollLeft;i++);
-    if (i<c_lTabs)
-     iNextTab=i-1;
-   }
-  } else {
-   if (g_rglTabX[c_lTabs]+6>offsetWidth+scrollLeft) {
-    for (i=0;i<c_lTabs&&g_rglTabX[i]<=scrollLeft;i++);
-    if (i<c_lTabs)
-     iNextTab=i;
-   }
-  }
- }
- return iNextTab;
-}
-
-function fnScrollTabs(fDir)
-{
- var iNextTab=fnNextTab(fDir);
-
- if (iNextTab>=0) {
-  frames['frTabs'].scroll(g_rglTabX[iNextTab],0);
-  return true;
- } else
-  return false;
-}
-
-function fnFastScrollTabs(fDir)
-{
- if (c_lTabs>16)
-  frames['frTabs'].scroll(g_rglTabX[fDir?c_lTabs-1:0],0);
- else
-  if (fnScrollTabs(fDir)>0) window.setTimeout("fnFastScrollTabs("+fDir+");",5);
-}
-
-function fnSetTabProps(iTab,fActive)
-{
- var iCol=fnTabToCol(iTab);
- var i;
-
- if (iTab>=0) {
-  with (frames['frTabs'].document.all) {
-   with (tbTabs) {
-    for (i=0;i<=4;i++) {
-     with (rows[i]) {
-      if (i==0)
-       cells[iCol].style.background=c_rgszClr[fActive?0:2];
-      else if (i>0 && i<4) {
-       if (fActive) {
-        cells[iCol-1].style.background=c_rgszClr[2];
-        cells[iCol].style.background=c_rgszClr[0];
-        cells[iCol+1].style.background=c_rgszClr[2];
-       } else {
-        if (i==1) {
-         cells[iCol-1].style.background=c_rgszClr[2];
-         cells[iCol].style.background=c_rgszClr[1];
-         cells[iCol+1].style.background=c_rgszClr[2];
-        } else {
-         cells[iCol-1].style.background=c_rgszClr[4];
-         cells[iCol].style.background=c_rgszClr[(i==2)?2:4];
-         cells[iCol+1].style.background=c_rgszClr[4];
-        }
-       }
-      } else
-       cells[iCol].style.background=c_rgszClr[fActive?2:4];
-     }
-    }
-   }
-   with (aTab[iTab].style) {
-    cursor=(fActive?"default":"hand");
-    color=c_rgszClr[3];
-   }
-  }
- }
-}
-
-function fnMouseOverScroll(iCtl)
-{
- frames['frScroll'].document.all.tdScroll[iCtl].style.color=c_rgszClr[7];
-}
-
-function fnMouseOutScroll(iCtl)
-{
- frames['frScroll'].document.all.tdScroll[iCtl].style.color=c_rgszClr[6];
-}
-
-function fnMouseOverTab(iTab)
-{
- if (iTab!=g_iShCur) {
-  var iCol=fnTabToCol(iTab);
-  with (frames['frTabs'].document.all) {
-   tdTab[iTab].style.background=c_rgszClr[5];
-  }
- }
-}
-
-function fnMouseOutTab(iTab)
-{
- if (iTab>=0) {
-  var elFrom=frames['frTabs'].event.srcElement;
-  var elTo=frames['frTabs'].event.toElement;
-
-  if ((!elTo) ||
-   (elFrom.tagName==elTo.tagName) ||
-   (elTo.tagName=="A" && elTo.parentElement!=elFrom) ||
-   (elFrom.tagName=="A" && elFrom.parentElement!=elTo)) {
-
-   if (iTab!=g_iShCur) {
-    with (frames['frTabs'].document.all) {
-     tdTab[iTab].style.background=c_rgszClr[1];
-    }
-   }
-  }
- }
-}
-
-function fnSetActiveSheet(iSh)
-{
- if (iSh!=g_iShCur) {
-  fnSetTabProps(g_iShCur,false);
-  fnSetTabProps(iSh,true);
-  g_iShCur=iSh;
- }
-}
-
- window.g_iIEVer=fnGetIEVer();
- if (window.g_iIEVer>=4)
-  fnBuildFrameset();
-//-->
-</script>
-<![endif]><!--[if gte mso 9]><xml>
- <x:ExcelWorkbook>
-  <x:ExcelWorksheets>
-   <x:ExcelWorksheet>
-    <x:Name>Sheet1</x:Name>
-    <x:WorksheetSource HRef="crn_stats_dec_29_files/sheet001.htm"/>
-   </x:ExcelWorksheet>
-   <x:ExcelWorksheet>
-    <x:Name>Sheet2</x:Name>
-    <x:WorksheetSource HRef="crn_stats_dec_29_files/sheet002.htm"/>
-   </x:ExcelWorksheet>
-   <x:ExcelWorksheet>
-    <x:Name>Sheet3</x:Name>
-    <x:WorksheetSource HRef="crn_stats_dec_29_files/sheet003.htm"/>
-   </x:ExcelWorksheet>
-  </x:ExcelWorksheets>
-  <x:Stylesheet HRef="crn_stats_dec_29_files/stylesheet.css"/>
-  <x:WindowHeight>12915</x:WindowHeight>
-  <x:WindowWidth>28620</x:WindowWidth>
-  <x:WindowTopX>120</x:WindowTopX>
-  <x:WindowTopY>15</x:WindowTopY>
-  <x:ProtectStructure>False</x:ProtectStructure>
-  <x:ProtectWindows>False</x:ProtectWindows>
- </x:ExcelWorkbook>
-</xml><![endif]-->
-</head>
-
-<frameset rows="*,39" border=0 width=0 frameborder=no framespacing=0>
- <frame src="crn_stats_dec_29_files/sheet001.htm" name="frSheet">
- <frame src="crn_stats_dec_29_files/tabstrip.htm" name="frTabs" marginwidth=0 marginheight=0>
- <noframes>
-  <body>
-   <p>This page uses frames, but your browser doesn't support them.</p>
-  </body>
- </noframes>
-</frameset>
-</html>
@@ -0,0 +1,96 @@
+COMPILE_OPTIONS = -O3 -fomit-frame-pointer -ffast-math -fno-math-errno -g -fno-strict-aliasing -Wall -Wno-unused-value -Wno-unused -march=core2
+LINKER_OPTIONS = -lpthread -g
+
+OBJECTS = \
+  crn_arealist.o \
+  crn_assert.o \
+  crn_checksum.o \
+  crn_colorized_console.o \
+  crn_command_line_params.o \
+  crn_comp.o \
+  crn_console.o \
+  crn_core.o \
+  crn_data_stream.o \
+  crn_mipmapped_texture.o \
+  crn_decomp.o \
+  crn_dxt1.o \
+  crn_dxt5a.o \
+  crn_dxt.o \
+  crn_dxt_endpoint_refiner.o \
+  crn_dxt_fast.o \
+  crn_dxt_hc_common.o \
+  crn_dxt_hc.o \
+  crn_dxt_image.o \
+  crn_dynamic_string.o \
+  crn_file_utils.o \
+  crn_find_files.o \
+  crn_hash.o \
+  crn_hash_map.o \
+  crn_huffman_codes.o \
+  crn_image_utils.o \
+  crnlib.o \
+  crn_math.o \
+  crn_mem.o \
+  crn_pixel_format.o \
+  crn_platform.o \
+  crn_prefix_coding.o \
+  crn_qdxt1.o \
+  crn_qdxt5.o \
+  crn_rand.o \
+  crn_resample_filters.o \
+  crn_resampler.o \
+  crn_ryg_dxt.o \
+  crn_sparse_bit_array.o \
+  crn_stb_image.o \
+  crn_strutils.o \
+  crn_symbol_codec.o \
+  crn_texture_file_types.o \
+  crn_threaded_resampler.o \
+  crn_threading_pthreads.o \
+  crn_timer.o \
+  crn_utils.o \
+  crn_value.o \
+  crn_vector.o \
+  crn_zeng.o \
+  crn_texture_comp.o \
+  crn_texture_conversion.o \
+  crn_dds_comp.o \
+  crn_lzma_codec.o \
+  crn_ktx_texture.o \
+  crn_etc.o \
+  crn_rg_etc1.o \
+  crn_miniz.o \
+  crn_jpge.o \
+  crn_jpgd.o \
+  lzma_7zBuf2.o \
+  lzma_7zBuf.o \
+  lzma_7zCrc.o \
+  lzma_7zFile.o \
+  lzma_7zStream.o \
+  lzma_Alloc.o \
+  lzma_Bcj2.o \
+  lzma_Bra86.o \
+  lzma_Bra.o \
+  lzma_BraIA64.o \
+  lzma_LzFind.o \
+  lzma_LzmaDec.o \
+  lzma_LzmaEnc.o \
+  lzma_LzmaLib.o
+
+all: crunch
+
+%.o: %.cpp
+	g++ $< -o $@ -c $(COMPILE_OPTIONS)
+
+crunch.o: ../crunch/crunch.cpp
+	g++ $< -o $@ -c -I../inc -I../crnlib $(COMPILE_OPTIONS)
+
+corpus_gen.o: ../crunch/corpus_gen.cpp
+	g++ $< -o $@ -c -I../inc -I../crnlib $(COMPILE_OPTIONS)
+
+corpus_test.o: ../crunch/corpus_test.cpp
+	g++ $< -o $@ -c -I../inc -I../crnlib $(COMPILE_OPTIONS)
+
+crunch: $(OBJECTS) crunch.o corpus_gen.o corpus_test.o
+	g++ $(OBJECTS) crunch.o corpus_gen.o corpus_test.o -o crunch $(LINKER_OPTIONS)
+
@@ -0,0 +1,625 @@
+// File: crn_arealist.cpp - 2D shape algebra (currently unused)
+// See Copyright Notice and license at the end of inc/crnlib.h
+// Ported from the PowerView DOS image viewer, a product I wrote back in 1993. Not currently used in the open source release of crnlib.
+#include "crn_core.h"
+#include "crn_arealist.h"
+
+#define RECT_DEBUG
+
+namespace crnlib {
+
+static void area_fatal_error(const char*, const char* pMsg, ...) {
+  va_list args;
+  va_start(args, pMsg);
+
+  char buf[512];
+#ifdef _MSC_VER
+  _vsnprintf_s(buf, sizeof(buf), pMsg, args);
+#else
+  vsnprintf(buf, sizeof(buf), pMsg, args);
+#endif
+
+  va_end(args);
+
+  CRNLIB_FAIL(buf);
+}
+
+static Area* delete_area(Area_List* Plist, Area* Parea) {
+  Area *p, *q;
+
+#ifdef RECT_DEBUG
+  if ((Parea == Plist->Phead) || (Parea == Plist->Ptail))
+    area_fatal_error("delete_area", "tried to remove head or tail");
+#endif
+
+  p = Parea->Pprev;
+  q = Parea->Pnext;
+  p->Pnext = q;
+  q->Pprev = p;
+
+  Parea->Pnext = Plist->Pfree;
+  Parea->Pprev = NULL;
+  Plist->Pfree = Parea;
+
+  return (q);
+}
+
+static Area* alloc_area(Area_List* Plist) {
+  Area* p = Plist->Pfree;
+
+  if (p == NULL) {
+    if (Plist->next_free == Plist->total_areas)
+      area_fatal_error("alloc_area", "Out of areas!");
+
+    p = Plist->Phead + Plist->next_free;
+    Plist->next_free++;
+  } else
+    Plist->Pfree = p->Pnext;
+
+  return (p);
+}
+
+static Area* insert_area_before(Area_List* Plist, Area* Parea,
+                                int x1, int y1, int x2, int y2) {
+  Area *p, *Pnew_area = alloc_area(Plist);
+
+  p = Parea->Pprev;
+
+  p->Pnext = Pnew_area;
+
+  Pnew_area->Pprev = p;
+  Pnew_area->Pnext = Parea;
+
+  Parea->Pprev = Pnew_area;
+
+  Pnew_area->x1 = x1;
+  Pnew_area->y1 = y1;
+  Pnew_area->x2 = x2;
+  Pnew_area->y2 = y2;
+
+  return (Pnew_area);
+}
+
+static Area* insert_area_after(Area_List* Plist, Area* Parea,
+                               int x1, int y1, int x2, int y2) {
+  Area *p, *Pnew_area = alloc_area(Plist);
+
+  p = Parea->Pnext;
+
+  p->Pprev = Pnew_area;
+
+  Pnew_area->Pnext = p;
+  Pnew_area->Pprev = Parea;
+
+  Parea->Pnext = Pnew_area;
+
+  Pnew_area->x1 = x1;
+  Pnew_area->y1 = y1;
+  Pnew_area->x2 = x2;
+  Pnew_area->y2 = y2;
+
+  return (Pnew_area);
+}
+
+void Area_List_deinit(Area_List* Pobj_base) {
+  Area_List* Plist = (Area_List*)Pobj_base;
+
+  if (!Plist)
+    return;
+
+  if (Plist->Phead) {
+    crnlib_free(Plist->Phead);
+    Plist->Phead = NULL;
+  }
+
+  crnlib_free(Plist);
+}
+
+Area_List* Area_List_init(int max_areas) {
+  Area_List* Plist = (Area_List*)crnlib_calloc(1, sizeof(Area_List));
+
+  Plist->total_areas = max_areas + 2;
+
+  Plist->Phead = (Area*)crnlib_calloc(max_areas + 2, sizeof(Area));
+  Plist->Ptail = Plist->Phead + 1;
+
+  Plist->Phead->Pprev = NULL;
+  Plist->Phead->Pnext = Plist->Ptail;
+
+  Plist->Ptail->Pprev = Plist->Phead;
+  Plist->Ptail->Pnext = NULL;
+
+  Plist->Pfree = NULL;
+  Plist->next_free = 2;
+
+  return (Plist);
+}
+
+void Area_List_print(Area_List* Plist) {
+  Area* Parea = Plist->Phead->Pnext;
+
+  while (Parea != Plist->Ptail) {
+    printf("%04i %04i : %04i %04i\n", Parea->x1, Parea->y1, Parea->x2, Parea->y2);
+
+    Parea = Parea->Pnext;
+  }
+}
+
+Area_List* Area_List_dup_new(Area_List* Plist,
+                             int x_ofs, int y_ofs) {
+  int i;
+  Area_List* Pnew_list = (Area_List*)crnlib_calloc(1, sizeof(Area_List));
+
+  Pnew_list->total_areas = Plist->total_areas;
+
+  Pnew_list->Phead = (Area*)crnlib_malloc(sizeof(Area) * Plist->total_areas);
+  Pnew_list->Ptail = Pnew_list->Phead + 1;
+
+  Pnew_list->Pfree = (Plist->Pfree) ? ((Plist->Pfree - Plist->Phead) + Pnew_list->Phead) : NULL;
+
+  Pnew_list->next_free = Plist->next_free;
+
+  memcpy(Pnew_list->Phead, Plist->Phead, sizeof(Area) * Plist->total_areas);
+
+  for (i = 0; i < Plist->total_areas; i++) {
+    Pnew_list->Phead[i].Pnext = (Plist->Phead[i].Pnext == NULL) ? NULL : (Plist->Phead[i].Pnext - Plist->Phead) + Pnew_list->Phead;
+    Pnew_list->Phead[i].Pprev = (Plist->Phead[i].Pprev == NULL) ? NULL : (Plist->Phead[i].Pprev - Plist->Phead) + Pnew_list->Phead;
+
+    Pnew_list->Phead[i].x1 += x_ofs;
+    Pnew_list->Phead[i].y1 += y_ofs;
+    Pnew_list->Phead[i].x2 += x_ofs;
+    Pnew_list->Phead[i].y2 += y_ofs;
+  }
+
+  return (Pnew_list);
+}
+
+uint Area_List_get_num(Area_List* Plist) {
+  uint num = 0;
+
+  Area* Parea = Plist->Phead->Pnext;
+
+  while (Parea != Plist->Ptail) {
+    num++;
+
+    Parea = Parea->Pnext;
+  }
+
+  return num;
+}
+
+void Area_List_dup(Area_List* Psrc_list, Area_List* Pdst_list,
+                   int x_ofs, int y_ofs) {
+  int i;
+
+  if (Psrc_list->total_areas != Pdst_list->total_areas)
+    area_fatal_error("Area_List_dup", "Src and Dst total_areas must be equal!");
+
+  Pdst_list->Pfree = (Psrc_list->Pfree) ? ((Psrc_list->Pfree - Psrc_list->Phead) + Pdst_list->Phead) : NULL;
+
+  Pdst_list->next_free = Psrc_list->next_free;
+
+  memcpy(Pdst_list->Phead, Psrc_list->Phead, sizeof(Area) * Psrc_list->total_areas);
+
+  if ((x_ofs) || (y_ofs)) {
+    for (i = 0; i < Psrc_list->total_areas; i++) {
+      Pdst_list->Phead[i].Pnext = (Psrc_list->Phead[i].Pnext == NULL) ? NULL : (Psrc_list->Phead[i].Pnext - Psrc_list->Phead) + Pdst_list->Phead;
+      Pdst_list->Phead[i].Pprev = (Psrc_list->Phead[i].Pprev == NULL) ? NULL : (Psrc_list->Phead[i].Pprev - Psrc_list->Phead) + Pdst_list->Phead;
+
+      Pdst_list->Phead[i].x1 += x_ofs;
+      Pdst_list->Phead[i].y1 += y_ofs;
+      Pdst_list->Phead[i].x2 += x_ofs;
+      Pdst_list->Phead[i].y2 += y_ofs;
+    }
+  } else {
+    for (i = 0; i < Psrc_list->total_areas; i++) {
+      Pdst_list->Phead[i].Pnext = (Psrc_list->Phead[i].Pnext == NULL) ? NULL : (Psrc_list->Phead[i].Pnext - Psrc_list->Phead) + Pdst_list->Phead;
+      Pdst_list->Phead[i].Pprev = (Psrc_list->Phead[i].Pprev == NULL) ? NULL : (Psrc_list->Phead[i].Pprev - Psrc_list->Phead) + Pdst_list->Phead;
+    }
+  }
+}
+
+void Area_List_copy(
+    Area_List* Psrc_list, Area_List* Pdst_list,
+    int x_ofs, int y_ofs) {
+  Area* Parea = Psrc_list->Phead->Pnext;
+
+  Area_List_clear(Pdst_list);
+
+  if ((x_ofs) || (y_ofs)) {
+    Area* Pprev_area = Pdst_list->Phead;
+
+    while (Parea != Psrc_list->Ptail) {
+      //      Area *p, *Pnew_area;
+      Area* Pnew_area;
+
+      if (Pdst_list->next_free == Pdst_list->total_areas)
+        area_fatal_error("Area_List_copy", "Out of areas!");
+
+      Pnew_area = Pdst_list->Phead + Pdst_list->next_free;
+      Pdst_list->next_free++;
+
+      Pnew_area->Pprev = Pprev_area;
+      Pprev_area->Pnext = Pnew_area;
+
+      Pnew_area->x1 = Parea->x1 + x_ofs;
+      Pnew_area->y1 = Parea->y1 + y_ofs;
+      Pnew_area->x2 = Parea->x2 + x_ofs;
+      Pnew_area->y2 = Parea->y2 + y_ofs;
+
+      Pprev_area = Pnew_area;
+
+      Parea = Parea->Pnext;
+    }
+
+    Pprev_area->Pnext = Pdst_list->Ptail;
+  } else {
+#if 0
+         while (Parea != Psrc_list->Ptail)
+         {
+            insert_area_after(Pdst_list, Pdst_list->Phead,
+               Parea->x1,
+               Parea->y1,
+               Parea->x2,
+               Parea->y2);
+
+            Parea = Parea->Pnext;
+         }
+#endif
+
+    Area* Pprev_area = Pdst_list->Phead;
+
+    while (Parea != Psrc_list->Ptail) {
+      //      Area *p, *Pnew_area;
+      Area* Pnew_area;
+
+      if (Pdst_list->next_free == Pdst_list->total_areas)
+        area_fatal_error("Area_List_copy", "Out of areas!");
+
+      Pnew_area = Pdst_list->Phead + Pdst_list->next_free;
+      Pdst_list->next_free++;
+
+      Pnew_area->Pprev = Pprev_area;
+      Pprev_area->Pnext = Pnew_area;
+
+      Pnew_area->x1 = Parea->x1;
+      Pnew_area->y1 = Parea->y1;
+      Pnew_area->x2 = Parea->x2;
+      Pnew_area->y2 = Parea->y2;
+
+      Pprev_area = Pnew_area;
+
+      Parea = Parea->Pnext;
+    }
+
+    Pprev_area->Pnext = Pdst_list->Ptail;
+  }
+}
+
+void Area_List_clear(Area_List* Plist) {
+  Plist->Phead->Pnext = Plist->Ptail;
+  Plist->Ptail->Pprev = Plist->Phead;
+  Plist->Pfree = NULL;
+  Plist->next_free = 2;
+}
+
+void Area_List_set(Area_List* Plist, int x1, int y1, int x2, int y2) {
+  Plist->Pfree = NULL;
+
+  Plist->Phead[2].x1 = x1;
+  Plist->Phead[2].y1 = y1;
+  Plist->Phead[2].x2 = x2;
+  Plist->Phead[2].y2 = y2;
+
+  Plist->Phead[2].Pprev = Plist->Phead;
+  Plist->Phead->Pnext = Plist->Phead + 2;
+
+  Plist->Phead[2].Pnext = Plist->Ptail;
+  Plist->Ptail->Pprev = Plist->Phead + 2;
+
+  Plist->next_free = 3;
+}
+
+void Area_List_remove(Area_List* Plist,
+                      int x1, int y1, int x2, int y2) {
+  int l, h;
+  Area* Parea = Plist->Phead->Pnext;
+
+#ifdef RECT_DEBUG
+  if ((x1 > x2) || (y1 > y2))
+    area_fatal_error("area_list_remove", "invalid coords: %i %i %i %i", x1, y1, x2, y2);
+#endif
+
+  while (Parea != Plist->Ptail) {
+    // Not touching
+    if ((x2 < Parea->x1) || (x1 > Parea->x2) ||
+        (y2 < Parea->y1) || (y1 > Parea->y2)) {
+      Parea = Parea->Pnext;
+      continue;
+    }
+
+    // Completely covers
+    if ((x1 <= Parea->x1) && (x2 >= Parea->x2) &&
+        (y1 <= Parea->y1) && (y2 >= Parea->y2)) {
+      if ((x1 == Parea->x1) && (x2 == Parea->x2) &&
+          (y1 == Parea->y1) && (y2 == Parea->y2)) {
+        delete_area(Plist, Parea);
+        return;
+      }
+
+      Parea = delete_area(Plist, Parea);
+
+      continue;
+    }
+
+    // top
+    if (y1 > Parea->y1) {
+      insert_area_before(Plist, Parea,
+                         Parea->x1, Parea->y1,
+                         Parea->x2, y1 - 1);
+    }
+
+    // bottom
+    if (y2 < Parea->y2) {
+      insert_area_before(Plist, Parea,
+                         Parea->x1, y2 + 1,
+                         Parea->x2, Parea->y2);
+    }
+
+    l = math::maximum(y1, Parea->y1);
+    h = math::minimum(y2, Parea->y2);
+
+    // left middle
+    if (x1 > Parea->x1) {
+      insert_area_before(Plist, Parea,
+                         Parea->x1, l,
+                         x1 - 1, h);
+    }
+
+    // right middle
+    if (x2 < Parea->x2) {
+      insert_area_before(Plist, Parea,
+                         x2 + 1, l,
+                         Parea->x2, h);
+    }
+
+    // early out - we know there's nothing else to remove, as areas can
+    // never overlap
+    if ((x1 >= Parea->x1) && (x2 <= Parea->x2) &&
+        (y1 >= Parea->y1) && (y2 <= Parea->y2)) {
+      delete_area(Plist, Parea);
+      return;
+    }
+
+    Parea = delete_area(Plist, Parea);
+  }
+}
+
+void Area_List_insert(Area_List* Plist,
+                      int x1, int y1, int x2, int y2,
+                      bool combine) {
+  Area* Parea = Plist->Phead->Pnext;
+
+#ifdef RECT_DEBUG
+  if ((x1 > x2) || (y1 > y2))
+    area_fatal_error("Area_List_insert", "invalid coords: %i %i %i %i", x1, y1, x2, y2);
+#endif
+
+  while (Parea != Plist->Ptail) {
+    // totally covers
+    if ((x1 <= Parea->x1) && (x2 >= Parea->x2) &&
+        (y1 <= Parea->y1) && (y2 >= Parea->y2)) {
+      Parea = delete_area(Plist, Parea);
+      continue;
+    }
+
+    // intersects
+    if ((x2 >= Parea->x1) && (x1 <= Parea->x2) &&
+        (y2 >= Parea->y1) && (y1 <= Parea->y2)) {
+      int ax1, ay1, ax2, ay2;
+
+      ax1 = Parea->x1;
+      ay1 = Parea->y1;
+      ax2 = Parea->x2;
+      ay2 = Parea->y2;
+
+      if (x1 < ax1)
+        Area_List_insert(Plist, x1, math::maximum(y1, ay1), ax1 - 1, math::minimum(y2, ay2), combine);
+
+      if (x2 > ax2)
+        Area_List_insert(Plist, ax2 + 1, math::maximum(y1, ay1), x2, math::minimum(y2, ay2), combine);
+
+      if (y1 < ay1)
+        Area_List_insert(Plist, x1, y1, x2, ay1 - 1, combine);
+
+      if (y2 > ay2)
+        Area_List_insert(Plist, x1, ay2 + 1, x2, y2, combine);
+
+      return;
+    }
+
+    if (combine) {
+      if ((x1 == Parea->x1) && (x2 == Parea->x2)) {
+        if ((y2 == Parea->y1 - 1) || (y1 == Parea->y2 + 1)) {
+          delete_area(Plist, Parea);
+          Area_List_insert(Plist, x1, math::minimum(y1, Parea->y1), x2, math::maximum(y2, Parea->y2), CRNLIB_TRUE);
+          return;
+        }
+      } else if ((y1 == Parea->y1) && (y2 == Parea->y2)) {
+        if ((x2 == Parea->x1 - 1) || (x1 == Parea->x2 + 1)) {
+          delete_area(Plist, Parea);
+          Area_List_insert(Plist, math::minimum(x1, Parea->x1), y1, math::maximum(x2, Parea->x2), y2, CRNLIB_TRUE);
+          return;
+        }
+      }
+    }
+
+    Parea = Parea->Pnext;
+  }
+
+  insert_area_before(Plist, Parea, x1, y1, x2, y2);
+}
+
+void Area_List_intersect_area(Area_List* Plist,
+                              int x1, int y1, int x2, int y2) {
+  Area* Parea = Plist->Phead->Pnext;
+
+  while (Parea != Plist->Ptail) {
+    // doesn't cover
+    if ((x2 < Parea->x1) || (x1 > Parea->x2) ||
+        (y2 < Parea->y1) || (y1 > Parea->y2)) {
+      Parea = delete_area(Plist, Parea);
+      continue;
+    }
+
+    // totally covers
+    if ((x1 <= Parea->x1) && (x2 >= Parea->x2) &&
+        (y1 <= Parea->y1) && (y2 >= Parea->y2)) {
+      Parea = Parea->Pnext;
+      continue;
+    }
+
+    // Oct 21- should insert after, because deleted area will access the NEXT area!
+    //    insert_area_after(Plist, Parea,
+    //                      math::maximum(x1, Parea->x1),
+    //                      math::maximum(y1, Parea->y1),
+    //                      math::minimum(x2, Parea->x2),
+    //                      math::minimum(y2, Parea->y2));
+
+    insert_area_before(Plist, Parea,
+                       math::maximum(x1, Parea->x1),
+                       math::maximum(y1, Parea->y1),
+                       math::minimum(x2, Parea->x2),
+                       math::minimum(y2, Parea->y2));
+
+    Parea = delete_area(Plist, Parea);
+  }
+}
+
+#if 0
+   void Area_List_intersect_Area_List(
+                                      Area_List *Pouter_list,
+                                      Area_List *Pinner_list,
+                                      Area_List *Pdst_list)
+   {
+      Area *Parea1 = Pouter_list->Phead->Pnext;
+
+      while (Parea1 != Pouter_list->Ptail)
+      {
+         Area *Parea2 = Pinner_list->Phead->Pnext;
+         int x1, y1, x2, y2;
+
+         x1 = Parea1->x1; x2 = Parea1->x2;
+         y1 = Parea1->y1; y2 = Parea1->y2;
+
+         while (Parea2 != Pinner_list->Ptail)
+         {
+            if ((x1 <= Parea2->x2) && (x2 >= Parea2->x1) &&
+               (y1 <= Parea2->y2) && (y2 >= Parea2->y1))
+            {
+               insert_area_after(Pdst_list, Pdst_list->Phead,
+                  math::maximum(x1, Parea2->x1),
+                  math::maximum(y1, Parea2->y1),
+                  math::minimum(x2, Parea2->x2),
+                  math::minimum(y2, Parea2->y2));
+            }
+
+            Parea2 = Parea2->Pnext;
+         }
+
+         Parea1 = Parea1->Pnext;
+      }
+   }
+#endif
+
+#if 1
+void Area_List_intersect_Area_List(Area_List* Pouter_list,
+                                   Area_List* Pinner_list,
+                                   Area_List* Pdst_list) {
+  Area* Parea1 = Pouter_list->Phead->Pnext;
+
+  while (Parea1 != Pouter_list->Ptail) {
+    Area* Parea2 = Pinner_list->Phead->Pnext;
+    int x1, y1, x2, y2;
+
+    x1 = Parea1->x1;
+    x2 = Parea1->x2;
+    y1 = Parea1->y1;
+    y2 = Parea1->y2;
+
+    while (Parea2 != Pinner_list->Ptail) {
+      if ((x1 <= Parea2->x2) && (x2 >= Parea2->x1) &&
+          (y1 <= Parea2->y2) && (y2 >= Parea2->y1)) {
+        int nx1, ny1, nx2, ny2;
+
+        nx1 = math::maximum(x1, Parea2->x1);
+        ny1 = math::maximum(y1, Parea2->y1);
+        nx2 = math::minimum(x2, Parea2->x2);
+        ny2 = math::minimum(y2, Parea2->y2);
+
+        if (Pdst_list->Phead->Pnext == Pdst_list->Ptail) {
+          insert_area_after(Pdst_list, Pdst_list->Phead,
+                            nx1, ny1, nx2, ny2);
+        } else {
+          Area_Ptr Ptemp = Pdst_list->Phead->Pnext;
+          if ((Ptemp->x1 == nx1) && (Ptemp->x2 == nx2)) {
+            if (Ptemp->y1 == (ny2 + 1)) {
+              Ptemp->y1 = ny1;
+              goto next;
+            } else if (Ptemp->y2 == (ny1 - 1)) {
+              Ptemp->y2 = ny2;
+              goto next;
+            }
+          } else if ((Ptemp->y1 == ny1) && (Ptemp->y2 == ny2)) {
+            if (Ptemp->x1 == (nx2 + 1)) {
+              Ptemp->x1 = nx1;
+              goto next;
+            } else if (Ptemp->x2 == (nx1 - 1)) {
+              Ptemp->x2 = nx2;
+              goto next;
+            }
+          }
+
+          insert_area_after(Pdst_list, Pdst_list->Phead,
+                            nx1, ny1, nx2, ny2);
+        }
+      }
+
+    next:
+
+      Parea2 = Parea2->Pnext;
+    }
+
+    Parea1 = Parea1->Pnext;
+  }
+}
+#endif
+
+Area_List_Ptr Area_List_create_optimal(Area_List_Ptr Plist) {
+  Area_Ptr Parea = Plist->Phead->Pnext, Parea_after;
+  int num = 2;
+  Area_List_Ptr Pnew_list;
+
+  while (Parea != Plist->Ptail) {
+    num++;
+    Parea = Parea->Pnext;
+  }
+
+  Pnew_list = Area_List_init(num);
+
+  Parea = Plist->Phead->Pnext;
+
+  Parea_after = Pnew_list->Phead;
+
+  while (Parea != Plist->Ptail) {
+    Parea_after = insert_area_after(Pnew_list, Parea_after,
+                                    Parea->x1, Parea->y1,
+                                    Parea->x2, Parea->y2);
+
+    Parea = Parea->Pnext;
+  }
+
+  return (Pnew_list);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,71 @@
+// File: crn_arealist.h - 2D shape algebra
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+struct Area {
+  struct Area *Pprev, *Pnext;
+
+  int x1, y1, x2, y2;
+
+  uint get_width() const { return x2 - x1 + 1; }
+  uint get_height() const { return y2 - y1 + 1; }
+  uint get_area() const { return get_width() * get_height(); }
+};
+
+typedef Area* Area_Ptr;
+
+struct Area_List {
+  int total_areas;
+  int next_free;
+
+  Area *Phead, *Ptail, *Pfree;
+};
+
+typedef Area_List* Area_List_Ptr;
+
+Area_List* Area_List_init(int max_areas);
+void Area_List_deinit(Area_List* Pobj_base);
+
+void Area_List_print(Area_List* Plist);
+
+Area_List* Area_List_dup_new(Area_List* Plist,
+                             int x_ofs, int y_ofs);
+
+uint Area_List_get_num(Area_List* Plist);
+
+// src and dst area lists must have the same number of total areas.
+void Area_List_dup(Area_List* Psrc_list,
+                   Area_List* Pdst_list,
+                   int x_ofs, int y_ofs);
+
+void Area_List_copy(Area_List* Psrc_list,
+                    Area_List* Pdst_list,
+                    int x_ofs, int y_ofs);
+
+void Area_List_clear(Area_List* Plist);
+
+void Area_List_set(Area_List* Plist,
+                   int x1, int y1, int x2, int y2);
+
+// logical: x and (not y)
+void Area_List_remove(Area_List* Plist,
+                      int x1, int y1, int x2, int y2);
+
+// logical: x or y
+void Area_List_insert(Area_List* Plist,
+                      int x1, int y1, int x2, int y2,
+                      bool combine);
+
+// logical: x and y
+void Area_List_intersect_area(Area_List* Plist,
+                              int x1, int y1, int x2, int y2);
+
+// logical: x and y
+void Area_List_intersect_Area_List(Area_List* Pouter_list,
+                                   Area_List* Pinner_list,
+                                   Area_List* Pdst_list);
+
+Area_List_Ptr Area_List_create_optimal(Area_List_Ptr Plist);
+
+}  // namespace crnlib
@@ -0,0 +1,63 @@
+// File: crn_assert.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#if CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+#endif
+
+static bool g_fail_exceptions;
+static bool g_exit_on_failure = true;
+
+void crnlib_enable_fail_exceptions(bool enabled) {
+  g_fail_exceptions = enabled;
+}
+
+void crnlib_assert(const char* pExp, const char* pFile, unsigned line) {
+  char buf[512];
+
+  sprintf_s(buf, sizeof(buf), "%s(%u): Assertion failed: \"%s\"\n", pFile, line, pExp);
+
+  crnlib_output_debug_string(buf);
+
+  fputs(buf, stderr);
+
+  if (crnlib_is_debugger_present())
+    crnlib_debug_break();
+}
+
+void crnlib_fail(const char* pExp, const char* pFile, unsigned line) {
+  char buf[512];
+
+  sprintf_s(buf, sizeof(buf), "%s(%u): Failure: \"%s\"\n", pFile, line, pExp);
+
+  crnlib_output_debug_string(buf);
+
+  fputs(buf, stderr);
+
+  if (crnlib_is_debugger_present())
+    crnlib_debug_break();
+
+#if CRNLIB_USE_WIN32_API
+  if (g_fail_exceptions)
+    RaiseException(CRNLIB_FAIL_EXCEPTION_CODE, 0, 0, NULL);
+  else
+#endif
+      if (g_exit_on_failure)
+    exit(EXIT_FAILURE);
+}
+
+void trace(const char* pFmt, va_list args) {
+  if (crnlib_is_debugger_present()) {
+    char buf[512];
+    vsprintf_s(buf, sizeof(buf), pFmt, args);
+
+    crnlib_output_debug_string(buf);
+  }
+};
+
+void trace(const char* pFmt, ...) {
+  va_list args;
+  va_start(args, pFmt);
+  trace(pFmt, args);
+  va_end(args);
+};
@@ -0,0 +1,67 @@
+// File: crn_assert.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+const unsigned int CRNLIB_FAIL_EXCEPTION_CODE = 256U;
+void crnlib_enable_fail_exceptions(bool enabled);
+
+void crnlib_assert(const char* pExp, const char* pFile, unsigned line);
+void crnlib_fail(const char* pExp, const char* pFile, unsigned line);
+
+#ifdef NDEBUG
+#define CRNLIB_ASSERT(x) ((void)0)
+#undef CRNLIB_ASSERTS_ENABLED
+#else
+#define CRNLIB_ASSERT(_exp) (void)((!!(_exp)) || (crnlib_assert(#_exp, __FILE__, __LINE__), 0))
+#define CRNLIB_ASSERTS_ENABLED
+#endif
+
+#define CRNLIB_VERIFY(_exp) (void)((!!(_exp)) || (crnlib_assert(#_exp, __FILE__, __LINE__), 0))
+
+#define CRNLIB_FAIL(msg)                   \
+  do {                                     \
+    crnlib_fail(#msg, __FILE__, __LINE__); \
+  } while (0)
+
+#define CRNLIB_ASSERT_OPEN_RANGE(x, l, h) CRNLIB_ASSERT((x >= l) && (x < h))
+#define CRNLIB_ASSERT_CLOSED_RANGE(x, l, h) CRNLIB_ASSERT((x >= l) && (x <= h))
+
+void trace(const char* pFmt, va_list args);
+void trace(const char* pFmt, ...);
+
+// Borrowed from boost libraries.
+template <bool x>
+struct crnlib_assume_failure;
+template <>
+struct crnlib_assume_failure<true> {
+  enum { blah = 1 };
+};
+template <int x>
+struct crnlib_assume_try {};
+
+#define CRNLIB_JOINER_FINAL(a, b) a##b
+#define CRNLIB_JOINER(a, b) CRNLIB_JOINER_FINAL(a, b)
+#define CRNLIB_JOIN(a, b) CRNLIB_JOINER(a, b)
+#define CRNLIB_ASSUME(p) typedef crnlib_assume_try<sizeof(crnlib_assume_failure<(bool)(p)>)> CRNLIB_JOIN(crnlib_assume_typedef, __COUNTER__)
+
+#ifdef NDEBUG
+template <typename T>
+inline T crnlib_assert_range(T i, T) {
+  return i;
+}
+template <typename T>
+inline T crnlib_assert_range_incl(T i, T) {
+  return i;
+}
+#else
+template <typename T>
+inline T crnlib_assert_range(T i, T m) {
+  CRNLIB_ASSERT((i >= 0) && (i < m));
+  return i;
+}
+template <typename T>
+inline T crnlib_assert_range_incl(T i, T m) {
+  CRNLIB_ASSERT((i >= 0) && (i <= m));
+  return i;
+}
+#endif
@@ -0,0 +1,184 @@
+// File: crn_atomics.h
+#ifndef CRN_ATOMICS_H
+#define CRN_ATOMICS_H
+
+#ifdef WIN32
+#pragma once
+#endif
+
+#ifdef WIN32
+#include "crn_winhdr.h"
+#endif
+
+#if defined(__GNUC__) && CRNLIB_PLATFORM_PC
+extern __inline__ __attribute__((__always_inline__, __gnu_inline__)) void crnlib_yield_processor() {
+  __asm__ __volatile__("pause");
+}
+#else
+CRNLIB_FORCE_INLINE void crnlib_yield_processor() {
+#if CRNLIB_USE_MSVC_INTRINSICS
+#if CRNLIB_PLATFORM_PC_X64
+  _mm_pause();
+#else
+  YieldProcessor();
+#endif
+#else
+// No implementation
+#endif
+}
+#endif
+
+#if CRNLIB_USE_WIN32_ATOMIC_FUNCTIONS
+extern "C" __int64 _InterlockedCompareExchange64(__int64 volatile* Destination, __int64 Exchange, __int64 Comperand);
+#if defined(_MSC_VER)
+#pragma intrinsic(_InterlockedCompareExchange64)
+#endif
+#endif  // CRNLIB_USE_WIN32_ATOMIC_FUNCTIONS
+
+namespace crnlib {
+#if CRNLIB_USE_WIN32_ATOMIC_FUNCTIONS
+typedef LONG atomic32_t;
+typedef LONGLONG atomic64_t;
+
+// Returns the original value.
+inline atomic32_t atomic_compare_exchange32(atomic32_t volatile* pDest, atomic32_t exchange, atomic32_t comparand) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return InterlockedCompareExchange(pDest, exchange, comparand);
+}
+
+// Returns the original value.
+inline atomic64_t atomic_compare_exchange64(atomic64_t volatile* pDest, atomic64_t exchange, atomic64_t comparand) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 7) == 0);
+  return _InterlockedCompareExchange64(pDest, exchange, comparand);
+}
+
+// Returns the resulting incremented value.
+inline atomic32_t atomic_increment32(atomic32_t volatile* pDest) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return InterlockedIncrement(pDest);
+}
+
+// Returns the resulting decremented value.
+inline atomic32_t atomic_decrement32(atomic32_t volatile* pDest) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return InterlockedDecrement(pDest);
+}
+
+// Returns the original value.
+inline atomic32_t atomic_exchange32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return InterlockedExchange(pDest, val);
+}
+
+// Returns the resulting value.
+inline atomic32_t atomic_add32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return InterlockedExchangeAdd(pDest, val) + val;
+}
+
+// Returns the original value.
+inline atomic32_t atomic_exchange_add32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return InterlockedExchangeAdd(pDest, val);
+}
+#elif CRNLIB_USE_GCC_ATOMIC_BUILTINS
+typedef long atomic32_t;
+typedef long long atomic64_t;
+
+// Returns the original value.
+inline atomic32_t atomic_compare_exchange32(atomic32_t volatile* pDest, atomic32_t exchange, atomic32_t comparand) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return __sync_val_compare_and_swap(pDest, comparand, exchange);
+}
+
+// Returns the original value.
+inline atomic64_t atomic_compare_exchange64(atomic64_t volatile* pDest, atomic64_t exchange, atomic64_t comparand) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 7) == 0);
+  return __sync_val_compare_and_swap(pDest, comparand, exchange);
+}
+
+// Returns the resulting incremented value.
+inline atomic32_t atomic_increment32(atomic32_t volatile* pDest) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return __sync_add_and_fetch(pDest, 1);
+}
+
+// Returns the resulting decremented value.
+inline atomic32_t atomic_decrement32(atomic32_t volatile* pDest) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return __sync_sub_and_fetch(pDest, 1);
+}
+
+// Returns the original value.
+inline atomic32_t atomic_exchange32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return __sync_lock_test_and_set(pDest, val);
+}
+
+// Returns the resulting value.
+inline atomic32_t atomic_add32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return __sync_add_and_fetch(pDest, val);
+}
+
+// Returns the original value.
+inline atomic32_t atomic_exchange_add32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return __sync_fetch_and_add(pDest, val);
+}
+#else
+#define CRNLIB_NO_ATOMICS 1
+
+// Atomic ops not supported - but try to do something reasonable. Assumes no threading at all.
+typedef long atomic32_t;
+typedef long long atomic64_t;
+
+inline atomic32_t atomic_compare_exchange32(atomic32_t volatile* pDest, atomic32_t exchange, atomic32_t comparand) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  atomic32_t cur = *pDest;
+  if (cur == comparand)
+    *pDest = exchange;
+  return cur;
+}
+
+inline atomic64_t atomic_compare_exchange64(atomic64_t volatile* pDest, atomic64_t exchange, atomic64_t comparand) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 7) == 0);
+  atomic64_t cur = *pDest;
+  if (cur == comparand)
+    *pDest = exchange;
+  return cur;
+}
+
+inline atomic32_t atomic_increment32(atomic32_t volatile* pDest) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return (*pDest += 1);
+}
+
+inline atomic32_t atomic_decrement32(atomic32_t volatile* pDest) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return (*pDest -= 1);
+}
+
+inline atomic32_t atomic_exchange32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  atomic32_t cur = *pDest;
+  *pDest = val;
+  return cur;
+}
+
+inline atomic32_t atomic_add32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  return (*pDest += val);
+}
+
+inline atomic32_t atomic_exchange_add32(atomic32_t volatile* pDest, atomic32_t val) {
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(pDest) & 3) == 0);
+  atomic32_t cur = *pDest;
+  *pDest += val;
+  return cur;
+}
+#endif
+
+}  // namespace crnlib
+
+#endif  // CRN_ATOMICS_H
@@ -0,0 +1,178 @@
+// File: crn_buffer_stream.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_data_stream.h"
+
+namespace crnlib {
+class buffer_stream : public data_stream {
+ public:
+  buffer_stream()
+      : data_stream(),
+        m_pBuf(NULL),
+        m_size(0),
+        m_ofs(0) {
+  }
+
+  buffer_stream(void* p, uint size)
+      : data_stream(),
+        m_pBuf(NULL),
+        m_size(0),
+        m_ofs(0) {
+    open(p, size);
+  }
+
+  buffer_stream(const void* p, uint size)
+      : data_stream(),
+        m_pBuf(NULL),
+        m_size(0),
+        m_ofs(0) {
+    open(p, size);
+  }
+
+  virtual ~buffer_stream() {
+  }
+
+  bool open(const void* p, uint size) {
+    CRNLIB_ASSERT(p);
+
+    close();
+
+    if ((!p) || (!size))
+      return false;
+
+    m_opened = true;
+    m_pBuf = (uint8*)(p);
+    m_size = size;
+    m_ofs = 0;
+    m_attribs = cDataStreamSeekable | cDataStreamReadable;
+    return true;
+  }
+
+  bool open(void* p, uint size) {
+    CRNLIB_ASSERT(p);
+
+    close();
+
+    if ((!p) || (!size))
+      return false;
+
+    m_opened = true;
+    m_pBuf = static_cast<uint8*>(p);
+    m_size = size;
+    m_ofs = 0;
+    m_attribs = cDataStreamSeekable | cDataStreamWritable | cDataStreamReadable;
+    return true;
+  }
+
+  virtual bool close() {
+    if (m_opened) {
+      m_opened = false;
+      m_pBuf = NULL;
+      m_size = 0;
+      m_ofs = 0;
+      return true;
+    }
+
+    return false;
+  }
+
+  const void* get_buf() const { return m_pBuf; }
+  void* get_buf() { return m_pBuf; }
+
+  virtual const void* get_ptr() const { return m_pBuf; }
+
+  virtual uint read(void* pBuf, uint len) {
+    CRNLIB_ASSERT(pBuf && (len <= 0x7FFFFFFF));
+
+    if ((!m_opened) || (!is_readable()) || (!len))
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_size);
+
+    uint bytes_left = m_size - m_ofs;
+
+    len = math::minimum<uint>(len, bytes_left);
+
+    if (len)
+      memcpy(pBuf, &m_pBuf[m_ofs], len);
+
+    m_ofs += len;
+
+    return len;
+  }
+
+  virtual uint write(const void* pBuf, uint len) {
+    CRNLIB_ASSERT(pBuf && (len <= 0x7FFFFFFF));
+
+    if ((!m_opened) || (!is_writable()) || (!len))
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_size);
+
+    uint bytes_left = m_size - m_ofs;
+
+    len = math::minimum<uint>(len, bytes_left);
+
+    if (len)
+      memcpy(&m_pBuf[m_ofs], pBuf, len);
+
+    m_ofs += len;
+
+    return len;
+  }
+
+  virtual bool flush() {
+    if (!m_opened)
+      return false;
+
+    return true;
+  }
+
+  virtual uint64 get_size() {
+    if (!m_opened)
+      return 0;
+
+    return m_size;
+  }
+
+  virtual uint64 get_remaining() {
+    if (!m_opened)
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_size);
+
+    return m_size - m_ofs;
+  }
+
+  virtual uint64 get_ofs() {
+    if (!m_opened)
+      return 0;
+
+    return m_ofs;
+  }
+
+  virtual bool seek(int64 ofs, bool relative) {
+    if ((!m_opened) || (!is_seekable()))
+      return false;
+
+    int64 new_ofs = relative ? (m_ofs + ofs) : ofs;
+
+    if (new_ofs < 0)
+      return false;
+    else if (new_ofs > m_size)
+      return false;
+
+    m_ofs = static_cast<uint>(new_ofs);
+
+    post_seek();
+
+    return true;
+  }
+
+ private:
+  uint8* m_pBuf;
+  uint m_size;
+  uint m_ofs;
+};
+
+}  // namespace crnlib
@@ -0,0 +1,215 @@
+// File: crn_cfile_stream.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_data_stream.h"
+
+namespace crnlib {
+class cfile_stream : public data_stream {
+ public:
+  cfile_stream()
+      : data_stream(), m_pFile(NULL), m_size(0), m_ofs(0), m_has_ownership(false) {
+  }
+
+  cfile_stream(FILE* pFile, const char* pFilename, uint attribs, bool has_ownership)
+      : data_stream(), m_pFile(NULL), m_size(0), m_ofs(0), m_has_ownership(false) {
+    open(pFile, pFilename, attribs, has_ownership);
+  }
+
+  cfile_stream(const char* pFilename, uint attribs = cDataStreamReadable | cDataStreamSeekable, bool open_existing = false)
+      : data_stream(), m_pFile(NULL), m_size(0), m_ofs(0), m_has_ownership(false) {
+    open(pFilename, attribs, open_existing);
+  }
+
+  virtual ~cfile_stream() {
+    close();
+  }
+
+  virtual bool close() {
+    clear_error();
+
+    if (m_opened) {
+      bool status = true;
+      if (m_has_ownership) {
+        if (EOF == fclose(m_pFile))
+          status = false;
+      }
+
+      m_pFile = NULL;
+      m_opened = false;
+      m_size = 0;
+      m_ofs = 0;
+      m_has_ownership = false;
+
+      return status;
+    }
+
+    return false;
+  }
+
+  bool open(FILE* pFile, const char* pFilename, uint attribs, bool has_ownership) {
+    CRNLIB_ASSERT(pFile);
+    CRNLIB_ASSERT(pFilename);
+
+    close();
+
+    set_name(pFilename);
+    m_pFile = pFile;
+    m_has_ownership = has_ownership;
+    m_attribs = static_cast<uint16>(attribs);
+
+    m_ofs = crn_ftell(m_pFile);
+    crn_fseek(m_pFile, 0, SEEK_END);
+    m_size = crn_ftell(m_pFile);
+    crn_fseek(m_pFile, m_ofs, SEEK_SET);
+
+    m_opened = true;
+
+    return true;
+  }
+
+  bool open(const char* pFilename, uint attribs = cDataStreamReadable | cDataStreamSeekable, bool open_existing = false) {
+    CRNLIB_ASSERT(pFilename);
+
+    close();
+
+    m_attribs = static_cast<uint16>(attribs);
+
+    const char* pMode;
+    if ((is_readable()) && (is_writable()))
+      pMode = open_existing ? "r+b" : "w+b";
+    else if (is_writable())
+      pMode = open_existing ? "ab" : "wb";
+    else if (is_readable())
+      pMode = "rb";
+    else {
+      set_error();
+      return false;
+    }
+
+    FILE* pFile = NULL;
+    crn_fopen(&pFile, pFilename, pMode);
+    m_has_ownership = true;
+
+    if (!pFile) {
+      set_error();
+      return false;
+    }
+
+    // TODO: Change stream class to support UCS2 filenames.
+
+    return open(pFile, pFilename, attribs, true);
+  }
+
+  FILE* get_file() const { return m_pFile; }
+
+  virtual uint read(void* pBuf, uint len) {
+    CRNLIB_ASSERT(pBuf && (len <= 0x7FFFFFFF));
+
+    if (!m_opened || (!is_readable()) || (!len))
+      return 0;
+
+    len = static_cast<uint>(math::minimum<uint64>(len, get_remaining()));
+
+    if (fread(pBuf, 1, len, m_pFile) != len) {
+      set_error();
+      return 0;
+    }
+
+    m_ofs += len;
+    return len;
+  }
+
+  virtual uint write(const void* pBuf, uint len) {
+    CRNLIB_ASSERT(pBuf && (len <= 0x7FFFFFFF));
+
+    if (!m_opened || (!is_writable()) || (!len))
+      return 0;
+
+    if (fwrite(pBuf, 1, len, m_pFile) != len) {
+      set_error();
+      return 0;
+    }
+
+    m_ofs += len;
+    m_size = math::maximum(m_size, m_ofs);
+
+    return len;
+  }
+
+  virtual bool flush() {
+    if ((!m_opened) || (!is_writable()))
+      return false;
+
+    if (EOF == fflush(m_pFile)) {
+      set_error();
+      return false;
+    }
+
+    return true;
+  }
+
+  virtual uint64 get_size() {
+    if (!m_opened)
+      return 0;
+
+    return m_size;
+  }
+
+  virtual uint64 get_remaining() {
+    if (!m_opened)
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_size);
+    return m_size - m_ofs;
+  }
+
+  virtual uint64 get_ofs() {
+    if (!m_opened)
+      return 0;
+
+    return m_ofs;
+  }
+
+  virtual bool seek(int64 ofs, bool relative) {
+    if ((!m_opened) || (!is_seekable()))
+      return false;
+
+    int64 new_ofs = relative ? (m_ofs + ofs) : ofs;
+    if (new_ofs < 0)
+      return false;
+    else if (static_cast<uint64>(new_ofs) > m_size)
+      return false;
+
+    if (static_cast<uint64>(new_ofs) != m_ofs) {
+      if (crn_fseek(m_pFile, new_ofs, SEEK_SET) != 0) {
+        set_error();
+        return false;
+      }
+
+      m_ofs = new_ofs;
+    }
+
+    return true;
+  }
+
+  static bool read_file_into_array(const char* pFilename, vector<uint8>& buf) {
+    cfile_stream in_stream(pFilename);
+    if (!in_stream.is_opened())
+      return false;
+    return in_stream.read_array(buf);
+  }
+
+  static bool write_array_to_file(const char* pFilename, const vector<uint8>& buf) {
+    cfile_stream out_stream(pFilename, cDataStreamWritable | cDataStreamSeekable);
+    if (!out_stream.is_opened())
+      return false;
+    return out_stream.write_array(buf);
+  }
+
+ private:
+  FILE* m_pFile;
+  uint64 m_size, m_ofs;
+  bool m_has_ownership;
+};
+
+}  // namespace crnlib
@@ -0,0 +1,58 @@
+// File: crn_checksum.cpp
+#include "crn_core.h"
+
+namespace crnlib {
+// From the public domain stb.h header.
+uint adler32(const void* pBuf, size_t buflen, uint adler32) {
+  const uint8* buffer = static_cast<const uint8*>(pBuf);
+
+  const unsigned long ADLER_MOD = 65521;
+  unsigned long s1 = adler32 & 0xffff, s2 = adler32 >> 16;
+  size_t blocklen;
+  unsigned long i;
+
+  blocklen = buflen % 5552;
+  while (buflen) {
+    for (i = 0; i + 7 < blocklen; i += 8) {
+      s1 += buffer[0], s2 += s1;
+      s1 += buffer[1], s2 += s1;
+      s1 += buffer[2], s2 += s1;
+      s1 += buffer[3], s2 += s1;
+      s1 += buffer[4], s2 += s1;
+      s1 += buffer[5], s2 += s1;
+      s1 += buffer[6], s2 += s1;
+      s1 += buffer[7], s2 += s1;
+
+      buffer += 8;
+    }
+
+    for (; i < blocklen; ++i)
+      s1 += *buffer++, s2 += s1;
+
+    s1 %= ADLER_MOD, s2 %= ADLER_MOD;
+    buflen -= blocklen;
+    blocklen = 5552;
+  }
+  return (s2 << 16) + s1;
+}
+
+uint16 crc16(const void* pBuf, size_t len, uint16 crc) {
+  crc = ~crc;
+
+  const uint8* p = reinterpret_cast<const uint8*>(pBuf);
+  while (len) {
+    const uint16 q = *p++ ^ (crc >> 8);
+    crc <<= 8U;
+    uint16 r = (q >> 4) ^ q;
+    crc ^= r;
+    r <<= 5U;
+    crc ^= r;
+    r <<= 7U;
+    crc ^= r;
+    len--;
+  }
+
+  return static_cast<uint16>(~crc);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,12 @@
+// File: crn_checksum.h
+#pragma once
+
+namespace crnlib {
+const uint cInitAdler32 = 1U;
+uint adler32(const void* pBuf, size_t buflen, uint adler32 = cInitAdler32);
+
+// crc16() intended for small buffers - doesn't use an acceleration table.
+const uint cInitCRC16 = 0;
+uint16 crc16(const void* pBuf, size_t len, uint16 crc = cInitCRC16);
+
+}  // namespace crnlib
@@ -0,0 +1,700 @@
+// File: crn_clusterizer.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_matrix.h"
+
+namespace crnlib {
+template <typename VectorType>
+class clusterizer {
+ public:
+  clusterizer()
+      : m_overall_variance(0.0f),
+        m_split_index(0),
+        m_heap_size(0),
+        m_quick(false) {
+  }
+
+  void clear() {
+    m_training_vecs.clear();
+    m_codebook.clear();
+    m_nodes.clear();
+    m_overall_variance = 0.0f;
+    m_split_index = 0;
+    m_heap_size = 0;
+    m_quick = false;
+  }
+
+  void reserve_training_vecs(uint num_expected) {
+    m_training_vecs.reserve(num_expected);
+  }
+
+  void add_training_vec(const VectorType& v, uint weight) {
+    m_training_vecs.push_back(std::make_pair(v, weight));
+  }
+
+  typedef bool (*progress_callback_func_ptr)(uint percentage_completed, void* pData);
+
+  bool generate_codebook(uint max_size, progress_callback_func_ptr pProgress_callback = NULL, void* pProgress_data = NULL, bool quick = false) {
+    if (m_training_vecs.empty())
+      return false;
+
+    m_quick = quick;
+
+    double ttsum = 0.0f;
+
+    vq_node root;
+    root.m_vectors.reserve(m_training_vecs.size());
+
+    for (uint i = 0; i < m_training_vecs.size(); i++) {
+      const VectorType& v = m_training_vecs[i].first;
+      const uint weight = m_training_vecs[i].second;
+
+      root.m_centroid += (v * (float)weight);
+      root.m_total_weight += weight;
+      root.m_vectors.push_back(i);
+
+      ttsum += v.dot(v) * weight;
+    }
+
+    root.m_variance = (float)(ttsum - (root.m_centroid.dot(root.m_centroid) / root.m_total_weight));
+
+    root.m_centroid *= (1.0f / root.m_total_weight);
+
+    m_nodes.clear();
+    m_nodes.reserve(max_size * 2 + 1);
+
+    m_nodes.push_back(root);
+
+    m_heap.resize(max_size + 1);
+    m_heap[1] = 0;
+    m_heap_size = 1;
+
+    m_split_index = 0;
+
+    uint total_leaves = 1;
+
+    m_left_children.reserve(m_training_vecs.size() + 1);
+    m_right_children.reserve(m_training_vecs.size() + 1);
+
+    int prev_percentage = -1;
+    while ((total_leaves < max_size) && (m_heap_size)) {
+      int worst_node_index = m_heap[1];
+
+      m_heap[1] = m_heap[m_heap_size];
+      m_heap_size--;
+      if (m_heap_size)
+        down_heap(1);
+
+      split_node(worst_node_index);
+      total_leaves++;
+
+      if ((pProgress_callback) && ((total_leaves & 63) == 0) && (max_size)) {
+        int cur_percentage = (total_leaves * 100U + (max_size / 2U)) / max_size;
+        if (cur_percentage != prev_percentage) {
+          if (!(*pProgress_callback)(cur_percentage, pProgress_data))
+            return false;
+
+          prev_percentage = cur_percentage;
+        }
+      }
+    }
+
+    m_codebook.clear();
+
+    m_overall_variance = 0.0f;
+
+    for (uint i = 0; i < m_nodes.size(); i++) {
+      vq_node& node = m_nodes[i];
+      if (node.m_left != -1) {
+        CRNLIB_ASSERT(node.m_right != -1);
+        continue;
+      }
+
+      CRNLIB_ASSERT((node.m_left == -1) && (node.m_right == -1));
+
+      node.m_codebook_index = m_codebook.size();
+      m_codebook.push_back(node.m_centroid);
+
+      m_overall_variance += node.m_variance;
+    }
+
+    m_heap.clear();
+    m_left_children.clear();
+    m_right_children.clear();
+
+    return true;
+  }
+
+  inline uint get_num_training_vecs() const { return m_training_vecs.size(); }
+  const VectorType& get_training_vec(uint index) const { return m_training_vecs[index].first; }
+  uint get_training_vec_weight(uint index) const { return m_training_vecs[index].second; }
+
+  typedef crnlib::vector<std::pair<VectorType, uint> > training_vec_array;
+
+  const training_vec_array& get_training_vecs() const { return m_training_vecs; }
+  training_vec_array& get_training_vecs() { return m_training_vecs; }
+
+  inline float get_overall_variance() const { return m_overall_variance; }
+
+  inline uint get_codebook_size() const {
+    return m_codebook.size();
+  }
+
+  inline const VectorType& get_codebook_entry(uint index) const {
+    return m_codebook[index];
+  }
+
+  VectorType& get_codebook_entry(uint index) {
+    return m_codebook[index];
+  }
+
+  typedef crnlib::vector<VectorType> vector_vec_type;
+  inline const vector_vec_type& get_codebook() const {
+    return m_codebook;
+  }
+
+  uint find_best_codebook_entry(const VectorType& v) const {
+    uint cur_node_index = 0;
+
+    for (;;) {
+      const vq_node& cur_node = m_nodes[cur_node_index];
+
+      if (cur_node.m_left == -1)
+        return cur_node.m_codebook_index;
+
+      const vq_node& left_node = m_nodes[cur_node.m_left];
+      const vq_node& right_node = m_nodes[cur_node.m_right];
+
+      float left_dist = left_node.m_centroid.squared_distance(v);
+      float right_dist = right_node.m_centroid.squared_distance(v);
+
+      if (left_dist < right_dist)
+        cur_node_index = cur_node.m_left;
+      else
+        cur_node_index = cur_node.m_right;
+    }
+  }
+
+  const VectorType& find_best_codebook_entry(const VectorType& v, uint max_codebook_size) const {
+    uint cur_node_index = 0;
+
+    for (;;) {
+      const vq_node& cur_node = m_nodes[cur_node_index];
+
+      if ((cur_node.m_left == -1) || ((cur_node.m_codebook_index + 1) >= (int)max_codebook_size))
+        return cur_node.m_centroid;
+
+      const vq_node& left_node = m_nodes[cur_node.m_left];
+      const vq_node& right_node = m_nodes[cur_node.m_right];
+
+      float left_dist = left_node.m_centroid.squared_distance(v);
+      float right_dist = right_node.m_centroid.squared_distance(v);
+
+      if (left_dist < right_dist)
+        cur_node_index = cur_node.m_left;
+      else
+        cur_node_index = cur_node.m_right;
+    }
+  }
+
+  uint find_best_codebook_entry_fs(const VectorType& v) const {
+    float best_dist = math::cNearlyInfinite;
+    uint best_index = 0;
+
+    for (uint i = 0; i < m_codebook.size(); i++) {
+      float dist = m_codebook[i].squared_distance(v);
+      if (dist < best_dist) {
+        best_dist = dist;
+        best_index = i;
+        if (best_dist == 0.0f)
+          break;
+      }
+    }
+
+    return best_index;
+  }
+
+  void retrieve_clusters(uint max_clusters, crnlib::vector<crnlib::vector<uint> >& clusters) const {
+    clusters.resize(0);
+    clusters.reserve(max_clusters);
+
+    crnlib::vector<uint> stack;
+    stack.reserve(512);
+
+    uint cur_node_index = 0;
+
+    for (;;) {
+      const vq_node& cur_node = m_nodes[cur_node_index];
+
+      if ((cur_node.is_leaf()) || ((cur_node.m_codebook_index + 2) > (int)max_clusters)) {
+        clusters.resize(clusters.size() + 1);
+        clusters.back() = cur_node.m_vectors;
+
+        if (stack.empty())
+          break;
+        cur_node_index = stack.back();
+        stack.pop_back();
+        continue;
+      }
+
+      cur_node_index = cur_node.m_left;
+      stack.push_back(cur_node.m_right);
+    }
+  }
+
+ private:
+  training_vec_array m_training_vecs;
+
+  struct vq_node {
+    vq_node()
+        : m_centroid(cClear), m_total_weight(0), m_left(-1), m_right(-1), m_codebook_index(-1), m_unsplittable(false) {}
+
+    VectorType m_centroid;
+    uint64 m_total_weight;
+
+    float m_variance;
+
+    crnlib::vector<uint> m_vectors;
+
+    int m_left;
+    int m_right;
+
+    int m_codebook_index;
+
+    bool m_unsplittable;
+
+    bool is_leaf() const { return m_left < 0; }
+  };
+
+  typedef crnlib::vector<vq_node> node_vec_type;
+
+  node_vec_type m_nodes;
+
+  vector_vec_type m_codebook;
+
+  float m_overall_variance;
+
+  uint m_split_index;
+
+  crnlib::vector<uint> m_heap;
+  uint m_heap_size;
+
+  bool m_quick;
+
+  void insert_heap(uint node_index) {
+    const float variance = m_nodes[node_index].m_variance;
+    uint pos = ++m_heap_size;
+
+    if (m_heap_size >= m_heap.size())
+      m_heap.resize(m_heap_size + 1);
+
+    for (;;) {
+      uint parent = pos >> 1;
+      if (!parent)
+        break;
+
+      float parent_variance = m_nodes[m_heap[parent]].m_variance;
+      if (parent_variance > variance)
+        break;
+
+      m_heap[pos] = m_heap[parent];
+
+      pos = parent;
+    }
+
+    m_heap[pos] = node_index;
+  }
+
+  void down_heap(uint pos) {
+    uint child;
+    uint orig = m_heap[pos];
+
+    const float orig_variance = m_nodes[orig].m_variance;
+
+    while ((child = (pos << 1)) <= m_heap_size) {
+      if (child < m_heap_size) {
+        if (m_nodes[m_heap[child]].m_variance < m_nodes[m_heap[child + 1]].m_variance)
+          child++;
+      }
+
+      if (orig_variance > m_nodes[m_heap[child]].m_variance)
+        break;
+
+      m_heap[pos] = m_heap[child];
+
+      pos = child;
+    }
+
+    m_heap[pos] = orig;
+  }
+
+  void compute_split_estimate(VectorType& left_child_res, VectorType& right_child_res, const vq_node& parent_node) {
+    VectorType furthest(0);
+    double furthest_dist = -1.0f;
+
+    for (uint i = 0; i < parent_node.m_vectors.size(); i++) {
+      const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+
+      double dist = v.squared_distance(parent_node.m_centroid);
+      if (dist > furthest_dist) {
+        furthest_dist = dist;
+        furthest = v;
+      }
+    }
+
+    VectorType opposite(0);
+    double opposite_dist = -1.0f;
+
+    for (uint i = 0; i < parent_node.m_vectors.size(); i++) {
+      const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+
+      double dist = v.squared_distance(furthest);
+      if (dist > opposite_dist) {
+        opposite_dist = dist;
+        opposite = v;
+      }
+    }
+
+    left_child_res = (furthest + parent_node.m_centroid) * .5f;
+    right_child_res = (opposite + parent_node.m_centroid) * .5f;
+  }
+
+  void compute_split_pca(VectorType& left_child_res, VectorType& right_child_res, const vq_node& parent_node) {
+    if (parent_node.m_vectors.size() == 2) {
+      left_child_res = m_training_vecs[parent_node.m_vectors[0]].first;
+      right_child_res = m_training_vecs[parent_node.m_vectors[1]].first;
+      return;
+    }
+
+    const uint N = VectorType::num_elements;
+
+    matrix<N, N, float> covar;
+    covar.clear();
+
+    for (uint i = 0; i < parent_node.m_vectors.size(); i++) {
+      const VectorType v(m_training_vecs[parent_node.m_vectors[i]].first - parent_node.m_centroid);
+      const VectorType w(v * (float)m_training_vecs[parent_node.m_vectors[i]].second);
+
+      for (uint x = 0; x < N; x++)
+        for (uint y = x; y < N; y++)
+          covar[x][y] = covar[x][y] + v[x] * w[y];
+    }
+
+    float one_over_total_weight = 1.0f / parent_node.m_total_weight;
+
+    for (uint x = 0; x < N; x++)
+      for (uint y = x; y < N; y++)
+        covar[x][y] *= one_over_total_weight;
+
+    for (uint x = 0; x < (N - 1); x++)
+      for (uint y = x + 1; y < N; y++)
+        covar[y][x] = covar[x][y];
+
+    VectorType axis;  //(1.0f);
+    if (N == 1)
+      axis.set(1.0f);
+    else {
+      for (uint i = 0; i < N; i++)
+        axis[i] = math::lerp(.75f, 1.25f, i * (1.0f / math::maximum<int>(N - 1, 1)));
+    }
+
+    VectorType prev_axis(axis);
+
+    for (uint iter = 0; iter < 10; iter++) {
+      VectorType x;
+
+      double max_sum = 0;
+
+      for (uint i = 0; i < N; i++) {
+        double sum = 0;
+
+        for (uint j = 0; j < N; j++)
+          sum += axis[j] * covar[i][j];
+
+        x[i] = static_cast<float>(sum);
+
+        max_sum = math::maximum(max_sum, fabs(sum));
+      }
+
+      if (max_sum != 0.0f)
+        x *= static_cast<float>(1.0f / max_sum);
+
+      VectorType delta_axis(prev_axis - x);
+
+      prev_axis = axis;
+      axis = x;
+
+      if (delta_axis.norm() < .0025f)
+        break;
+    }
+
+    axis.normalize();
+
+    VectorType left_child(0.0f);
+    VectorType right_child(0.0f);
+
+    double left_weight = 0.0f;
+    double right_weight = 0.0f;
+
+    for (uint i = 0; i < parent_node.m_vectors.size(); i++) {
+      const float weight = (float)m_training_vecs[parent_node.m_vectors[i]].second;
+
+      const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+
+      double t = (v - parent_node.m_centroid) * axis;
+      if (t < 0.0f) {
+        left_child += v * weight;
+        left_weight += weight;
+      } else {
+        right_child += v * weight;
+        right_weight += weight;
+      }
+    }
+
+    if ((left_weight > 0.0f) && (right_weight > 0.0f)) {
+      left_child_res = left_child * (float)(1.0f / left_weight);
+      right_child_res = right_child * (float)(1.0f / right_weight);
+    } else {
+      compute_split_estimate(left_child_res, right_child_res, parent_node);
+    }
+  }
+
+#if 0
+      void compute_split_pca2(VectorType& left_child_res, VectorType& right_child_res, const vq_node& parent_node)
+      {
+         if (parent_node.m_vectors.size() == 2)
+         {
+            left_child_res = m_training_vecs[parent_node.m_vectors[0]].first;
+            right_child_res = m_training_vecs[parent_node.m_vectors[1]].first;
+            return;
+         }
+
+         const uint N = VectorType::num_elements;
+
+         VectorType furthest;
+         double furthest_dist = -1.0f;
+
+         for (uint i = 0; i < parent_node.m_vectors.size(); i++)
+         {
+            const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+
+            double dist = v.squared_distance(parent_node.m_centroid);
+            if (dist > furthest_dist)
+            {
+               furthest_dist = dist;
+               furthest = v;
+            }
+         }
+
+         VectorType opposite;
+         double opposite_dist = -1.0f;
+
+         for (uint i = 0; i < parent_node.m_vectors.size(); i++)
+         {
+            const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+
+            double dist = v.squared_distance(furthest);
+            if (dist > opposite_dist)
+            {
+               opposite_dist = dist;
+               opposite = v;
+            }
+         }
+
+         VectorType axis(opposite - furthest);
+         if (axis.normalize() < .000125f)
+         {
+            left_child_res = (furthest + parent_node.m_centroid) * .5f;
+            right_child_res = (opposite + parent_node.m_centroid) * .5f;
+            return;
+         }
+
+         for (uint iter = 0; iter < 2; iter++)
+         {
+            double next_axis[N];
+            utils::zero_object(next_axis);
+
+            for (uint i = 0; i < parent_node.m_vectors.size(); i++)
+            {
+               const double weight = m_training_vecs[parent_node.m_vectors[i]].second;
+
+               VectorType v(m_training_vecs[parent_node.m_vectors[i]].first - parent_node.m_centroid);
+
+               double dot = (v * axis) * weight;
+
+               for (uint j = 0; j < N; j++)
+                  next_axis[j] += dot * v[j];
+            }
+
+            double w = 0.0f;
+            for (uint j = 0; j < N; j++)
+               w += next_axis[j] * next_axis[j];
+
+            if (w > 0.0f)
+            {
+               w = 1.0f / sqrt(w);
+               for (uint j = 0; j < N; j++)
+                  axis[j] = static_cast<float>(next_axis[j] * w);
+            }
+            else
+               break;
+         }
+
+         VectorType left_child(0.0f);
+         VectorType right_child(0.0f);
+
+         double left_weight = 0.0f;
+         double right_weight = 0.0f;
+
+         for (uint i = 0; i < parent_node.m_vectors.size(); i++)
+         {
+            const float weight = (float)m_training_vecs[parent_node.m_vectors[i]].second;
+
+            const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+
+            double t = (v - parent_node.m_centroid) * axis;
+            if (t < 0.0f)
+            {
+               left_child += v * weight;
+               left_weight += weight;
+            }
+            else
+            {
+               right_child += v * weight;
+               right_weight += weight;
+            }
+         }
+
+         if ((left_weight > 0.0f) && (right_weight > 0.0f))
+         {
+            left_child_res = left_child * (float)(1.0f / left_weight);
+            right_child_res = right_child * (float)(1.0f / right_weight);
+         }
+         else
+         {
+            left_child_res = (furthest + parent_node.m_centroid) * .5f;
+            right_child_res = (opposite + parent_node.m_centroid) * .5f;
+         }
+      }
+#endif
+
+  // thread safety warning: shared state!
+  crnlib::vector<uint> m_left_children;
+  crnlib::vector<uint> m_right_children;
+
+  void split_node(uint index) {
+    vq_node& parent_node = m_nodes[index];
+
+    if (parent_node.m_vectors.size() == 1)
+      return;
+
+    VectorType left_child, right_child;
+    if (m_quick)
+      compute_split_estimate(left_child, right_child, parent_node);
+    else
+      compute_split_pca(left_child, right_child, parent_node);
+
+    uint64 left_weight = 0;
+    uint64 right_weight = 0;
+
+    float prev_total_variance = 1e+10f;
+
+    float left_variance = 0.0f;
+    float right_variance = 0.0f;
+
+    const uint cMaxLoops = m_quick ? 2 : 8;
+    for (uint total_loops = 0; total_loops < cMaxLoops; total_loops++) {
+      m_left_children.resize(0);
+      m_right_children.resize(0);
+
+      VectorType new_left_child(cClear);
+      VectorType new_right_child(cClear);
+
+      double left_ttsum = 0.0f;
+      double right_ttsum = 0.0f;
+
+      left_weight = 0;
+      right_weight = 0;
+
+      for (uint i = 0; i < parent_node.m_vectors.size(); i++) {
+        const VectorType& v = m_training_vecs[parent_node.m_vectors[i]].first;
+        const uint weight = m_training_vecs[parent_node.m_vectors[i]].second;
+
+        double left_dist2 = left_child.squared_distance(v);
+        double right_dist2 = right_child.squared_distance(v);
+
+        if (left_dist2 < right_dist2) {
+          m_left_children.push_back(parent_node.m_vectors[i]);
+
+          new_left_child += (v * (float)weight);
+          left_weight += weight;
+
+          left_ttsum += v.dot(v) * weight;
+        } else {
+          m_right_children.push_back(parent_node.m_vectors[i]);
+
+          new_right_child += (v * (float)weight);
+          right_weight += weight;
+
+          right_ttsum += v.dot(v) * weight;
+        }
+      }
+
+      if ((!left_weight) || (!right_weight)) {
+        parent_node.m_unsplittable = true;
+        return;
+      }
+
+      left_variance = (float)(left_ttsum - (new_left_child.dot(new_left_child) / left_weight));
+      right_variance = (float)(right_ttsum - (new_right_child.dot(new_right_child) / right_weight));
+
+      new_left_child *= (1.0f / left_weight);
+      new_right_child *= (1.0f / right_weight);
+
+      left_child = new_left_child;
+      right_child = new_right_child;
+
+      float total_variance = left_variance + right_variance;
+      if (total_variance < .00001f)
+        break;
+
+      //const float variance_delta_thresh = .00001f;
+      const float variance_delta_thresh = .00125f;
+      if (((prev_total_variance - total_variance) / total_variance) < variance_delta_thresh)
+        break;
+
+      prev_total_variance = total_variance;
+    }
+
+    const uint left_child_index = m_nodes.size();
+    const uint right_child_index = m_nodes.size() + 1;
+
+    parent_node.m_left = m_nodes.size();
+    parent_node.m_right = m_nodes.size() + 1;
+    parent_node.m_codebook_index = m_split_index;
+    m_split_index++;
+
+    m_nodes.resize(m_nodes.size() + 2);
+
+    // parent_node is invalid now, because m_nodes has been changed
+
+    vq_node& left_child_node = m_nodes[left_child_index];
+    vq_node& right_child_node = m_nodes[right_child_index];
+
+    left_child_node.m_centroid = left_child;
+    left_child_node.m_total_weight = left_weight;
+    left_child_node.m_vectors.swap(m_left_children);
+    left_child_node.m_variance = left_variance;
+    if ((left_child_node.m_vectors.size() > 1) && (left_child_node.m_variance > 0.0f))
+      insert_heap(left_child_index);
+
+    right_child_node.m_centroid = right_child;
+    right_child_node.m_total_weight = right_weight;
+    right_child_node.m_vectors.swap(m_right_children);
+    right_child_node.m_variance = right_variance;
+    if ((right_child_node.m_vectors.size() > 1) && (right_child_node.m_variance > 0.0f))
+      insert_heap(right_child_index);
+  }
+};
+
+}  // namespace crnlib
@@ -0,0 +1,900 @@
+// File: crn_color.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_core.h"
+
+namespace crnlib {
+template <typename component_type>
+struct color_quad_component_traits {
+  enum {
+    cSigned = false,
+    cFloat = false,
+    cMin = cUINT8_MIN,
+    cMax = cUINT8_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<int8> {
+  enum {
+    cSigned = true,
+    cFloat = false,
+    cMin = cINT8_MIN,
+    cMax = cINT8_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<int16> {
+  enum {
+    cSigned = true,
+    cFloat = false,
+    cMin = cINT16_MIN,
+    cMax = cINT16_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<uint16> {
+  enum {
+    cSigned = false,
+    cFloat = false,
+    cMin = cUINT16_MIN,
+    cMax = cUINT16_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<int32> {
+  enum {
+    cSigned = true,
+    cFloat = false,
+    cMin = cINT32_MIN,
+    cMax = cINT32_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<uint32> {
+  enum {
+    cSigned = false,
+    cFloat = false,
+    cMin = cUINT32_MIN,
+    cMax = cUINT32_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<float> {
+  enum {
+    cSigned = false,
+    cFloat = true,
+    cMin = cINT32_MIN,
+    cMax = cINT32_MAX
+  };
+};
+
+template <>
+struct color_quad_component_traits<double> {
+  enum {
+    cSigned = false,
+    cFloat = true,
+    cMin = cINT32_MIN,
+    cMax = cINT32_MAX
+  };
+};
+
+template <typename component_type, typename parameter_type>
+class color_quad : public helpers::rel_ops<color_quad<component_type, parameter_type> > {
+  template <typename T>
+  static inline parameter_type clamp(T v) {
+    parameter_type result = static_cast<parameter_type>(v);
+    if (!component_traits::cFloat) {
+      if (v < component_traits::cMin)
+        result = static_cast<parameter_type>(component_traits::cMin);
+      else if (v > component_traits::cMax)
+        result = static_cast<parameter_type>(component_traits::cMax);
+    }
+    return result;
+  }
+
+#ifdef _MSC_VER
+  template <>
+  static inline parameter_type clamp(int v) {
+    if (!component_traits::cFloat) {
+      if ((!component_traits::cSigned) && (component_traits::cMin == 0) && (component_traits::cMax == 0xFF)) {
+        if (v & 0xFFFFFF00U)
+          v = (~(static_cast<int>(v) >> 31)) & 0xFF;
+      } else {
+        if (v < component_traits::cMin)
+          v = component_traits::cMin;
+        else if (v > component_traits::cMax)
+          v = component_traits::cMax;
+      }
+    }
+    return static_cast<parameter_type>(v);
+  }
+#endif
+
+ public:
+  typedef component_type component_t;
+  typedef parameter_type parameter_t;
+  typedef color_quad_component_traits<component_type> component_traits;
+
+  enum { cNumComps = 4 };
+
+  union {
+    struct
+    {
+      component_type r;
+      component_type g;
+      component_type b;
+      component_type a;
+    };
+
+    component_type c[cNumComps];
+
+    uint32 m_u32;
+  };
+
+  inline color_quad() {
+  }
+
+  inline color_quad(eClear)
+      : r(0), g(0), b(0), a(0) {
+  }
+
+  inline color_quad(const color_quad& other)
+      : r(other.r), g(other.g), b(other.b), a(other.a) {
+  }
+
+  explicit inline color_quad(parameter_type y, parameter_type alpha = component_traits::cMax) {
+    set(y, alpha);
+  }
+
+  inline color_quad(parameter_type red, parameter_type green, parameter_type blue, parameter_type alpha = component_traits::cMax) {
+    set(red, green, blue, alpha);
+  }
+
+  explicit inline color_quad(eNoClamp, parameter_type y, parameter_type alpha = component_traits::cMax) {
+    set_noclamp_y_alpha(y, alpha);
+  }
+
+  inline color_quad(eNoClamp, parameter_type red, parameter_type green, parameter_type blue, parameter_type alpha = component_traits::cMax) {
+    set_noclamp_rgba(red, green, blue, alpha);
+  }
+
+  template <typename other_component_type, typename other_parameter_type>
+  inline color_quad(const color_quad<other_component_type, other_parameter_type>& other)
+      : r(static_cast<component_type>(clamp(other.r))), g(static_cast<component_type>(clamp(other.g))), b(static_cast<component_type>(clamp(other.b))), a(static_cast<component_type>(clamp(other.a))) {
+  }
+
+  inline void clear() {
+    r = 0;
+    g = 0;
+    b = 0;
+    a = 0;
+  }
+
+  inline color_quad& operator=(const color_quad& other) {
+    r = other.r;
+    g = other.g;
+    b = other.b;
+    a = other.a;
+    return *this;
+  }
+
+  inline color_quad& set_rgb(const color_quad& other) {
+    r = other.r;
+    g = other.g;
+    b = other.b;
+    return *this;
+  }
+
+  template <typename other_component_type, typename other_parameter_type>
+  inline color_quad& operator=(const color_quad<other_component_type, other_parameter_type>& other) {
+    r = static_cast<component_type>(clamp(other.r));
+    g = static_cast<component_type>(clamp(other.g));
+    b = static_cast<component_type>(clamp(other.b));
+    a = static_cast<component_type>(clamp(other.a));
+    return *this;
+  }
+
+  inline color_quad& operator=(parameter_type y) {
+    set(y, component_traits::cMax);
+    return *this;
+  }
+
+  inline color_quad& set(parameter_type y, parameter_type alpha = component_traits::cMax) {
+    y = clamp(y);
+    alpha = clamp(alpha);
+    r = static_cast<component_type>(y);
+    g = static_cast<component_type>(y);
+    b = static_cast<component_type>(y);
+    a = static_cast<component_type>(alpha);
+    return *this;
+  }
+
+  inline color_quad& set_noclamp_y_alpha(parameter_type y, parameter_type alpha = component_traits::cMax) {
+    CRNLIB_ASSERT((y >= component_traits::cMin) && (y <= component_traits::cMax));
+    CRNLIB_ASSERT((alpha >= component_traits::cMin) && (alpha <= component_traits::cMax));
+
+    r = static_cast<component_type>(y);
+    g = static_cast<component_type>(y);
+    b = static_cast<component_type>(y);
+    a = static_cast<component_type>(alpha);
+    return *this;
+  }
+
+  inline color_quad& set(parameter_type red, parameter_type green, parameter_type blue, parameter_type alpha = component_traits::cMax) {
+    r = static_cast<component_type>(clamp(red));
+    g = static_cast<component_type>(clamp(green));
+    b = static_cast<component_type>(clamp(blue));
+    a = static_cast<component_type>(clamp(alpha));
+    return *this;
+  }
+
+  inline color_quad& set_noclamp_rgba(parameter_type red, parameter_type green, parameter_type blue, parameter_type alpha) {
+    CRNLIB_ASSERT((red >= component_traits::cMin) && (red <= component_traits::cMax));
+    CRNLIB_ASSERT((green >= component_traits::cMin) && (green <= component_traits::cMax));
+    CRNLIB_ASSERT((blue >= component_traits::cMin) && (blue <= component_traits::cMax));
+    CRNLIB_ASSERT((alpha >= component_traits::cMin) && (alpha <= component_traits::cMax));
+
+    r = static_cast<component_type>(red);
+    g = static_cast<component_type>(green);
+    b = static_cast<component_type>(blue);
+    a = static_cast<component_type>(alpha);
+    return *this;
+  }
+
+  inline color_quad& set_noclamp_rgb(parameter_type red, parameter_type green, parameter_type blue) {
+    CRNLIB_ASSERT((red >= component_traits::cMin) && (red <= component_traits::cMax));
+    CRNLIB_ASSERT((green >= component_traits::cMin) && (green <= component_traits::cMax));
+    CRNLIB_ASSERT((blue >= component_traits::cMin) && (blue <= component_traits::cMax));
+
+    r = static_cast<component_type>(red);
+    g = static_cast<component_type>(green);
+    b = static_cast<component_type>(blue);
+    return *this;
+  }
+
+  static inline parameter_type get_min_comp() { return component_traits::cMin; }
+  static inline parameter_type get_max_comp() { return component_traits::cMax; }
+  static inline bool get_comps_are_signed() { return component_traits::cSigned; }
+
+  inline component_type operator[](uint i) const {
+    CRNLIB_ASSERT(i < cNumComps);
+    return c[i];
+  }
+  inline component_type& operator[](uint i) {
+    CRNLIB_ASSERT(i < cNumComps);
+    return c[i];
+  }
+
+  inline color_quad& set_component(uint i, parameter_type f) {
+    CRNLIB_ASSERT(i < cNumComps);
+
+    c[i] = static_cast<component_type>(clamp(f));
+
+    return *this;
+  }
+
+  inline color_quad& set_grayscale(parameter_t l) {
+    component_t x = static_cast<component_t>(clamp(l));
+    c[0] = x;
+    c[1] = x;
+    c[2] = x;
+    return *this;
+  }
+
+  inline color_quad& clamp(const color_quad& l, const color_quad& h) {
+    for (uint i = 0; i < cNumComps; i++)
+      c[i] = static_cast<component_type>(math::clamp<parameter_type>(c[i], l[i], h[i]));
+    return *this;
+  }
+
+  inline color_quad& clamp(parameter_type l, parameter_type h) {
+    for (uint i = 0; i < cNumComps; i++)
+      c[i] = static_cast<component_type>(math::clamp<parameter_type>(c[i], l, h));
+    return *this;
+  }
+
+  // Returns CCIR 601 luma (consistent with color_utils::RGB_To_Y).
+  inline parameter_type get_luma() const {
+    return static_cast<parameter_type>((19595U * r + 38470U * g + 7471U * b + 32768U) >> 16U);
+  }
+
+  // Returns REC 709 luma.
+  inline parameter_type get_luma_rec709() const {
+    return static_cast<parameter_type>((13938U * r + 46869U * g + 4729U * b + 32768U) >> 16U);
+  }
+
+  // Beware of endianness!
+  inline uint32 get_uint32() const {
+    CRNLIB_ASSERT(sizeof(*this) == sizeof(uint32));
+    return *reinterpret_cast<const uint32*>(this);
+  }
+
+  // Beware of endianness!
+  inline uint64 get_uint64() const {
+    CRNLIB_ASSERT(sizeof(*this) == sizeof(uint64));
+    return *reinterpret_cast<const uint64*>(this);
+  }
+
+  inline uint squared_distance(const color_quad& c, bool alpha = true) const {
+    return math::square(r - c.r) + math::square(g - c.g) + math::square(b - c.b) + (alpha ? math::square(a - c.a) : 0);
+  }
+
+  inline bool rgb_equals(const color_quad& rhs) const {
+    return (r == rhs.r) && (g == rhs.g) && (b == rhs.b);
+  }
+
+  inline bool operator==(const color_quad& rhs) const {
+    if (sizeof(color_quad) == sizeof(uint32))
+      return m_u32 == rhs.m_u32;
+    else
+      return (r == rhs.r) && (g == rhs.g) && (b == rhs.b) && (a == rhs.a);
+  }
+
+  inline bool operator<(const color_quad& rhs) const {
+    for (uint i = 0; i < cNumComps; i++) {
+      if (c[i] < rhs.c[i])
+        return true;
+      else if (!(c[i] == rhs.c[i]))
+        return false;
+    }
+    return false;
+  }
+
+  color_quad& operator+=(const color_quad& other) {
+    for (uint i = 0; i < 4; i++)
+      c[i] = static_cast<component_type>(clamp(c[i] + other.c[i]));
+    return *this;
+  }
+
+  color_quad& operator-=(const color_quad& other) {
+    for (uint i = 0; i < 4; i++)
+      c[i] = static_cast<component_type>(clamp(c[i] - other.c[i]));
+    return *this;
+  }
+
+  color_quad& operator*=(parameter_type v) {
+    for (uint i = 0; i < 4; i++)
+      c[i] = static_cast<component_type>(clamp(c[i] * v));
+    return *this;
+  }
+
+  color_quad& operator/=(parameter_type v) {
+    for (uint i = 0; i < 4; i++)
+      c[i] = static_cast<component_type>(c[i] / v);
+    return *this;
+  }
+
+  color_quad get_swizzled(uint x, uint y, uint z, uint w) const {
+    CRNLIB_ASSERT((x | y | z | w) < 4);
+    return color_quad(c[x], c[y], c[z], c[w]);
+  }
+
+  friend color_quad operator+(const color_quad& lhs, const color_quad& rhs) {
+    color_quad result(lhs);
+    result += rhs;
+    return result;
+  }
+
+  friend color_quad operator-(const color_quad& lhs, const color_quad& rhs) {
+    color_quad result(lhs);
+    result -= rhs;
+    return result;
+  }
+
+  friend color_quad operator*(const color_quad& lhs, parameter_type v) {
+    color_quad result(lhs);
+    result *= v;
+    return result;
+  }
+
+  friend color_quad operator/(const color_quad& lhs, parameter_type v) {
+    color_quad result(lhs);
+    result /= v;
+    return result;
+  }
+
+  friend color_quad operator*(parameter_type v, const color_quad& rhs) {
+    color_quad result(rhs);
+    result *= v;
+    return result;
+  }
+
+  inline bool is_grayscale() const {
+    return (c[0] == c[1]) && (c[1] == c[2]);
+  }
+
+  uint get_min_component_index(bool alpha = true) const {
+    uint index = 0;
+    uint limit = alpha ? cNumComps : (cNumComps - 1);
+    for (uint i = 1; i < limit; i++)
+      if (c[i] < c[index])
+        index = i;
+    return index;
+  }
+
+  uint get_max_component_index(bool alpha = true) const {
+    uint index = 0;
+    uint limit = alpha ? cNumComps : (cNumComps - 1);
+    for (uint i = 1; i < limit; i++)
+      if (c[i] > c[index])
+        index = i;
+    return index;
+  }
+
+  operator size_t() const {
+    return (size_t)fast_hash(this, sizeof(*this));
+  }
+
+  void get_float4(float* pDst) {
+    for (uint i = 0; i < 4; i++)
+      pDst[i] = ((*this)[i] - component_traits::cMin) / float(component_traits::cMax - component_traits::cMin);
+  }
+
+  void get_float3(float* pDst) {
+    for (uint i = 0; i < 3; i++)
+      pDst[i] = ((*this)[i] - component_traits::cMin) / float(component_traits::cMax - component_traits::cMin);
+  }
+
+  static color_quad component_min(const color_quad& a, const color_quad& b) {
+    color_quad result;
+    for (uint i = 0; i < 4; i++)
+      result[i] = static_cast<component_type>(math::minimum(a[i], b[i]));
+    return result;
+  }
+
+  static color_quad component_max(const color_quad& a, const color_quad& b) {
+    color_quad result;
+    for (uint i = 0; i < 4; i++)
+      result[i] = static_cast<component_type>(math::maximum(a[i], b[i]));
+    return result;
+  }
+
+  static color_quad make_black() {
+    return color_quad(0, 0, 0, component_traits::cMax);
+  }
+
+  static color_quad make_white() {
+    return color_quad(component_traits::cMax, component_traits::cMax, component_traits::cMax, component_traits::cMax);
+  }
+};  // class color_quad
+
+template <typename c, typename q>
+struct scalar_type<color_quad<c, q> > {
+  enum { cFlag = true };
+  static inline void construct(color_quad<c, q>* p) {}
+  static inline void construct(color_quad<c, q>* p, const color_quad<c, q>& init) { memcpy(p, &init, sizeof(color_quad<c, q>)); }
+  static inline void construct_array(color_quad<c, q>*, uint) {}
+  static inline void destruct(color_quad<c, q>*) {}
+  static inline void destruct_array(color_quad<c, q>*, uint) {}
+};
+
+typedef color_quad<uint8, int> color_quad_u8;
+typedef color_quad<int8, int> color_quad_i8;
+typedef color_quad<int16, int> color_quad_i16;
+typedef color_quad<uint16, int> color_quad_u16;
+typedef color_quad<int32, int> color_quad_i32;
+typedef color_quad<uint32, uint> color_quad_u32;
+typedef color_quad<float, float> color_quad_f;
+typedef color_quad<double, double> color_quad_d;
+
+namespace color {
+inline uint elucidian_distance(uint r0, uint g0, uint b0, uint r1, uint g1, uint b1) {
+  int dr = (int)r0 - (int)r1;
+  int dg = (int)g0 - (int)g1;
+  int db = (int)b0 - (int)b1;
+
+  return static_cast<uint>(dr * dr + dg * dg + db * db);
+}
+
+inline uint elucidian_distance(uint r0, uint g0, uint b0, uint a0, uint r1, uint g1, uint b1, uint a1) {
+  int dr = (int)r0 - (int)r1;
+  int dg = (int)g0 - (int)g1;
+  int db = (int)b0 - (int)b1;
+  int da = (int)a0 - (int)a1;
+
+  return static_cast<uint>(dr * dr + dg * dg + db * db + da * da);
+}
+
+inline uint elucidian_distance(const color_quad_u8& c0, const color_quad_u8& c1, bool alpha) {
+  if (alpha)
+    return elucidian_distance(c0.r, c0.g, c0.b, c0.a, c1.r, c1.g, c1.b, c1.a);
+  else
+    return elucidian_distance(c0.r, c0.g, c0.b, c1.r, c1.g, c1.b);
+}
+
+inline uint weighted_elucidian_distance(uint r0, uint g0, uint b0, uint r1, uint g1, uint b1, uint wr, uint wg, uint wb) {
+  int dr = (int)r0 - (int)r1;
+  int dg = (int)g0 - (int)g1;
+  int db = (int)b0 - (int)b1;
+
+  return static_cast<uint>((wr * dr * dr) + (wg * dg * dg) + (wb * db * db));
+}
+
+inline uint weighted_elucidian_distance(
+    uint r0, uint g0, uint b0, uint a0,
+    uint r1, uint g1, uint b1, uint a1,
+    uint wr, uint wg, uint wb, uint wa) {
+  int dr = (int)r0 - (int)r1;
+  int dg = (int)g0 - (int)g1;
+  int db = (int)b0 - (int)b1;
+  int da = (int)a0 - (int)a1;
+
+  return static_cast<uint>((wr * dr * dr) + (wg * dg * dg) + (wb * db * db) + (wa * da * da));
+}
+
+inline uint weighted_elucidian_distance(const color_quad_u8& c0, const color_quad_u8& c1, uint wr, uint wg, uint wb, uint wa) {
+  return weighted_elucidian_distance(c0.r, c0.g, c0.b, c0.a, c1.r, c1.g, c1.b, c1.a, wr, wg, wb, wa);
+}
+
+//const uint cRWeight = 8;//24;
+//const uint cGWeight = 24;//73;
+//const uint cBWeight = 1;//3;
+
+const uint cRWeight = 8;   //24;
+const uint cGWeight = 25;  //73;
+const uint cBWeight = 1;   //3;
+
+inline uint color_distance(bool perceptual, const color_quad_u8& e1, const color_quad_u8& e2, bool alpha) {
+  if (perceptual) {
+    if (alpha)
+      return weighted_elucidian_distance(e1, e2, cRWeight, cGWeight, cBWeight, cRWeight + cGWeight + cBWeight);
+    else
+      return weighted_elucidian_distance(e1, e2, cRWeight, cGWeight, cBWeight, 0);
+  } else
+    return elucidian_distance(e1, e2, alpha);
+}
+
+inline uint peak_color_error(const color_quad_u8& e1, const color_quad_u8& e2) {
+  return math::maximum<uint>(labs(e1[0] - e2[0]), labs(e1[1] - e2[1]), labs(e1[2] - e2[2]));
+  //return math::square<int>(e1[0] - e2[0]) + math::square<int>(e1[1] - e2[1]) + math::square<int>(e1[2] - e2[2]);
+}
+
+// y - [0,255]
+// co - [-127,127]
+// cg - [-126,127]
+inline void RGB_to_YCoCg(int r, int g, int b, int& y, int& co, int& cg) {
+  y = (r >> 2) + (g >> 1) + (b >> 2);
+  co = (r >> 1) - (b >> 1);
+  cg = -(r >> 2) + (g >> 1) - (b >> 2);
+}
+
+inline void YCoCg_to_RGB(int y, int co, int cg, int& r, int& g, int& b) {
+  int tmp = y - cg;
+  g = y + cg;
+  r = tmp + co;
+  b = tmp - co;
+}
+
+static inline uint8 clamp_component(int i) {
+  if (static_cast<uint>(i) > 255U) {
+    if (i < 0)
+      i = 0;
+    else if (i > 255)
+      i = 255;
+  }
+  return static_cast<uint8>(i);
+}
+
+// RGB->YCbCr constants, scaled by 2^16
+const int YR = 19595, YG = 38470, YB = 7471, CB_R = -11059, CB_G = -21709, CB_B = 32768, CR_R = 32768, CR_G = -27439, CR_B = -5329;
+// YCbCr->RGB constants, scaled by 2^16
+const int R_CR = 91881, B_CB = 116130, G_CR = -46802, G_CB = -22554;
+
+inline int RGB_to_Y(const color_quad_u8& rgb) {
+  const int r = rgb[0], g = rgb[1], b = rgb[2];
+  return (r * YR + g * YG + b * YB + 32768) >> 16;
+}
+
+// RGB to YCbCr (same as JFIF JPEG).
+// Odd default biases account for 565 endpoint packing.
+inline void RGB_to_YCC(color_quad_u8& ycc, const color_quad_u8& rgb, int cb_bias = 123, int cr_bias = 125) {
+  const int r = rgb[0], g = rgb[1], b = rgb[2];
+  ycc.a = static_cast<uint8>((r * YR + g * YG + b * YB + 32768) >> 16);
+  ycc.r = clamp_component(cb_bias + ((r * CB_R + g * CB_G + b * CB_B + 32768) >> 16));
+  ycc.g = clamp_component(cr_bias + ((r * CR_R + g * CR_G + b * CR_B + 32768) >> 16));
+  ycc.b = 0;
+}
+
+// YCbCr to RGB.
+// Odd biases account for 565 endpoint packing.
+inline void YCC_to_RGB(color_quad_u8& rgb, const color_quad_u8& ycc, int cb_bias = 123, int cr_bias = 125) {
+  const int y = ycc.a;
+  const int cb = ycc.r - cb_bias;
+  const int cr = ycc.g - cr_bias;
+  rgb.r = clamp_component(y + ((R_CR * cr + 32768) >> 16));
+  rgb.g = clamp_component(y + ((G_CR * cr + G_CB * cb + 32768) >> 16));
+  rgb.b = clamp_component(y + ((B_CB * cb + 32768) >> 16));
+  rgb.a = 255;
+}
+
+// Float RGB->YCbCr constants
+const float S = 1.0f / 65536.0f;
+const float F_YR = S * YR, F_YG = S * YG, F_YB = S * YB, F_CB_R = S * CB_R, F_CB_G = S * CB_G, F_CB_B = S * CB_B, F_CR_R = S * CR_R, F_CR_G = S * CR_G, F_CR_B = S * CR_B;
+// Float YCbCr->RGB constants
+const float F_R_CR = S * R_CR, F_B_CB = S * B_CB, F_G_CR = S * G_CR, F_G_CB = S * G_CB;
+
+inline void RGB_to_YCC_float(color_quad_f& ycc, const color_quad_u8& rgb) {
+  const int r = rgb[0], g = rgb[1], b = rgb[2];
+  ycc.a = r * F_YR + g * F_YG + b * F_YB;
+  ycc.r = r * F_CB_R + g * F_CB_G + b * F_CB_B;
+  ycc.g = r * F_CR_R + g * F_CR_G + b * F_CR_B;
+  ycc.b = 0;
+}
+
+inline void YCC_float_to_RGB(color_quad_u8& rgb, const color_quad_f& ycc) {
+  float y = ycc.a, cb = ycc.r, cr = ycc.g;
+  rgb.r = color::clamp_component(static_cast<int>(.5f + y + F_R_CR * cr));
+  rgb.g = color::clamp_component(static_cast<int>(.5f + y + F_G_CR * cr + F_G_CB * cb));
+  rgb.b = color::clamp_component(static_cast<int>(.5f + y + F_B_CB * cb));
+  rgb.a = 255;
+}
+
+}  // namespace color
+
+// This class purposely trades off speed for extremely flexibility. It can handle any component swizzle, any pixel type from 1-4 components and 1-32 bits/component,
+// any pixel size between 1-16 bytes/pixel, any pixel stride, any color_quad data type (signed/unsigned/float 8/16/32 bits/component), and scaled/non-scaled components.
+// On the downside, it's freaking slow.
+class pixel_packer {
+ public:
+  pixel_packer() {
+    clear();
+  }
+
+  pixel_packer(uint num_comps, uint bits_per_comp, int pixel_stride = -1, bool reversed = false) {
+    init(num_comps, bits_per_comp, pixel_stride, reversed);
+  }
+
+  pixel_packer(const char* pComp_map, int pixel_stride = -1, int force_comp_size = -1) {
+    init(pComp_map, pixel_stride, force_comp_size);
+  }
+
+  void clear() {
+    utils::zero_this(this);
+  }
+
+  inline bool is_valid() const { return m_pixel_stride > 0; }
+
+  inline uint get_pixel_stride() const { return m_pixel_stride; }
+  void set_pixel_stride(uint n) { m_pixel_stride = n; }
+
+  uint get_num_comps() const { return m_num_comps; }
+  uint get_comp_size(uint index) const {
+    CRNLIB_ASSERT(index < 4);
+    return m_comp_size[index];
+  }
+  uint get_comp_ofs(uint index) const {
+    CRNLIB_ASSERT(index < 4);
+    return m_comp_ofs[index];
+  }
+  uint get_comp_max(uint index) const {
+    CRNLIB_ASSERT(index < 4);
+    return m_comp_max[index];
+  }
+  bool get_rgb_is_luma() const { return m_rgb_is_luma; }
+
+  template <typename color_quad_type>
+  const void* unpack(const void* p, color_quad_type& color, bool rescale = true) const {
+    const uint8* pSrc = static_cast<const uint8*>(p);
+
+    for (uint i = 0; i < 4; i++) {
+      const uint comp_size = m_comp_size[i];
+      if (!comp_size) {
+        if (color_quad_type::component_traits::cFloat)
+          color[i] = static_cast<typename color_quad_type::parameter_t>((i == 3) ? 1 : 0);
+        else
+          color[i] = static_cast<typename color_quad_type::parameter_t>((i == 3) ? color_quad_type::component_traits::cMax : 0);
+        continue;
+      }
+
+      uint n = 0, dst_bit_ofs = 0;
+      uint src_bit_ofs = m_comp_ofs[i];
+      while (dst_bit_ofs < comp_size) {
+        const uint byte_bit_ofs = src_bit_ofs & 7;
+        n |= ((pSrc[src_bit_ofs >> 3] >> byte_bit_ofs) << dst_bit_ofs);
+
+        const uint bits_read = 8 - byte_bit_ofs;
+        src_bit_ofs += bits_read;
+        dst_bit_ofs += bits_read;
+      }
+
+      const uint32 mx = m_comp_max[i];
+      n &= mx;
+
+      const uint32 h = static_cast<uint32>(color_quad_type::component_traits::cMax);
+
+      if (color_quad_type::component_traits::cFloat)
+        color.set_component(i, static_cast<typename color_quad_type::parameter_t>(n));
+      else if (rescale)
+        color.set_component(i, static_cast<typename color_quad_type::parameter_t>((static_cast<uint64>(n) * h + (mx >> 1U)) / mx));
+      else if (color_quad_type::component_traits::cSigned)
+        color.set_component(i, static_cast<typename color_quad_type::parameter_t>(math::minimum<uint32>(n, h)));
+      else
+        color.set_component(i, static_cast<typename color_quad_type::parameter_t>(n));
+    }
+
+    if (m_rgb_is_luma) {
+      color[0] = color[1];
+      color[2] = color[1];
+    }
+
+    return pSrc + m_pixel_stride;
+  }
+
+  template <typename color_quad_type>
+  void* pack(const color_quad_type& color, void* p, bool rescale = true) const {
+    uint8* pDst = static_cast<uint8*>(p);
+
+    for (uint i = 0; i < 4; i++) {
+      const uint comp_size = m_comp_size[i];
+      if (!comp_size)
+        continue;
+
+      uint32 mx = m_comp_max[i];
+
+      uint32 n;
+      if (color_quad_type::component_traits::cFloat) {
+        typename color_quad_type::parameter_t t = color[i];
+        if (t < 0.0f)
+          n = 0;
+        else if (t > static_cast<typename color_quad_type::parameter_t>(mx))
+          n = mx;
+        else
+          n = math::minimum<uint32>(static_cast<uint32>(floor(t + .5f)), mx);
+      } else if (rescale) {
+        if (color_quad_type::component_traits::cSigned)
+          n = math::maximum<int>(static_cast<int>(color[i]), 0);
+        else
+          n = static_cast<uint32>(color[i]);
+
+        const uint32 h = static_cast<uint32>(color_quad_type::component_traits::cMax);
+        n = static_cast<uint32>((static_cast<uint64>(n) * mx + (h >> 1)) / h);
+      } else {
+        if (color_quad_type::component_traits::cSigned)
+          n = math::minimum<uint32>(static_cast<uint32>(math::maximum<int>(static_cast<int>(color[i]), 0)), mx);
+        else
+          n = math::minimum<uint32>(static_cast<uint32>(color[i]), mx);
+      }
+
+      uint src_bit_ofs = 0;
+      uint dst_bit_ofs = m_comp_ofs[i];
+      while (src_bit_ofs < comp_size) {
+        const uint cur_byte_bit_ofs = (dst_bit_ofs & 7);
+        const uint cur_byte_bits = 8 - cur_byte_bit_ofs;
+
+        uint byte_val = pDst[dst_bit_ofs >> 3];
+        uint bit_mask = (mx << cur_byte_bit_ofs) & 0xFF;
+        byte_val &= ~bit_mask;
+        byte_val |= (n << cur_byte_bit_ofs);
+        pDst[dst_bit_ofs >> 3] = static_cast<uint8>(byte_val);
+
+        mx >>= cur_byte_bits;
+        n >>= cur_byte_bits;
+
+        dst_bit_ofs += cur_byte_bits;
+        src_bit_ofs += cur_byte_bits;
+      }
+    }
+
+    return pDst + m_pixel_stride;
+  }
+
+  bool init(uint num_comps, uint bits_per_comp, int pixel_stride = -1, bool reversed = false) {
+    clear();
+
+    if ((num_comps < 1) || (num_comps > 4) || (bits_per_comp < 1) || (bits_per_comp > 32)) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    for (uint i = 0; i < num_comps; i++) {
+      m_comp_size[i] = bits_per_comp;
+      m_comp_ofs[i] = i * bits_per_comp;
+      if (reversed)
+        m_comp_ofs[i] = ((num_comps - 1) * bits_per_comp) - m_comp_ofs[i];
+    }
+
+    for (uint i = 0; i < 4; i++)
+      m_comp_max[i] = static_cast<uint32>((1ULL << m_comp_size[i]) - 1ULL);
+
+    m_pixel_stride = (pixel_stride >= 0) ? pixel_stride : (num_comps * bits_per_comp + 7) / 8;
+
+    return true;
+  }
+
+  // Format examples:
+  // R16G16B16
+  // B5G6R5
+  // B5G5R5x1
+  // Y8A8
+  // A8R8G8B8
+  // First component is at LSB in memory. Assumes unsigned integer components, 1-32bits each.
+  bool init(const char* pComp_map, int pixel_stride = -1, int force_comp_size = -1) {
+    clear();
+
+    uint cur_bit_ofs = 0;
+
+    while (*pComp_map) {
+      char c = *pComp_map++;
+
+      int comp_index = -1;
+      if (c == 'R')
+        comp_index = 0;
+      else if (c == 'G')
+        comp_index = 1;
+      else if (c == 'B')
+        comp_index = 2;
+      else if (c == 'A')
+        comp_index = 3;
+      else if (c == 'Y')
+        comp_index = 4;
+      else if (c != 'x')
+        return false;
+
+      uint comp_size = 0;
+
+      uint n = *pComp_map;
+      if ((n >= '0') && (n <= '9')) {
+        comp_size = n - '0';
+        pComp_map++;
+
+        n = *pComp_map;
+        if ((n >= '0') && (n <= '9')) {
+          comp_size = (comp_size * 10) + (n - '0');
+          pComp_map++;
+        }
+      }
+
+      if (force_comp_size != -1)
+        comp_size = force_comp_size;
+
+      if ((!comp_size) || (comp_size > 32))
+        return false;
+
+      if (comp_index == 4) {
+        if (m_comp_size[0] || m_comp_size[1] || m_comp_size[2])
+          return false;
+
+        //m_comp_ofs[0] = m_comp_ofs[1] = m_comp_ofs[2] = cur_bit_ofs;
+        //m_comp_size[0] = m_comp_size[1] = m_comp_size[2] = comp_size;
+        m_comp_ofs[1] = cur_bit_ofs;
+        m_comp_size[1] = comp_size;
+        m_rgb_is_luma = true;
+        m_num_comps++;
+      } else if (comp_index >= 0) {
+        if (m_comp_size[comp_index])
+          return false;
+
+        m_comp_ofs[comp_index] = cur_bit_ofs;
+        m_comp_size[comp_index] = comp_size;
+        m_num_comps++;
+      }
+
+      cur_bit_ofs += comp_size;
+    }
+
+    for (uint i = 0; i < 4; i++)
+      m_comp_max[i] = static_cast<uint32>((1ULL << m_comp_size[i]) - 1ULL);
+
+    if (pixel_stride >= 0)
+      m_pixel_stride = pixel_stride;
+    else
+      m_pixel_stride = (cur_bit_ofs + 7) / 8;
+    return true;
+  }
+
+ private:
+  uint m_pixel_stride;
+  uint m_num_comps;
+  uint m_comp_size[4];
+  uint m_comp_ofs[4];
+  uint m_comp_max[4];
+  bool m_rgb_is_luma;
+};
+
+}  // namespace crnlib
@@ -0,0 +1,109 @@
+// File: crn_colorized_console.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_colorized_console.h"
+#ifdef CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+#endif
+
+namespace crnlib {
+void colorized_console::init() {
+  console::init();
+  console::add_console_output_func(console_output_func, NULL);
+}
+
+void colorized_console::deinit() {
+  console::remove_console_output_func(console_output_func);
+  console::deinit();
+}
+
+void colorized_console::tick() {
+}
+
+#ifdef CRNLIB_USE_WIN32_API
+bool colorized_console::console_output_func(eConsoleMessageType type, const char* pMsg, void*) {
+  if (console::get_output_disabled())
+    return true;
+
+  HANDLE cons = GetStdHandle(STD_OUTPUT_HANDLE);
+
+  DWORD attr = FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE;
+  switch (type) {
+    case cDebugConsoleMessage:
+      attr = FOREGROUND_BLUE | FOREGROUND_INTENSITY;
+      break;
+    case cMessageConsoleMessage:
+      attr = FOREGROUND_GREEN | FOREGROUND_BLUE | FOREGROUND_INTENSITY;
+      break;
+    case cWarningConsoleMessage:
+      attr = FOREGROUND_GREEN | FOREGROUND_RED | FOREGROUND_INTENSITY;
+      break;
+    case cErrorConsoleMessage:
+      attr = FOREGROUND_RED | FOREGROUND_INTENSITY;
+      break;
+    default:
+      break;
+  }
+
+  if (INVALID_HANDLE_VALUE != cons)
+    SetConsoleTextAttribute(cons, (WORD)attr);
+
+  if ((console::get_prefixes()) && (console::get_at_beginning_of_line())) {
+    switch (type) {
+      case cDebugConsoleMessage:
+        printf("Debug: %s", pMsg);
+        break;
+      case cWarningConsoleMessage:
+        printf("Warning: %s", pMsg);
+        break;
+      case cErrorConsoleMessage:
+        printf("Error: %s", pMsg);
+        break;
+      default:
+        printf("%s", pMsg);
+        break;
+    }
+  } else {
+    printf("%s", pMsg);
+  }
+
+  if (console::get_crlf())
+    printf("\n");
+
+  if (INVALID_HANDLE_VALUE != cons)
+    SetConsoleTextAttribute(cons, FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE);
+
+  return true;
+}
+#else
+bool colorized_console::console_output_func(eConsoleMessageType type, const char* pMsg, void*) {
+  if (console::get_output_disabled())
+    return true;
+
+  if ((console::get_prefixes()) && (console::get_at_beginning_of_line())) {
+    switch (type) {
+      case cDebugConsoleMessage:
+        printf("Debug: %s", pMsg);
+        break;
+      case cWarningConsoleMessage:
+        printf("Warning: %s", pMsg);
+        break;
+      case cErrorConsoleMessage:
+        printf("Error: %s", pMsg);
+        break;
+      default:
+        printf("%s", pMsg);
+        break;
+    }
+  } else {
+    printf("%s", pMsg);
+  }
+
+  if (console::get_crlf())
+    printf("\n");
+
+  return true;
+}
+#endif
+
+}  // namespace crnlib
@@ -0,0 +1,17 @@
+// File: crn_colorized_console.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_console.h"
+
+namespace crnlib {
+class colorized_console {
+ public:
+  static void init();
+  static void deinit();
+  static void tick();
+
+ private:
+  static bool console_output_func(eConsoleMessageType type, const char* pMsg, void* pData);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,410 @@
+// File: crn_command_line_params.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_command_line_params.h"
+#include "crn_console.h"
+#include "crn_cfile_stream.h"
+
+#ifdef WIN32
+#define CRNLIB_CMD_LINE_ALLOW_SLASH_PARAMS 1
+#endif
+
+#if CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+#endif
+namespace crnlib {
+void get_command_line_as_single_string(dynamic_string& cmd_line, int argc, char* argv[]) {
+  argc, argv;
+#if CRNLIB_USE_WIN32_API
+  cmd_line.set(GetCommandLineA());
+#else
+  cmd_line.clear();
+  for (int i = 0; i < argc; i++) {
+    dynamic_string tmp(argv[i]);
+    if ((tmp.front() != '"') && (tmp.front() != '-') && (tmp.front() != '@'))
+      tmp = "\"" + tmp + "\"";
+    if (cmd_line.get_len())
+      cmd_line += " ";
+    cmd_line += tmp;
+  }
+#endif
+}
+
+command_line_params::command_line_params() {
+}
+
+void command_line_params::clear() {
+  m_params.clear();
+
+  m_param_map.clear();
+}
+
+bool command_line_params::split_params(const char* p, dynamic_string_array& params) {
+  bool within_param = false;
+  bool within_quote = false;
+
+  uint ofs = 0;
+  dynamic_string str;
+
+  while (p[ofs]) {
+    const char c = p[ofs];
+
+    if (within_param) {
+      if (within_quote) {
+        if (c == '"')
+          within_quote = false;
+
+        str.append_char(c);
+      } else if ((c == ' ') || (c == '\t')) {
+        if (!str.is_empty()) {
+          params.push_back(str);
+          str.clear();
+        }
+        within_param = false;
+      } else {
+        if (c == '"')
+          within_quote = true;
+
+        str.append_char(c);
+      }
+    } else if ((c != ' ') && (c != '\t')) {
+      within_param = true;
+
+      if (c == '"')
+        within_quote = true;
+
+      str.append_char(c);
+    }
+
+    ofs++;
+  }
+
+  if (within_quote) {
+    console::error("Unmatched quote in command line \"%s\"", p);
+    return false;
+  }
+
+  if (!str.is_empty())
+    params.push_back(str);
+
+  return true;
+}
+
+bool command_line_params::load_string_file(const char* pFilename, dynamic_string_array& strings) {
+  cfile_stream in_stream;
+  if (!in_stream.open(pFilename, cDataStreamReadable | cDataStreamSeekable)) {
+    console::error("Unable to open file \"%s\" for reading!", pFilename);
+    return false;
+  }
+
+  dynamic_string ansi_str;
+
+  for (;;) {
+    if (!in_stream.read_line(ansi_str))
+      break;
+
+    ansi_str.trim();
+    if (ansi_str.is_empty())
+      continue;
+
+    strings.push_back(dynamic_string(ansi_str.get_ptr()));
+  }
+
+  return true;
+}
+
+bool command_line_params::parse(const dynamic_string_array& params, uint n, const param_desc* pParam_desc) {
+  CRNLIB_ASSERT(n && pParam_desc);
+
+  m_params = params;
+
+  uint arg_index = 0;
+  while (arg_index < params.size()) {
+    const uint cur_arg_index = arg_index;
+    const dynamic_string& src_param = params[arg_index++];
+
+    if (src_param.is_empty())
+      continue;
+#if CRNLIB_CMD_LINE_ALLOW_SLASH_PARAMS
+    if ((src_param[0] == '/') || (src_param[0] == '-'))
+#else
+    if (src_param[0] == '-')
+#endif
+    {
+      if (src_param.get_len() < 2) {
+        console::error("Invalid command line parameter: \"%s\"", src_param.get_ptr());
+        return false;
+      }
+
+      dynamic_string key_str(src_param);
+
+      key_str.right(1);
+
+      int modifier = 0;
+      char c = key_str[key_str.get_len() - 1];
+      if (c == '+')
+        modifier = 1;
+      else if (c == '-')
+        modifier = -1;
+
+      if (modifier)
+        key_str.left(key_str.get_len() - 1);
+
+      uint param_index;
+      for (param_index = 0; param_index < n; param_index++)
+        if (key_str == pParam_desc[param_index].m_pName)
+          break;
+
+      if (param_index == n) {
+        console::error("Unrecognized command line parameter: \"%s\"", src_param.get_ptr());
+        return false;
+      }
+
+      const param_desc& desc = pParam_desc[param_index];
+
+      const uint cMaxValues = 16;
+      dynamic_string val_str[cMaxValues];
+      uint num_val_strs = 0;
+      if (desc.m_num_values) {
+        CRNLIB_ASSERT(desc.m_num_values <= cMaxValues);
+
+        if ((arg_index + desc.m_num_values) > params.size()) {
+          console::error("Expected %u value(s) after command line parameter: \"%s\"", desc.m_num_values, src_param.get_ptr());
+          return false;
+        }
+
+        for (uint v = 0; v < desc.m_num_values; v++)
+          val_str[num_val_strs++] = params[arg_index++];
+      }
+
+      dynamic_string_array strings;
+
+      if ((desc.m_support_listing_file) && (val_str[0].get_len() >= 2) && (val_str[0][0] == '@')) {
+        dynamic_string filename(val_str[0]);
+        filename.right(1);
+        filename.unquote();
+
+        if (!load_string_file(filename.get_ptr(), strings)) {
+          console::error("Failed loading listing file \"%s\"!", filename.get_ptr());
+          return false;
+        }
+      } else {
+        for (uint v = 0; v < num_val_strs; v++) {
+          val_str[v].unquote();
+          strings.push_back(val_str[v]);
+        }
+      }
+
+      param_value pv;
+      pv.m_values.swap(strings);
+      pv.m_index = cur_arg_index;
+      pv.m_modifier = (int8)modifier;
+      m_param_map.insert(std::make_pair(key_str, pv));
+    } else {
+      param_value pv;
+      pv.m_values.push_back(src_param);
+      pv.m_values.back().unquote();
+      pv.m_index = cur_arg_index;
+      m_param_map.insert(std::make_pair(g_empty_dynamic_string, pv));
+    }
+  }
+
+  return true;
+}
+
+bool command_line_params::parse(const char* pCmd_line, uint n, const param_desc* pParam_desc, bool skip_first_param) {
+  CRNLIB_ASSERT(n && pParam_desc);
+
+  dynamic_string_array p;
+  if (!split_params(pCmd_line, p))
+    return 0;
+
+  if (p.empty())
+    return 0;
+
+  if (skip_first_param)
+    p.erase(0U);
+
+  return parse(p, n, pParam_desc);
+}
+
+bool command_line_params::is_param(uint index) const {
+  CRNLIB_ASSERT(index < m_params.size());
+  if (index >= m_params.size())
+    return false;
+
+  const dynamic_string& w = m_params[index];
+  if (w.is_empty())
+    return false;
+
+#if CRNLIB_CMD_LINE_ALLOW_SLASH_PARAMS
+  return (w.get_len() >= 2) && ((w[0] == '-') || (w[0] == '/'));
+#else
+  return (w.get_len() >= 2) && (w[0] == '-');
+#endif
+}
+
+uint command_line_params::find(uint num_keys, const char** ppKeys, crnlib::vector<param_map_const_iterator>* pIterators, crnlib::vector<uint>* pUnmatched_indices) const {
+  CRNLIB_ASSERT(ppKeys);
+
+  if (pUnmatched_indices) {
+    pUnmatched_indices->resize(m_params.size());
+    for (uint i = 0; i < m_params.size(); i++)
+      (*pUnmatched_indices)[i] = i;
+  }
+
+  uint n = 0;
+  for (uint i = 0; i < num_keys; i++) {
+    const char* pKey = ppKeys[i];
+
+    param_map_const_iterator begin, end;
+    find(pKey, begin, end);
+
+    while (begin != end) {
+      if (pIterators)
+        pIterators->push_back(begin);
+
+      if (pUnmatched_indices) {
+        int k = pUnmatched_indices->find(begin->second.m_index);
+        if (k >= 0)
+          pUnmatched_indices->erase_unordered(k);
+      }
+
+      n++;
+      begin++;
+    }
+  }
+
+  return n;
+}
+
+void command_line_params::find(const char* pKey, param_map_const_iterator& begin, param_map_const_iterator& end) const {
+  dynamic_string key(pKey);
+  begin = m_param_map.lower_bound(key);
+  end = m_param_map.upper_bound(key);
+}
+
+uint command_line_params::get_count(const char* pKey) const {
+  param_map_const_iterator begin, end;
+  find(pKey, begin, end);
+
+  uint n = 0;
+
+  while (begin != end) {
+    n++;
+    begin++;
+  }
+
+  return n;
+}
+
+command_line_params::param_map_const_iterator command_line_params::get_param(const char* pKey, uint index) const {
+  param_map_const_iterator begin, end;
+  find(pKey, begin, end);
+
+  if (begin == end)
+    return m_param_map.end();
+
+  uint n = 0;
+
+  while ((begin != end) && (n != index)) {
+    n++;
+    begin++;
+  }
+
+  if (begin == end)
+    return m_param_map.end();
+
+  return begin;
+}
+
+bool command_line_params::has_value(const char* pKey, uint index) const {
+  return get_num_values(pKey, index) != 0;
+}
+
+uint command_line_params::get_num_values(const char* pKey, uint index) const {
+  param_map_const_iterator it = get_param(pKey, index);
+
+  if (it == end())
+    return 0;
+
+  return it->second.m_values.size();
+}
+
+bool command_line_params::get_value_as_bool(const char* pKey, uint index, bool def) const {
+  param_map_const_iterator it = get_param(pKey, index);
+  if (it == end())
+    return def;
+
+  if (it->second.m_modifier)
+    return it->second.m_modifier > 0;
+  else
+    return true;
+}
+
+int command_line_params::get_value_as_int(const char* pKey, uint index, int def, int l, int h, uint value_index) const {
+  param_map_const_iterator it = get_param(pKey, index);
+  if ((it == end()) || (value_index >= it->second.m_values.size()))
+    return def;
+
+  int val;
+  const char* p = it->second.m_values[value_index].get_ptr();
+  if (!string_to_int(p, val)) {
+    crnlib::console::warning("Invalid value specified for parameter \"%s\", using default value of %i", pKey, def);
+    return def;
+  }
+
+  if (val < l) {
+    crnlib::console::warning("Value %i for parameter \"%s\" is out of range, clamping to %i", val, pKey, l);
+    val = l;
+  } else if (val > h) {
+    crnlib::console::warning("Value %i for parameter \"%s\" is out of range, clamping to %i", val, pKey, h);
+    val = h;
+  }
+
+  return val;
+}
+
+float command_line_params::get_value_as_float(const char* pKey, uint index, float def, float l, float h, uint value_index) const {
+  param_map_const_iterator it = get_param(pKey, index);
+  if ((it == end()) || (value_index >= it->second.m_values.size()))
+    return def;
+
+  float val;
+  const char* p = it->second.m_values[value_index].get_ptr();
+  if (!string_to_float(p, val)) {
+    crnlib::console::warning("Invalid value specified for float parameter \"%s\", using default value of %f", pKey, def);
+    return def;
+  }
+
+  if (val < l) {
+    crnlib::console::warning("Value %f for parameter \"%s\" is out of range, clamping to %f", val, pKey, l);
+    val = l;
+  } else if (val > h) {
+    crnlib::console::warning("Value %f for parameter \"%s\" is out of range, clamping to %f", val, pKey, h);
+    val = h;
+  }
+
+  return val;
+}
+
+bool command_line_params::get_value_as_string(const char* pKey, uint index, dynamic_string& value, uint value_index) const {
+  param_map_const_iterator it = get_param(pKey, index);
+  if ((it == end()) || (value_index >= it->second.m_values.size())) {
+    value.empty();
+    return false;
+  }
+
+  value = it->second.m_values[value_index];
+  return true;
+}
+
+const dynamic_string& command_line_params::get_value_as_string_or_empty(const char* pKey, uint index, uint value_index) const {
+  param_map_const_iterator it = get_param(pKey, index);
+  if ((it == end()) || (value_index >= it->second.m_values.size()))
+    return g_empty_dynamic_string;
+
+  return it->second.m_values[value_index];
+}
+
+}  // namespace crnlib
@@ -0,0 +1,83 @@
+// File: crn_command_line_params.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_value.h"
+#include <map>
+
+namespace crnlib {
+// Returns the command line passed to the app as a string.
+// On systems where this isn't trivial, this function combines together the separate arguments, quoting and adding spaces as needed.
+void get_command_line_as_single_string(dynamic_string& cmd_line, int argc, char* argv[]);
+
+class command_line_params {
+ public:
+  struct param_value {
+    inline param_value()
+        : m_index(0), m_modifier(0) {}
+
+    dynamic_string_array m_values;
+    uint m_index;
+    int8 m_modifier;
+  };
+
+  typedef std::multimap<dynamic_string, param_value> param_map;
+  typedef param_map::const_iterator param_map_const_iterator;
+  typedef param_map::iterator param_map_iterator;
+
+  command_line_params();
+
+  void clear();
+
+  static bool split_params(const char* p, dynamic_string_array& params);
+
+  struct param_desc {
+    const char* m_pName;
+    uint m_num_values;
+    bool m_support_listing_file;
+  };
+
+  bool parse(const dynamic_string_array& params, uint n, const param_desc* pParam_desc);
+  bool parse(const char* pCmd_line, uint n, const param_desc* pParam_desc, bool skip_first_param = true);
+
+  const dynamic_string_array& get_array() const { return m_params; }
+
+  bool is_param(uint index) const;
+
+  const param_map& get_map() const { return m_param_map; }
+
+  uint get_num_params() const { return static_cast<uint>(m_param_map.size()); }
+
+  param_map_const_iterator begin() const { return m_param_map.begin(); }
+  param_map_const_iterator end() const { return m_param_map.end(); }
+
+  uint find(uint num_keys, const char** ppKeys, crnlib::vector<param_map_const_iterator>* pIterators, crnlib::vector<uint>* pUnmatched_indices) const;
+
+  void find(const char* pKey, param_map_const_iterator& begin, param_map_const_iterator& end) const;
+
+  uint get_count(const char* pKey) const;
+
+  // Returns end() if param cannot be found, or index is out of range.
+  param_map_const_iterator get_param(const char* pKey, uint index) const;
+
+  bool has_key(const char* pKey) const { return get_param(pKey, 0) != end(); }
+
+  bool has_value(const char* pKey, uint index) const;
+  uint get_num_values(const char* pKey, uint index) const;
+
+  bool get_value_as_bool(const char* pKey, uint index = 0, bool def = false) const;
+
+  int get_value_as_int(const char* pKey, uint index, int def, int l = INT_MIN, int h = INT_MAX, uint value_index = 0) const;
+  float get_value_as_float(const char* pKey, uint index, float def = 0.0f, float l = -math::cNearlyInfinite, float h = math::cNearlyInfinite, uint value_index = 0) const;
+
+  bool get_value_as_string(const char* pKey, uint index, dynamic_string& value, uint value_index = 0) const;
+  const dynamic_string& get_value_as_string_or_empty(const char* pKey, uint index = 0, uint value_index = 0) const;
+
+ private:
+  dynamic_string_array m_params;
+
+  param_map m_param_map;
+
+  static bool load_string_file(const char* pFilename, dynamic_string_array& strings);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,125 @@
+// File: crn_comp.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+#include "../inc/crn_defs.h"
+
+#include "../inc/crnlib.h"
+#include "crn_symbol_codec.h"
+#include "crn_dxt_hc.h"
+#include "crn_image.h"
+#include "crn_image_utils.h"
+#include "crn_texture_comp.h"
+
+namespace crnlib {
+class crn_comp : public itexture_comp {
+  CRNLIB_NO_COPY_OR_ASSIGNMENT_OP(crn_comp);
+
+ public:
+  crn_comp();
+  virtual ~crn_comp();
+
+  virtual const char* get_ext() const { return "CRN"; }
+
+  virtual bool compress_init(const crn_comp_params&) { return true; };
+  virtual bool compress_pass(const crn_comp_params& params, float* pEffective_bitrate);
+  virtual void compress_deinit();
+
+  virtual const crnlib::vector<uint8>& get_comp_data() const { return m_comp_data; }
+  virtual crnlib::vector<uint8>& get_comp_data() { return m_comp_data; }
+
+  uint get_comp_data_size() const { return m_comp_data.size(); }
+  const uint8* get_comp_data_ptr() const { return m_comp_data.size() ? &m_comp_data[0] : NULL; }
+
+ private:
+  task_pool m_task_pool;
+  const crn_comp_params* m_pParams;
+
+  image_u8 m_images[cCRNMaxFaces][cCRNMaxLevels];
+
+  enum comp {
+    cColor,
+    cAlpha0,
+    cAlpha1,
+    cNumComps
+  };
+
+  bool m_has_comp[cNumComps];
+  bool m_has_etc_color_blocks;
+  bool m_has_subblocks;
+
+  struct level_details {
+    uint first_block;
+    uint num_blocks;
+    uint block_width;
+  };
+  crnlib::vector<level_details> m_levels;
+
+  uint m_total_blocks;
+  crnlib::vector<uint32> m_color_endpoints;
+  crnlib::vector<uint32> m_alpha_endpoints;
+  crnlib::vector<uint32> m_color_selectors;
+  crnlib::vector<uint64> m_alpha_selectors;
+  crnlib::vector<dxt_hc::endpoint_indices_details> m_endpoint_indices;
+  crnlib::vector<dxt_hc::selector_indices_details> m_selector_indices;
+
+  crnd::crn_header m_crn_header;
+  crnlib::vector<uint8> m_comp_data;
+
+  dxt_hc m_hvq;
+
+  symbol_histogram m_reference_hist;
+  static_huffman_data_model m_reference_dm;
+
+  crnlib::vector<uint16> m_endpoint_remaping[2];
+  symbol_histogram m_endpoint_index_hist[2];
+  static_huffman_data_model m_endpoint_index_dm[2];
+
+  crnlib::vector<uint16> m_selector_remaping[2];
+  symbol_histogram m_selector_index_hist[2];
+  static_huffman_data_model m_selector_index_dm[2];
+
+  crnlib::vector<uint8> m_packed_blocks[cCRNMaxLevels];
+  crnlib::vector<uint8> m_packed_data_models;
+  crnlib::vector<uint8> m_packed_color_endpoints;
+  crnlib::vector<uint8> m_packed_color_selectors;
+  crnlib::vector<uint8> m_packed_alpha_endpoints;
+  crnlib::vector<uint8> m_packed_alpha_selectors;
+
+  bool pack_color_endpoints(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
+  bool pack_color_endpoints_etc(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
+  bool pack_color_selectors(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
+  bool pack_alpha_endpoints(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
+  bool pack_alpha_selectors(crnlib::vector<uint8>& packed_data, const crnlib::vector<uint16>& remapping);
+  bool pack_blocks(
+    uint group,
+    bool clear_histograms,
+    symbol_codec* pCodec,
+    const crnlib::vector<uint16>* pColor_endpoint_remap,
+    const crnlib::vector<uint16>* pColor_selector_remap,
+    const crnlib::vector<uint16>* pAlpha_endpoint_remap,
+    const crnlib::vector<uint16>* pAlpha_selector_remap
+  );
+
+  bool alias_images();
+  void clear();
+  bool quantize_images();
+
+  void optimize_color_endpoints_task(uint64 data, void* pData_ptr);
+  void optimize_color_selectors();
+  void optimize_color();
+
+  void optimize_alpha_endpoints_task(uint64 data, void* pData_ptr);
+  void optimize_alpha_selectors();
+  void optimize_alpha();
+
+  bool pack_data_models();
+  static void append_vec(crnlib::vector<uint8>& a, const void* p, uint size);
+  static void append_vec(crnlib::vector<uint8>& a, const crnlib::vector<uint8>& b);
+  bool create_comp_data();
+
+  bool update_progress(uint phase_index, uint subphase_index, uint subphase_total);
+  bool compress_internal();
+};
+
+}  // namespace crnlib
@@ -0,0 +1,200 @@
+// File: crn_console.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_console.h"
+#include "crn_data_stream.h"
+#include "crn_threading.h"
+
+namespace crnlib {
+eConsoleMessageType console::m_default_category = cInfoConsoleMessage;
+crnlib::vector<console::console_func> console::m_output_funcs;
+bool console::m_crlf = true;
+bool console::m_prefixes = true;
+bool console::m_output_disabled;
+data_stream* console::m_pLog_stream;
+mutex* console::m_pMutex;
+uint console::m_num_messages[cCMTTotal];
+bool console::m_at_beginning_of_line = true;
+
+const uint cConsoleBufSize = 4096;
+
+void console::init() {
+  if (!m_pMutex) {
+    m_pMutex = crnlib_new<mutex>();
+  }
+}
+
+void console::deinit() {
+  if (m_pMutex) {
+    crnlib_delete(m_pMutex);
+    m_pMutex = NULL;
+  }
+}
+
+void console::disable_crlf() {
+  init();
+
+  m_crlf = false;
+}
+
+void console::enable_crlf() {
+  init();
+
+  m_crlf = true;
+}
+
+void console::vprintf(eConsoleMessageType type, const char* p, va_list args) {
+  init();
+
+  scoped_mutex lock(*m_pMutex);
+
+  m_num_messages[type]++;
+
+  char buf[cConsoleBufSize];
+  vsprintf_s(buf, cConsoleBufSize, p, args);
+
+  bool handled = false;
+
+  if (m_output_funcs.size()) {
+    for (uint i = 0; i < m_output_funcs.size(); i++)
+      if (m_output_funcs[i].m_func(type, buf, m_output_funcs[i].m_pData))
+        handled = true;
+  }
+
+  const char* pPrefix = NULL;
+  if ((m_prefixes) && (m_at_beginning_of_line)) {
+    switch (type) {
+      case cDebugConsoleMessage:
+        pPrefix = "Debug: ";
+        break;
+      case cWarningConsoleMessage:
+        pPrefix = "Warning: ";
+        break;
+      case cErrorConsoleMessage:
+        pPrefix = "Error: ";
+        break;
+      default:
+        break;
+    }
+  }
+
+  if ((!m_output_disabled) && (!handled)) {
+    if (pPrefix)
+      ::printf("%s", pPrefix);
+    ::printf(m_crlf ? "%s\n" : "%s", buf);
+  }
+
+  uint n = strlen(buf);
+  m_at_beginning_of_line = (m_crlf) || ((n) && (buf[n - 1] == '\n'));
+
+  if ((type != cProgressConsoleMessage) && (m_pLog_stream)) {
+    // Yes this is bad.
+    dynamic_string tmp_buf(buf);
+
+    tmp_buf.translate_lf_to_crlf();
+
+    m_pLog_stream->printf(m_crlf ? "%s\r\n" : "%s", tmp_buf.get_ptr());
+    m_pLog_stream->flush();
+  }
+}
+
+void console::printf(eConsoleMessageType type, const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(type, p, args);
+  va_end(args);
+}
+
+void console::printf(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(m_default_category, p, args);
+  va_end(args);
+}
+
+void console::set_default_category(eConsoleMessageType category) {
+  init();
+
+  m_default_category = category;
+}
+
+eConsoleMessageType console::get_default_category() {
+  init();
+
+  return m_default_category;
+}
+
+void console::add_console_output_func(console_output_func pFunc, void* pData) {
+  init();
+
+  scoped_mutex lock(*m_pMutex);
+
+  m_output_funcs.push_back(console_func(pFunc, pData));
+}
+
+void console::remove_console_output_func(console_output_func pFunc) {
+  init();
+
+  scoped_mutex lock(*m_pMutex);
+
+  for (int i = m_output_funcs.size() - 1; i >= 0; i--) {
+    if (m_output_funcs[i].m_func == pFunc) {
+      m_output_funcs.erase(m_output_funcs.begin() + i);
+    }
+  }
+
+  if (!m_output_funcs.size()) {
+    m_output_funcs.clear();
+  }
+}
+
+void console::progress(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cProgressConsoleMessage, p, args);
+  va_end(args);
+}
+
+void console::info(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cInfoConsoleMessage, p, args);
+  va_end(args);
+}
+
+void console::message(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cMessageConsoleMessage, p, args);
+  va_end(args);
+}
+
+void console::cons(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cConsoleConsoleMessage, p, args);
+  va_end(args);
+}
+
+void console::debug(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cDebugConsoleMessage, p, args);
+  va_end(args);
+}
+
+void console::warning(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cWarningConsoleMessage, p, args);
+  va_end(args);
+}
+
+void console::error(const char* p, ...) {
+  va_list args;
+  va_start(args, p);
+  vprintf(cErrorConsoleMessage, p, args);
+  va_end(args);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,121 @@
+// File: crn_console.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dynamic_string.h"
+
+#ifdef WIN32
+#include <tchar.h>
+#include <conio.h>
+#endif
+namespace crnlib {
+class dynamic_string;
+class data_stream;
+class mutex;
+
+enum eConsoleMessageType {
+  cDebugConsoleMessage,     // debugging messages
+  cProgressConsoleMessage,  // progress messages
+  cInfoConsoleMessage,      // ordinary messages
+  cConsoleConsoleMessage,   // user console output
+  cMessageConsoleMessage,   // high importance messages
+  cWarningConsoleMessage,   // warnings
+  cErrorConsoleMessage,     // errors
+
+  cCMTTotal
+};
+
+typedef bool (*console_output_func)(eConsoleMessageType type, const char* pMsg, void* pData);
+
+class console {
+ public:
+  static void init();
+  static void deinit();
+
+  static bool is_initialized() { return m_pMutex != NULL; }
+
+  static void set_default_category(eConsoleMessageType category);
+  static eConsoleMessageType get_default_category();
+
+  static void add_console_output_func(console_output_func pFunc, void* pData);
+  static void remove_console_output_func(console_output_func pFunc);
+
+  static void printf(const char* p, ...);
+
+  static void vprintf(eConsoleMessageType type, const char* p, va_list args);
+  static void printf(eConsoleMessageType type, const char* p, ...);
+
+  static void cons(const char* p, ...);
+  static void debug(const char* p, ...);
+  static void progress(const char* p, ...);
+  static void info(const char* p, ...);
+  static void message(const char* p, ...);
+  static void warning(const char* p, ...);
+  static void error(const char* p, ...);
+
+  // FIXME: All console state is currently global!
+  static void disable_prefixes();
+  static void enable_prefixes();
+  static bool get_prefixes() { return m_prefixes; }
+  static bool get_at_beginning_of_line() { return m_at_beginning_of_line; }
+
+  static void disable_crlf();
+  static void enable_crlf();
+  static bool get_crlf() { return m_crlf; }
+
+  static void disable_output() { m_output_disabled = true; }
+  static void enable_output() { m_output_disabled = false; }
+  static bool get_output_disabled() { return m_output_disabled; }
+
+  static void set_log_stream(data_stream* pStream) { m_pLog_stream = pStream; }
+  static data_stream* get_log_stream() { return m_pLog_stream; }
+
+  static uint get_num_messages(eConsoleMessageType type) { return m_num_messages[type]; }
+
+ private:
+  static eConsoleMessageType m_default_category;
+
+  struct console_func {
+    console_func(console_output_func func = NULL, void* pData = NULL)
+        : m_func(func), m_pData(pData) {}
+
+    console_output_func m_func;
+    void* m_pData;
+  };
+  static crnlib::vector<console_func> m_output_funcs;
+
+  static bool m_crlf, m_prefixes, m_output_disabled;
+
+  static data_stream* m_pLog_stream;
+
+  static mutex* m_pMutex;
+
+  static uint m_num_messages[cCMTTotal];
+
+  static bool m_at_beginning_of_line;
+};
+
+#if defined(WIN32)
+inline int crn_getch() {
+  return _getch();
+}
+#elif defined(__GNUC__)
+#include <termios.h>
+#include <unistd.h>
+inline int crn_getch() {
+  struct termios oldt, newt;
+  int ch;
+  tcgetattr(STDIN_FILENO, &oldt);
+  newt = oldt;
+  newt.c_lflag &= ~(ICANON | ECHO);
+  tcsetattr(STDIN_FILENO, TCSANOW, &newt);
+  ch = getchar();
+  tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
+  return ch;
+}
+#else
+inline int crn_getch() {
+  printf("crn_getch: Unimplemented");
+  return 0;
+}
+#endif
+}  // namespace crnlib
@@ -0,0 +1,13 @@
+// File: crn_core.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+
+#if CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+#endif
+
+namespace crnlib {
+const char* g_copyright_str = "Copyright (c) 2010-2016 Richard Geldreich, Jr. and Binomial LLC";
+const char* g_sig_str = "C8cfRlaorj0wLtnMSxrBJxTC85rho2L9hUZKHcBL";
+
+}  // namespace crnlib
@@ -0,0 +1,178 @@
+// File: crn_core.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+#if defined(WIN32) && defined(_MSC_VER)
+#pragma warning(disable : 4201)  // nonstandard extension used : nameless struct/union
+#pragma warning(disable : 4127)  // conditional expression is constant
+#pragma warning(disable : 4793)  // function compiled as native
+#pragma warning(disable : 4324)  // structure was padded due to __declspec(align())
+#endif
+
+#if defined(WIN32) && !defined(CRNLIB_ANSI_CPLUSPLUS)
+// MSVC or MinGW, x86 or x64, Win32 API's for threading and Win32 Interlocked API's or GCC built-ins for atomic ops.
+#ifdef NDEBUG
+// Ensure checked iterators are disabled. Note: Be sure anything else that links against this lib also #define's this stuff, or remove this crap!
+#define _SECURE_SCL 0
+#define _HAS_ITERATOR_DEBUGGING 0
+#endif
+#ifndef _DLL
+// If we're using the DLL form of the run-time libs, we're also going to be enabling exceptions because we'll be building CLR apps.
+// Otherwise, we disable exceptions for a small speed boost.
+#define _HAS_EXCEPTIONS 0
+#endif
+#define NOMINMAX
+
+#define CRNLIB_USE_WIN32_API 1
+
+#if defined(__MINGW32__) || defined(__MINGW64__)
+#define CRNLIB_USE_GCC_ATOMIC_BUILTINS 1
+#else
+#define CRNLIB_USE_WIN32_ATOMIC_FUNCTIONS 1
+#endif
+
+#define CRNLIB_PLATFORM_PC 1
+
+#if defined(_WIN64) || defined(__MINGW64__) || defined(_LP64) || defined(__LP64__)
+#define CRNLIB_PLATFORM_PC_X64 1
+#define CRNLIB_64BIT_POINTERS 1
+#define CRNLIB_CPU_HAS_64BIT_REGISTERS 1
+#define CRNLIB_LITTLE_ENDIAN_CPU 1
+#else
+#define CRNLIB_PLATFORM_PC_X86 1
+#define CRNLIB_64BIT_POINTERS 0
+#define CRNLIB_CPU_HAS_64BIT_REGISTERS 0
+#define CRNLIB_LITTLE_ENDIAN_CPU 1
+#endif
+
+#define CRNLIB_USE_UNALIGNED_INT_LOADS 1
+#define CRNLIB_RESTRICT __restrict
+#define CRNLIB_FORCE_INLINE __forceinline
+
+#if defined(_MSC_VER) || defined(__MINGW32__) || defined(__MINGW64__)
+#define CRNLIB_USE_MSVC_INTRINSICS 1
+#endif
+
+#define CRNLIB_INT64_FORMAT_SPECIFIER "%I64i"
+#define CRNLIB_UINT64_FORMAT_SPECIFIER "%I64u"
+
+#define CRNLIB_STDCALL __stdcall
+#define CRNLIB_MEMORY_IMPORT_BARRIER
+#define CRNLIB_MEMORY_EXPORT_BARRIER
+#elif defined(__GNUC__) && !defined(CRNLIB_ANSI_CPLUSPLUS)
+// GCC x86 or x64, pthreads for threading and GCC built-ins for atomic ops.
+#define CRNLIB_PLATFORM_PC 1
+
+#if defined(_WIN64) || defined(__MINGW64__) || defined(_LP64) || defined(__LP64__)
+#define CRNLIB_PLATFORM_PC_X64 1
+#define CRNLIB_64BIT_POINTERS 1
+#define CRNLIB_CPU_HAS_64BIT_REGISTERS 1
+#else
+#define CRNLIB_PLATFORM_PC_X86 1
+#define CRNLIB_64BIT_POINTERS 0
+#define CRNLIB_CPU_HAS_64BIT_REGISTERS 0
+#endif
+
+#define CRNLIB_USE_UNALIGNED_INT_LOADS 1
+
+#define CRNLIB_LITTLE_ENDIAN_CPU 1
+
+#define CRNLIB_USE_PTHREADS_API 1
+#define CRNLIB_USE_GCC_ATOMIC_BUILTINS 1
+
+#define CRNLIB_RESTRICT
+
+#define CRNLIB_FORCE_INLINE inline __attribute__((__always_inline__, __gnu_inline__))
+
+#define CRNLIB_INT64_FORMAT_SPECIFIER "%lli"
+#define CRNLIB_UINT64_FORMAT_SPECIFIER "%llu"
+
+#define CRNLIB_STDCALL
+#define CRNLIB_MEMORY_IMPORT_BARRIER
+#define CRNLIB_MEMORY_EXPORT_BARRIER
+#else
+// Vanilla ANSI-C/C++
+// No threading support, unaligned loads are NOT okay.
+#if defined(_WIN64) || defined(__MINGW64__) || defined(_LP64) || defined(__LP64__)
+#define CRNLIB_64BIT_POINTERS 1
+#define CRNLIB_CPU_HAS_64BIT_REGISTERS 1
+#else
+#define CRNLIB_64BIT_POINTERS 0
+#define CRNLIB_CPU_HAS_64BIT_REGISTERS 0
+#endif
+
+#define CRNLIB_USE_UNALIGNED_INT_LOADS 0
+
+#if __BIG_ENDIAN__
+#define CRNLIB_BIG_ENDIAN_CPU 1
+#else
+#define CRNLIB_LITTLE_ENDIAN_CPU 1
+#endif
+
+#define CRNLIB_USE_GCC_ATOMIC_BUILTINS 0
+#define CRNLIB_USE_WIN32_ATOMIC_FUNCTIONS 0
+
+#define CRNLIB_RESTRICT
+#define CRNLIB_FORCE_INLINE inline
+
+#define CRNLIB_INT64_FORMAT_SPECIFIER "%I64i"
+#define CRNLIB_UINT64_FORMAT_SPECIFIER "%I64u"
+
+#define CRNLIB_STDCALL
+#define CRNLIB_MEMORY_IMPORT_BARRIER
+#define CRNLIB_MEMORY_EXPORT_BARRIER
+#endif
+
+#define CRNLIB_SLOW_STRING_LEN_CHECKS 1
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <limits.h>
+#include <math.h>
+#include <stdarg.h>
+#include <string.h>
+#include <algorithm>
+#include <locale>
+#include <memory.h>
+#include <limits.h>
+#include <algorithm>
+#include <errno.h>
+
+#ifdef min
+#undef min
+#endif
+
+#ifdef max
+#undef max
+#endif
+
+#define CRNLIB_FALSE (0)
+#define CRNLIB_TRUE (1)
+#define CRNLIB_MAX_PATH (260)
+
+#ifdef _DEBUG
+#define CRNLIB_BUILD_DEBUG
+#else
+#define CRNLIB_BUILD_RELEASE
+
+#ifndef NDEBUG
+#define NDEBUG
+#endif
+
+#ifdef DEBUG
+#error DEBUG cannot be defined in CRNLIB_BUILD_RELEASE
+#endif
+#endif
+
+#include "crn_types.h"
+#include "crn_assert.h"
+#include "crn_platform.h"
+#include "crn_helpers.h"
+#include "crn_traits.h"
+#include "crn_mem.h"
+#include "crn_math.h"
+#include "crn_utils.h"
+#include "crn_hash.h"
+#include "crn_vector.h"
+#include "crn_timer.h"
+#include "crn_dynamic_string.h"
@@ -0,0 +1,114 @@
+// File: crn_data_stream.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_data_stream.h"
+
+namespace crnlib {
+data_stream::data_stream()
+    : m_attribs(0),
+      m_opened(false),
+      m_error(false),
+      m_got_cr(false) {
+}
+
+data_stream::data_stream(const char* pName, uint attribs)
+    : m_name(pName),
+      m_attribs(static_cast<uint16>(attribs)),
+      m_opened(false),
+      m_error(false),
+      m_got_cr(false) {
+}
+
+uint64 data_stream::skip(uint64 len) {
+  uint64 total_bytes_read = 0;
+
+  const uint cBufSize = 1024;
+  uint8 buf[cBufSize];
+
+  while (len) {
+    const uint64 bytes_to_read = math::minimum<uint64>(sizeof(buf), len);
+    const uint64 bytes_read = read(buf, static_cast<uint>(bytes_to_read));
+    total_bytes_read += bytes_read;
+
+    if (bytes_read != bytes_to_read)
+      break;
+
+    len -= bytes_read;
+  }
+
+  return total_bytes_read;
+}
+
+bool data_stream::read_line(dynamic_string& str) {
+  str.empty();
+
+  for (;;) {
+    const int c = read_byte();
+
+    const bool prev_got_cr = m_got_cr;
+    m_got_cr = false;
+
+    if (c < 0) {
+      if (!str.is_empty())
+        break;
+
+      return false;
+    } else if ((26 == c) || (!c))
+      continue;
+    else if (13 == c) {
+      m_got_cr = true;
+      break;
+    } else if (10 == c) {
+      if (prev_got_cr)
+        continue;
+
+      break;
+    }
+
+    str.append_char(static_cast<char>(c));
+  }
+
+  return true;
+}
+
+bool data_stream::printf(const char* p, ...) {
+  va_list args;
+
+  va_start(args, p);
+  dynamic_string buf;
+  buf.format_args(p, args);
+  va_end(args);
+
+  return write(buf.get_ptr(), buf.get_len() * sizeof(char)) == buf.get_len() * sizeof(char);
+}
+
+bool data_stream::write_line(const dynamic_string& str) {
+  if (!str.is_empty())
+    return write(str.get_ptr(), str.get_len()) == str.get_len();
+
+  return true;
+}
+
+bool data_stream::read_array(vector<uint8>& buf) {
+  if (buf.size() < get_remaining()) {
+    if (get_remaining() > 1024U * 1024U * 1024U)
+      return false;
+
+    buf.resize((uint)get_remaining());
+  }
+
+  if (!get_remaining()) {
+    buf.resize(0);
+    return true;
+  }
+
+  return read(&buf[0], buf.size()) == buf.size();
+}
+
+bool data_stream::write_array(const vector<uint8>& buf) {
+  if (!buf.empty())
+    return write(&buf[0], buf.size()) == buf.size();
+  return true;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,98 @@
+// File: crn_data_stream.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+enum data_stream_attribs {
+  cDataStreamReadable = 1,
+  cDataStreamWritable = 2,
+  cDataStreamSeekable = 4
+};
+
+const int64 DATA_STREAM_SIZE_UNKNOWN = cINT64_MAX;
+const int64 DATA_STREAM_SIZE_INFINITE = cUINT64_MAX;
+
+class data_stream {
+  data_stream(const data_stream&);
+  data_stream& operator=(const data_stream&);
+
+ public:
+  data_stream();
+  data_stream(const char* pName, uint attribs);
+
+  virtual ~data_stream() {}
+
+  virtual data_stream* get_parent() { return NULL; }
+
+  virtual bool close() {
+    m_opened = false;
+    m_error = false;
+    m_got_cr = false;
+    return true;
+  }
+
+  typedef uint16 attribs_t;
+  inline attribs_t get_attribs() const { return m_attribs; }
+
+  inline bool is_opened() const { return m_opened; }
+
+  inline bool is_readable() const { return utils::is_bit_set(m_attribs, cDataStreamReadable); }
+  inline bool is_writable() const { return utils::is_bit_set(m_attribs, cDataStreamWritable); }
+  inline bool is_seekable() const { return utils::is_bit_set(m_attribs, cDataStreamSeekable); }
+
+  inline bool get_error() const { return m_error; }
+
+  inline const dynamic_string& get_name() const { return m_name; }
+  inline void set_name(const char* pName) { m_name.set(pName); }
+
+  virtual uint read(void* pBuf, uint len) = 0;
+  virtual uint64 skip(uint64 len);
+
+  virtual uint write(const void* pBuf, uint len) = 0;
+  virtual bool flush() = 0;
+
+  virtual bool is_size_known() const { return true; }
+
+  // Returns DATA_STREAM_SIZE_UNKNOWN if size hasn't been determined yet, or DATA_STREAM_SIZE_INFINITE for infinite streams.
+  virtual uint64 get_size() = 0;
+  virtual uint64 get_remaining() = 0;
+
+  virtual uint64 get_ofs() = 0;
+  virtual bool seek(int64 ofs, bool relative) = 0;
+
+  virtual const void* get_ptr() const { return NULL; }
+
+  inline int read_byte() {
+    uint8 c;
+    if (read(&c, 1) != 1)
+      return -1;
+    return c;
+  }
+  inline bool write_byte(uint8 c) { return write(&c, 1) == 1; }
+
+  bool read_line(dynamic_string& str);
+  bool printf(const char* p, ...);
+  bool write_line(const dynamic_string& str);
+  bool write_bom() {
+    uint16 bom = 0xFEFF;
+    return write(&bom, sizeof(bom)) == sizeof(bom);
+  }
+
+  bool read_array(vector<uint8>& buf);
+  bool write_array(const vector<uint8>& buf);
+
+ protected:
+  dynamic_string m_name;
+
+  attribs_t m_attribs;
+  bool m_opened : 1;
+  bool m_error : 1;
+  bool m_got_cr : 1;
+
+  inline void set_error() { m_error = true; }
+  inline void clear_error() { m_error = false; }
+
+  inline void post_seek() { m_got_cr = false; }
+};
+
+}  // namespace crnlib
@@ -0,0 +1,495 @@
+// File: data_stream_serializer.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_data_stream.h"
+
+namespace crnlib {
+// Defaults to little endian mode.
+class data_stream_serializer {
+ public:
+  data_stream_serializer()
+      : m_pStream(NULL), m_little_endian(true) {}
+  data_stream_serializer(data_stream* pStream)
+      : m_pStream(pStream), m_little_endian(true) {}
+  data_stream_serializer(data_stream& stream)
+      : m_pStream(&stream), m_little_endian(true) {}
+  data_stream_serializer(const data_stream_serializer& other)
+      : m_pStream(other.m_pStream), m_little_endian(other.m_little_endian) {}
+
+  data_stream_serializer& operator=(const data_stream_serializer& rhs) {
+    m_pStream = rhs.m_pStream;
+    m_little_endian = rhs.m_little_endian;
+    return *this;
+  }
+
+  data_stream* get_stream() const { return m_pStream; }
+  void set_stream(data_stream* pStream) { m_pStream = pStream; }
+
+  const dynamic_string& get_name() const { return m_pStream ? m_pStream->get_name() : g_empty_dynamic_string; }
+
+  bool get_error() { return m_pStream ? m_pStream->get_error() : false; }
+
+  bool get_little_endian() const { return m_little_endian; }
+  void set_little_endian(bool little_endian) { m_little_endian = little_endian; }
+
+  bool write(const void* pBuf, uint len) {
+    return m_pStream->write(pBuf, len) == len;
+  }
+
+  bool read(void* pBuf, uint len) {
+    return m_pStream->read(pBuf, len) == len;
+  }
+
+  // size = size of each element, count = number of elements, returns actual count of elements written
+  uint write(const void* pBuf, uint size, uint count) {
+    uint actual_size = size * count;
+    if (!actual_size)
+      return 0;
+    uint n = m_pStream->write(pBuf, actual_size);
+    if (n == actual_size)
+      return count;
+    return n / size;
+  }
+
+  // size = size of each element, count = number of elements, returns actual count of elements read
+  uint read(void* pBuf, uint size, uint count) {
+    uint actual_size = size * count;
+    if (!actual_size)
+      return 0;
+    uint n = m_pStream->read(pBuf, actual_size);
+    if (n == actual_size)
+      return count;
+    return n / size;
+  }
+
+  bool write_chars(const char* pBuf, uint len) {
+    return write(pBuf, len);
+  }
+
+  bool read_chars(char* pBuf, uint len) {
+    return read(pBuf, len);
+  }
+
+  bool skip(uint len) {
+    return m_pStream->skip(len) == len;
+  }
+
+  template <typename T>
+  bool write_object(const T& obj) {
+    if (m_little_endian == c_crnlib_little_endian_platform)
+      return write(&obj, sizeof(obj));
+    else {
+      uint8 buf[sizeof(T)];
+      uint buf_size = sizeof(T);
+      void* pBuf = buf;
+      utils::write_obj(obj, pBuf, buf_size, m_little_endian);
+
+      return write(buf, sizeof(T));
+    }
+  }
+
+  template <typename T>
+  bool read_object(T& obj) {
+    if (m_little_endian == c_crnlib_little_endian_platform)
+      return read(&obj, sizeof(obj));
+    else {
+      uint8 buf[sizeof(T)];
+      if (!read(buf, sizeof(T)))
+        return false;
+
+      uint buf_size = sizeof(T);
+      const void* pBuf = buf;
+      utils::read_obj(obj, pBuf, buf_size, m_little_endian);
+
+      return true;
+    }
+  }
+
+  template <typename T>
+  bool write_value(T value) {
+    return write_object(value);
+  }
+
+  template <typename T>
+  T read_value(const T& on_error_value = T()) {
+    T result;
+    if (!read_object(result))
+      result = on_error_value;
+    return result;
+  }
+
+  template <typename T>
+  bool write_enum(T e) {
+    int val = static_cast<int>(e);
+    return write_object(val);
+  }
+
+  template <typename T>
+  T read_enum() {
+    return static_cast<T>(read_value<int>());
+  }
+
+  // Writes uint using a simple variable length code (VLC).
+  bool write_uint_vlc(uint val) {
+    do {
+      uint8 c = static_cast<uint8>(val) & 0x7F;
+      if (val <= 0x7F)
+        c |= 0x80;
+
+      if (!write_value(c))
+        return false;
+
+      val >>= 7;
+    } while (val);
+
+    return true;
+  }
+
+  // Reads uint using a simple variable length code (VLC).
+  bool read_uint_vlc(uint& val) {
+    val = 0;
+    uint shift = 0;
+
+    for (;;) {
+      if (shift >= 32)
+        return false;
+
+      uint8 c;
+      if (!read_object(c))
+        return false;
+
+      val |= ((c & 0x7F) << shift);
+      shift += 7;
+
+      if (c & 0x80)
+        break;
+    }
+
+    return true;
+  }
+
+  bool write_c_str(const char* p) {
+    uint len = static_cast<uint>(strlen(p));
+    if (!write_uint_vlc(len))
+      return false;
+
+    return write_chars(p, len);
+  }
+
+  bool read_c_str(char* pBuf, uint buf_size) {
+    uint len;
+    if (!read_uint_vlc(len))
+      return false;
+    if ((len + 1) > buf_size)
+      return false;
+
+    pBuf[len] = '\0';
+
+    return read_chars(pBuf, len);
+  }
+
+  bool write_string(const dynamic_string& str) {
+    if (!write_uint_vlc(str.get_len()))
+      return false;
+
+    return write_chars(str.get_ptr(), str.get_len());
+  }
+
+  bool read_string(dynamic_string& str) {
+    uint len;
+    if (!read_uint_vlc(len))
+      return false;
+
+    if (!str.set_len(len))
+      return false;
+
+    if (len) {
+      if (!read_chars(str.get_ptr_raw(), len))
+        return false;
+
+      if (memchr(str.get_ptr(), 0, len) != NULL) {
+        str.truncate(0);
+        return false;
+      }
+    }
+
+    return true;
+  }
+
+  template <typename T>
+  bool write_vector(const T& vec) {
+    if (!write_uint_vlc(vec.size()))
+      return false;
+
+    for (uint i = 0; i < vec.size(); i++) {
+      *this << vec[i];
+      if (get_error())
+        return false;
+    }
+
+    return true;
+  };
+
+  template <typename T>
+  bool read_vector(T& vec, uint num_expected = UINT_MAX) {
+    uint size;
+    if (!read_uint_vlc(size))
+      return false;
+
+    if ((size * sizeof(T::value_type)) >= 2U * 1024U * 1024U * 1024U)
+      return false;
+
+    if ((num_expected != UINT_MAX) && (size != num_expected))
+      return false;
+
+    vec.resize(size);
+    for (uint i = 0; i < vec.size(); i++) {
+      *this >> vec[i];
+
+      if (get_error())
+        return false;
+    }
+
+    return true;
+  }
+
+  bool read_entire_file(crnlib::vector<uint8>& buf) {
+    return m_pStream->read_array(buf);
+  }
+
+  bool write_entire_file(const crnlib::vector<uint8>& buf) {
+    return m_pStream->write_array(buf);
+  }
+
+  // Got this idea from the Molly Rocket forums.
+  // fmt may contain the characters "1", "2", or "4".
+  bool writef(char* fmt, ...) {
+    va_list v;
+    va_start(v, fmt);
+
+    while (*fmt) {
+      switch (*fmt++) {
+        case '1': {
+          const uint8 x = static_cast<uint8>(va_arg(v, uint));
+          if (!write_value(x))
+            return false;
+        }
+        case '2': {
+          const uint16 x = static_cast<uint16>(va_arg(v, uint));
+          if (!write_value(x))
+            return false;
+        }
+        case '4': {
+          const uint32 x = static_cast<uint32>(va_arg(v, uint));
+          if (!write_value(x))
+            return false;
+        }
+        case ' ':
+        case ',': {
+          break;
+        }
+        default: {
+          CRNLIB_ASSERT(0);
+          return false;
+        }
+      }
+    }
+
+    va_end(v);
+    return true;
+  }
+
+  // Got this idea from the Molly Rocket forums.
+  // fmt may contain the characters "1", "2", or "4".
+  bool readf(char* fmt, ...) {
+    va_list v;
+    va_start(v, fmt);
+
+    while (*fmt) {
+      switch (*fmt++) {
+        case '1': {
+          uint8* x = va_arg(v, uint8*);
+          CRNLIB_ASSERT(x);
+          if (!read_object(*x))
+            return false;
+        }
+        case '2': {
+          uint16* x = va_arg(v, uint16*);
+          CRNLIB_ASSERT(x);
+          if (!read_object(*x))
+            return false;
+        }
+        case '4': {
+          uint32* x = va_arg(v, uint32*);
+          CRNLIB_ASSERT(x);
+          if (!read_object(*x))
+            return false;
+        }
+        case ' ':
+        case ',': {
+          break;
+        }
+        default: {
+          CRNLIB_ASSERT(0);
+          return false;
+        }
+      }
+    }
+
+    va_end(v);
+    return true;
+  }
+
+ private:
+  data_stream* m_pStream;
+
+  bool m_little_endian;
+};
+
+// Write operators
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, bool val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, int8 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, uint8 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, int16 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, uint16 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, int32 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, uint32 val) {
+  serializer.write_uint_vlc(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, int64 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, uint64 val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, long val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, unsigned long val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, float val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, double val) {
+  serializer.write_value(val);
+  return serializer;
+}
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, const char* p) {
+  serializer.write_c_str(p);
+  return serializer;
+}
+
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, const dynamic_string& str) {
+  serializer.write_string(str);
+  return serializer;
+}
+
+template <typename T>
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, const crnlib::vector<T>& vec) {
+  serializer.write_vector(vec);
+  return serializer;
+}
+
+template <typename T>
+inline data_stream_serializer& operator<<(data_stream_serializer& serializer, const T* p) {
+  serializer.write_object(*p);
+  return serializer;
+}
+
+// Read operators
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, bool& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, int8& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, uint8& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, int16& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, uint16& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, int32& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, uint32& val) {
+  serializer.read_uint_vlc(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, int64& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, uint64& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, long& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, unsigned long& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, float& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, double& val) {
+  serializer.read_object(val);
+  return serializer;
+}
+
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, dynamic_string& str) {
+  serializer.read_string(str);
+  return serializer;
+}
+
+template <typename T>
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, crnlib::vector<T>& vec) {
+  serializer.read_vector(vec);
+  return serializer;
+}
+
+template <typename T>
+inline data_stream_serializer& operator>>(data_stream_serializer& serializer, T* p) {
+  serializer.read_object(*p);
+  return serializer;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,227 @@
+// File: crn_dds_comp.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_dds_comp.h"
+#include "crn_dynamic_stream.h"
+#include "crn_lzma_codec.h"
+
+namespace crnlib {
+dds_comp::dds_comp()
+    : m_pParams(NULL),
+      m_pixel_fmt(PIXEL_FMT_INVALID),
+      m_pQDXT_state(NULL) {
+}
+
+dds_comp::~dds_comp() {
+  crnlib_delete(m_pQDXT_state);
+}
+
+void dds_comp::clear() {
+  m_src_tex.clear();
+  m_packed_tex.clear();
+  m_comp_data.clear();
+  m_pParams = NULL;
+  m_pixel_fmt = PIXEL_FMT_INVALID;
+  m_task_pool.deinit();
+  if (m_pQDXT_state) {
+    crnlib_delete(m_pQDXT_state);
+    m_pQDXT_state = NULL;
+  }
+}
+
+bool dds_comp::create_dds_tex(mipmapped_texture& dds_tex) {
+  image_u8 images[cCRNMaxFaces][cCRNMaxLevels];
+
+  bool has_alpha = false;
+  for (uint face_index = 0; face_index < m_pParams->m_faces; face_index++) {
+    for (uint level_index = 0; level_index < m_pParams->m_levels; level_index++) {
+      const uint width = math::maximum(1U, m_pParams->m_width >> level_index);
+      const uint height = math::maximum(1U, m_pParams->m_height >> level_index);
+
+      if (!m_pParams->m_pImages[face_index][level_index])
+        return false;
+
+      images[face_index][level_index].alias((color_quad_u8*)m_pParams->m_pImages[face_index][level_index], width, height);
+      if (!has_alpha)
+        has_alpha = image_utils::has_alpha(images[face_index][level_index]);
+    }
+  }
+
+  for (uint face_index = 0; face_index < m_pParams->m_faces; face_index++)
+    for (uint level_index = 0; level_index < m_pParams->m_levels; level_index++)
+      images[face_index][level_index].set_component_valid(3, has_alpha);
+
+  image_utils::conversion_type conv_type = image_utils::get_image_conversion_type_from_crn_format((crn_format)m_pParams->m_format);
+  if (conv_type != image_utils::cConversion_Invalid) {
+    for (uint face_index = 0; face_index < m_pParams->m_faces; face_index++) {
+      for (uint level_index = 0; level_index < m_pParams->m_levels; level_index++) {
+        image_u8 cooked_image(images[face_index][level_index]);
+
+        image_utils::convert_image(cooked_image, conv_type);
+
+        images[face_index][level_index].swap(cooked_image);
+      }
+    }
+  }
+
+  face_vec faces(m_pParams->m_faces);
+
+  for (uint face_index = 0; face_index < m_pParams->m_faces; face_index++) {
+    for (uint level_index = 0; level_index < m_pParams->m_levels; level_index++) {
+      mip_level* pMip = crnlib_new<mip_level>();
+
+      image_u8* pImage = crnlib_new<image_u8>();
+      pImage->swap(images[face_index][level_index]);
+      pMip->assign(pImage);
+
+      faces[face_index].push_back(pMip);
+    }
+  }
+
+  dds_tex.assign(faces);
+#ifdef CRNLIB_BUILD_DEBUG
+  CRNLIB_ASSERT(dds_tex.check());
+#endif
+
+  return true;
+}
+
+static bool progress_callback_func(uint percentage_complete, void* pUser_data_ptr) {
+  const crn_comp_params& params = *(const crn_comp_params*)pUser_data_ptr;
+  return params.m_pProgress_func(0, 1, percentage_complete, 100, params.m_pProgress_func_data) != 0;
+}
+
+static bool progress_callback_func_phase_0(uint percentage_complete, void* pUser_data_ptr) {
+  const crn_comp_params& params = *(const crn_comp_params*)pUser_data_ptr;
+  return params.m_pProgress_func(0, 2, percentage_complete, 100, params.m_pProgress_func_data) != 0;
+}
+
+static bool progress_callback_func_phase_1(uint percentage_complete, void* pUser_data_ptr) {
+  const crn_comp_params& params = *(const crn_comp_params*)pUser_data_ptr;
+  return params.m_pProgress_func(1, 2, percentage_complete, 100, params.m_pProgress_func_data) != 0;
+}
+
+bool dds_comp::convert_to_dxt(const crn_comp_params& params) {
+  if ((params.m_quality_level == cCRNMaxQualityLevel) || (params.m_format == cCRNFmtDXT3)) {
+    m_packed_tex = m_src_tex;
+    if (!m_packed_tex.convert(m_pixel_fmt, false, m_pack_params))
+      return false;
+  } else {
+    const bool hierarchical = (params.m_flags & cCRNCompFlagHierarchical) != 0;
+
+    m_q1_params.m_quality_level = params.m_quality_level;
+    m_q1_params.m_hierarchical = hierarchical;
+
+    m_q5_params.m_quality_level = params.m_quality_level;
+    m_q5_params.m_hierarchical = hierarchical;
+
+    if (!m_pQDXT_state) {
+      m_pQDXT_state = crnlib_new<mipmapped_texture::qdxt_state>(m_task_pool);
+
+      if (params.m_pProgress_func) {
+        m_q1_params.m_pProgress_func = progress_callback_func_phase_0;
+        m_q1_params.m_pProgress_data = (void*)&params;
+        m_q5_params.m_pProgress_func = progress_callback_func_phase_0;
+        m_q5_params.m_pProgress_data = (void*)&params;
+      }
+
+      if (!m_src_tex.qdxt_pack_init(*m_pQDXT_state, m_packed_tex, m_q1_params, m_q5_params, m_pixel_fmt, false))
+        return false;
+
+      if (params.m_pProgress_func) {
+        m_q1_params.m_pProgress_func = progress_callback_func_phase_1;
+        m_q5_params.m_pProgress_func = progress_callback_func_phase_1;
+      }
+    } else {
+      if (params.m_pProgress_func) {
+        m_q1_params.m_pProgress_func = progress_callback_func;
+        m_q1_params.m_pProgress_data = (void*)&params;
+        m_q5_params.m_pProgress_func = progress_callback_func;
+        m_q5_params.m_pProgress_data = (void*)&params;
+      }
+    }
+
+    if (!m_src_tex.qdxt_pack(*m_pQDXT_state, m_packed_tex, m_q1_params, m_q5_params))
+      return false;
+  }
+
+  return true;
+}
+
+bool dds_comp::compress_init(const crn_comp_params& params) {
+  clear();
+
+  m_pParams = &params;
+
+  if ((math::minimum(m_pParams->m_width, m_pParams->m_height) < 1) || (math::maximum(m_pParams->m_width, m_pParams->m_height) > cCRNMaxLevelResolution))
+    return false;
+
+  if (math::minimum(m_pParams->m_faces, m_pParams->m_levels) < 1)
+    return false;
+
+  if (!create_dds_tex(m_src_tex))
+    return false;
+
+  m_pack_params.init(*m_pParams);
+  if (params.m_pProgress_func) {
+    m_pack_params.m_pProgress_callback = progress_callback_func;
+    m_pack_params.m_pProgress_callback_user_data_ptr = (void*)&params;
+  }
+
+  m_pixel_fmt = pixel_format_helpers::convert_crn_format_to_pixel_format(static_cast<crn_format>(m_pParams->m_format));
+  if (m_pixel_fmt == PIXEL_FMT_INVALID)
+    return false;
+  if ((m_pixel_fmt == PIXEL_FMT_DXT1) && (m_src_tex.has_alpha()) && (m_pack_params.m_use_both_block_types) && (m_pParams->m_flags & cCRNCompFlagDXT1AForTransparency))
+    m_pixel_fmt = PIXEL_FMT_DXT1A;
+
+  if (!m_task_pool.init(m_pParams->m_num_helper_threads))
+    return false;
+  m_pack_params.m_pTask_pool = &m_task_pool;
+
+  const bool hierarchical = (params.m_flags & cCRNCompFlagHierarchical) != 0;
+  m_q1_params.init(m_pack_params, params.m_quality_level, hierarchical);
+  m_q5_params.init(m_pack_params, params.m_quality_level, hierarchical);
+
+  return true;
+}
+
+bool dds_comp::compress_pass(const crn_comp_params& params, float* pEffective_bitrate) {
+  if (pEffective_bitrate)
+    *pEffective_bitrate = 0.0f;
+
+  if (!m_pParams)
+    return false;
+
+  if (!convert_to_dxt(params))
+    return false;
+
+  dynamic_stream out_stream;
+  out_stream.reserve(512 * 1024);
+  data_stream_serializer serializer(out_stream);
+
+  if (!m_packed_tex.write_dds(serializer))
+    return false;
+  out_stream.reserve(0);
+
+  m_comp_data.swap(out_stream.get_buf());
+
+  if (pEffective_bitrate) {
+    lzma_codec lossless_codec;
+
+    crnlib::vector<uint8> cmp_tex_bytes;
+    if (lossless_codec.pack(m_comp_data.get_ptr(), m_comp_data.size(), cmp_tex_bytes)) {
+      uint comp_size = cmp_tex_bytes.size();
+      if (comp_size) {
+        *pEffective_bitrate = (comp_size * 8.0f) / m_src_tex.get_total_pixels_in_all_faces_and_mips();
+      }
+    }
+  }
+
+  return true;
+}
+
+void dds_comp::compress_deinit() {
+  clear();
+}
+
+}  // namespace crnlib
@@ -0,0 +1,46 @@
+// File: crn_comp.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_comp.h"
+#include "crn_mipmapped_texture.h"
+#include "crn_texture_comp.h"
+
+namespace crnlib {
+class dds_comp : public itexture_comp {
+  CRNLIB_NO_COPY_OR_ASSIGNMENT_OP(dds_comp);
+
+ public:
+  dds_comp();
+  virtual ~dds_comp();
+
+  virtual const char* get_ext() const { return "DDS"; }
+
+  virtual bool compress_init(const crn_comp_params& params);
+  virtual bool compress_pass(const crn_comp_params& params, float* pEffective_bitrate);
+  virtual void compress_deinit();
+
+  virtual const crnlib::vector<uint8>& get_comp_data() const { return m_comp_data; }
+  virtual crnlib::vector<uint8>& get_comp_data() { return m_comp_data; }
+
+ private:
+  mipmapped_texture m_src_tex;
+  mipmapped_texture m_packed_tex;
+
+  crnlib::vector<uint8> m_comp_data;
+
+  const crn_comp_params* m_pParams;
+
+  pixel_format m_pixel_fmt;
+  dxt_image::pack_params m_pack_params;
+
+  task_pool m_task_pool;
+  qdxt1_params m_q1_params;
+  qdxt5_params m_q5_params;
+  mipmapped_texture::qdxt_state* m_pQDXT_state;
+
+  void clear();
+  bool create_dds_tex(mipmapped_texture& dds_tex);
+  bool convert_to_dxt(const crn_comp_params& params);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,6 @@
+// File: crn_decomp.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+
+// Include the single-file header library with no defines, which brings in the full CRN decompressor.
+#include "../inc/crn_decomp.h"
@@ -0,0 +1,377 @@
+// File: crn_dxt.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_dxt.h"
+#include "crn_dxt1.h"
+#include "crn_ryg_dxt.hpp"
+#include "crn_dxt_fast.h"
+#include "crn_intersect.h"
+
+namespace crnlib {
+const uint8 g_dxt5_from_linear[cDXT5SelectorValues] = {0U, 2U, 3U, 4U, 5U, 6U, 7U, 1U};
+const uint8 g_dxt5_to_linear[cDXT5SelectorValues] = {0U, 7U, 1U, 2U, 3U, 4U, 5U, 6U};
+
+const uint8 g_dxt5_alpha6_to_linear[cDXT5SelectorValues] = {0U, 5U, 1U, 2U, 3U, 4U, 0U, 0U};
+
+const uint8 g_dxt1_from_linear[cDXT1SelectorValues] = {0U, 2U, 3U, 1U};
+const uint8 g_dxt1_to_linear[cDXT1SelectorValues] = {0U, 3U, 1U, 2U};
+
+const uint8 g_six_alpha_invert_table[cDXT5SelectorValues] = {1, 0, 5, 4, 3, 2, 6, 7};
+const uint8 g_eight_alpha_invert_table[cDXT5SelectorValues] = {1, 0, 7, 6, 5, 4, 3, 2};
+
+const char* get_dxt_format_string(dxt_format fmt) {
+  switch (fmt) {
+    case cDXT1:
+      return "DXT1";
+    case cDXT1A:
+      return "DXT1A";
+    case cDXT3:
+      return "DXT3";
+    case cDXT5:
+      return "DXT5";
+    case cDXT5A:
+      return "DXT5A";
+    case cDXN_XY:
+      return "DXN_XY";
+    case cDXN_YX:
+      return "DXN_YX";
+    case cETC1:
+      return "ETC1";
+    case cETC2:
+      return "ETC2";
+    case cETC2A:
+      return "ETC2A";
+    case cETC1S:
+      return "ETC1S";
+    case cETC2AS:
+      return "ETC2AS";
+    default:
+      break;
+  }
+  CRNLIB_ASSERT(false);
+  return "?";
+}
+
+const char* get_dxt_compressor_name(crn_dxt_compressor_type c) {
+  switch (c) {
+    case cCRNDXTCompressorCRN:
+      return "CRN";
+    case cCRNDXTCompressorCRNF:
+      return "CRNF";
+    case cCRNDXTCompressorRYG:
+      return "RYG";
+#if CRNLIB_SUPPORT_ATI_COMPRESS
+    case cCRNDXTCompressorATI:
+      return "ATI";
+#endif
+    default:
+      break;
+  }
+  CRNLIB_ASSERT(false);
+  return "?";
+}
+
+uint get_dxt_format_bits_per_pixel(dxt_format fmt) {
+  switch (fmt) {
+    case cDXT1:
+    case cDXT1A:
+    case cDXT5A:
+    case cETC1:
+    case cETC2:
+    case cETC1S:
+      return 4;
+    case cDXT3:
+    case cDXT5:
+    case cDXN_XY:
+    case cDXN_YX:
+    case cETC2A:
+    case cETC2AS:
+      return 8;
+    default:
+      break;
+  }
+  CRNLIB_ASSERT(false);
+  return 0;
+}
+
+bool get_dxt_format_has_alpha(dxt_format fmt) {
+  switch (fmt) {
+    case cDXT1A:
+    case cDXT3:
+    case cDXT5:
+    case cDXT5A:
+    case cETC2A:
+    case cETC2AS:
+      return true;
+    default:
+      break;
+  }
+  return false;
+}
+
+uint16 dxt1_block::pack_color(const color_quad_u8& color, bool scaled, uint bias) {
+  uint r = color.r;
+  uint g = color.g;
+  uint b = color.b;
+
+  if (scaled) {
+    r = (r * 31U + bias) / 255U;
+    g = (g * 63U + bias) / 255U;
+    b = (b * 31U + bias) / 255U;
+  }
+
+  r = math::minimum(r, 31U);
+  g = math::minimum(g, 63U);
+  b = math::minimum(b, 31U);
+
+  return static_cast<uint16>(b | (g << 5U) | (r << 11U));
+}
+
+uint16 dxt1_block::pack_color(uint r, uint g, uint b, bool scaled, uint bias) {
+  return pack_color(color_quad_u8(r, g, b, 0), scaled, bias);
+}
+
+color_quad_u8 dxt1_block::unpack_color(uint16 packed_color, bool scaled, uint alpha) {
+  uint b = packed_color & 31U;
+  uint g = (packed_color >> 5U) & 63U;
+  uint r = (packed_color >> 11U) & 31U;
+
+  if (scaled) {
+    b = (b << 3U) | (b >> 2U);
+    g = (g << 2U) | (g >> 4U);
+    r = (r << 3U) | (r >> 2U);
+  }
+
+  return color_quad_u8(cNoClamp, r, g, b, math::minimum(alpha, 255U));
+}
+
+void dxt1_block::unpack_color(uint& r, uint& g, uint& b, uint16 packed_color, bool scaled) {
+  color_quad_u8 c(unpack_color(packed_color, scaled, 0));
+  r = c.r;
+  g = c.g;
+  b = c.b;
+}
+
+void dxt1_block::get_block_colors_NV5x(color_quad_u8* pDst, uint16 packed_col0, uint16 packed_col1, bool color4) {
+  color_quad_u8 col0(unpack_color(packed_col0, false));
+  color_quad_u8 col1(unpack_color(packed_col1, false));
+
+  pDst[0].r = (3 * col0.r * 22) / 8;
+  pDst[0].b = (3 * col0.b * 22) / 8;
+  pDst[0].g = (col0.g << 2) | (col0.g >> 4);
+  pDst[0].a = 0xFF;
+
+  pDst[1].r = (3 * col1.r * 22) / 8;
+  pDst[1].g = (col1.g << 2) | (col1.g >> 4);
+  pDst[1].b = (3 * col1.b * 22) / 8;
+  pDst[1].a = 0xFF;
+
+  int gdiff = pDst[1].g - pDst[0].g;
+
+  if (color4)  //(packed_col0 > packed_col1)
+  {
+    pDst[2].r = static_cast<uint8>(((2 * col0.r + col1.r) * 22) / 8);
+    pDst[2].g = static_cast<uint8>((256 * pDst[0].g + gdiff / 4 + 128 + gdiff * 80) / 256);
+    pDst[2].b = static_cast<uint8>(((2 * col0.b + col1.b) * 22) / 8);
+    pDst[2].a = 0xFF;
+
+    pDst[3].r = static_cast<uint8>(((2 * col1.r + col0.r) * 22) / 8);
+    pDst[3].g = static_cast<uint8>((256 * pDst[1].g - gdiff / 4 + 128 - gdiff * 80) / 256);
+    pDst[3].b = static_cast<uint8>(((2 * col1.b + col0.b) * 22) / 8);
+    pDst[3].a = 0xFF;
+  } else {
+    pDst[2].r = static_cast<uint8>(((col0.r + col1.r) * 33) / 8);
+    pDst[2].g = static_cast<uint8>((256 * pDst[0].g + gdiff / 4 + 128 + gdiff * 128) / 256);
+    pDst[2].b = static_cast<uint8>(((col0.b + col1.b) * 33) / 8);
+    pDst[2].a = 0xFF;
+
+    pDst[3].r = 0x00;
+    pDst[3].g = 0x00;
+    pDst[3].b = 0x00;
+    pDst[3].a = 0x00;
+  }
+}
+
+uint dxt1_block::get_block_colors3(color_quad_u8* pDst, uint16 color0, uint16 color1) {
+  color_quad_u8 c0(unpack_color(color0, true));
+  color_quad_u8 c1(unpack_color(color1, true));
+
+  pDst[0] = c0;
+  pDst[1] = c1;
+  pDst[2].set_noclamp_rgba((c0.r + c1.r) >> 1U, (c0.g + c1.g) >> 1U, (c0.b + c1.b) >> 1U, 255U);
+  pDst[3].set_noclamp_rgba(0, 0, 0, 0);
+
+  return 3;
+}
+
+uint dxt1_block::get_block_colors4(color_quad_u8* pDst, uint16 color0, uint16 color1) {
+  color_quad_u8 c0(unpack_color(color0, true));
+  color_quad_u8 c1(unpack_color(color1, true));
+
+  pDst[0] = c0;
+  pDst[1] = c1;
+
+  // The compiler changes the div3 into a mul by recip+shift.
+  pDst[2].set_noclamp_rgba((c0.r * 2 + c1.r) / 3, (c0.g * 2 + c1.g) / 3, (c0.b * 2 + c1.b) / 3, 255U);
+  pDst[3].set_noclamp_rgba((c1.r * 2 + c0.r) / 3, (c1.g * 2 + c0.g) / 3, (c1.b * 2 + c0.b) / 3, 255U);
+
+  return 4;
+}
+
+uint dxt1_block::get_block_colors3_round(color_quad_u8* pDst, uint16 color0, uint16 color1) {
+  color_quad_u8 c0(unpack_color(color0, true));
+  color_quad_u8 c1(unpack_color(color1, true));
+
+  pDst[0] = c0;
+  pDst[1] = c1;
+  pDst[2].set_noclamp_rgba((c0.r + c1.r + 1) >> 1U, (c0.g + c1.g + 1) >> 1U, (c0.b + c1.b + 1) >> 1U, 255U);
+  pDst[3].set_noclamp_rgba(0, 0, 0, 0);
+
+  return 3;
+}
+
+uint dxt1_block::get_block_colors4_round(color_quad_u8* pDst, uint16 color0, uint16 color1) {
+  color_quad_u8 c0(unpack_color(color0, true));
+  color_quad_u8 c1(unpack_color(color1, true));
+
+  pDst[0] = c0;
+  pDst[1] = c1;
+
+  // 12/14/08 - Supposed to round according to DX docs, but this conflicts with the OpenGL S3TC spec. ?
+  // The compiler changes the div3 into a mul by recip+shift.
+  pDst[2].set_noclamp_rgba((c0.r * 2 + c1.r + 1) / 3, (c0.g * 2 + c1.g + 1) / 3, (c0.b * 2 + c1.b + 1) / 3, 255U);
+  pDst[3].set_noclamp_rgba((c1.r * 2 + c0.r + 1) / 3, (c1.g * 2 + c0.g + 1) / 3, (c1.b * 2 + c0.b + 1) / 3, 255U);
+
+  return 4;
+}
+
+uint dxt1_block::get_block_colors(color_quad_u8* pDst, uint16 color0, uint16 color1) {
+  if (color0 > color1)
+    return get_block_colors4(pDst, color0, color1);
+  else
+    return get_block_colors3(pDst, color0, color1);
+}
+
+uint dxt1_block::get_block_colors_round(color_quad_u8* pDst, uint16 color0, uint16 color1) {
+  if (color0 > color1)
+    return get_block_colors4_round(pDst, color0, color1);
+  else
+    return get_block_colors3_round(pDst, color0, color1);
+}
+
+color_quad_u8 dxt1_block::unpack_endpoint(uint32 endpoints, uint index, bool scaled, uint alpha) {
+  CRNLIB_ASSERT(index < 2);
+  return unpack_color(static_cast<uint16>((endpoints >> (index * 16U)) & 0xFFFFU), scaled, alpha);
+}
+
+uint dxt1_block::pack_endpoints(uint lo, uint hi) {
+  CRNLIB_ASSERT((lo <= 0xFFFFU) && (hi <= 0xFFFFU));
+  return lo | (hi << 16U);
+}
+
+void dxt3_block::set_alpha(uint x, uint y, uint value, bool scaled) {
+  CRNLIB_ASSERT((x < cDXTBlockSize) && (y < cDXTBlockSize));
+
+  if (scaled) {
+    CRNLIB_ASSERT(value <= 0xFF);
+    value = (value * 15U + 128U) / 255U;
+  } else {
+    CRNLIB_ASSERT(value <= 0xF);
+  }
+
+  uint ofs = (y << 1U) + (x >> 1U);
+  uint c = m_alpha[ofs];
+
+  c &= ~(0xF << ((x & 1U) << 2U));
+  c |= (value << ((x & 1U) << 2U));
+
+  m_alpha[ofs] = static_cast<uint8>(c);
+}
+
+uint dxt3_block::get_alpha(uint x, uint y, bool scaled) const {
+  CRNLIB_ASSERT((x < cDXTBlockSize) && (y < cDXTBlockSize));
+
+  uint value = m_alpha[(y << 1U) + (x >> 1U)];
+  if (x & 1)
+    value >>= 4;
+  value &= 0xF;
+
+  if (scaled)
+    value = (value << 4U) | value;
+
+  return value;
+}
+
+uint dxt5_block::get_block_values6(color_quad_u8* pDst, uint l, uint h) {
+  pDst[0].a = static_cast<uint8>(l);
+  pDst[1].a = static_cast<uint8>(h);
+  pDst[2].a = static_cast<uint8>((l * 4 + h) / 5);
+  pDst[3].a = static_cast<uint8>((l * 3 + h * 2) / 5);
+  pDst[4].a = static_cast<uint8>((l * 2 + h * 3) / 5);
+  pDst[5].a = static_cast<uint8>((l + h * 4) / 5);
+  pDst[6].a = 0;
+  pDst[7].a = 255;
+  return 6;
+}
+
+uint dxt5_block::get_block_values8(color_quad_u8* pDst, uint l, uint h) {
+  pDst[0].a = static_cast<uint8>(l);
+  pDst[1].a = static_cast<uint8>(h);
+  pDst[2].a = static_cast<uint8>((l * 6 + h) / 7);
+  pDst[3].a = static_cast<uint8>((l * 5 + h * 2) / 7);
+  pDst[4].a = static_cast<uint8>((l * 4 + h * 3) / 7);
+  pDst[5].a = static_cast<uint8>((l * 3 + h * 4) / 7);
+  pDst[6].a = static_cast<uint8>((l * 2 + h * 5) / 7);
+  pDst[7].a = static_cast<uint8>((l + h * 6) / 7);
+  return 8;
+}
+
+uint dxt5_block::get_block_values(color_quad_u8* pDst, uint l, uint h) {
+  if (l > h)
+    return get_block_values8(pDst, l, h);
+  else
+    return get_block_values6(pDst, l, h);
+}
+
+uint dxt5_block::get_block_values6(uint* pDst, uint l, uint h) {
+  pDst[0] = l;
+  pDst[1] = h;
+  pDst[2] = (l * 4 + h) / 5;
+  pDst[3] = (l * 3 + h * 2) / 5;
+  pDst[4] = (l * 2 + h * 3) / 5;
+  pDst[5] = (l + h * 4) / 5;
+  pDst[6] = 0;
+  pDst[7] = 255;
+  return 6;
+}
+
+uint dxt5_block::get_block_values8(uint* pDst, uint l, uint h) {
+  pDst[0] = l;
+  pDst[1] = h;
+  pDst[2] = (l * 6 + h) / 7;
+  pDst[3] = (l * 5 + h * 2) / 7;
+  pDst[4] = (l * 4 + h * 3) / 7;
+  pDst[5] = (l * 3 + h * 4) / 7;
+  pDst[6] = (l * 2 + h * 5) / 7;
+  pDst[7] = (l + h * 6) / 7;
+  return 8;
+}
+
+uint dxt5_block::unpack_endpoint(uint packed, uint index) {
+  CRNLIB_ASSERT(index < 2);
+  return (packed >> (8 * index)) & 0xFF;
+}
+
+uint dxt5_block::pack_endpoints(uint lo, uint hi) {
+  CRNLIB_ASSERT((lo <= 0xFF) && (hi <= 0xFF));
+  return lo | (hi << 8U);
+}
+
+uint dxt5_block::get_block_values(uint* pDst, uint l, uint h) {
+  if (l > h)
+    return get_block_values8(pDst, l, h);
+  else
+    return get_block_values6(pDst, l, h);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,325 @@
+// File: crn_dxt.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "../inc/crnlib.h"
+#include "crn_color.h"
+#include "crn_vec.h"
+#include "crn_rand.h"
+#include "crn_sparse_bit_array.h"
+#include "crn_hash_map.h"
+#include <map>
+
+#define CRNLIB_DXT_ALT_ROUNDING 1
+
+namespace crnlib {
+enum dxt_constants {
+  cDXT1BytesPerBlock = 8U,
+  cDXT5NBytesPerBlock = 16U,
+
+  cDXT5SelectorBits = 3U,
+  cDXT5SelectorValues = 1U << cDXT5SelectorBits,
+  cDXT5SelectorMask = cDXT5SelectorValues - 1U,
+
+  cDXT1SelectorBits = 2U,
+  cDXT1SelectorValues = 1U << cDXT1SelectorBits,
+  cDXT1SelectorMask = cDXT1SelectorValues - 1U,
+
+  cDXTBlockShift = 2U,
+  cDXTBlockSize = 1U << cDXTBlockShift
+};
+
+enum dxt_format {
+  cDXTInvalid = -1,
+
+  // cDXT1/1A must appear first!
+  cDXT1,
+  cDXT1A,
+
+  cDXT3,
+  cDXT5,
+  cDXT5A,
+
+  cDXN_XY,  // inverted relative to standard ATI2, 360's DXN
+  cDXN_YX,  // standard ATI2,
+
+  cETC1,
+  cETC2,
+  cETC2A,
+  cETC1S,
+  cETC2AS,
+};
+
+const float cDXT1MaxLinearValue = 3.0f;
+const float cDXT1InvMaxLinearValue = 1.0f / 3.0f;
+
+const float cDXT5MaxLinearValue = 7.0f;
+const float cDXT5InvMaxLinearValue = 1.0f / 7.0f;
+
+// Converts DXT1 raw color selector index to a linear value.
+extern const uint8 g_dxt1_to_linear[cDXT1SelectorValues];
+
+// Converts DXT5 raw alpha selector index to a linear value.
+extern const uint8 g_dxt5_to_linear[cDXT5SelectorValues];
+
+// Converts DXT1 linear color selector index to a raw value (inverse of g_dxt1_to_linear).
+extern const uint8 g_dxt1_from_linear[cDXT1SelectorValues];
+
+// Converts DXT5 linear alpha selector index to a raw value (inverse of g_dxt5_to_linear).
+extern const uint8 g_dxt5_from_linear[cDXT5SelectorValues];
+
+extern const uint8 g_dxt5_alpha6_to_linear[cDXT5SelectorValues];
+
+extern const uint8 g_six_alpha_invert_table[cDXT5SelectorValues];
+extern const uint8 g_eight_alpha_invert_table[cDXT5SelectorValues];
+
+const char* get_dxt_format_string(dxt_format fmt);
+uint get_dxt_format_bits_per_pixel(dxt_format fmt);
+bool get_dxt_format_has_alpha(dxt_format fmt);
+
+const char* get_dxt_quality_string(crn_dxt_quality q);
+
+const char* get_dxt_compressor_name(crn_dxt_compressor_type c);
+
+struct dxt1_block {
+  uint8 m_low_color[2];
+  uint8 m_high_color[2];
+
+  enum { cNumSelectorBytes = 4 };
+  uint8 m_selectors[cNumSelectorBytes];
+
+  inline void clear() {
+    utils::zero_this(this);
+  }
+
+  // These methods assume the in-memory rep is in LE byte order.
+  inline uint get_low_color() const {
+    return m_low_color[0] | (m_low_color[1] << 8U);
+  }
+
+  inline uint get_high_color() const {
+    return m_high_color[0] | (m_high_color[1] << 8U);
+  }
+
+  inline void set_low_color(uint16 c) {
+    m_low_color[0] = static_cast<uint8>(c & 0xFF);
+    m_low_color[1] = static_cast<uint8>((c >> 8) & 0xFF);
+  }
+
+  inline void set_high_color(uint16 c) {
+    m_high_color[0] = static_cast<uint8>(c & 0xFF);
+    m_high_color[1] = static_cast<uint8>((c >> 8) & 0xFF);
+  }
+
+  inline bool is_constant_color_block() const { return get_low_color() == get_high_color(); }
+  inline bool is_alpha_block() const { return get_low_color() <= get_high_color(); }
+  inline bool is_non_alpha_block() const { return !is_alpha_block(); }
+
+  inline uint get_selector(uint x, uint y) const {
+    CRNLIB_ASSERT((x < 4U) && (y < 4U));
+    return (m_selectors[y] >> (x * cDXT1SelectorBits)) & cDXT1SelectorMask;
+  }
+
+  inline void set_selector(uint x, uint y, uint val) {
+    CRNLIB_ASSERT((x < 4U) && (y < 4U) && (val < 4U));
+
+    m_selectors[y] &= (~(cDXT1SelectorMask << (x * cDXT1SelectorBits)));
+    m_selectors[y] |= (val << (x * cDXT1SelectorBits));
+  }
+
+  inline void flip_x(uint w = 4, uint h = 4) {
+    for (uint x = 0; x < (w / 2); x++) {
+      for (uint y = 0; y < h; y++) {
+        const uint c = get_selector(x, y);
+        set_selector(x, y, get_selector((w - 1) - x, y));
+        set_selector((w - 1) - x, y, c);
+      }
+    }
+  }
+
+  inline void flip_y(uint w = 4, uint h = 4) {
+    for (uint y = 0; y < (h / 2); y++) {
+      for (uint x = 0; x < w; x++) {
+        const uint c = get_selector(x, y);
+        set_selector(x, y, get_selector(x, (h - 1) - y));
+        set_selector(x, (h - 1) - y, c);
+      }
+    }
+  }
+
+  static uint16 pack_color(const color_quad_u8& color, bool scaled, uint bias = 127U);
+  static uint16 pack_color(uint r, uint g, uint b, bool scaled, uint bias = 127U);
+
+  static color_quad_u8 unpack_color(uint16 packed_color, bool scaled, uint alpha = 255U);
+  static void unpack_color(uint& r, uint& g, uint& b, uint16 packed_color, bool scaled);
+
+  static uint get_block_colors3(color_quad_u8* pDst, uint16 color0, uint16 color1);
+  static uint get_block_colors3_round(color_quad_u8* pDst, uint16 color0, uint16 color1);
+
+  static uint get_block_colors4(color_quad_u8* pDst, uint16 color0, uint16 color1);
+  static uint get_block_colors4_round(color_quad_u8* pDst, uint16 color0, uint16 color1);
+
+  // pDst must point to an array at least cDXT1SelectorValues long.
+  static uint get_block_colors(color_quad_u8* pDst, uint16 color0, uint16 color1);
+
+  static uint get_block_colors_round(color_quad_u8* pDst, uint16 color0, uint16 color1);
+
+  static color_quad_u8 unpack_endpoint(uint32 endpoints, uint index, bool scaled, uint alpha = 255U);
+  static uint pack_endpoints(uint lo, uint hi);
+
+  static void get_block_colors_NV5x(color_quad_u8* pDst, uint16 packed_col0, uint16 packed_col1, bool color4);
+};
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(dxt1_block);
+
+struct dxt3_block {
+  enum { cNumAlphaBytes = 8 };
+  uint8 m_alpha[cNumAlphaBytes];
+
+  void set_alpha(uint x, uint y, uint value, bool scaled);
+  uint get_alpha(uint x, uint y, bool scaled) const;
+
+  inline void flip_x(uint w = 4, uint h = 4) {
+    for (uint x = 0; x < (w / 2); x++) {
+      for (uint y = 0; y < h; y++) {
+        const uint c = get_alpha(x, y, false);
+        set_alpha(x, y, get_alpha((w - 1) - x, y, false), false);
+        set_alpha((w - 1) - x, y, c, false);
+      }
+    }
+  }
+
+  inline void flip_y(uint w = 4, uint h = 4) {
+    for (uint y = 0; y < (h / 2); y++) {
+      for (uint x = 0; x < w; x++) {
+        const uint c = get_alpha(x, y, false);
+        set_alpha(x, y, get_alpha(x, (h - 1) - y, false), false);
+        set_alpha(x, (h - 1) - y, c, false);
+      }
+    }
+  }
+};
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(dxt3_block);
+
+struct dxt5_block {
+  uint8 m_endpoints[2];
+
+  enum { cNumSelectorBytes = 6 };
+  uint8 m_selectors[cNumSelectorBytes];
+
+  inline void clear() {
+    utils::zero_this(this);
+  }
+
+  inline uint get_low_alpha() const {
+    return m_endpoints[0];
+  }
+
+  inline uint get_high_alpha() const {
+    return m_endpoints[1];
+  }
+
+  inline void set_low_alpha(uint i) {
+    CRNLIB_ASSERT(i <= cUINT8_MAX);
+    m_endpoints[0] = static_cast<uint8>(i);
+  }
+
+  inline void set_high_alpha(uint i) {
+    CRNLIB_ASSERT(i <= cUINT8_MAX);
+    m_endpoints[1] = static_cast<uint8>(i);
+  }
+
+  inline bool is_alpha6_block() const { return get_low_alpha() <= get_high_alpha(); }
+
+  uint get_endpoints_as_word() const { return m_endpoints[0] | (m_endpoints[1] << 8); }
+  uint get_selectors_as_word(uint index) {
+    CRNLIB_ASSERT(index < 3);
+    return m_selectors[index * 2] | (m_selectors[index * 2 + 1] << 8);
+  }
+
+  inline uint get_selector(uint x, uint y) const {
+    CRNLIB_ASSERT((x < 4U) && (y < 4U));
+
+    uint selector_index = (y * 4) + x;
+    uint bit_index = selector_index * cDXT5SelectorBits;
+
+    uint byte_index = bit_index >> 3;
+    uint bit_ofs = bit_index & 7;
+
+    uint v = m_selectors[byte_index];
+    if (byte_index < (cNumSelectorBytes - 1))
+      v |= (m_selectors[byte_index + 1] << 8);
+
+    return (v >> bit_ofs) & 7;
+  }
+
+  inline void set_selector(uint x, uint y, uint val) {
+    CRNLIB_ASSERT((x < 4U) && (y < 4U) && (val < 8U));
+
+    uint selector_index = (y * 4) + x;
+    uint bit_index = selector_index * cDXT5SelectorBits;
+
+    uint byte_index = bit_index >> 3;
+    uint bit_ofs = bit_index & 7;
+
+    uint v = m_selectors[byte_index];
+    if (byte_index < (cNumSelectorBytes - 1))
+      v |= (m_selectors[byte_index + 1] << 8);
+
+    v &= (~(7 << bit_ofs));
+    v |= (val << bit_ofs);
+
+    m_selectors[byte_index] = static_cast<uint8>(v);
+    if (byte_index < (cNumSelectorBytes - 1))
+      m_selectors[byte_index + 1] = static_cast<uint8>(v >> 8);
+  }
+
+  inline void flip_x(uint w = 4, uint h = 4) {
+    for (uint x = 0; x < (w / 2); x++) {
+      for (uint y = 0; y < h; y++) {
+        const uint c = get_selector(x, y);
+        set_selector(x, y, get_selector((w - 1) - x, y));
+        set_selector((w - 1) - x, y, c);
+      }
+    }
+  }
+
+  inline void flip_y(uint w = 4, uint h = 4) {
+    for (uint y = 0; y < (h / 2); y++) {
+      for (uint x = 0; x < w; x++) {
+        const uint c = get_selector(x, y);
+        set_selector(x, y, get_selector(x, (h - 1) - y));
+        set_selector(x, (h - 1) - y, c);
+      }
+    }
+  }
+
+  enum { cMaxSelectorValues = 8 };
+
+  // Results written to alpha channel.
+  static uint get_block_values6(color_quad_u8* pDst, uint l, uint h);
+  static uint get_block_values8(color_quad_u8* pDst, uint l, uint h);
+  static uint get_block_values(color_quad_u8* pDst, uint l, uint h);
+
+  static uint get_block_values6(uint* pDst, uint l, uint h);
+  static uint get_block_values8(uint* pDst, uint l, uint h);
+  // pDst must point to an array at least cDXT5SelectorValues long.
+  static uint get_block_values(uint* pDst, uint l, uint h);
+
+  static uint unpack_endpoint(uint packed, uint index);
+  static uint pack_endpoints(uint lo, uint hi);
+};
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(dxt5_block);
+
+struct dxt_pixel_block {
+  color_quad_u8 m_pixels[cDXTBlockSize][cDXTBlockSize];  // [y][x]
+
+  inline void clear() {
+    utils::zero_object(*this);
+  }
+};
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(dxt_pixel_block);
+
+}  // namespace crnlib
@@ -0,0 +1,276 @@
+// File: crn_dxt1.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dxt.h"
+
+namespace crnlib {
+struct dxt1_solution_coordinates {
+  inline dxt1_solution_coordinates()
+      : m_low_color(0), m_high_color(0) {}
+
+  inline dxt1_solution_coordinates(uint16 l, uint16 h)
+      : m_low_color(l), m_high_color(h) {}
+
+  inline dxt1_solution_coordinates(const color_quad_u8& l, const color_quad_u8& h, bool scaled = true)
+      : m_low_color(dxt1_block::pack_color(l, scaled)),
+        m_high_color(dxt1_block::pack_color(h, scaled)) {
+  }
+
+  inline dxt1_solution_coordinates(vec3F nl, vec3F nh) {
+#if CRNLIB_DXT_ALT_ROUNDING
+    // Umm, wtf?
+    nl.clamp(0.0f, .999f);
+    nh.clamp(0.0f, .999f);
+    color_quad_u8 l((int)floor(nl[0] * 32.0f), (int)floor(nl[1] * 64.0f), (int)floor(nl[2] * 32.0f), 255);
+    color_quad_u8 h((int)floor(nh[0] * 32.0f), (int)floor(nh[1] * 64.0f), (int)floor(nh[2] * 32.0f), 255);
+#else
+    // Fixes the bins
+    color_quad_u8 l((int)floor(.5f + nl[0] * 31.0f), (int)floor(.5f + nl[1] * 63.0f), (int)floor(.5f + nl[2] * 31.0f), 255);
+    color_quad_u8 h((int)floor(.5f + nh[0] * 31.0f), (int)floor(.5f + nh[1] * 63.0f), (int)floor(.5f + nh[2] * 31.0f), 255);
+#endif
+
+    m_low_color = dxt1_block::pack_color(l, false);
+    m_high_color = dxt1_block::pack_color(h, false);
+  }
+
+  uint16 m_low_color;
+  uint16 m_high_color;
+
+  inline void clear() {
+    m_low_color = 0;
+    m_high_color = 0;
+  }
+
+  inline dxt1_solution_coordinates& canonicalize() {
+    if (m_low_color < m_high_color)
+      utils::swap(m_low_color, m_high_color);
+    return *this;
+  }
+
+  inline operator size_t() const { return fast_hash(this, sizeof(*this)); }
+
+  inline bool operator==(const dxt1_solution_coordinates& other) const {
+    uint16 l0 = math::minimum(m_low_color, m_high_color);
+    uint16 h0 = math::maximum(m_low_color, m_high_color);
+
+    uint16 l1 = math::minimum(other.m_low_color, other.m_high_color);
+    uint16 h1 = math::maximum(other.m_low_color, other.m_high_color);
+
+    return (l0 == l1) && (h0 == h1);
+  }
+
+  inline bool operator!=(const dxt1_solution_coordinates& other) const {
+    return !(*this == other);
+  }
+
+  inline bool operator<(const dxt1_solution_coordinates& other) const {
+    uint16 l0 = math::minimum(m_low_color, m_high_color);
+    uint16 h0 = math::maximum(m_low_color, m_high_color);
+
+    uint16 l1 = math::minimum(other.m_low_color, other.m_high_color);
+    uint16 h1 = math::maximum(other.m_low_color, other.m_high_color);
+
+    if (l0 < l1)
+      return true;
+    else if (l0 == l1) {
+      if (h0 < h1)
+        return true;
+    }
+
+    return false;
+  }
+};
+
+typedef crnlib::vector<dxt1_solution_coordinates> dxt1_solution_coordinates_vec;
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(dxt1_solution_coordinates);
+
+struct unique_color {
+  inline unique_color() {}
+  inline unique_color(const color_quad_u8& color, uint weight)
+      : m_color(color), m_weight(weight) {}
+
+  color_quad_u8 m_color;
+  uint m_weight;
+
+  inline bool operator<(const unique_color& c) const {
+    return *reinterpret_cast<const uint32*>(&m_color) < *reinterpret_cast<const uint32*>(&c.m_color);
+  }
+
+  inline bool operator==(const unique_color& c) const {
+    return *reinterpret_cast<const uint32*>(&m_color) == *reinterpret_cast<const uint32*>(&c.m_color);
+  }
+};
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(unique_color);
+
+class dxt1_endpoint_optimizer {
+ public:
+  dxt1_endpoint_optimizer();
+
+  struct params {
+    params()
+        : m_block_index(0),
+          m_pPixels(NULL),
+          m_num_pixels(0),
+          m_dxt1a_alpha_threshold(128U),
+          m_quality(cCRNDXTQualityUber),
+          m_pixels_have_alpha(false),
+          m_use_alpha_blocks(true),
+          m_perceptual(true),
+          m_grayscale_sampling(false),
+          m_endpoint_caching(true),
+          m_use_transparent_indices_for_black(false),
+          m_force_alpha_blocks(false) {
+    }
+
+    uint m_block_index;
+
+    const color_quad_u8* m_pPixels;
+    uint m_num_pixels;
+    uint m_dxt1a_alpha_threshold;
+
+    crn_dxt_quality m_quality;
+
+    bool m_pixels_have_alpha;
+    bool m_use_alpha_blocks;
+    bool m_perceptual;
+    bool m_grayscale_sampling;
+    bool m_endpoint_caching;
+    bool m_use_transparent_indices_for_black;
+    bool m_force_alpha_blocks;
+  };
+
+  struct results {
+    inline results()
+        : m_pSelectors(NULL) {}
+
+    uint64 m_error;
+
+    uint16 m_low_color;
+    uint16 m_high_color;
+
+    uint8* m_pSelectors;
+    bool m_alpha_block;
+    bool m_reordered;
+    bool m_alternate_rounding;
+    bool m_enforce_selector;
+    uint8 m_enforced_selector;
+  };
+
+  bool compute(const params& p, results& r);
+
+ private:
+  const params* m_pParams;
+  results* m_pResults;
+
+  bool m_perceptual;
+  bool m_evaluate_hc;
+
+  typedef crnlib::vector<unique_color> unique_color_vec;
+
+  //typedef crnlib::hash_map<uint32, uint32, bit_hasher<uint32> > unique_color_hash_map;
+  typedef crnlib::hash_map<uint32, uint32> unique_color_hash_map;
+  unique_color_hash_map m_unique_color_hash_map;
+
+  unique_color_vec m_unique_colors;  // excludes transparent colors!
+  unique_color_vec m_evaluated_colors;
+  unique_color_vec m_temp_unique_colors;
+
+  struct {
+    uint64 low, high;
+  } m_rDist[32], m_gDist[64], m_bDist[32];
+  
+  uint m_total_unique_color_weight;
+
+  bool m_has_transparent_pixels;
+
+  vec3F_array m_norm_unique_colors;
+  vec3F m_mean_norm_color;
+
+  vec3F_array m_norm_unique_colors_weighted;
+  vec3F m_mean_norm_color_weighted;
+
+  vec3F m_principle_axis;
+
+  crnlib::vector<uint16> m_unique_packed_colors;
+  crnlib::vector<uint8> m_trial_selectors;
+
+  crnlib::vector<vec3F> m_low_coords;
+  crnlib::vector<vec3F> m_high_coords;
+
+  enum { cMaxPrevResults = 4 };
+  dxt1_solution_coordinates m_prev_results[cMaxPrevResults];
+  uint m_num_prev_results;
+
+  crnlib::vector<vec3I> m_lo_cells;
+  crnlib::vector<vec3I> m_hi_cells;
+
+  struct potential_solution {
+    potential_solution()
+        : m_coords(), m_error(cUINT64_MAX), m_alpha_block(false) {
+    }
+
+    dxt1_solution_coordinates m_coords;
+    crnlib::vector<uint8> m_selectors;
+    uint64 m_error;
+    bool m_alpha_block;
+    bool m_alternate_rounding;
+    bool m_enforce_selector;
+    uint8 m_enforced_selector;
+
+    void clear() {
+      m_coords.clear();
+      m_selectors.resize(0);
+      m_error = cUINT64_MAX;
+      m_alpha_block = false;
+    }
+
+    bool are_selectors_all_equal() const {
+      if (m_selectors.empty())
+        return false;
+      const uint s = m_selectors[0];
+      for (uint i = 1; i < m_selectors.size(); i++)
+        if (m_selectors[i] != s)
+          return false;
+      return true;
+    }
+  };
+
+  potential_solution m_trial_solution;
+  potential_solution m_best_solution;
+
+  typedef crnlib::hash_map<uint, empty_type> solution_hash_map;
+  solution_hash_map m_solutions_tried;
+
+  bool refine_solution(int refinement_level = 0);
+
+  bool evaluate_solution(const dxt1_solution_coordinates& coords, bool alternate_rounding = false);
+  bool evaluate_solution_uber(const dxt1_solution_coordinates& coords, bool alternate_rounding);
+  bool evaluate_solution_fast(const dxt1_solution_coordinates& coords, bool alternate_rounding);
+  bool evaluate_solution_hc_perceptual(const dxt1_solution_coordinates& coords, bool alternate_rounding);
+  bool evaluate_solution_hc_uniform(const dxt1_solution_coordinates& coords, bool alternate_rounding);
+  void compute_selectors();
+  void compute_selectors_hc();
+
+  void find_unique_colors();
+  void handle_multicolor_block();
+  void compute_pca(vec3F& axis, const vec3F_array& norm_colors, const vec3F& def);
+  void compute_vectors(const vec3F& perceptual_weights);
+  void return_solution();
+  void try_combinatorial_encoding();
+  void compute_endpoint_component_errors(uint comp_index, uint64 (&error)[4][256], uint64 (&best_remaining_error)[4]);
+  void optimize_endpoint_comps();
+  void optimize_endpoints(vec3F& low_color, vec3F& high_color);
+  bool try_alpha_as_black_optimization();
+  bool try_average_block_as_solid();
+  bool try_median4(const vec3F& low_color, const vec3F& high_color);
+
+  void compute_internal(const params& p, results& r);
+
+  unique_color lerp_color(const color_quad_u8& a, const color_quad_u8& b, float f, int rounding = 1);
+
+  inline uint color_distance(bool perceptual, const color_quad_u8& e1, const color_quad_u8& e2, bool alpha);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,189 @@
+// File: crn_dxt5a.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_dxt5a.h"
+#include "crn_ryg_dxt.hpp"
+#include "crn_dxt_fast.h"
+#include "crn_intersect.h"
+
+namespace crnlib {
+dxt5_endpoint_optimizer::dxt5_endpoint_optimizer()
+    : m_pParams(NULL),
+      m_pResults(NULL) {
+  m_unique_values.reserve(16);
+  m_unique_value_weights.reserve(16);
+}
+
+bool dxt5_endpoint_optimizer::compute(const params& p, results& r) {
+  m_pParams = &p;
+  m_pResults = &r;
+
+  if ((!p.m_num_pixels) || (!p.m_pPixels))
+    return false;
+
+  m_unique_values.resize(0);
+  m_unique_value_weights.resize(0);
+
+  for (uint i = 0; i < 256; i++)
+    m_unique_value_map[i] = -1;
+
+  for (uint i = 0; i < p.m_num_pixels; i++) {
+    uint alpha = p.m_pPixels[i][p.m_comp_index];
+
+    int index = m_unique_value_map[alpha];
+
+    if (index == -1) {
+      index = m_unique_values.size();
+
+      m_unique_value_map[alpha] = index;
+
+      m_unique_values.push_back(static_cast<uint8>(alpha));
+      m_unique_value_weights.push_back(0);
+    }
+
+    m_unique_value_weights[index]++;
+  }
+
+  if (m_unique_values.size() == 1) {
+    r.m_block_type = 0;
+    r.m_reordered = false;
+    r.m_error = 0;
+    r.m_first_endpoint = m_unique_values[0];
+    r.m_second_endpoint = m_unique_values[0];
+    memset(r.m_pSelectors, 0, p.m_num_pixels);
+    return true;
+  }
+
+  m_trial_selectors.resize(m_unique_values.size());
+  m_best_selectors.resize(m_unique_values.size());
+
+  r.m_error = cUINT64_MAX;
+
+  for (uint i = 0; i < m_unique_values.size() - 1; i++) {
+    const uint low_endpoint = m_unique_values[i];
+
+    for (uint j = i + 1; j < m_unique_values.size(); j++) {
+      const uint high_endpoint = m_unique_values[j];
+
+      evaluate_solution(low_endpoint, high_endpoint);
+    }
+  }
+
+  if ((m_pParams->m_quality >= cCRNDXTQualityBetter) && (m_pResults->m_error)) {
+    m_flags.resize(256 * 256);
+    m_flags.clear_all_bits();
+
+    const int cProbeAmount = (m_pParams->m_quality == cCRNDXTQualityUber) ? 16 : 8;
+
+    for (int l_delta = -cProbeAmount; l_delta <= cProbeAmount; l_delta++) {
+      const int l = m_pResults->m_first_endpoint + l_delta;
+      if (l < 0)
+        continue;
+      else if (l > 255)
+        break;
+
+      const uint bit_index = l * 256;
+
+      for (int h_delta = -cProbeAmount; h_delta <= cProbeAmount; h_delta++) {
+        const int h = m_pResults->m_second_endpoint + h_delta;
+        if (h < 0)
+          continue;
+        else if (h > 255)
+          break;
+
+        //if (m_flags.get_bit(bit_index + h))
+        //   continue;
+        if ((m_flags.get_bit(bit_index + h)) || (m_flags.get_bit(h * 256 + l)))
+          continue;
+        m_flags.set_bit(bit_index + h);
+
+        evaluate_solution(static_cast<uint>(l), static_cast<uint>(h));
+      }
+    }
+  }
+
+  m_pResults->m_reordered = false;
+  if (m_pResults->m_first_endpoint == m_pResults->m_second_endpoint) {
+    for (uint i = 0; i < m_best_selectors.size(); i++)
+      m_best_selectors[i] = 0;
+  } else if (m_pResults->m_block_type) {
+    //if (l > h)
+    //   eight alpha
+    // else
+    //   six alpha
+
+    if (m_pResults->m_first_endpoint > m_pResults->m_second_endpoint) {
+      utils::swap(m_pResults->m_first_endpoint, m_pResults->m_second_endpoint);
+      m_pResults->m_reordered = true;
+      for (uint i = 0; i < m_best_selectors.size(); i++)
+        m_best_selectors[i] = g_six_alpha_invert_table[m_best_selectors[i]];
+    }
+  } else if (!(m_pResults->m_first_endpoint > m_pResults->m_second_endpoint)) {
+    utils::swap(m_pResults->m_first_endpoint, m_pResults->m_second_endpoint);
+    m_pResults->m_reordered = true;
+    for (uint i = 0; i < m_best_selectors.size(); i++)
+      m_best_selectors[i] = g_eight_alpha_invert_table[m_best_selectors[i]];
+  }
+
+  for (uint i = 0; i < m_pParams->m_num_pixels; i++) {
+    uint alpha = m_pParams->m_pPixels[i][m_pParams->m_comp_index];
+
+    int index = m_unique_value_map[alpha];
+
+    m_pResults->m_pSelectors[i] = m_best_selectors[index];
+  }
+
+  return true;
+}
+
+void dxt5_endpoint_optimizer::evaluate_solution(uint low_endpoint, uint high_endpoint) {
+  for (uint block_type = 0; block_type < (m_pParams->m_use_both_block_types ? 2U : 1U); block_type++) {
+    uint selector_values[8];
+
+    if (!block_type)
+      dxt5_block::get_block_values8(selector_values, low_endpoint, high_endpoint);
+    else
+      dxt5_block::get_block_values6(selector_values, low_endpoint, high_endpoint);
+
+    uint64 trial_error = 0;
+
+    for (uint i = 0; i < m_unique_values.size(); i++) {
+      const uint val = m_unique_values[i];
+      const uint weight = m_unique_value_weights[i];
+
+      uint best_selector_error = UINT_MAX;
+      uint best_selector = 0;
+
+      for (uint j = 0; j < 8; j++) {
+        int selector_error = val - selector_values[j];
+        selector_error = selector_error * selector_error * (int)weight;
+
+        if (static_cast<uint>(selector_error) < best_selector_error) {
+          best_selector_error = selector_error;
+          best_selector = j;
+          if (!best_selector_error)
+            break;
+        }
+      }
+
+      m_trial_selectors[i] = static_cast<uint8>(best_selector);
+      trial_error += best_selector_error;
+
+      if (trial_error > m_pResults->m_error)
+        break;
+    }
+
+    if (trial_error < m_pResults->m_error) {
+      m_pResults->m_error = trial_error;
+      m_pResults->m_first_endpoint = static_cast<uint8>(low_endpoint);
+      m_pResults->m_second_endpoint = static_cast<uint8>(high_endpoint);
+      m_pResults->m_block_type = static_cast<uint8>(block_type);
+      m_best_selectors.swap(m_trial_selectors);
+
+      if (!trial_error)
+        break;
+    }
+  }
+}
+
+}  // namespace crnlib
@@ -0,0 +1,62 @@
+// File: crn_dxt5a.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dxt.h"
+
+namespace crnlib {
+class dxt5_endpoint_optimizer {
+ public:
+  dxt5_endpoint_optimizer();
+
+  struct params {
+    params()
+        : m_block_index(0),
+          m_pPixels(NULL),
+          m_num_pixels(0),
+          m_comp_index(3),
+          m_quality(cCRNDXTQualityUber),
+          m_use_both_block_types(true) {
+    }
+
+    uint m_block_index;
+
+    const color_quad_u8* m_pPixels;
+    uint m_num_pixels;
+    uint m_comp_index;
+
+    crn_dxt_quality m_quality;
+
+    bool m_use_both_block_types;
+  };
+
+  struct results {
+    uint8* m_pSelectors;
+
+    uint64 m_error;
+
+    uint8 m_first_endpoint;
+    uint8 m_second_endpoint;
+
+    uint8 m_block_type;  // 1 if 6-alpha, otherwise 8-alpha
+    bool m_reordered;
+  };
+
+  bool compute(const params& p, results& r);
+
+ private:
+  const params* m_pParams;
+  results* m_pResults;
+
+  crnlib::vector<uint8> m_unique_values;
+  crnlib::vector<uint> m_unique_value_weights;
+
+  crnlib::vector<uint8> m_trial_selectors;
+  crnlib::vector<uint8> m_best_selectors;
+  int m_unique_value_map[256];
+
+  sparse_bit_array m_flags;
+
+  void evaluate_solution(uint low_endpoint, uint high_endpoint);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,209 @@
+// File: crn_dxt_endpoint_refiner.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_dxt_endpoint_refiner.h"
+#include "crn_dxt1.h"
+
+namespace crnlib {
+dxt_endpoint_refiner::dxt_endpoint_refiner()
+    : m_pParams(NULL),
+      m_pResults(NULL) {
+}
+
+bool dxt_endpoint_refiner::refine(const params& p, results& r) {
+  if (!p.m_num_pixels)
+    return false;
+
+  m_pParams = &p;
+  m_pResults = &r;
+
+  r.m_error = cUINT64_MAX;
+  r.m_low_color = 0;
+  r.m_high_color = 0;
+
+  double alpha2_sum = 0.0f;
+  double beta2_sum = 0.0f;
+  double alphabeta_sum = 0.0f;
+
+  vec<3, double> alphax_sum(0.0f);
+  vec<3, double> betax_sum(0.0f);
+
+  vec<3, double> first_color(0.0f);
+
+  // This linear solver is from Squish.
+  for (uint i = 0; i < p.m_num_pixels; ++i) {
+    uint8 c = p.m_pSelectors[i];
+
+    double k;
+    if (p.m_dxt1_selectors)
+      k = g_dxt1_to_linear[c] * 1.0f / 3.0f;
+    else
+      k = g_dxt5_to_linear[c] * 1.0f / 7.0f;
+
+    double alpha = 1.0f - k;
+    double beta = k;
+
+    vec<3, double> x;
+
+    if (p.m_dxt1_selectors)
+      x.set(p.m_pPixels[i][0] * 1.0f / 255.0f, p.m_pPixels[i][1] * 1.0f / 255.0f, p.m_pPixels[i][2] * 1.0f / 255.0f);
+    else
+      x.set(p.m_pPixels[i][p.m_alpha_comp_index] / 255.0f);
+
+    if (!i)
+      first_color = x;
+
+    alpha2_sum += alpha * alpha;
+    beta2_sum += beta * beta;
+    alphabeta_sum += alpha * beta;
+    alphax_sum += alpha * x;
+    betax_sum += beta * x;
+  }
+
+  // zero where non-determinate
+  vec<3, double> a, b;
+  if (beta2_sum == 0.0f) {
+    a = alphax_sum / alpha2_sum;
+    b.clear();
+  } else if (alpha2_sum == 0.0f) {
+    a.clear();
+    b = betax_sum / beta2_sum;
+  } else {
+    double factor = alpha2_sum * beta2_sum - alphabeta_sum * alphabeta_sum;
+    if (factor != 0.0f) {
+      a = (alphax_sum * beta2_sum - betax_sum * alphabeta_sum) / factor;
+      b = (betax_sum * alpha2_sum - alphax_sum * alphabeta_sum) / factor;
+    } else {
+      a = first_color;
+      b = first_color;
+    }
+  }
+
+  vec3F l(0.0f), h(0.0f);
+  l = a;
+  h = b;
+
+  l.clamp(0.0f, 1.0f);
+  h.clamp(0.0f, 1.0f);
+
+  if (p.m_dxt1_selectors)
+    optimize_dxt1(l, h);
+  else
+    optimize_dxt5(l, h);
+
+  return r.m_error < p.m_error_to_beat;
+}
+
+void dxt_endpoint_refiner::optimize_dxt5(vec3F low_color, vec3F high_color) {
+  uint8 L0 = math::clamp<int>(low_color[0] * 256.0f, 0, 255);
+  uint8 H0 = math::clamp<int>(high_color[0] * 256.0f, 0, 255);
+
+  uint64 hist[8] = {}, D2[8] = {}, DD[8] = {};
+  for (uint c = m_pParams->m_alpha_comp_index, i = 0; i < m_pParams->m_num_pixels; i++) {
+    uint8 a = m_pParams->m_pPixels[i][c];
+    uint8 s = m_pParams->m_pSelectors[i];
+    hist[s]++;
+    D2[s] += a * 2;
+    DD[s] += a * a;
+  }
+
+  uint16 solutions[529];
+  uint solutions_count = 0;
+  solutions[solutions_count++] = L0 == H0 ? H0 ? H0 - 1 << 8 | L0 : 1 : L0 > H0 ? H0 << 8 | L0 : L0 << 8 | H0;
+  uint8 minL = L0 <= 11 ? 0 : L0 - 11, maxL = L0 >= 244 ? 255 : L0 + 11;
+  uint8 minH = H0 <= 11 ? 0 : H0 - 11, maxH = H0 >= 244 ? 255 : H0 + 11;
+  for (uint16 L = minL; L <= maxL; L++) {
+    for (uint16 H = minH; H <= maxH; H++) {
+      if ((maxH < L || L <= H || H < minL) && (L != L0 || H != H0) && (L != H0 || H != L0))
+        solutions[solutions_count++] = L == H ? H ? H - 1 << 8 | L : 1 : L > H ? H << 8 | L : L << 8 | H;
+    }
+  }
+
+  for (uint i = 0; i < solutions_count; i++) {
+    uint8 L = solutions[i] & 0xFF;
+    uint8 H = solutions[i] >> 8;
+    uint values[8];
+    dxt5_block::get_block_values8(values, L, H);
+    uint64 error = 0;
+    for (uint64 s = 0; s < 8; s++)
+      error += hist[s] * values[s] * values[s] - D2[s] * values[s] + DD[s];
+    if (error < m_pResults->m_error) {
+      m_pResults->m_low_color = L;
+      m_pResults->m_high_color = H;
+      m_pResults->m_error = error;
+      if (!m_pResults->m_error)
+        return;
+    }
+  }
+}
+
+void dxt_endpoint_refiner::optimize_dxt1(vec3F low_color, vec3F high_color) {
+  uint16 L0 = math::clamp<int>(low_color[0] * 32.0f, 0, 31) << 11 | math::clamp<int>(low_color[1] * 64.0f, 0, 63) << 5 | math::clamp<int>(low_color[2] * 32.0f, 0, 31);
+  uint16 H0 = math::clamp<int>(high_color[0] * 32.0f, 0, 31) << 11 | math::clamp<int>(high_color[1] * 64.0f, 0, 63) << 5 | math::clamp<int>(high_color[2] * 32.0f, 0, 31);
+
+  uint64 hist[4] = {}, D2[4][3] = {}, DD[4][3] = {};
+  for (uint i = 0; i < m_pParams->m_num_pixels; i++) {
+    const color_quad_u8& pixel = m_pParams->m_pPixels[i];
+    uint8 s = m_pParams->m_pSelectors[i];
+    hist[s]++;
+    for (uint c = 0; c < 3; c++) {
+      D2[s][c] += pixel[c] * 2;
+      DD[s][c] += pixel[c] * pixel[c];
+    }
+  }
+  crnlib::vector<uint> solutions(54);
+  bool preserveL = hist[0] + hist[2] > hist[1] + hist[3];
+  bool improved = true;
+
+  for (uint iterations = 8; improved && iterations; iterations--) {
+    improved = false;
+    uint solutions_count = 0;
+    for (uint16 b0 = L0 & 31, g0 = L0 >> 5 & 63, r0 = L0 >> 11 & 31, b = b0 ? b0 - 1 : b0; b <= b0 + 1 && b <= 31; b++) {
+      for (uint16 g = g0 ? g0 - 1 : g0; g <= g0 + 1 && g <= 63; g++) {
+        for (uint16 r = r0 ? r0 - 1 : r0; r <= r0 + 1 && r <= 31; r++) {
+          uint16 L = r << 11 | g << 5 | b;
+          if (L != L0)
+            solutions[solutions_count++] = L > H0 ? L | H0 << 16 : H0 | L << 16;
+        }
+      }
+    }
+    for (uint16 b0 = H0 & 31, g0 = H0 >> 5 & 63, r0 = H0 >> 11 & 31, b = b0 ? b0 - 1 : b0; b <= b0 + 1 && b <= 31; b++) {
+      for (uint16 g = g0 ? g0 - 1 : g0; g <= g0 + 1 && g <= 63; g++) {
+        for (uint16 r = r0 ? r0 - 1 : r0; r <= r0 + 1 && r <= 31; r++) {
+          uint16 H = r << 11 | g << 5 | b;
+          if (H != H0)
+            solutions[solutions_count++] = H > L0 ? H | L0 << 16 : L0 | H << 16;
+        }
+      }
+    }
+    std::sort(solutions.begin(), solutions.begin() + solutions_count);
+    for (uint i = 0; i < solutions_count; i++) {
+      if (i && solutions[i] == solutions[i - 1])
+        continue;
+      uint16 L = solutions[i] & 0xFFFF;
+      uint16 H = solutions[i] >> 16;
+      if (L == H) {
+        L += !preserveL ? ~L & 0x1F ? 0x1 : ~L & 0xF800 ? 0x800 : ~L & 0x7E0 ? 0x20 : 0 : !L ? 0x1 : 0;
+        H -= preserveL ? H & 0x1F ? 0x1 : H & 0xF800 ? 0x800 : H & 0x7E0 ? 0x20 : 0 : H == 0xFFFF ? 0x1 : 0;
+      }
+      color_quad_u8 block_colors[4];
+      dxt1_block::get_block_colors4(block_colors, L, H);
+      uint64 error = 0;
+      for (uint64 s = 0, d[3]; s < 4; s++) {
+        for (uint c = 0; c < 3; c++)
+          d[c] = hist[s] * block_colors[s][c] * block_colors[s][c] - D2[s][c] * block_colors[s][c] + DD[s][c];
+        error += m_pParams->m_perceptual ? d[0] * 8 + d[1] * 25 + d[2] : d[0] + d[1] + d[2];
+      }
+      if (error < m_pResults->m_error) {
+        m_pResults->m_low_color = L0 = L;
+        m_pResults->m_high_color = H0 = H;
+        m_pResults->m_error = error;
+        if (!m_pResults->m_error)
+          return;
+        improved = true;
+      }
+    }
+  }
+}
+
+}  // namespace crnlib
@@ -0,0 +1,57 @@
+// File: crn_dxt_endpoint_refiner.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dxt.h"
+
+namespace crnlib {
+// TODO: Experimental/Not fully implemented
+class dxt_endpoint_refiner {
+ public:
+  dxt_endpoint_refiner();
+
+  struct params {
+    params()
+        : m_block_index(0),
+          m_pPixels(NULL),
+          m_num_pixels(0),
+          m_pSelectors(NULL),
+          m_alpha_comp_index(0),
+          m_error_to_beat(cUINT64_MAX),
+          m_dxt1_selectors(true),
+          m_perceptual(true),
+          m_highest_quality(true) {
+    }
+
+    uint m_block_index;
+
+    const color_quad_u8* m_pPixels;
+    uint m_num_pixels;
+
+    const uint8* m_pSelectors;
+
+    uint m_alpha_comp_index;
+
+    uint64 m_error_to_beat;
+
+    bool m_dxt1_selectors;
+    bool m_perceptual;
+    bool m_highest_quality;
+  };
+
+  struct results {
+    uint16 m_low_color;
+    uint16 m_high_color;
+    uint64 m_error;
+  };
+
+  bool refine(const params& p, results& r);
+
+ private:
+  const params* m_pParams;
+  results* m_pResults;
+
+  void optimize_dxt1(vec3F low_color, vec3F high_color);
+  void optimize_dxt5(vec3F low_color, vec3F high_color);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,836 @@
+// File: crn_dxt_fast.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+// Parts of this module are derived from RYG's excellent public domain DXTx compressor.
+#include "crn_core.h"
+#include "crn_dxt_fast.h"
+#include "crn_ryg_dxt.hpp"
+
+namespace crnlib {
+namespace dxt_fast {
+static inline int mul_8bit(int a, int b) {
+  int t = a * b + 128;
+  return (t + (t >> 8)) >> 8;
+}
+
+static inline color_quad_u8& unpack_color(color_quad_u8& c, uint v) {
+  uint rv = (v & 0xf800) >> 11;
+  uint gv = (v & 0x07e0) >> 5;
+  uint bv = (v & 0x001f) >> 0;
+
+  c.r = ryg_dxt::Expand5[rv];
+  c.g = ryg_dxt::Expand6[gv];
+  c.b = ryg_dxt::Expand5[bv];
+  c.a = 0;
+
+  return c;
+}
+
+static inline uint pack_color(const color_quad_u8& c) {
+  return (mul_8bit(c.r, 31) << 11) + (mul_8bit(c.g, 63) << 5) + mul_8bit(c.b, 31);
+}
+
+static inline void lerp_color(color_quad_u8& result, const color_quad_u8& p1, const color_quad_u8& p2, uint f) {
+  CRNLIB_ASSERT(f <= 255);
+
+  result.r = static_cast<uint8>(p1.r + mul_8bit(p2.r - p1.r, f));
+  result.g = static_cast<uint8>(p1.g + mul_8bit(p2.g - p1.g, f));
+  result.b = static_cast<uint8>(p1.b + mul_8bit(p2.b - p1.b, f));
+}
+
+static inline void eval_colors(color_quad_u8* pColors, uint c0, uint c1) {
+  unpack_color(pColors[0], c0);
+  unpack_color(pColors[1], c1);
+
+#if 0         
+         lerp_color(pColors[2], pColors[0], pColors[1], 0x55);
+         lerp_color(pColors[3], pColors[0], pColors[1], 0xAA);
+#else
+  pColors[2].r = (pColors[0].r * 2 + pColors[1].r) / 3;
+  pColors[2].g = (pColors[0].g * 2 + pColors[1].g) / 3;
+  pColors[2].b = (pColors[0].b * 2 + pColors[1].b) / 3;
+
+  pColors[3].r = (pColors[1].r * 2 + pColors[0].r) / 3;
+  pColors[3].g = (pColors[1].g * 2 + pColors[0].g) / 3;
+  pColors[3].b = (pColors[1].b * 2 + pColors[0].b) / 3;
+#endif
+}
+
+// false if all selectors equal
+static bool match_block_colors(uint n, const color_quad_u8* pBlock, const color_quad_u8* pColors, uint8* pSelectors) {
+  int dirr = pColors[0].r - pColors[1].r;
+  int dirg = pColors[0].g - pColors[1].g;
+  int dirb = pColors[0].b - pColors[1].b;
+
+  int stops[4];
+  for (int i = 0; i < 4; i++)
+    stops[i] = pColors[i].r * dirr + pColors[i].g * dirg + pColors[i].b * dirb;
+
+  // 0 2 3 1
+  int c0Point = stops[1] + stops[3];
+  int halfPoint = stops[3] + stops[2];
+  int c3Point = stops[2] + stops[0];
+
+  //dirr *= 2;
+  //dirg *= 2;
+  //dirb *= 2;
+  c0Point >>= 1;
+  halfPoint >>= 1;
+  c3Point >>= 1;
+
+  bool status = false;
+  for (uint i = 0; i < n; i++) {
+    int dot = pBlock[i].r * dirr + pBlock[i].g * dirg + pBlock[i].b * dirb;
+
+    uint8 s;
+    if (dot < halfPoint)
+      s = (dot < c0Point) ? 1 : 3;
+    else
+      s = (dot < c3Point) ? 2 : 0;
+
+    pSelectors[i] = s;
+
+    if (s != pSelectors[0])
+      status = true;
+  }
+
+  return status;
+}
+
+static bool optimize_block_colors(uint n, const color_quad_u8* block, uint& max16, uint& min16, uint ave_color[3], float axis[3]) {
+  int min[3], max[3];
+
+  for (uint ch = 0; ch < 3; ch++) {
+    const uint8* bp = ((const uint8*)block) + ch;
+    int minv, maxv;
+
+    int64 muv = bp[0];
+    minv = maxv = bp[0];
+
+    const uint l = n << 2;
+    for (uint i = 4; i < l; i += 4) {
+      muv += bp[i];
+      minv = math::minimum<int>(minv, bp[i]);
+      maxv = math::maximum<int>(maxv, bp[i]);
+    }
+
+    ave_color[ch] = static_cast<int>((muv + (n / 2)) / n);
+    min[ch] = minv;
+    max[ch] = maxv;
+  }
+
+  if ((min[0] == max[0]) && (min[1] == max[1]) && (min[2] == max[2]))
+    return false;
+
+  // determine covariance matrix
+  double cov[6];
+  for (int i = 0; i < 6; i++)
+    cov[i] = 0;
+
+  for (uint i = 0; i < n; i++) {
+    double r = (int)block[i].r - (int)ave_color[0];
+    double g = (int)block[i].g - (int)ave_color[1];
+    double b = (int)block[i].b - (int)ave_color[2];
+
+    cov[0] += r * r;
+    cov[1] += r * g;
+    cov[2] += r * b;
+    cov[3] += g * g;
+    cov[4] += g * b;
+    cov[5] += b * b;
+  }
+
+  double covf[6], vfr, vfg, vfb;
+  for (int i = 0; i < 6; i++)
+    covf[i] = cov[i] * (1.0f / 255.0f);
+
+  vfr = max[0] - min[0];
+  vfg = max[1] - min[1];
+  vfb = max[2] - min[2];
+
+  static const uint nIterPower = 4;
+  for (uint iter = 0; iter < nIterPower; iter++) {
+    double r = vfr * covf[0] + vfg * covf[1] + vfb * covf[2];
+    double g = vfr * covf[1] + vfg * covf[3] + vfb * covf[4];
+    double b = vfr * covf[2] + vfg * covf[4] + vfb * covf[5];
+
+    vfr = r;
+    vfg = g;
+    vfb = b;
+  }
+
+  double magn = math::maximum(math::maximum(fabs(vfr), fabs(vfg)), fabs(vfb));
+  int v_r, v_g, v_b;
+
+  if (magn < 4.0f)  // too small, default to luminance
+  {
+    v_r = 148;
+    v_g = 300;
+    v_b = 58;
+
+    axis[0] = (float)v_r;
+    axis[1] = (float)v_g;
+    axis[2] = (float)v_b;
+  } else {
+    magn = 512.0f / magn;
+    vfr *= magn;
+    vfg *= magn;
+    vfb *= magn;
+    v_r = static_cast<int>(vfr);
+    v_g = static_cast<int>(vfg);
+    v_b = static_cast<int>(vfb);
+
+    axis[0] = (float)vfr;
+    axis[1] = (float)vfg;
+    axis[2] = (float)vfb;
+  }
+
+  int mind = block[0].r * v_r + block[0].g * v_g + block[0].b * v_b;
+  int maxd = mind;
+  color_quad_u8 minp(block[0]);
+  color_quad_u8 maxp(block[0]);
+
+  for (uint i = 1; i < n; i++) {
+    int dot = block[i].r * v_r + block[i].g * v_g + block[i].b * v_b;
+
+    if (dot < mind) {
+      mind = dot;
+      minp = block[i];
+    }
+
+    if (dot > maxd) {
+      maxd = dot;
+      maxp = block[i];
+    }
+  }
+
+  max16 = pack_color(maxp);
+  min16 = pack_color(minp);
+
+  return true;
+}
+
+// The refinement function. (Clever code, part 2)
+// Tries to optimize colors to suit block contents better.
+// (By solving a least squares system via normal equations+Cramer's rule)
+static bool refine_block(uint n, const color_quad_u8* block, uint& max16, uint& min16, const uint8* pSelectors) {
+  static const int w1Tab[4] = {3, 0, 2, 1};
+
+  static const int prods_0[4] = {0x00, 0x00, 0x02, 0x02};
+  static const int prods_1[4] = {0x00, 0x09, 0x01, 0x04};
+  static const int prods_2[4] = {0x09, 0x00, 0x04, 0x01};
+
+  double akku_0 = 0;
+  double akku_1 = 0;
+  double akku_2 = 0;
+  double At1_r, At1_g, At1_b;
+  double At2_r, At2_g, At2_b;
+
+  At1_r = At1_g = At1_b = 0;
+  At2_r = At2_g = At2_b = 0;
+  for (uint i = 0; i < n; i++) {
+    double r = block[i].r;
+    double g = block[i].g;
+    double b = block[i].b;
+    int step = pSelectors[i];
+
+    int w1 = w1Tab[step];
+
+    akku_0 += prods_0[step];
+    akku_1 += prods_1[step];
+    akku_2 += prods_2[step];
+    At1_r += w1 * r;
+    At1_g += w1 * g;
+    At1_b += w1 * b;
+    At2_r += r;
+    At2_g += g;
+    At2_b += b;
+  }
+
+  At2_r = 3 * At2_r - At1_r;
+  At2_g = 3 * At2_g - At1_g;
+  At2_b = 3 * At2_b - At1_b;
+
+  double xx = akku_2;
+  double yy = akku_1;
+  double xy = akku_0;
+
+  double t = xx * yy - xy * xy;
+  if (!yy || !xx || (fabs(t) < .0000125f))
+    return false;
+
+  double frb = (3.0f * 31.0f / 255.0f) / t;
+  double fg = frb * (63.0f / 31.0f);
+
+  uint oldMin = min16;
+  uint oldMax = max16;
+
+  // solve.
+  max16 = math::clamp<int>(static_cast<int>((At1_r * yy - At2_r * xy) * frb + 0.5f), 0, 31) << 11;
+  max16 |= math::clamp<int>(static_cast<int>((At1_g * yy - At2_g * xy) * fg + 0.5f), 0, 63) << 5;
+  max16 |= math::clamp<int>(static_cast<int>((At1_b * yy - At2_b * xy) * frb + 0.5f), 0, 31) << 0;
+
+  min16 = math::clamp<int>(static_cast<int>((At2_r * xx - At1_r * xy) * frb + 0.5f), 0, 31) << 11;
+  min16 |= math::clamp<int>(static_cast<int>((At2_g * xx - At1_g * xy) * fg + 0.5f), 0, 63) << 5;
+  min16 |= math::clamp<int>(static_cast<int>((At2_b * xx - At1_b * xy) * frb + 0.5f), 0, 31) << 0;
+
+  return (oldMin != min16) || (oldMax != max16);
+}
+
+// false if all selectors equal
+static bool determine_selectors(uint n, const color_quad_u8* block, uint min16, uint max16, uint8* pSelectors) {
+  color_quad_u8 color[4];
+
+  if (max16 != min16) {
+    eval_colors(color, min16, max16);
+
+    return match_block_colors(n, block, color, pSelectors);
+  }
+
+  memset(pSelectors, 0, n);
+  return false;
+}
+
+static uint64 determine_error(uint n, const color_quad_u8* block, uint min16, uint max16, uint64 early_out_error) {
+  color_quad_u8 color[4];
+
+  eval_colors(color, min16, max16);
+
+  int dirr = color[0].r - color[1].r;
+  int dirg = color[0].g - color[1].g;
+  int dirb = color[0].b - color[1].b;
+
+  int stops[4];
+  for (int i = 0; i < 4; i++)
+    stops[i] = color[i].r * dirr + color[i].g * dirg + color[i].b * dirb;
+
+  // 0 2 3 1
+  int c0Point = stops[1] + stops[3];
+  int halfPoint = stops[3] + stops[2];
+  int c3Point = stops[2] + stops[0];
+
+  c0Point >>= 1;
+  halfPoint >>= 1;
+  c3Point >>= 1;
+
+  uint64 total_error = 0;
+
+  for (uint i = 0; i < n; i++) {
+    const color_quad_u8& a = block[i];
+
+    uint s = 0;
+    if (min16 != max16) {
+      int dot = a.r * dirr + a.g * dirg + a.b * dirb;
+
+      if (dot < halfPoint)
+        s = (dot < c0Point) ? 1 : 3;
+      else
+        s = (dot < c3Point) ? 2 : 0;
+    }
+
+    const color_quad_u8& b = color[s];
+
+    int e = a[0] - b[0];
+    total_error += e * e;
+
+    e = a[1] - b[1];
+    total_error += e * e;
+
+    e = a[2] - b[2];
+    total_error += e * e;
+
+    if (total_error >= early_out_error)
+      break;
+  }
+
+  return total_error;
+}
+
+static bool refine_endpoints(uint n, const color_quad_u8* pBlock, uint& low16, uint& high16, uint8* pSelectors) {
+  bool optimized = false;
+
+  const int limits[3] = {31, 63, 31};
+
+  for (uint trial = 0; trial < 2; trial++) {
+    color_quad_u8 color[4];
+    eval_colors(color, low16, high16);
+
+    uint64 total_error[3] = {0, 0, 0};
+
+    for (uint i = 0; i < n; i++) {
+      const color_quad_u8& a = pBlock[i];
+
+      const uint s = pSelectors[i];
+      const color_quad_u8& b = color[s];
+
+      int e = a[0] - b[0];
+      total_error[0] += e * e;
+
+      e = a[1] - b[1];
+      total_error[1] += e * e;
+
+      e = a[2] - b[2];
+      total_error[2] += e * e;
+    }
+
+    color_quad_u8 endpoints[2];
+    endpoints[0] = dxt1_block::unpack_color((uint16)low16, false);
+    endpoints[1] = dxt1_block::unpack_color((uint16)high16, false);
+
+    color_quad_u8 expanded_endpoints[2];
+    expanded_endpoints[0] = dxt1_block::unpack_color((uint16)low16, true);
+    expanded_endpoints[1] = dxt1_block::unpack_color((uint16)high16, true);
+
+    bool trial_optimized = false;
+
+    for (uint axis = 0; axis < 3; axis++) {
+      if (!total_error[axis])
+        continue;
+
+      const sU8* const pExpand = (axis == 1) ? ryg_dxt::Expand6 : ryg_dxt::Expand5;
+
+      for (uint e = 0; e < 2; e++) {
+        uint v[4];
+        v[e ^ 1] = expanded_endpoints[e ^ 1][axis];
+
+        for (int t = -1; t <= 1; t += 2) {
+          int a = endpoints[e][axis] + t;
+          if ((a < 0) || (a > limits[axis]))
+            continue;
+
+          v[e] = pExpand[a];
+
+          //int delta = v[1] - v[0];
+          //v[2] = v[0] + mul_8bit(delta, 0x55);
+          //v[3] = v[0] + mul_8bit(delta, 0xAA);
+
+          v[2] = (v[0] * 2 + v[1]) / 3;
+          v[3] = (v[0] + v[1] * 2) / 3;
+
+          uint64 axis_error = 0;
+
+          for (uint i = 0; i < n; i++) {
+            const color_quad_u8& p = pBlock[i];
+
+            int e = v[pSelectors[i]] - p[axis];
+
+            axis_error += e * e;
+
+            if (axis_error >= total_error[axis])
+              break;
+          }
+
+          if (axis_error < total_error[axis]) {
+            //total_error[axis] = axis_error;
+
+            endpoints[e][axis] = (uint8)a;
+            expanded_endpoints[e][axis] = (uint8)v[e];
+
+            if (e)
+              high16 = dxt1_block::pack_color(endpoints[1], false);
+            else
+              low16 = dxt1_block::pack_color(endpoints[0], false);
+
+            determine_selectors(n, pBlock, low16, high16, pSelectors);
+
+            eval_colors(color, low16, high16);
+
+            utils::zero_object(total_error);
+
+            for (uint i = 0; i < n; i++) {
+              const color_quad_u8& a = pBlock[i];
+
+              const uint s = pSelectors[i];
+              const color_quad_u8& b = color[s];
+
+              int e = a[0] - b[0];
+              total_error[0] += e * e;
+
+              e = a[1] - b[1];
+              total_error[1] += e * e;
+
+              e = a[2] - b[2];
+              total_error[2] += e * e;
+            }
+
+            trial_optimized = true;
+          }
+
+        }  // t
+
+      }  // e
+    }    // axis
+
+    if (!trial_optimized)
+      break;
+
+    optimized = true;
+
+  }  // for ( ; ; )
+
+  return optimized;
+}
+
+static void refine_endpoints2(uint n, const color_quad_u8* pBlock, uint& low16, uint& high16, uint8* pSelectors, float axis[3]) {
+  uint64 orig_error = determine_error(n, pBlock, low16, high16, cUINT64_MAX);
+  if (!orig_error)
+    return;
+
+  float l = 1.0f / sqrt(axis[0] * axis[0] + axis[1] * axis[1] + axis[2] * axis[2]);
+  vec3F principle_axis(axis[0] * l, axis[1] * l, axis[2] * l);
+
+  const float dist_per_trial = 0.027063293f;
+
+  const uint cMaxProbeRange = 8;
+  uint probe_low[cMaxProbeRange * 2 + 1];
+  uint probe_high[cMaxProbeRange * 2 + 1];
+
+  int probe_range = 8;
+  uint num_iters = 4;
+
+  const uint num_trials = probe_range * 2 + 1;
+
+  vec3F scaled_principle_axis(principle_axis * dist_per_trial);
+  scaled_principle_axis[0] *= 31.0f;
+  scaled_principle_axis[1] *= 63.0f;
+  scaled_principle_axis[2] *= 31.0f;
+  vec3F initial_ofs(scaled_principle_axis * (float)-probe_range);
+  initial_ofs[0] += .5f;
+  initial_ofs[1] += .5f;
+  initial_ofs[2] += .5f;
+
+  uint64 cur_error = orig_error;
+
+  for (uint iter = 0; iter < num_iters; iter++) {
+    color_quad_u8 endpoints[2];
+
+    endpoints[0] = dxt1_block::unpack_color((uint16)low16, false);
+    endpoints[1] = dxt1_block::unpack_color((uint16)high16, false);
+
+    vec3F low_color(endpoints[0][0], endpoints[0][1], endpoints[0][2]);
+    vec3F high_color(endpoints[1][0], endpoints[1][1], endpoints[1][2]);
+
+    vec3F probe_low_color(low_color + initial_ofs);
+    for (uint i = 0; i < num_trials; i++) {
+      int r = math::clamp((int)floor(probe_low_color[0]), 0, 31);
+      int g = math::clamp((int)floor(probe_low_color[1]), 0, 63);
+      int b = math::clamp((int)floor(probe_low_color[2]), 0, 31);
+      probe_low[i] = b | (g << 5U) | (r << 11U);
+
+      probe_low_color += scaled_principle_axis;
+    }
+
+    vec3F probe_high_color(high_color + initial_ofs);
+    for (uint i = 0; i < num_trials; i++) {
+      int r = math::clamp((int)floor(probe_high_color[0]), 0, 31);
+      int g = math::clamp((int)floor(probe_high_color[1]), 0, 63);
+      int b = math::clamp((int)floor(probe_high_color[2]), 0, 31);
+      probe_high[i] = b | (g << 5U) | (r << 11U);
+
+      probe_high_color += scaled_principle_axis;
+    }
+
+    uint best_l = low16;
+    uint best_h = high16;
+
+    enum { cMaxHash = 4 };
+    uint64 hash[cMaxHash];
+    for (uint i = 0; i < cMaxHash; i++)
+      hash[i] = 0;
+
+    uint c = best_l | (best_h << 16);
+    c = fast_hash(&c, sizeof(c));
+    hash[(c >> 6) & 3] = 1ULL << (c & 63);
+
+    for (uint i = 0; i < num_trials; i++) {
+      for (uint j = 0; j < num_trials; j++) {
+        uint l = probe_low[i];
+        uint h = probe_high[j];
+        if (l < h)
+          utils::swap(l, h);
+
+        uint c = l | (h << 16);
+        c = fast_hash(&c, sizeof(c));
+        uint64 mask = 1ULL << (c & 63);
+        uint ofs = (c >> 6) & 3;
+        if (hash[ofs] & mask)
+          continue;
+
+        hash[ofs] |= mask;
+
+        uint64 new_error = determine_error(n, pBlock, l, h, cur_error);
+        if (new_error < cur_error) {
+          best_l = l;
+          best_h = h;
+          cur_error = new_error;
+        }
+      }
+    }
+
+    bool improved = false;
+
+    if ((best_l != low16) || (best_h != high16)) {
+      low16 = best_l;
+      high16 = best_h;
+
+      determine_selectors(n, pBlock, low16, high16, pSelectors);
+      improved = true;
+    }
+
+    if (refine_endpoints(n, pBlock, low16, high16, pSelectors)) {
+      improved = true;
+
+      uint64 cur_error = determine_error(n, pBlock, low16, high16, cUINT64_MAX);
+      if (!cur_error)
+        return;
+    }
+
+    if (!improved)
+      break;
+
+  }  // iter
+
+  //uint64 end_error = determine_error(n, pBlock, low16, high16, UINT64_MAX);
+  //if (end_error > orig_error) DebugBreak();
+}
+
+static void compress_solid_block(uint n, uint ave_color[3], uint& low16, uint& high16, uint8* pSelectors) {
+  uint r = ave_color[0];
+  uint g = ave_color[1];
+  uint b = ave_color[2];
+
+  memset(pSelectors, 2, n);
+
+  low16 = (ryg_dxt::OMatch5[r][0] << 11) | (ryg_dxt::OMatch6[g][0] << 5) | ryg_dxt::OMatch5[b][0];
+  high16 = (ryg_dxt::OMatch5[r][1] << 11) | (ryg_dxt::OMatch6[g][1] << 5) | ryg_dxt::OMatch5[b][1];
+}
+
+void compress_color_block(uint n, const color_quad_u8* block, uint& low16, uint& high16, uint8* pSelectors, bool refine) {
+  CRNLIB_ASSERT((n & 15) == 0);
+
+  uint ave_color[3];
+  float axis[3];
+
+  if (!optimize_block_colors(n, block, low16, high16, ave_color, axis)) {
+    compress_solid_block(n, ave_color, low16, high16, pSelectors);
+  } else {
+    if (!determine_selectors(n, block, low16, high16, pSelectors))
+      compress_solid_block(n, ave_color, low16, high16, pSelectors);
+    else {
+      if (refine_block(n, block, low16, high16, pSelectors))
+        determine_selectors(n, block, low16, high16, pSelectors);
+
+      if (refine)
+        refine_endpoints2(n, block, low16, high16, pSelectors, axis);
+    }
+  }
+
+  if (low16 < high16) {
+    utils::swap(low16, high16);
+    for (uint i = 0; i < n; i++)
+      pSelectors[i] ^= 1;
+  }
+}
+
+void compress_color_block(dxt1_block* pDXT1_block, const color_quad_u8* pBlock, bool refine) {
+  uint8 color_selectors[16];
+  uint low16, high16;
+  dxt_fast::compress_color_block(16, pBlock, low16, high16, color_selectors, refine);
+
+  pDXT1_block->set_low_color(static_cast<uint16>(low16));
+  pDXT1_block->set_high_color(static_cast<uint16>(high16));
+
+  uint mask = 0;
+  for (int i = 15; i >= 0; i--) {
+    mask <<= 2;
+    mask |= color_selectors[i];
+  }
+
+  pDXT1_block->m_selectors[0] = (uint8)(mask & 0xFF);
+  pDXT1_block->m_selectors[1] = (uint8)((mask >> 8) & 0xFF);
+  pDXT1_block->m_selectors[2] = (uint8)((mask >> 16) & 0xFF);
+  pDXT1_block->m_selectors[3] = (uint8)((mask >> 24) & 0xFF);
+}
+
+void compress_alpha_block(uint n, const color_quad_u8* block, uint& low8, uint& high8, uint8* pSelectors, uint comp_index) {
+  int min, max;
+  min = max = block[0][comp_index];
+
+  for (uint i = 1; i < n; i++) {
+    min = math::minimum<int>(min, block[i][comp_index]);
+    max = math::maximum<int>(max, block[i][comp_index]);
+  }
+
+  low8 = max;
+  high8 = min;
+
+  int dist = max - min;
+  int bias = min * 7 - (dist >> 1);
+  int dist4 = dist * 4;
+  int dist2 = dist * 2;
+
+  for (uint i = 0; i < n; i++) {
+    int a = block[i][comp_index] * 7 - bias;
+    int ind, t;
+
+    t = (dist4 - a) >> 31;
+    ind = t & 4;
+    a -= dist4 & t;
+    t = (dist2 - a) >> 31;
+    ind += t & 2;
+    a -= dist2 & t;
+    t = (dist - a) >> 31;
+    ind += t & 1;
+
+    ind = -ind & 7;
+    ind ^= (2 > ind);
+
+    pSelectors[i] = static_cast<uint8>(ind);
+  }
+}
+
+void compress_alpha_block(dxt5_block* pDXT5_block, const color_quad_u8* pBlock, uint comp_index) {
+  uint8 selectors[16];
+  uint low8, high8;
+
+  compress_alpha_block(16, pBlock, low8, high8, selectors, comp_index);
+
+  pDXT5_block->set_low_alpha(low8);
+  pDXT5_block->set_high_alpha(high8);
+
+  uint mask = 0;
+  uint bits = 0;
+  uint8* pDst = pDXT5_block->m_selectors;
+
+  for (uint i = 0; i < 16; i++) {
+    mask |= (selectors[i] << bits);
+
+    if ((bits += 3) >= 8) {
+      *pDst++ = static_cast<uint8>(mask);
+      mask >>= 8;
+      bits -= 8;
+    }
+  }
+}
+
+void find_representative_colors(uint n, const color_quad_u8* pBlock, color_quad_u8& lo, color_quad_u8& hi) {
+  uint64 ave64[3];
+  ave64[0] = 0;
+  ave64[1] = 0;
+  ave64[2] = 0;
+
+  for (uint i = 0; i < n; i++) {
+    ave64[0] += pBlock[i].r;
+    ave64[1] += pBlock[i].g;
+    ave64[2] += pBlock[i].b;
+  }
+
+  uint ave[3];
+  ave[0] = static_cast<uint>((ave64[0] + (n / 2)) / n);
+  ave[1] = static_cast<uint>((ave64[1] + (n / 2)) / n);
+  ave[2] = static_cast<uint>((ave64[2] + (n / 2)) / n);
+
+  int furthest_dist = -1;
+  uint furthest_index = 0;
+  for (uint i = 0; i < n; i++) {
+    int r = pBlock[i].r - ave[0];
+    int g = pBlock[i].g - ave[1];
+    int b = pBlock[i].b - ave[2];
+    int dist = r * r + g * g + b * b;
+    if (dist > furthest_dist) {
+      furthest_dist = dist;
+      furthest_index = i;
+    }
+  }
+
+  color_quad_u8 lo_color(pBlock[furthest_index]);
+
+  int opp_dist = -1;
+  uint opp_index = 0;
+  for (uint i = 0; i < n; i++) {
+    int r = pBlock[i].r - lo_color.r;
+    int g = pBlock[i].g - lo_color.g;
+    int b = pBlock[i].b - lo_color.b;
+    int dist = r * r + g * g + b * b;
+    if (dist > opp_dist) {
+      opp_dist = dist;
+      opp_index = i;
+    }
+  }
+
+  color_quad_u8 hi_color(pBlock[opp_index]);
+
+  for (uint i = 0; i < 3; i++) {
+    lo_color[i] = static_cast<uint8>((lo_color[i] + ave[i]) >> 1);
+    hi_color[i] = static_cast<uint8>((hi_color[i] + ave[i]) >> 1);
+  }
+
+  const uint cMaxIters = 4;
+  for (uint iter_index = 0; iter_index < cMaxIters; iter_index++) {
+    if ((lo_color[0] == hi_color[0]) && (lo_color[1] == hi_color[1]) && (lo_color[2] == hi_color[2]))
+      break;
+
+    uint64 new_color[2][3];
+    uint weight[2];
+
+    utils::zero_object(new_color);
+    utils::zero_object(weight);
+
+    int vec_r = hi_color[0] - lo_color[0];
+    int vec_g = hi_color[1] - lo_color[1];
+    int vec_b = hi_color[2] - lo_color[2];
+
+    int lo_dot = vec_r * lo_color[0] + vec_g * lo_color[1] + vec_b * lo_color[2];
+    int hi_dot = vec_r * hi_color[0] + vec_g * hi_color[1] + vec_b * hi_color[2];
+    int mid_dot = lo_dot + hi_dot;
+
+    vec_r *= 2;
+    vec_g *= 2;
+    vec_b *= 2;
+
+    for (uint i = 0; i < n; i++) {
+      const color_quad_u8& c = pBlock[i];
+
+      const int dot = c[0] * vec_r + c[1] * vec_g + c[2] * vec_b;
+      const uint match_index = (dot > mid_dot);
+
+      new_color[match_index][0] += c.r;
+      new_color[match_index][1] += c.g;
+      new_color[match_index][2] += c.b;
+      weight[match_index]++;
+    }
+
+    if ((!weight[0]) || (!weight[1]))
+      break;
+
+    uint8 new_color8[2][3];
+
+    for (uint j = 0; j < 2; j++)
+      for (uint i = 0; i < 3; i++)
+        new_color8[j][i] = static_cast<uint8>((new_color[j][i] + (weight[j] / 2)) / weight[j]);
+
+    if ((new_color8[0][0] == lo_color[0]) && (new_color8[0][1] == lo_color[1]) && (new_color8[0][2] == lo_color[2]) &&
+        (new_color8[1][0] == hi_color[0]) && (new_color8[1][1] == hi_color[1]) && (new_color8[1][2] == hi_color[2]))
+      break;
+
+    for (uint i = 0; i < 3; i++) {
+      lo_color[i] = new_color8[0][i];
+      hi_color[i] = new_color8[1][i];
+    }
+  }
+
+  uint energy[2] = {0, 0};
+  for (uint i = 0; i < 3; i++) {
+    energy[0] += lo_color[i] * lo_color[i];
+    energy[1] += hi_color[i] * hi_color[i];
+  }
+
+  if (energy[0] > energy[1])
+    utils::swap(lo_color, hi_color);
+
+  lo = lo_color;
+  hi = hi_color;
+}
+
+}  // namespace dxt_fast
+
+}  // namespace crnlib
@@ -0,0 +1,19 @@
+// File: crn_dxt_fast.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_color.h"
+#include "crn_dxt.h"
+
+namespace crnlib {
+namespace dxt_fast {
+void compress_color_block(uint n, const color_quad_u8* block, uint& low16, uint& high16, uint8* pSelectors, bool refine = false);
+void compress_color_block(dxt1_block* pDXT1_block, const color_quad_u8* pBlock, bool refine = false);
+
+void compress_alpha_block(uint n, const color_quad_u8* block, uint& low8, uint& high8, uint8* pSelectors, uint comp_index);
+void compress_alpha_block(dxt5_block* pDXT5_block, const color_quad_u8* pBlock, uint comp_index);
+
+void find_representative_colors(uint n, const color_quad_u8* pBlock, color_quad_u8& lo, color_quad_u8& hi);
+
+}  // namespace dxt_fast
+
+}  // namespace crnlib
@@ -0,0 +1,213 @@
+// File: crn_dxt_hc.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dxt1.h"
+#include "crn_dxt5a.h"
+#include "crn_dxt_endpoint_refiner.h"
+#include "crn_image.h"
+#include "crn_dxt.h"
+#include "crn_image.h"
+#include "crn_dxt_hc_common.h"
+#include "crn_tree_clusterizer.h"
+#include "crn_threading.h"
+
+#define CRN_NO_FUNCTION_DEFINITIONS
+#include "../inc/crnlib.h"
+
+namespace crnlib {
+const uint cTotalCompressionPhases = 25;
+
+class dxt_hc {
+ public:
+  dxt_hc();
+  ~dxt_hc();
+
+  struct endpoint_indices_details {
+    union {
+      struct {
+        uint16 color;
+        uint16 alpha0;
+        uint16 alpha1;
+      };
+      uint16 component[3];
+    };
+    uint8 reference;
+    endpoint_indices_details() { utils::zero_object(*this); }
+  };
+
+  struct selector_indices_details {
+    union {
+      struct {
+        uint16 color;
+        uint16 alpha0;
+        uint16 alpha1;
+      };
+      uint16 component[3];
+    };
+    selector_indices_details() { utils::zero_object(*this); }
+  };
+
+  struct tile_details {
+    crnlib::vector<color_quad_u8> pixels;
+    float weight;
+    vec<6, float> color_endpoint;
+    vec<2, float> alpha_endpoints[2];
+    uint16 cluster_indices[3];
+  };
+  crnlib::vector<tile_details> m_tiles;
+  uint m_num_tiles;
+  float m_color_derating[cCRNMaxLevels][8];
+  float m_alpha_derating[8];
+  float m_uint8_to_float[256];
+
+  color_quad_u8 (*m_blocks)[16];
+  uint m_num_blocks;
+  crnlib::vector<float> m_block_weights;
+  crnlib::vector<uint8> m_block_encodings;
+  crnlib::vector<uint64> m_block_selectors[3];
+  crnlib::vector<uint32> m_color_selectors;
+  crnlib::vector<uint64> m_alpha_selectors;
+  crnlib::vector<bool> m_color_selectors_used;
+  crnlib::vector<bool> m_alpha_selectors_used;
+  crnlib::vector<uint> m_tile_indices;
+  crnlib::vector<endpoint_indices_details> m_endpoint_indices;
+  crnlib::vector<selector_indices_details> m_selector_indices;
+
+  struct params {
+    params()
+        : m_num_blocks(0),
+          m_num_levels(0),
+          m_num_faces(0),
+          m_format(cDXT1),
+          m_perceptual(true),
+          m_hierarchical(true),
+          m_color_endpoint_codebook_size(3072),
+          m_color_selector_codebook_size(3072),
+          m_alpha_endpoint_codebook_size(3072),
+          m_alpha_selector_codebook_size(3072),
+          m_adaptive_tile_color_psnr_derating(2.0f),
+          m_adaptive_tile_alpha_psnr_derating(2.0f),
+          m_adaptive_tile_color_alpha_weighting_ratio(3.0f),
+          m_debugging(false),
+          m_pProgress_func(0),
+          m_pProgress_func_data(0) {
+      m_alpha_component_indices[0] = 3;
+      m_alpha_component_indices[1] = 0;
+      for (uint i = 0; i < cCRNMaxLevels; i++) {
+        m_levels[i].m_first_block = 0;
+        m_levels[i].m_num_blocks = 0;
+        m_levels[i].m_block_width = 0;
+      }
+    }
+
+    uint m_num_blocks;
+    uint m_num_levels;
+    uint m_num_faces;
+
+    struct {
+      uint m_first_block;
+      uint m_num_blocks;
+      uint m_block_width;
+      float m_weight;
+    } m_levels[cCRNMaxLevels];
+
+    dxt_format m_format;
+    bool m_perceptual;
+    bool m_hierarchical;
+
+    uint m_color_endpoint_codebook_size;
+    uint m_color_selector_codebook_size;
+    uint m_alpha_endpoint_codebook_size;
+    uint m_alpha_selector_codebook_size;
+
+    float m_adaptive_tile_color_psnr_derating;
+    float m_adaptive_tile_alpha_psnr_derating;
+    float m_adaptive_tile_color_alpha_weighting_ratio;
+    uint m_alpha_component_indices[2];
+
+    task_pool* m_pTask_pool;
+    bool m_debugging;
+    crn_progress_callback_func m_pProgress_func;
+    void* m_pProgress_func_data;
+  };
+
+  void clear();
+  bool compress(
+    color_quad_u8 (*blocks)[16],
+    crnlib::vector<endpoint_indices_details>& endpoint_indices,
+    crnlib::vector<selector_indices_details>& selector_indices,
+    crnlib::vector<uint32>& color_endpoints,
+    crnlib::vector<uint32>& alpha_endpoints,
+    crnlib::vector<uint32>& color_selectors,
+    crnlib::vector<uint64>& alpha_selectors,
+    const params& p
+  );
+
+ private:
+  params m_params;
+
+  uint m_num_alpha_blocks;
+  bool m_has_color_blocks;
+  bool m_has_etc_color_blocks;
+  bool m_has_subblocks;
+
+  enum {
+    cColor = 0,
+    cAlpha0 = 1,
+    cAlpha1 = 2,
+    cNumComps = 3
+  };
+
+  struct color_cluster {
+    color_cluster() : first_endpoint(0), second_endpoint(0) {}
+    crnlib::vector<uint> blocks[3];
+    crnlib::vector<color_quad_u8> pixels;
+    uint first_endpoint;
+    uint second_endpoint;
+    color_quad_u8 color_values[4];
+  };
+  crnlib::vector<color_cluster> m_color_clusters;
+
+  struct alpha_cluster {
+    alpha_cluster() : first_endpoint(0), second_endpoint(0) {}
+    crnlib::vector<uint> blocks[3];
+    crnlib::vector<color_quad_u8> pixels;
+    uint first_endpoint;
+    uint second_endpoint;
+    uint alpha_values[8];
+    bool refined_alpha;
+    uint refined_alpha_values[8];
+  };
+  crnlib::vector<alpha_cluster> m_alpha_clusters;
+
+  crn_thread_id_t m_main_thread_id;
+  bool m_canceled;
+  task_pool* m_pTask_pool;
+
+  int m_prev_phase_index;
+  int m_prev_percentage_complete;
+
+  vec<6, float> palettize_color(color_quad_u8* pixels, uint pixels_count);
+  vec<2, float> palettize_alpha(color_quad_u8* pixels, uint pixels_count, uint comp_index);
+  void determine_tiles_task(uint64 data, void* pData_ptr);
+  void determine_tiles_task_etc(uint64 data, void* pData_ptr);
+
+  void determine_color_endpoint_codebook_task(uint64 data, void* pData_ptr);
+  void determine_color_endpoint_codebook_task_etc(uint64 data, void* pData_ptr);
+  void determine_color_endpoint_clusters_task(uint64 data, void* pData_ptr);
+  void determine_color_endpoints();
+
+  void determine_alpha_endpoint_codebook_task(uint64 data, void* pData_ptr);
+  void determine_alpha_endpoint_clusters_task(uint64 data, void* pData_ptr);
+  void determine_alpha_endpoints();
+
+  void create_color_selector_codebook_task(uint64 data, void* pData_ptr);
+  void create_color_selector_codebook();
+
+  void create_alpha_selector_codebook_task(uint64 data, void* pData_ptr);
+  void create_alpha_selector_codebook();
+
+  bool update_progress(uint phase_index, uint subphase_index, uint subphase_total);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,41 @@
+// File: crn_dxt_hc_common.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_dxt_hc_common.h"
+
+namespace crnlib {
+chunk_encoding_desc g_chunk_encodings[cNumChunkEncodings] =
+    {
+        {1, {{0, 0, 8, 8, 0}}},
+
+        {2, {{0, 0, 8, 4, 1}, {0, 4, 8, 4, 2}}},
+        {2, {{0, 0, 4, 8, 3}, {4, 0, 4, 8, 4}}},
+
+        {3, {{0, 0, 8, 4, 1}, {0, 4, 4, 4, 7}, {4, 4, 4, 4, 8}}},
+        {3, {{0, 4, 8, 4, 2}, {0, 0, 4, 4, 5}, {4, 0, 4, 4, 6}}},
+
+        {3, {{0, 0, 4, 8, 3}, {4, 0, 4, 4, 6}, {4, 4, 4, 4, 8}}},
+        {3, {{4, 0, 4, 8, 4}, {0, 0, 4, 4, 5}, {0, 4, 4, 4, 7}}},
+
+        {4, {{0, 0, 4, 4, 5}, {4, 0, 4, 4, 6}, {0, 4, 4, 4, 7}, {4, 4, 4, 4, 8}}}};
+
+chunk_tile_desc g_chunk_tile_layouts[cNumChunkTileLayouts] =
+    {
+        // 2x2
+        {0, 0, 8, 8, 0},
+
+        // 2x1
+        {0, 0, 8, 4, 1},
+        {0, 4, 8, 4, 2},
+
+        // 1x2
+        {0, 0, 4, 8, 3},
+        {4, 0, 4, 8, 4},
+
+        // 1x1
+        {0, 0, 4, 4, 5},
+        {4, 0, 4, 4, 6},
+        {0, 4, 4, 4, 7},
+        {4, 4, 4, 4, 8}};
+
+}  // namespace crnlib
@@ -0,0 +1,40 @@
+// File: crn_dxt_hc_common.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+struct chunk_tile_desc {
+  // These values are in pixels, and always a multiple of cBlockPixelWidth/cBlockPixelHeight.
+  uint m_x_ofs;
+  uint m_y_ofs;
+  uint m_width;
+  uint m_height;
+  uint m_layout_index;
+};
+
+struct chunk_encoding_desc {
+  uint m_num_tiles;
+  chunk_tile_desc m_tiles[4];
+};
+
+const uint cChunkPixelWidth = 8;
+const uint cChunkPixelHeight = 8;
+const uint cChunkBlockWidth = 2;
+const uint cChunkBlockHeight = 2;
+
+const uint cChunkMaxTiles = 4;
+
+const uint cBlockPixelWidthShift = 2;
+const uint cBlockPixelHeightShift = 2;
+
+const uint cBlockPixelWidth = 4;
+const uint cBlockPixelHeight = 4;
+
+const uint cNumChunkEncodings = 8;
+extern chunk_encoding_desc g_chunk_encodings[cNumChunkEncodings];
+
+const uint cNumChunkTileLayouts = 9;
+const uint cFirst4x4ChunkTileLayout = 5;
+extern chunk_tile_desc g_chunk_tile_layouts[cNumChunkTileLayouts];
+
+}  // namespace crnlib
@@ -0,0 +1,256 @@
+// File: crn_dxt_image.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dxt1.h"
+#include "crn_dxt5a.h"
+#include "crn_etc.h"
+#if CRNLIB_SUPPORT_ETC_A1
+#include "crn_etc_a1.h"
+#endif
+#include "crn_image.h"
+
+#define CRNLIB_SUPPORT_ATI_COMPRESS 0
+
+namespace crnlib {
+class task_pool;
+
+class dxt_image {
+ public:
+  dxt_image();
+  dxt_image(const dxt_image& other);
+  dxt_image& operator=(const dxt_image& rhs);
+
+  void clear();
+
+  inline bool is_valid() const { return m_blocks_x > 0; }
+
+  uint get_width() const { return m_width; }
+  uint get_height() const { return m_height; }
+
+  uint get_blocks_x() const { return m_blocks_x; }
+  uint get_blocks_y() const { return m_blocks_y; }
+  uint get_total_blocks() const { return m_blocks_x * m_blocks_y; }
+
+  uint get_elements_per_block() const { return m_num_elements_per_block; }
+  uint get_bytes_per_block() const { return m_bytes_per_block; }
+
+  dxt_format get_format() const { return m_format; }
+
+  bool has_color() const { return (m_format == cDXT1) || (m_format == cDXT1A) || (m_format == cDXT3) || (m_format == cDXT5) || (m_format == cETC1) || (m_format == cETC2) || (m_format == cETC2A) || (m_format == cETC1S) || (m_format == cETC2AS); }
+
+  // Will be pretty slow if the image is DXT1, as this method scans for alpha blocks/selectors.
+  bool has_alpha() const;
+
+  enum element_type {
+    cUnused = 0,
+
+    cColorDXT1,  // DXT1 color block
+
+    cAlphaDXT3,  // DXT3 alpha block (only)
+    cAlphaDXT5,  // DXT5 alpha block (only)
+
+    cColorETC1,  // ETC1 color block
+    cColorETC2,  // ETC2 color block
+
+    cAlphaETC2,  // ETC2 alpha block (only)
+  };
+
+  element_type get_element_type(uint element_index) const {
+    CRNLIB_ASSERT(element_index < m_num_elements_per_block);
+    return m_element_type[element_index];
+  }
+
+  //Returns -1 for RGB, or [0,3]
+  int8 get_element_component_index(uint element_index) const {
+    CRNLIB_ASSERT(element_index < m_num_elements_per_block);
+    return m_element_component_index[element_index];
+  }
+
+  struct element {
+    uint8 m_bytes[8];
+
+    uint get_le_word(uint index) const {
+      CRNLIB_ASSERT(index < 4);
+      return m_bytes[index * 2] | (m_bytes[index * 2 + 1] << 8);
+    }
+    uint get_be_word(uint index) const {
+      CRNLIB_ASSERT(index < 4);
+      return m_bytes[index * 2 + 1] | (m_bytes[index * 2] << 8);
+    }
+
+    void set_le_word(uint index, uint val) {
+      CRNLIB_ASSERT((index < 4) && (val <= cUINT16_MAX));
+      m_bytes[index * 2] = static_cast<uint8>(val & 0xFF);
+      m_bytes[index * 2 + 1] = static_cast<uint8>((val >> 8) & 0xFF);
+    }
+    void set_be_word(uint index, uint val) {
+      CRNLIB_ASSERT((index < 4) && (val <= cUINT16_MAX));
+      m_bytes[index * 2 + 1] = static_cast<uint8>(val & 0xFF);
+      m_bytes[index * 2] = static_cast<uint8>((val >> 8) & 0xFF);
+    }
+
+    void clear() {
+      memset(this, 0, sizeof(*this));
+    }
+  };
+
+  typedef crnlib::vector<element> element_vec;
+
+  bool init(dxt_format fmt, uint width, uint height, bool clear_elements);
+  bool init(dxt_format fmt, uint width, uint height, uint num_elements, element* pElements, bool create_copy);
+
+  struct pack_params {
+    pack_params() {
+      clear();
+    }
+
+    void clear() {
+      m_quality = cCRNDXTQualityUber;
+      m_perceptual = true;
+      m_dithering = false;
+      m_grayscale_sampling = false;
+      m_use_both_block_types = true;
+      m_endpoint_caching = true;
+      m_compressor = cCRNDXTCompressorCRN;
+      m_pProgress_callback = NULL;
+      m_pProgress_callback_user_data_ptr = NULL;
+      m_dxt1a_alpha_threshold = 128;
+      m_num_helper_threads = 0;
+      m_progress_start = 0;
+      m_progress_range = 100;
+      m_use_transparent_indices_for_black = false;
+      m_pTask_pool = NULL;
+    }
+
+    void init(const crn_comp_params& params) {
+      m_perceptual = (params.m_flags & cCRNCompFlagPerceptual) != 0;
+      m_num_helper_threads = params.m_num_helper_threads;
+      m_use_both_block_types = (params.m_flags & cCRNCompFlagUseBothBlockTypes) != 0;
+      m_use_transparent_indices_for_black = (params.m_flags & cCRNCompFlagUseTransparentIndicesForBlack) != 0;
+      m_dxt1a_alpha_threshold = params.m_dxt1a_alpha_threshold;
+      m_quality = params.m_dxt_quality;
+      m_endpoint_caching = (params.m_flags & cCRNCompFlagDisableEndpointCaching) == 0;
+      m_grayscale_sampling = (params.m_flags & cCRNCompFlagGrayscaleSampling) != 0;
+      m_compressor = params.m_dxt_compressor_type;
+    }
+
+    uint m_dxt1a_alpha_threshold;
+
+    uint m_num_helper_threads;
+
+    crn_dxt_quality m_quality;
+
+    crn_dxt_compressor_type m_compressor;
+
+    bool m_perceptual;
+    bool m_dithering;
+    bool m_grayscale_sampling;
+    bool m_use_both_block_types;
+    bool m_endpoint_caching;
+    bool m_use_transparent_indices_for_black;
+
+    typedef bool (*progress_callback_func)(uint percentage_complete, void* pUser_data_ptr);
+    progress_callback_func m_pProgress_callback;
+    void* m_pProgress_callback_user_data_ptr;
+
+    uint m_progress_start;
+    uint m_progress_range;
+
+    task_pool* m_pTask_pool;
+  };
+
+  bool init(dxt_format fmt, const image_u8& img, const pack_params& p = dxt_image::pack_params());
+
+  bool unpack(image_u8& img) const;
+
+  void endian_swap();
+
+  uint get_total_elements() const { return m_elements.size(); }
+
+  const element_vec& get_element_vec() const { return m_elements; }
+  element_vec& get_element_vec() { return m_elements; }
+
+  const element& get_element(uint block_x, uint block_y, uint element_index) const;
+  element& get_element(uint block_x, uint block_y, uint element_index);
+
+  const element* get_element_ptr() const { return m_pElements; }
+  element* get_element_ptr() { return m_pElements; }
+
+  uint get_size_in_bytes() const { return m_elements.size() * sizeof(element); }
+  uint get_row_pitch_in_bytes() const { return m_blocks_x * m_bytes_per_block; }
+
+  color_quad_u8 get_pixel(uint x, uint y) const;
+  uint get_pixel_alpha(uint x, uint y, uint element_index) const;
+
+  void set_pixel(uint x, uint y, const color_quad_u8& c, bool perceptual = true);
+
+  // get_block_pixels() only sets those components stored in the image!
+  bool get_block_pixels(uint block_x, uint block_y, color_quad_u8* pPixels) const;
+
+  struct set_block_pixels_context {
+    dxt1_endpoint_optimizer m_dxt1_optimizer;
+    dxt5_endpoint_optimizer m_dxt5_optimizer;
+    pack_etc1_block_context m_etc1_optimizer;
+#if CRNLIB_SUPPORT_ETC_A1
+    etc_a1::pack_etc1_block_context m_etc1_a1_optimizer;
+#endif
+  };
+
+  void set_block_pixels(uint block_x, uint block_y, const color_quad_u8* pPixels, const pack_params& p, set_block_pixels_context& context);
+  void set_block_pixels(uint block_x, uint block_y, const color_quad_u8* pPixels, const pack_params& p);
+
+  void get_block_endpoints(uint block_x, uint block_y, uint element_index, uint& packed_low_endpoint, uint& packed_high_endpoint) const;
+
+  // Returns a value representing the component(s) that where actually set, where -1 = RGB.
+  // This method does not always set every component!
+  int get_block_endpoints(uint block_x, uint block_y, uint element_index, color_quad_u8& low_endpoint, color_quad_u8& high_endpoint, bool scaled = true) const;
+
+  // pColors should point to a 16 entry array, to handle DXT3.
+  // Returns the number of block colors: 3, 4, 6, 8, or 16.
+  uint get_block_colors(uint block_x, uint block_y, uint element_index, color_quad_u8* pColors, uint subblock_index = 0);
+
+  uint get_subblock_index(uint x, uint y, uint element_index) const;
+  uint get_total_subblocks(uint element_index) const;
+
+  uint get_selector(uint x, uint y, uint element_index) const;
+
+  void change_dxt1_to_dxt1a();
+
+  bool can_flip(uint axis_index);
+
+  // Returns true if the texture can actually be flipped.
+  bool flip_x();
+  bool flip_y();
+
+ private:
+  element_vec m_elements;
+  element* m_pElements;
+
+  uint m_width;
+  uint m_height;
+
+  uint m_blocks_x;
+  uint m_blocks_y;
+  uint m_total_blocks;
+  uint m_total_elements;
+
+  uint m_num_elements_per_block;  // 1 or 2
+  uint m_bytes_per_block;         // 8 or 16
+
+  int8 m_element_component_index[2];
+  element_type m_element_type[2];
+
+  dxt_format m_format;  // DXT1, 1A, 3, 5, N/3DC, or 5A
+
+  bool init_internal(dxt_format fmt, uint width, uint height);
+  void init_task(uint64 data, void* pData_ptr);
+
+#if CRNLIB_SUPPORT_ATI_COMPRESS
+  bool init_ati_compress(dxt_format fmt, const image_u8& img, const pack_params& p);
+#endif
+
+  void flip_col(uint x);
+  void flip_row(uint y);
+};
+
+}  // namespace crnlib
@@ -0,0 +1,182 @@
+// File: crn_dynamic_stream.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_data_stream.h"
+
+namespace crnlib {
+class dynamic_stream : public data_stream {
+ public:
+  dynamic_stream(uint initial_size, const char* pName = "dynamic_stream", uint attribs = cDataStreamSeekable | cDataStreamWritable | cDataStreamReadable)
+      : data_stream(pName, attribs),
+        m_ofs(0) {
+    open(initial_size, pName, attribs);
+  }
+
+  dynamic_stream(const void* pBuf, uint size, const char* pName = "dynamic_stream", uint attribs = cDataStreamSeekable | cDataStreamWritable | cDataStreamReadable)
+      : data_stream(pName, attribs),
+        m_ofs(0) {
+    open(pBuf, size, pName, attribs);
+  }
+
+  dynamic_stream()
+      : data_stream(),
+        m_ofs(0) {
+    open();
+  }
+
+  virtual ~dynamic_stream() {
+  }
+
+  bool open(uint initial_size = 0, const char* pName = "dynamic_stream", uint attribs = cDataStreamSeekable | cDataStreamWritable | cDataStreamReadable) {
+    close();
+
+    m_opened = true;
+    m_buf.clear();
+    m_buf.resize(initial_size);
+    m_ofs = 0;
+    m_name.set(pName ? pName : "dynamic_stream");
+    m_attribs = static_cast<attribs_t>(attribs);
+    return true;
+  }
+
+  bool reopen(const char* pName, uint attribs) {
+    if (!m_opened) {
+      return open(0, pName, attribs);
+    }
+
+    m_name.set(pName ? pName : "dynamic_stream");
+    m_attribs = static_cast<attribs_t>(attribs);
+    return true;
+  }
+
+  bool open(const void* pBuf, uint size, const char* pName = "dynamic_stream", uint attribs = cDataStreamSeekable | cDataStreamWritable | cDataStreamReadable) {
+    if (!m_opened) {
+      m_opened = true;
+      m_buf.resize(size);
+      if (size) {
+        CRNLIB_ASSERT(pBuf);
+        memcpy(&m_buf[0], pBuf, size);
+      }
+      m_ofs = 0;
+      m_name.set(pName ? pName : "dynamic_stream");
+      m_attribs = static_cast<attribs_t>(attribs);
+      return true;
+    }
+
+    return false;
+  }
+
+  virtual bool close() {
+    if (m_opened) {
+      m_opened = false;
+      m_buf.clear();
+      m_ofs = 0;
+      return true;
+    }
+
+    return false;
+  }
+
+  const crnlib::vector<uint8>& get_buf() const { return m_buf; }
+  crnlib::vector<uint8>& get_buf() { return m_buf; }
+
+  void reserve(uint size) {
+    if (m_opened) {
+      m_buf.reserve(size);
+    }
+  }
+
+  virtual const void* get_ptr() const { return m_buf.empty() ? NULL : &m_buf[0]; }
+
+  virtual uint read(void* pBuf, uint len) {
+    CRNLIB_ASSERT(pBuf && (len <= 0x7FFFFFFF));
+
+    if ((!m_opened) || (!is_readable()) || (!len))
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_buf.size());
+
+    uint bytes_left = m_buf.size() - m_ofs;
+
+    len = math::minimum<uint>(len, bytes_left);
+
+    if (len)
+      memcpy(pBuf, &m_buf[m_ofs], len);
+
+    m_ofs += len;
+
+    return len;
+  }
+
+  virtual uint write(const void* pBuf, uint len) {
+    CRNLIB_ASSERT(pBuf && (len <= 0x7FFFFFFF));
+
+    if ((!m_opened) || (!is_writable()) || (!len))
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_buf.size());
+
+    uint new_ofs = m_ofs + len;
+    if (new_ofs > m_buf.size())
+      m_buf.resize(new_ofs);
+
+    memcpy(&m_buf[m_ofs], pBuf, len);
+    m_ofs = new_ofs;
+
+    return len;
+  }
+
+  virtual bool flush() {
+    if (!m_opened)
+      return false;
+
+    return true;
+  }
+
+  virtual uint64 get_size() {
+    if (!m_opened)
+      return 0;
+
+    return m_buf.size();
+  }
+
+  virtual uint64 get_remaining() {
+    if (!m_opened)
+      return 0;
+
+    CRNLIB_ASSERT(m_ofs <= m_buf.size());
+
+    return m_buf.size() - m_ofs;
+  }
+
+  virtual uint64 get_ofs() {
+    if (!m_opened)
+      return 0;
+
+    return m_ofs;
+  }
+
+  virtual bool seek(int64 ofs, bool relative) {
+    if ((!m_opened) || (!is_seekable()))
+      return false;
+
+    int64 new_ofs = relative ? (m_ofs + ofs) : ofs;
+
+    if (new_ofs < 0)
+      return false;
+    else if (new_ofs > m_buf.size())
+      return false;
+
+    m_ofs = static_cast<uint>(new_ofs);
+
+    post_seek();
+
+    return true;
+  }
+
+ private:
+  crnlib::vector<uint8> m_buf;
+  uint m_ofs;
+};
+
+}  // namespace crnlib
@@ -0,0 +1,583 @@
+// File: crn_dynamic_string.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_strutils.h"
+
+namespace crnlib {
+dynamic_string g_empty_dynamic_string;
+
+dynamic_string::dynamic_string(eVarArg, const char* p, ...)
+    : m_buf_size(0), m_len(0), m_pStr(NULL) {
+  CRNLIB_ASSERT(p);
+
+  va_list args;
+  va_start(args, p);
+  format_args(p, args);
+  va_end(args);
+}
+
+dynamic_string::dynamic_string(const char* p)
+    : m_buf_size(0), m_len(0), m_pStr(NULL) {
+  CRNLIB_ASSERT(p);
+  set(p);
+}
+
+dynamic_string::dynamic_string(const char* p, uint len)
+    : m_buf_size(0), m_len(0), m_pStr(NULL) {
+  CRNLIB_ASSERT(p);
+  set_from_buf(p, len);
+}
+
+dynamic_string::dynamic_string(const dynamic_string& other)
+    : m_buf_size(0), m_len(0), m_pStr(NULL) {
+  set(other);
+}
+
+void dynamic_string::clear() {
+  check();
+
+  if (m_pStr) {
+    crnlib_delete_array(m_pStr);
+    m_pStr = NULL;
+
+    m_len = 0;
+    m_buf_size = 0;
+  }
+}
+
+void dynamic_string::empty() {
+  truncate(0);
+}
+
+void dynamic_string::optimize() {
+  if (!m_len)
+    clear();
+  else {
+    uint min_buf_size = math::next_pow2((uint)m_len + 1);
+    if (m_buf_size > min_buf_size) {
+      char* p = crnlib_new_array<char>(min_buf_size);
+      memcpy(p, m_pStr, m_len + 1);
+
+      crnlib_delete_array(m_pStr);
+      m_pStr = p;
+
+      m_buf_size = static_cast<uint16>(min_buf_size);
+
+      check();
+    }
+  }
+}
+
+int dynamic_string::compare(const char* p, bool case_sensitive) const {
+  CRNLIB_ASSERT(p);
+
+  const int result = (case_sensitive ? strcmp : crn_stricmp)(get_ptr_priv(), p);
+
+  if (result < 0)
+    return -1;
+  else if (result > 0)
+    return 1;
+
+  return 0;
+}
+
+int dynamic_string::compare(const dynamic_string& rhs, bool case_sensitive) const {
+  return compare(rhs.get_ptr_priv(), case_sensitive);
+}
+
+dynamic_string& dynamic_string::set(const char* p, uint max_len) {
+  CRNLIB_ASSERT(p);
+
+  const uint len = math::minimum<uint>(max_len, static_cast<uint>(strlen(p)));
+  CRNLIB_ASSERT(len < cUINT16_MAX);
+
+  if ((!len) || (len >= cUINT16_MAX))
+    clear();
+  else if ((m_pStr) && (p >= m_pStr) && (p < (m_pStr + m_buf_size))) {
+    if (m_pStr != p)
+      memmove(m_pStr, p, len);
+    m_pStr[len] = '\0';
+    m_len = static_cast<uint16>(len);
+  } else if (ensure_buf(len, false)) {
+    m_len = static_cast<uint16>(len);
+    memcpy(m_pStr, p, m_len + 1);
+  }
+
+  check();
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::set(const dynamic_string& other, uint max_len) {
+  if (this == &other) {
+    if (max_len < m_len) {
+      m_pStr[max_len] = '\0';
+      m_len = static_cast<uint16>(max_len);
+    }
+  } else {
+    const uint len = math::minimum<uint>(max_len, other.m_len);
+
+    if (!len)
+      clear();
+    else if (ensure_buf(len, false)) {
+      m_len = static_cast<uint16>(len);
+      memcpy(m_pStr, other.get_ptr_priv(), m_len);
+      m_pStr[len] = '\0';
+    }
+  }
+
+  check();
+
+  return *this;
+}
+
+bool dynamic_string::set_len(uint new_len, char fill_char) {
+  if ((new_len >= cUINT16_MAX) || (!fill_char)) {
+    CRNLIB_ASSERT(0);
+    return false;
+  }
+
+  uint cur_len = m_len;
+
+  if (ensure_buf(new_len, true)) {
+    if (new_len > cur_len)
+      memset(m_pStr + cur_len, fill_char, new_len - cur_len);
+
+    m_pStr[new_len] = 0;
+
+    m_len = static_cast<uint16>(new_len);
+
+    check();
+  }
+
+  return true;
+}
+
+dynamic_string& dynamic_string::set_from_raw_buf_and_assume_ownership(char* pBuf, uint buf_size_in_chars, uint len_in_chars) {
+  CRNLIB_ASSERT(buf_size_in_chars <= cUINT16_MAX);
+  CRNLIB_ASSERT(math::is_power_of_2(buf_size_in_chars) || (buf_size_in_chars == cUINT16_MAX));
+  CRNLIB_ASSERT((len_in_chars + 1) <= buf_size_in_chars);
+
+  clear();
+
+  m_pStr = pBuf;
+  m_buf_size = static_cast<uint16>(buf_size_in_chars);
+  m_len = static_cast<uint16>(len_in_chars);
+
+  check();
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::set_from_buf(const void* pBuf, uint buf_size) {
+  CRNLIB_ASSERT(pBuf);
+
+  if (buf_size >= cUINT16_MAX) {
+    clear();
+    return *this;
+  }
+
+#ifdef CRNLIB_BUILD_DEBUG
+  if ((buf_size) && (memchr(pBuf, 0, buf_size) != NULL)) {
+    CRNLIB_ASSERT(0);
+    clear();
+    return *this;
+  }
+#endif
+
+  if (ensure_buf(buf_size, false)) {
+    if (buf_size)
+      memcpy(m_pStr, pBuf, buf_size);
+
+    m_pStr[buf_size] = 0;
+
+    m_len = static_cast<uint16>(buf_size);
+
+    check();
+  }
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::set_char(uint index, char c) {
+  CRNLIB_ASSERT(index <= m_len);
+
+  if (!c)
+    truncate(index);
+  else if (index < m_len) {
+    m_pStr[index] = c;
+
+    check();
+  } else if (index == m_len)
+    append_char(c);
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::append_char(char c) {
+  if (ensure_buf(m_len + 1)) {
+    m_pStr[m_len] = c;
+    m_pStr[m_len + 1] = '\0';
+    m_len++;
+    check();
+  }
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::truncate(uint new_len) {
+  if (new_len < m_len) {
+    m_pStr[new_len] = '\0';
+    m_len = static_cast<uint16>(new_len);
+    check();
+  }
+  return *this;
+}
+
+dynamic_string& dynamic_string::tolower() {
+  if (m_len) {
+#ifdef _MSC_VER
+    _strlwr_s(get_ptr_priv(), m_buf_size);
+#else
+    strlwr(get_ptr_priv());
+#endif
+  }
+  return *this;
+}
+
+dynamic_string& dynamic_string::toupper() {
+  if (m_len) {
+#ifdef _MSC_VER
+    _strupr_s(get_ptr_priv(), m_buf_size);
+#else
+    strupr(get_ptr_priv());
+#endif
+  }
+  return *this;
+}
+
+dynamic_string& dynamic_string::append(const char* p) {
+  CRNLIB_ASSERT(p);
+
+  uint len = static_cast<uint>(strlen(p));
+  uint new_total_len = m_len + len;
+  if ((new_total_len) && ensure_buf(new_total_len)) {
+    memcpy(m_pStr + m_len, p, len + 1);
+    m_len = static_cast<uint16>(m_len + len);
+    check();
+  }
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::append(const dynamic_string& other) {
+  uint len = other.m_len;
+  uint new_total_len = m_len + len;
+  if ((new_total_len) && ensure_buf(new_total_len)) {
+    memcpy(m_pStr + m_len, other.get_ptr_priv(), len + 1);
+    m_len = static_cast<uint16>(m_len + len);
+    check();
+  }
+
+  return *this;
+}
+
+dynamic_string operator+(const char* p, const dynamic_string& a) {
+  return dynamic_string(p).append(a);
+}
+
+dynamic_string operator+(const dynamic_string& a, const char* p) {
+  return dynamic_string(a).append(p);
+}
+
+dynamic_string operator+(const dynamic_string& a, const dynamic_string& b) {
+  return dynamic_string(a).append(b);
+}
+
+dynamic_string& dynamic_string::format_args(const char* p, va_list args) {
+  CRNLIB_ASSERT(p);
+
+  const uint cBufSize = 4096;
+  char buf[cBufSize];
+
+#ifdef _MSC_VER
+  int l = vsnprintf_s(buf, cBufSize, _TRUNCATE, p, args);
+#else
+  int l = vsnprintf(buf, cBufSize, p, args);
+#endif
+  if (l <= 0)
+    clear();
+  else if (ensure_buf(l, false)) {
+    memcpy(m_pStr, buf, l + 1);
+
+    m_len = static_cast<uint16>(l);
+
+    check();
+  }
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::format(const char* p, ...) {
+  CRNLIB_ASSERT(p);
+
+  va_list args;
+  va_start(args, p);
+  format_args(p, args);
+  va_end(args);
+  return *this;
+}
+
+dynamic_string& dynamic_string::crop(uint start, uint len) {
+  if (start >= m_len) {
+    clear();
+    return *this;
+  }
+
+  len = math::minimum<uint>(len, m_len - start);
+
+  if (start)
+    memmove(get_ptr_priv(), get_ptr_priv() + start, len);
+
+  m_pStr[len] = '\0';
+
+  m_len = static_cast<uint16>(len);
+
+  check();
+
+  return *this;
+}
+
+dynamic_string& dynamic_string::substring(uint start, uint end) {
+  CRNLIB_ASSERT(start <= end);
+  if (start > end)
+    return *this;
+  return crop(start, end - start);
+}
+
+dynamic_string& dynamic_string::left(uint len) {
+  return substring(0, len);
+}
+
+dynamic_string& dynamic_string::mid(uint start, uint len) {
+  return crop(start, len);
+}
+
+dynamic_string& dynamic_string::right(uint start) {
+  return substring(start, get_len());
+}
+
+dynamic_string& dynamic_string::tail(uint num) {
+  return substring(math::maximum<int>(static_cast<int>(get_len()) - static_cast<int>(num), 0), get_len());
+}
+
+dynamic_string& dynamic_string::unquote() {
+  if (m_len >= 2) {
+    if (((*this)[0] == '\"') && ((*this)[m_len - 1] == '\"')) {
+      return mid(1, m_len - 2);
+    }
+  }
+
+  return *this;
+}
+
+int dynamic_string::find_left(const char* p, bool case_sensitive) const {
+  CRNLIB_ASSERT(p);
+
+  const int p_len = (int)strlen(p);
+
+  for (int i = 0; i <= (m_len - p_len); i++)
+    if ((case_sensitive ? strncmp : _strnicmp)(p, &m_pStr[i], p_len) == 0)
+      return i;
+
+  return -1;
+}
+
+bool dynamic_string::contains(const char* p, bool case_sensitive) const {
+  return find_left(p, case_sensitive) >= 0;
+}
+
+uint dynamic_string::count_char(char c) const {
+  uint count = 0;
+  for (uint i = 0; i < m_len; i++)
+    if (m_pStr[i] == c)
+      count++;
+  return count;
+}
+
+int dynamic_string::find_left(char c) const {
+  for (uint i = 0; i < m_len; i++)
+    if (m_pStr[i] == c)
+      return i;
+  return -1;
+}
+
+int dynamic_string::find_right(char c) const {
+  for (int i = (int)m_len - 1; i >= 0; i--)
+    if (m_pStr[i] == c)
+      return i;
+  return -1;
+}
+
+int dynamic_string::find_right(const char* p, bool case_sensitive) const {
+  CRNLIB_ASSERT(p);
+  const int p_len = (int)strlen(p);
+
+  for (int i = m_len - p_len; i >= 0; i--)
+    if ((case_sensitive ? strncmp : _strnicmp)(p, &m_pStr[i], p_len) == 0)
+      return i;
+
+  return -1;
+}
+
+dynamic_string& dynamic_string::trim() {
+  int s, e;
+  for (s = 0; s < (int)m_len; s++)
+    if (!isspace(m_pStr[s]))
+      break;
+
+  for (e = m_len - 1; e > s; e--)
+    if (!isspace(m_pStr[e]))
+      break;
+
+  return crop(s, e - s + 1);
+}
+
+dynamic_string& dynamic_string::trim_crlf() {
+  int s = 0, e;
+
+  for (e = m_len - 1; e > s; e--)
+    if ((m_pStr[e] != 13) && (m_pStr[e] != 10))
+      break;
+
+  return crop(s, e - s + 1);
+}
+
+dynamic_string& dynamic_string::remap(int from_char, int to_char) {
+  for (uint i = 0; i < m_len; i++)
+    if (m_pStr[i] == from_char)
+      m_pStr[i] = (char)to_char;
+  return *this;
+}
+
+#ifdef CRNLIB_BUILD_DEBUG
+void dynamic_string::check() const {
+  if (!m_pStr) {
+    CRNLIB_ASSERT(!m_buf_size && !m_len);
+  } else {
+    CRNLIB_ASSERT(m_buf_size);
+    CRNLIB_ASSERT((m_buf_size == cUINT16_MAX) || math::is_power_of_2((uint32)m_buf_size));
+    CRNLIB_ASSERT(m_len < m_buf_size);
+    CRNLIB_ASSERT(!m_pStr[m_len]);
+#if CRNLIB_SLOW_STRING_LEN_CHECKS
+    CRNLIB_ASSERT(strlen(m_pStr) == m_len);
+#endif
+  }
+}
+#endif
+
+bool dynamic_string::ensure_buf(uint len, bool preserve_contents) {
+  uint buf_size_needed = len + 1;
+
+  CRNLIB_ASSERT(buf_size_needed <= cUINT16_MAX);
+
+  if (buf_size_needed <= cUINT16_MAX) {
+    if (buf_size_needed > m_buf_size)
+      expand_buf(buf_size_needed, preserve_contents);
+  }
+
+  return m_buf_size >= buf_size_needed;
+}
+
+bool dynamic_string::expand_buf(uint new_buf_size, bool preserve_contents) {
+  new_buf_size = math::minimum<uint>(cUINT16_MAX, math::next_pow2(math::maximum<uint>(m_buf_size, new_buf_size)));
+
+  if (new_buf_size != m_buf_size) {
+    char* p = crnlib_new_array<char>(new_buf_size);
+
+    if (preserve_contents)
+      memcpy(p, get_ptr_priv(), m_len + 1);
+
+    crnlib_delete_array(m_pStr);
+    m_pStr = p;
+
+    m_buf_size = static_cast<uint16>(new_buf_size);
+
+    if (preserve_contents)
+      check();
+  }
+
+  return m_buf_size >= new_buf_size;
+}
+
+void dynamic_string::swap(dynamic_string& other) {
+  utils::swap(other.m_buf_size, m_buf_size);
+  utils::swap(other.m_len, m_len);
+  utils::swap(other.m_pStr, m_pStr);
+}
+
+int dynamic_string::serialize(void* pBuf, uint buf_size, bool little_endian) const {
+  uint buf_left = buf_size;
+
+  //if (m_len > cUINT16_MAX)
+  //   return -1;
+  CRNLIB_ASSUME(sizeof(m_len) == sizeof(uint16));
+
+  if (!utils::write_val((uint16)m_len, pBuf, buf_left, little_endian))
+    return -1;
+
+  if (buf_left < m_len)
+    return -1;
+
+  memcpy(pBuf, get_ptr(), m_len);
+
+  buf_left -= m_len;
+
+  return buf_size - buf_left;
+}
+
+int dynamic_string::deserialize(const void* pBuf, uint buf_size, bool little_endian) {
+  uint buf_left = buf_size;
+
+  if (buf_left < sizeof(uint16))
+    return -1;
+
+  uint16 l;
+  if (!utils::read_obj(l, pBuf, buf_left, little_endian))
+    return -1;
+
+  if (buf_left < l)
+    return -1;
+
+  set_from_buf(pBuf, l);
+
+  buf_left -= l;
+
+  return buf_size - buf_left;
+}
+
+void dynamic_string::translate_lf_to_crlf() {
+  if (find_left(0x0A) < 0)
+    return;
+
+  dynamic_string tmp;
+  tmp.ensure_buf(m_len + 2);
+
+  // normal sequence is 0x0D 0x0A (CR LF, \r\n)
+
+  int prev_char = -1;
+  for (uint i = 0; i < get_len(); i++) {
+    const int cur_char = (*this)[i];
+
+    if ((cur_char == 0x0A) && (prev_char != 0x0D))
+      tmp.append_char(0x0D);
+
+    tmp.append_char(cur_char);
+
+    prev_char = cur_char;
+  }
+
+  swap(tmp);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,185 @@
+// File: crn_dynamic_string.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+enum { cMaxDynamicStringLen = cUINT16_MAX - 1 };
+class dynamic_string {
+ public:
+  inline dynamic_string()
+      : m_buf_size(0), m_len(0), m_pStr(NULL) {}
+  dynamic_string(eVarArg dummy, const char* p, ...);
+  dynamic_string(const char* p);
+  dynamic_string(const char* p, uint len);
+  dynamic_string(const dynamic_string& other);
+
+  inline ~dynamic_string() {
+    if (m_pStr)
+      crnlib_delete_array(m_pStr);
+  }
+
+  // Truncates the string to 0 chars and frees the buffer.
+  void clear();
+  void optimize();
+
+  // Truncates the string to 0 chars, but does not free the buffer.
+  void empty();
+  inline const char* assume_ownership() {
+    const char* p = m_pStr;
+    m_pStr = NULL;
+    m_len = 0;
+    m_buf_size = 0;
+    return p;
+  }
+
+  inline uint get_len() const { return m_len; }
+  inline bool is_empty() const { return !m_len; }
+
+  inline const char* get_ptr() const { return m_pStr ? m_pStr : ""; }
+  inline const char* c_str() const { return get_ptr(); }
+
+  inline const char* get_ptr_raw() const { return m_pStr; }
+  inline char* get_ptr_raw() { return m_pStr; }
+
+  inline char front() const { return m_len ? m_pStr[0] : '\0'; }
+  inline char back() const { return m_len ? m_pStr[m_len - 1] : '\0'; }
+
+  inline char operator[](uint i) const {
+    CRNLIB_ASSERT(i <= m_len);
+    return get_ptr()[i];
+  }
+
+  inline operator size_t() const { return fast_hash(get_ptr(), m_len) ^ fast_hash(&m_len, sizeof(m_len)); }
+
+  int compare(const char* p, bool case_sensitive = false) const;
+  int compare(const dynamic_string& rhs, bool case_sensitive = false) const;
+
+  inline bool operator==(const dynamic_string& rhs) const { return compare(rhs) == 0; }
+  inline bool operator==(const char* p) const { return compare(p) == 0; }
+
+  inline bool operator!=(const dynamic_string& rhs) const { return compare(rhs) != 0; }
+  inline bool operator!=(const char* p) const { return compare(p) != 0; }
+
+  inline bool operator<(const dynamic_string& rhs) const { return compare(rhs) < 0; }
+  inline bool operator<(const char* p) const { return compare(p) < 0; }
+
+  inline bool operator>(const dynamic_string& rhs) const { return compare(rhs) > 0; }
+  inline bool operator>(const char* p) const { return compare(p) > 0; }
+
+  inline bool operator<=(const dynamic_string& rhs) const { return compare(rhs) <= 0; }
+  inline bool operator<=(const char* p) const { return compare(p) <= 0; }
+
+  inline bool operator>=(const dynamic_string& rhs) const { return compare(rhs) >= 0; }
+  inline bool operator>=(const char* p) const { return compare(p) >= 0; }
+
+  friend inline bool operator==(const char* p, const dynamic_string& rhs) { return rhs.compare(p) == 0; }
+
+  dynamic_string& set(const char* p, uint max_len = UINT_MAX);
+  dynamic_string& set(const dynamic_string& other, uint max_len = UINT_MAX);
+
+  bool set_len(uint new_len, char fill_char = ' ');
+
+  // Set from non-zero terminated buffer.
+  dynamic_string& set_from_buf(const void* pBuf, uint buf_size);
+
+  dynamic_string& operator=(const dynamic_string& rhs) { return set(rhs); }
+  dynamic_string& operator=(const char* p) { return set(p); }
+
+  dynamic_string& set_char(uint index, char c);
+  dynamic_string& append_char(char c);
+  dynamic_string& append_char(int c) {
+    CRNLIB_ASSERT((c >= 0) && (c <= 255));
+    return append_char(static_cast<char>(c));
+  }
+  dynamic_string& truncate(uint new_len);
+  dynamic_string& tolower();
+  dynamic_string& toupper();
+
+  dynamic_string& append(const char* p);
+  dynamic_string& append(const dynamic_string& other);
+  dynamic_string& operator+=(const char* p) { return append(p); }
+  dynamic_string& operator+=(const dynamic_string& other) { return append(other); }
+
+  friend dynamic_string operator+(const char* p, const dynamic_string& a);
+  friend dynamic_string operator+(const dynamic_string& a, const char* p);
+  friend dynamic_string operator+(const dynamic_string& a, const dynamic_string& b);
+
+  dynamic_string& format_args(const char* p, va_list args);
+  dynamic_string& format(const char* p, ...);
+
+  dynamic_string& crop(uint start, uint len);
+  dynamic_string& substring(uint start, uint end);
+  dynamic_string& left(uint len);
+  dynamic_string& mid(uint start, uint len);
+  dynamic_string& right(uint start);
+  dynamic_string& tail(uint num);
+
+  dynamic_string& unquote();
+
+  uint count_char(char c) const;
+
+  int find_left(const char* p, bool case_sensitive = false) const;
+  int find_left(char c) const;
+
+  int find_right(char c) const;
+  int find_right(const char* p, bool case_sensitive = false) const;
+
+  bool contains(const char* p, bool case_sensitive = false) const;
+
+  dynamic_string& trim();
+  dynamic_string& trim_crlf();
+
+  dynamic_string& remap(int from_char, int to_char);
+
+  void swap(dynamic_string& other);
+
+  // Returns -1 on failure, or the number of bytes written.
+  int serialize(void* pBuf, uint buf_size, bool little_endian) const;
+
+  // Returns -1 on failure, or the number of bytes read.
+  int deserialize(const void* pBuf, uint buf_size, bool little_endian);
+
+  void translate_lf_to_crlf();
+
+  static inline char* create_raw_buffer(uint& buf_size_in_chars);
+  static inline void free_raw_buffer(char* p) { crnlib_delete_array(p); }
+  dynamic_string& set_from_raw_buf_and_assume_ownership(char* pBuf, uint buf_size_in_chars, uint len_in_chars);
+
+ private:
+  uint16 m_buf_size;
+  uint16 m_len;
+  char* m_pStr;
+
+#ifdef CRNLIB_BUILD_DEBUG
+  void check() const;
+#else
+  inline void check() const {}
+#endif
+
+  bool expand_buf(uint new_buf_size, bool preserve_contents);
+
+  const char* get_ptr_priv() const { return m_pStr ? m_pStr : ""; }
+  char* get_ptr_priv() { return (char*)(m_pStr ? m_pStr : ""); }
+
+  bool ensure_buf(uint len, bool preserve_contents = true);
+};
+
+typedef crnlib::vector<dynamic_string> dynamic_string_array;
+
+extern dynamic_string g_empty_dynamic_string;
+
+CRNLIB_DEFINE_BITWISE_MOVABLE(dynamic_string);
+
+inline void swap(dynamic_string& a, dynamic_string& b) {
+  a.swap(b);
+}
+
+inline char* dynamic_string::create_raw_buffer(uint& buf_size_in_chars) {
+  if (buf_size_in_chars > cUINT16_MAX) {
+    CRNLIB_ASSERT(0);
+    return NULL;
+  }
+  buf_size_in_chars = math::minimum<uint>(cUINT16_MAX, math::next_pow2(buf_size_in_chars));
+  return crnlib_new_array<char>(buf_size_in_chars);
+}
+}  // namespace crnlib
@@ -0,0 +1,543 @@
+// File: crn_etc.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "../inc/crnlib.h"
+#include "crn_dxt.h"
+
+namespace crnlib {
+enum etc_constants {
+  cETC1BytesPerBlock = 8U,
+
+  cETC1SelectorBits = 2U,
+  cETC1SelectorValues = 1U << cETC1SelectorBits,
+  cETC1SelectorMask = cETC1SelectorValues - 1U,
+
+  cETC1BlockShift = 2U,
+  cETC1BlockSize = 1U << cETC1BlockShift,
+
+  cETC1LSBSelectorIndicesBitOffset = 0,
+  cETC1MSBSelectorIndicesBitOffset = 16,
+
+  cETC1FlipBitOffset = 32,
+  cETC1DiffBitOffset = 33,
+
+  cETC1IntenModifierNumBits = 3,
+  cETC1IntenModifierValues = 1 << cETC1IntenModifierNumBits,
+  cETC1RightIntenModifierTableBitOffset = 34,
+  cETC1LeftIntenModifierTableBitOffset = 37,
+
+  // Base+Delta encoding (5 bit bases, 3 bit delta)
+  cETC1BaseColorCompNumBits = 5,
+  cETC1BaseColorCompMax = 1 << cETC1BaseColorCompNumBits,
+
+  cETC1DeltaColorCompNumBits = 3,
+  cETC1DeltaColorComp = 1 << cETC1DeltaColorCompNumBits,
+  cETC1DeltaColorCompMax = 1 << cETC1DeltaColorCompNumBits,
+
+  cETC1BaseColor5RBitOffset = 59,
+  cETC1BaseColor5GBitOffset = 51,
+  cETC1BaseColor5BBitOffset = 43,
+
+  cETC1DeltaColor3RBitOffset = 56,
+  cETC1DeltaColor3GBitOffset = 48,
+  cETC1DeltaColor3BBitOffset = 40,
+
+  // Absolute (non-delta) encoding (two 4-bit per component bases)
+  cETC1AbsColorCompNumBits = 4,
+  cETC1AbsColorCompMax = 1 << cETC1AbsColorCompNumBits,
+
+  cETC1AbsColor4R1BitOffset = 60,
+  cETC1AbsColor4G1BitOffset = 52,
+  cETC1AbsColor4B1BitOffset = 44,
+
+  cETC1AbsColor4R2BitOffset = 56,
+  cETC1AbsColor4G2BitOffset = 48,
+  cETC1AbsColor4B2BitOffset = 40,
+
+  cETC1ColorDeltaMin = -4,
+  cETC1ColorDeltaMax = 3,
+
+  // Delta3:
+  // 0   1   2   3   4   5   6   7
+  // 000 001 010 011 100 101 110 111
+  // 0   1   2   3   -4  -3  -2  -1
+};
+
+extern const int g_etc1_inten_tables[cETC1IntenModifierValues][cETC1SelectorValues];
+extern const uint8 g_etc1_to_selector_index[cETC1SelectorValues];
+extern const uint8 g_selector_index_to_etc1[cETC1SelectorValues];
+
+struct etc1_coord2 {
+  uint8 m_x, m_y;
+};
+extern const etc1_coord2 g_etc1_pixel_coords[2][2][8];  // [flipped][subblock][subblock_pixel]
+
+struct etc1_block {
+  // big endian uint64:
+  // bit ofs:  56  48  40  32  24  16   8   0
+  // byte ofs: b0, b1, b2, b3, b4, b5, b6, b7
+  union {
+    uint64 m_uint64;
+    uint8 m_bytes[8];
+  };
+
+  uint8 m_low_color[2];
+  uint8 m_high_color[2];
+
+  enum { cNumSelectorBytes = 4 };
+  uint8 m_selectors[cNumSelectorBytes];
+
+  inline void clear() {
+    utils::zero_this(this);
+  }
+
+  inline uint get_general_bits(uint ofs, uint num) const {
+    CRNLIB_ASSERT((ofs + num) <= 64U);
+    CRNLIB_ASSERT(num && (num < 32U));
+    return (utils::read_be64(&m_uint64) >> ofs) & ((1UL << num) - 1UL);
+  }
+
+  inline void set_general_bits(uint ofs, uint num, uint bits) {
+    CRNLIB_ASSERT((ofs + num) <= 64U);
+    CRNLIB_ASSERT(num && (num < 32U));
+
+    uint64 x = utils::read_be64(&m_uint64);
+    uint64 msk = ((1ULL << static_cast<uint64>(num)) - 1ULL) << static_cast<uint64>(ofs);
+    x &= ~msk;
+    x |= (static_cast<uint64>(bits) << static_cast<uint64>(ofs));
+    utils::write_be64(&m_uint64, x);
+  }
+
+  inline uint get_byte_bits(uint ofs, uint num) const {
+    CRNLIB_ASSERT((ofs + num) <= 64U);
+    CRNLIB_ASSERT(num && (num <= 8U));
+    CRNLIB_ASSERT((ofs >> 3) == ((ofs + num - 1) >> 3));
+    const uint byte_ofs = 7 - (ofs >> 3);
+    const uint byte_bit_ofs = ofs & 7;
+    return (m_bytes[byte_ofs] >> byte_bit_ofs) & ((1 << num) - 1);
+  }
+
+  inline void set_byte_bits(uint ofs, uint num, uint bits) {
+    CRNLIB_ASSERT((ofs + num) <= 64U);
+    CRNLIB_ASSERT(num && (num < 32U));
+    CRNLIB_ASSERT((ofs >> 3) == ((ofs + num - 1) >> 3));
+    CRNLIB_ASSERT(bits < (1U << num));
+    const uint byte_ofs = 7 - (ofs >> 3);
+    const uint byte_bit_ofs = ofs & 7;
+    const uint mask = (1 << num) - 1;
+    m_bytes[byte_ofs] &= ~(mask << byte_bit_ofs);
+    m_bytes[byte_ofs] |= (bits << byte_bit_ofs);
+  }
+
+  // false = left/right subblocks
+  // true = upper/lower subblocks
+  inline bool get_flip_bit() const {
+    return (m_bytes[3] & 1) != 0;
+  }
+
+  inline void set_flip_bit(bool flip) {
+    m_bytes[3] &= ~1;
+    m_bytes[3] |= static_cast<uint8>(flip);
+  }
+
+  inline bool get_diff_bit() const {
+    return (m_bytes[3] & 2) != 0;
+  }
+
+  inline void set_diff_bit(bool diff) {
+    m_bytes[3] &= ~2;
+    m_bytes[3] |= (static_cast<uint>(diff) << 1);
+  }
+
+  // Returns intensity modifier table (0-7) used by subblock subblock_id.
+  // subblock_id=0 left/top (CW 1), 1=right/bottom (CW 2)
+  inline uint get_inten_table(uint subblock_id) const {
+    CRNLIB_ASSERT(subblock_id < 2);
+    const uint ofs = subblock_id ? 2 : 5;
+    return (m_bytes[3] >> ofs) & 7;
+  }
+
+  // Sets intensity modifier table (0-7) used by subblock subblock_id (0 or 1)
+  inline void set_inten_table(uint subblock_id, uint t) {
+    CRNLIB_ASSERT(subblock_id < 2);
+    CRNLIB_ASSERT(t < 8);
+    const uint ofs = subblock_id ? 2 : 5;
+    m_bytes[3] &= ~(7 << ofs);
+    m_bytes[3] |= (t << ofs);
+  }
+
+  // Returned selector value ranges from 0-3 and is a direct index into g_etc1_inten_tables.
+  inline uint get_selector(uint x, uint y) const {
+    CRNLIB_ASSERT((x | y) < 4);
+
+    const uint bit_index = x * 4 + y;
+    const uint byte_bit_ofs = bit_index & 7;
+    const uint8* p = &m_bytes[7 - (bit_index >> 3)];
+    const uint lsb = (p[0] >> byte_bit_ofs) & 1;
+    const uint msb = (p[-2] >> byte_bit_ofs) & 1;
+    const uint val = lsb | (msb << 1);
+
+    return g_etc1_to_selector_index[val];
+  }
+
+  // Selector "val" ranges from 0-3 and is a direct index into g_etc1_inten_tables.
+  inline void set_selector(uint x, uint y, uint val) {
+    CRNLIB_ASSERT((x | y | val) < 4);
+    const uint bit_index = x * 4 + y;
+
+    uint8* p = &m_bytes[7 - (bit_index >> 3)];
+
+    const uint byte_bit_ofs = bit_index & 7;
+    const uint mask = 1 << byte_bit_ofs;
+
+    const uint etc1_val = g_selector_index_to_etc1[val];
+
+    const uint lsb = etc1_val & 1;
+    const uint msb = etc1_val >> 1;
+
+    p[0] &= ~mask;
+    p[0] |= (lsb << byte_bit_ofs);
+
+    p[-2] &= ~mask;
+    p[-2] |= (msb << byte_bit_ofs);
+  }
+
+  inline void set_base4_color(uint idx, uint16 c) {
+    if (idx) {
+      set_byte_bits(cETC1AbsColor4R2BitOffset, 4, (c >> 8) & 15);
+      set_byte_bits(cETC1AbsColor4G2BitOffset, 4, (c >> 4) & 15);
+      set_byte_bits(cETC1AbsColor4B2BitOffset, 4, c & 15);
+    } else {
+      set_byte_bits(cETC1AbsColor4R1BitOffset, 4, (c >> 8) & 15);
+      set_byte_bits(cETC1AbsColor4G1BitOffset, 4, (c >> 4) & 15);
+      set_byte_bits(cETC1AbsColor4B1BitOffset, 4, c & 15);
+    }
+  }
+
+  inline uint16 get_base4_color(uint idx) const {
+    uint r, g, b;
+    if (idx) {
+      r = get_byte_bits(cETC1AbsColor4R2BitOffset, 4);
+      g = get_byte_bits(cETC1AbsColor4G2BitOffset, 4);
+      b = get_byte_bits(cETC1AbsColor4B2BitOffset, 4);
+    } else {
+      r = get_byte_bits(cETC1AbsColor4R1BitOffset, 4);
+      g = get_byte_bits(cETC1AbsColor4G1BitOffset, 4);
+      b = get_byte_bits(cETC1AbsColor4B1BitOffset, 4);
+    }
+    return static_cast<uint16>(b | (g << 4U) | (r << 8U));
+  }
+
+  inline void set_base5_color(uint16 c) {
+    set_byte_bits(cETC1BaseColor5RBitOffset, 5, (c >> 10) & 31);
+    set_byte_bits(cETC1BaseColor5GBitOffset, 5, (c >> 5) & 31);
+    set_byte_bits(cETC1BaseColor5BBitOffset, 5, c & 31);
+  }
+
+  inline uint16 get_base5_color() const {
+    const uint r = get_byte_bits(cETC1BaseColor5RBitOffset, 5);
+    const uint g = get_byte_bits(cETC1BaseColor5GBitOffset, 5);
+    const uint b = get_byte_bits(cETC1BaseColor5BBitOffset, 5);
+    return static_cast<uint16>(b | (g << 5U) | (r << 10U));
+  }
+
+  void set_delta3_color(uint16 c) {
+    set_byte_bits(cETC1DeltaColor3RBitOffset, 3, (c >> 6) & 7);
+    set_byte_bits(cETC1DeltaColor3GBitOffset, 3, (c >> 3) & 7);
+    set_byte_bits(cETC1DeltaColor3BBitOffset, 3, c & 7);
+  }
+
+  inline uint16 get_delta3_color() const {
+    const uint r = get_byte_bits(cETC1DeltaColor3RBitOffset, 3);
+    const uint g = get_byte_bits(cETC1DeltaColor3GBitOffset, 3);
+    const uint b = get_byte_bits(cETC1DeltaColor3BBitOffset, 3);
+    return static_cast<uint16>(b | (g << 3U) | (r << 6U));
+  }
+
+  // Base color 5
+  static uint16 pack_color5(const color_quad_u8& color, bool scaled, uint bias = 127U);
+  static uint16 pack_color5(uint r, uint g, uint b, bool scaled, uint bias = 127U);
+
+  static color_quad_u8 unpack_color5(uint16 packed_color5, bool scaled, uint alpha = 255U);
+  static void unpack_color5(uint& r, uint& g, uint& b, uint16 packed_color, bool scaled);
+
+  static bool unpack_color5(color_quad_u8& result, uint16 packed_color5, uint16 packed_delta3, bool scaled, uint alpha = 255U);
+  static bool unpack_color5(uint& r, uint& g, uint& b, uint16 packed_color5, uint16 packed_delta3, bool scaled, uint alpha = 255U);
+
+  // Delta color 3
+  // Inputs range from -4 to 3 (cETC1ColorDeltaMin to cETC1ColorDeltaMax)
+  static uint16 pack_delta3(const color_quad_i16& color);
+  static uint16 pack_delta3(int r, int g, int b);
+
+  // Results range from -4 to 3 (cETC1ColorDeltaMin to cETC1ColorDeltaMax)
+  static color_quad_i16 unpack_delta3(uint16 packed_delta3);
+  static void unpack_delta3(int& r, int& g, int& b, uint16 packed_delta3);
+
+  // Abs color 4
+  static uint16 pack_color4(const color_quad_u8& color, bool scaled, uint bias = 127U);
+  static uint16 pack_color4(uint r, uint g, uint b, bool scaled, uint bias = 127U);
+
+  static color_quad_u8 unpack_color4(uint16 packed_color4, bool scaled, uint alpha = 255U);
+  static void unpack_color4(uint& r, uint& g, uint& b, uint16 packed_color4, bool scaled);
+
+  // subblock colors
+  static void get_diff_subblock_colors(color_quad_u8* pDst, uint16 packed_color5, uint table_idx);
+  static bool get_diff_subblock_colors(color_quad_u8* pDst, uint16 packed_color5, uint16 packed_delta3, uint table_idx);
+  static void get_abs_subblock_colors(color_quad_u8* pDst, uint16 packed_color4, uint table_idx);
+
+  static inline void unscaled_to_scaled_color(color_quad_u8& dst, const color_quad_u8& src, bool color4) {
+    if (color4) {
+      dst.r = src.r | (src.r << 4);
+      dst.g = src.g | (src.g << 4);
+      dst.b = src.b | (src.b << 4);
+    } else {
+      dst.r = (src.r >> 2) | (src.r << 3);
+      dst.g = (src.g >> 2) | (src.g << 3);
+      dst.b = (src.b >> 2) | (src.b << 3);
+    }
+    dst.a = src.a;
+  }
+};
+
+CRNLIB_DEFINE_BITWISE_COPYABLE(etc1_block);
+
+// Returns false if the block is invalid (it will still be unpacked with clamping).
+bool unpack_etc1(const etc1_block& block, color_quad_u8* pDst, bool preserve_alpha = false);
+
+enum crn_etc_quality {
+  cCRNETCQualityFast,
+  cCRNETCQualityMedium,
+  cCRNETCQualitySlow,
+
+  cCRNETCQualityTotal,
+
+  cCRNETCQualityForceDWORD = 0xFFFFFFFF
+};
+
+struct crn_etc1_pack_params {
+  crn_etc_quality m_quality;
+  bool m_perceptual;
+  bool m_dithering;
+
+  inline crn_etc1_pack_params() {
+    clear();
+  }
+
+  void clear() {
+    m_quality = cCRNETCQualitySlow;
+    m_perceptual = true;
+    m_dithering = false;
+  }
+};
+
+struct etc1_solution_coordinates {
+  inline etc1_solution_coordinates()
+      : m_unscaled_color(0, 0, 0, 0),
+        m_inten_table(0),
+        m_color4(false) {
+  }
+
+  inline etc1_solution_coordinates(uint r, uint g, uint b, uint inten_table, bool color4)
+      : m_unscaled_color(r, g, b, 255),
+        m_inten_table(inten_table),
+        m_color4(color4) {
+  }
+
+  inline etc1_solution_coordinates(const color_quad_u8& c, uint inten_table, bool color4)
+      : m_unscaled_color(c),
+        m_inten_table(inten_table),
+        m_color4(color4) {
+  }
+
+  inline etc1_solution_coordinates(const etc1_solution_coordinates& other) {
+    *this = other;
+  }
+
+  inline etc1_solution_coordinates& operator=(const etc1_solution_coordinates& rhs) {
+    m_unscaled_color = rhs.m_unscaled_color;
+    m_inten_table = rhs.m_inten_table;
+    m_color4 = rhs.m_color4;
+    return *this;
+  }
+
+  inline void clear() {
+    m_unscaled_color.clear();
+    m_inten_table = 0;
+    m_color4 = false;
+  }
+
+  inline color_quad_u8 get_scaled_color() const {
+    int br, bg, bb;
+    if (m_color4) {
+      br = m_unscaled_color.r | (m_unscaled_color.r << 4);
+      bg = m_unscaled_color.g | (m_unscaled_color.g << 4);
+      bb = m_unscaled_color.b | (m_unscaled_color.b << 4);
+    } else {
+      br = (m_unscaled_color.r >> 2) | (m_unscaled_color.r << 3);
+      bg = (m_unscaled_color.g >> 2) | (m_unscaled_color.g << 3);
+      bb = (m_unscaled_color.b >> 2) | (m_unscaled_color.b << 3);
+    }
+    return color_quad_u8(br, bg, bb);
+  }
+
+  inline void get_block_colors(color_quad_u8* pBlock_colors) {
+    int br, bg, bb;
+    if (m_color4) {
+      br = m_unscaled_color.r | (m_unscaled_color.r << 4);
+      bg = m_unscaled_color.g | (m_unscaled_color.g << 4);
+      bb = m_unscaled_color.b | (m_unscaled_color.b << 4);
+    } else {
+      br = (m_unscaled_color.r >> 2) | (m_unscaled_color.r << 3);
+      bg = (m_unscaled_color.g >> 2) | (m_unscaled_color.g << 3);
+      bb = (m_unscaled_color.b >> 2) | (m_unscaled_color.b << 3);
+    }
+    const int* pInten_table = g_etc1_inten_tables[m_inten_table];
+    pBlock_colors[0].set(br + pInten_table[0], bg + pInten_table[0], bb + pInten_table[0]);
+    pBlock_colors[1].set(br + pInten_table[1], bg + pInten_table[1], bb + pInten_table[1]);
+    pBlock_colors[2].set(br + pInten_table[2], bg + pInten_table[2], bb + pInten_table[2]);
+    pBlock_colors[3].set(br + pInten_table[3], bg + pInten_table[3], bb + pInten_table[3]);
+  }
+
+  color_quad_u8 m_unscaled_color;
+  uint m_inten_table;
+  bool m_color4;
+};
+
+class etc1_optimizer {
+  CRNLIB_NO_COPY_OR_ASSIGNMENT_OP(etc1_optimizer);
+
+ public:
+  etc1_optimizer() {
+    clear();
+  }
+
+  void clear() {
+    m_pParams = NULL;
+    m_pResult = NULL;
+    m_pSorted_luma = NULL;
+    m_pSorted_luma_indices = NULL;
+  }
+
+  struct params : crn_etc1_pack_params {
+    params() {
+      clear();
+    }
+
+    params(const crn_etc1_pack_params& base_params)
+        : crn_etc1_pack_params(base_params) {
+      clear_optimizer_params();
+    }
+
+    void clear() {
+      crn_etc1_pack_params::clear();
+      clear_optimizer_params();
+    }
+
+    void clear_optimizer_params() {
+      m_num_src_pixels = 0;
+      m_pSrc_pixels = 0;
+
+      m_use_color4 = false;
+      static const int s_default_scan_delta[] = {0};
+      m_pScan_deltas = s_default_scan_delta;
+      m_scan_delta_size = 1;
+
+      m_base_color5.clear();
+      m_constrain_against_base_color5 = false;
+    }
+
+    uint m_num_src_pixels;
+    const color_quad_u8* m_pSrc_pixels;
+
+    bool m_use_color4;
+    const int* m_pScan_deltas;
+    uint m_scan_delta_size;
+
+    color_quad_u8 m_base_color5;
+    bool m_constrain_against_base_color5;
+  };
+
+  struct results {
+    uint64 m_error;
+    color_quad_u8 m_block_color_unscaled;
+    uint m_block_inten_table;
+    uint m_n;
+    uint8* m_pSelectors;
+    bool m_block_color4;
+
+    inline results& operator=(const results& rhs) {
+      m_block_color_unscaled = rhs.m_block_color_unscaled;
+      m_block_color4 = rhs.m_block_color4;
+      m_block_inten_table = rhs.m_block_inten_table;
+      m_error = rhs.m_error;
+      CRNLIB_ASSERT(m_n == rhs.m_n);
+      memcpy(m_pSelectors, rhs.m_pSelectors, rhs.m_n);
+      return *this;
+    }
+  };
+
+  void init(const params& params, results& result);
+  bool compute();
+
+ private:
+  struct potential_solution {
+    potential_solution()
+        : m_coords(), m_error(cUINT64_MAX), m_valid(false) {
+    }
+
+    etc1_solution_coordinates m_coords;
+    crnlib::vector<uint8> m_selectors;
+    uint64 m_error;
+    bool m_valid;
+
+    void clear() {
+      m_coords.clear();
+      m_selectors.resize(0);
+      m_error = cUINT64_MAX;
+      m_valid = false;
+    }
+
+    bool are_selectors_all_equal() const {
+      if (m_selectors.empty())
+        return false;
+      const uint s = m_selectors[0];
+      for (uint i = 1; i < m_selectors.size(); i++)
+        if (m_selectors[i] != s)
+          return false;
+      return true;
+    }
+  };
+
+  const params* m_pParams;
+  results* m_pResult;
+
+  int m_limit;
+
+  vec3F m_avg_color;
+  int m_br, m_bg, m_bb;
+  crnlib::vector<uint16> m_luma;
+  crnlib::vector<uint32> m_sorted_luma[2];
+  const uint32* m_pSorted_luma_indices;
+  uint32* m_pSorted_luma;
+
+  crnlib::vector<uint8> m_selectors;
+  crnlib::vector<uint8> m_best_selectors;
+
+  potential_solution m_best_solution;
+  potential_solution m_trial_solution;
+  crnlib::vector<uint8> m_temp_selectors;
+
+  bool evaluate_solution(const etc1_solution_coordinates& coords, potential_solution& trial_solution, potential_solution* pBest_solution);
+  bool evaluate_solution_fast(const etc1_solution_coordinates& coords, potential_solution& trial_solution, potential_solution* pBest_solution);
+};
+
+struct pack_etc1_block_context {
+  etc1_optimizer m_optimizer;
+};
+
+void pack_etc1_block_init();
+
+uint64 pack_etc1_block(etc1_block& block, const color_quad_u8* pSrc_pixels, crn_etc1_pack_params& pack_params, pack_etc1_block_context& context);
+uint64 pack_etc1s_block(etc1_block& block, const color_quad_u8* pSrc_pixels, crn_etc1_pack_params& pack_params);
+
+}  // namespace crnlib
@@ -0,0 +1,526 @@
+// File: crn_file_utils.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_file_utils.h"
+#include "crn_strutils.h"
+
+#if CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+#endif
+
+#ifdef WIN32
+#include <direct.h>
+#endif
+
+#ifdef __GNUC__
+#include <sys/stat.h>
+#include <sys/stat.h>
+#include <libgen.h>
+#endif
+
+namespace crnlib {
+#if CRNLIB_USE_WIN32_API
+bool file_utils::is_read_only(const char* pFilename) {
+  uint32 dst_file_attribs = GetFileAttributesA(pFilename);
+  if (dst_file_attribs == INVALID_FILE_ATTRIBUTES)
+    return false;
+  if (dst_file_attribs & FILE_ATTRIBUTE_READONLY)
+    return true;
+  return false;
+}
+
+bool file_utils::disable_read_only(const char* pFilename) {
+  uint32 dst_file_attribs = GetFileAttributesA(pFilename);
+  if (dst_file_attribs == INVALID_FILE_ATTRIBUTES)
+    return false;
+  if (dst_file_attribs & FILE_ATTRIBUTE_READONLY) {
+    dst_file_attribs &= ~FILE_ATTRIBUTE_READONLY;
+    if (SetFileAttributesA(pFilename, dst_file_attribs))
+      return true;
+  }
+  return false;
+}
+
+bool file_utils::is_older_than(const char* pSrcFilename, const char* pDstFilename) {
+  WIN32_FILE_ATTRIBUTE_DATA src_file_attribs;
+  const BOOL src_file_exists = GetFileAttributesExA(pSrcFilename, GetFileExInfoStandard, &src_file_attribs);
+
+  WIN32_FILE_ATTRIBUTE_DATA dst_file_attribs;
+  const BOOL dest_file_exists = GetFileAttributesExA(pDstFilename, GetFileExInfoStandard, &dst_file_attribs);
+
+  if ((dest_file_exists) && (src_file_exists)) {
+    LONG timeComp = CompareFileTime(&src_file_attribs.ftLastWriteTime, &dst_file_attribs.ftLastWriteTime);
+    if (timeComp < 0)
+      return true;
+  }
+  return false;
+}
+
+bool file_utils::does_file_exist(const char* pFilename) {
+  const DWORD fullAttributes = GetFileAttributesA(pFilename);
+
+  if (fullAttributes == INVALID_FILE_ATTRIBUTES)
+    return false;
+
+  if (fullAttributes & FILE_ATTRIBUTE_DIRECTORY)
+    return false;
+
+  return true;
+}
+
+bool file_utils::does_dir_exist(const char* pDir) {
+  //-- Get the file attributes.
+  DWORD fullAttributes = GetFileAttributesA(pDir);
+
+  if (fullAttributes == INVALID_FILE_ATTRIBUTES)
+    return false;
+
+  if (fullAttributes & FILE_ATTRIBUTE_DIRECTORY)
+    return true;
+
+  return false;
+}
+
+bool file_utils::get_file_size(const char* pFilename, uint64& file_size) {
+  file_size = 0;
+
+  WIN32_FILE_ATTRIBUTE_DATA attr;
+
+  if (0 == GetFileAttributesExA(pFilename, GetFileExInfoStandard, &attr))
+    return false;
+
+  if (attr.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
+    return false;
+
+  file_size = static_cast<uint64>(attr.nFileSizeLow) | (static_cast<uint64>(attr.nFileSizeHigh) << 32U);
+
+  return true;
+}
+#elif defined(__GNUC__)
+bool file_utils::is_read_only(const char* pFilename) {
+  pFilename;
+  // TODO
+  return false;
+}
+
+bool file_utils::disable_read_only(const char* pFilename) {
+  pFilename;
+  // TODO
+  return false;
+}
+
+bool file_utils::is_older_than(const char* pSrcFilename, const char* pDstFilename) {
+  pSrcFilename, pDstFilename;
+  // TODO
+  return false;
+}
+
+bool file_utils::does_file_exist(const char* pFilename) {
+  struct stat stat_buf;
+  int result = stat(pFilename, &stat_buf);
+  if (result)
+    return false;
+  if (S_ISREG(stat_buf.st_mode))
+    return true;
+  return false;
+}
+
+bool file_utils::does_dir_exist(const char* pDir) {
+  struct stat stat_buf;
+  int result = stat(pDir, &stat_buf);
+  if (result)
+    return false;
+  if (S_ISDIR(stat_buf.st_mode) || S_ISLNK(stat_buf.st_mode))
+    return true;
+  return false;
+}
+
+bool file_utils::get_file_size(const char* pFilename, uint64& file_size) {
+  file_size = 0;
+  struct stat stat_buf;
+  int result = stat(pFilename, &stat_buf);
+  if (result)
+    return false;
+  if (!S_ISREG(stat_buf.st_mode))
+    return false;
+  file_size = stat_buf.st_size;
+  return true;
+}
+#else
+bool file_utils::is_read_only(const char* pFilename) {
+  return false;
+}
+
+bool file_utils::disable_read_only(const char* pFilename) {
+  pFilename;
+  // TODO
+  return false;
+}
+
+bool file_utils::is_older_than(const char* pSrcFilename, const char* pDstFilename) {
+  return false;
+}
+
+bool file_utils::does_file_exist(const char* pFilename) {
+  FILE* pFile;
+  crn_fopen(&pFile, pFilename, "rb");
+  if (!pFile)
+    return false;
+  fclose(pFile);
+  return true;
+}
+
+bool file_utils::does_dir_exist(const char* pDir) {
+  return false;
+}
+
+bool file_utils::get_file_size(const char* pFilename, uint64& file_size) {
+  FILE* pFile;
+  crn_fopen(&pFile, pFilename, "rb");
+  if (!pFile)
+    return false;
+  crn_fseek(pFile, 0, SEEK_END);
+  file_size = crn_ftell(pFile);
+  fclose(pFile);
+  return true;
+}
+#endif
+
+bool file_utils::get_file_size(const char* pFilename, uint32& file_size) {
+  uint64 file_size64;
+  if (!get_file_size(pFilename, file_size64)) {
+    file_size = 0;
+    return false;
+  }
+
+  if (file_size64 > cUINT32_MAX)
+    file_size64 = cUINT32_MAX;
+
+  file_size = static_cast<uint32>(file_size64);
+  return true;
+}
+
+bool file_utils::is_path_separator(char c) {
+#ifdef WIN32
+  return (c == '/') || (c == '\\');
+#else
+  return (c == '/');
+#endif
+}
+
+bool file_utils::is_path_or_drive_separator(char c) {
+#ifdef WIN32
+  return (c == '/') || (c == '\\') || (c == ':');
+#else
+  return (c == '/');
+#endif
+}
+
+bool file_utils::is_drive_separator(char c) {
+#ifdef WIN32
+  return (c == ':');
+#else
+  c;
+  return false;
+#endif
+}
+
+bool file_utils::split_path(const char* p, dynamic_string* pDrive, dynamic_string* pDir, dynamic_string* pFilename, dynamic_string* pExt) {
+  CRNLIB_ASSERT(p);
+
+#ifdef WIN32
+  char drive_buf[_MAX_DRIVE];
+  char dir_buf[_MAX_DIR];
+  char fname_buf[_MAX_FNAME];
+  char ext_buf[_MAX_EXT];
+
+#ifdef _MSC_VER
+  // Compiling with MSVC
+  errno_t error = _splitpath_s(p,
+                               pDrive ? drive_buf : NULL, pDrive ? _MAX_DRIVE : 0,
+                               pDir ? dir_buf : NULL, pDir ? _MAX_DIR : 0,
+                               pFilename ? fname_buf : NULL, pFilename ? _MAX_FNAME : 0,
+                               pExt ? ext_buf : NULL, pExt ? _MAX_EXT : 0);
+  if (error != 0)
+    return false;
+#else
+  // Compiling with MinGW
+  _splitpath(p,
+             pDrive ? drive_buf : NULL,
+             pDir ? dir_buf : NULL,
+             pFilename ? fname_buf : NULL,
+             pExt ? ext_buf : NULL);
+#endif
+
+  if (pDrive)
+    *pDrive = drive_buf;
+  if (pDir)
+    *pDir = dir_buf;
+  if (pFilename)
+    *pFilename = fname_buf;
+  if (pExt)
+    *pExt = ext_buf;
+#else
+  char dirtmp[1024];
+  char nametmp[1024];
+  strcpy_safe(dirtmp, sizeof(dirtmp), p);
+  strcpy_safe(nametmp, sizeof(nametmp), p);
+
+  if (pDrive)
+    pDrive->clear();
+
+  const char* pDirName = dirname(dirtmp);
+  if (!pDirName)
+    return false;
+
+  if (pDir) {
+    pDir->set(pDirName);
+    if ((!pDir->is_empty()) && (pDir->back() != '/'))
+      pDir->append_char('/');
+  }
+
+  const char* pBaseName = basename(nametmp);
+  if (!pBaseName)
+    return false;
+
+  if (pFilename) {
+    pFilename->set(pBaseName);
+    remove_extension(*pFilename);
+  }
+
+  if (pExt) {
+    pExt->set(pBaseName);
+    get_extension(*pExt);
+    *pExt = "." + *pExt;
+  }
+#endif  // #ifdef WIN32
+
+  return true;
+}
+
+bool file_utils::split_path(const char* p, dynamic_string& path, dynamic_string& filename) {
+  dynamic_string temp_drive, temp_path, temp_ext;
+  if (!split_path(p, &temp_drive, &temp_path, &filename, &temp_ext))
+    return false;
+
+  filename += temp_ext;
+
+  combine_path(path, temp_drive.get_ptr(), temp_path.get_ptr());
+  return true;
+}
+
+bool file_utils::get_pathname(const char* p, dynamic_string& path) {
+  dynamic_string temp_drive, temp_path;
+  if (!split_path(p, &temp_drive, &temp_path, NULL, NULL))
+    return false;
+
+  combine_path(path, temp_drive.get_ptr(), temp_path.get_ptr());
+  return true;
+}
+
+bool file_utils::get_filename(const char* p, dynamic_string& filename) {
+  dynamic_string temp_ext;
+  if (!split_path(p, NULL, NULL, &filename, &temp_ext))
+    return false;
+
+  filename += temp_ext;
+  return true;
+}
+
+void file_utils::combine_path(dynamic_string& dst, const char* pA, const char* pB) {
+  dynamic_string temp(pA);
+  if ((!temp.is_empty()) && (!is_path_separator(pB[0]))) {
+    char c = temp[temp.get_len() - 1];
+    if (!is_path_separator(c))
+      temp.append_char(CRNLIB_PATH_SEPERATOR_CHAR);
+  }
+  temp += pB;
+  dst.swap(temp);
+}
+
+void file_utils::combine_path(dynamic_string& dst, const char* pA, const char* pB, const char* pC) {
+  combine_path(dst, pA, pB);
+  combine_path(dst, dst.get_ptr(), pC);
+}
+
+bool file_utils::full_path(dynamic_string& path) {
+#ifdef WIN32
+  char buf[1024];
+  char* p = _fullpath(buf, path.get_ptr(), sizeof(buf));
+  if (!p)
+    return false;
+#else
+  char buf[PATH_MAX];
+  char* p;
+  dynamic_string pn, fn;
+  split_path(path.get_ptr(), pn, fn);
+  if ((fn == ".") || (fn == "..")) {
+    p = realpath(path.get_ptr(), buf);
+    if (!p)
+      return false;
+    path.set(buf);
+  } else {
+    if (pn.is_empty())
+      pn = "./";
+    p = realpath(pn.get_ptr(), buf);
+    if (!p)
+      return false;
+    combine_path(path, buf, fn.get_ptr());
+  }
+#endif
+
+  return true;
+}
+
+bool file_utils::get_extension(dynamic_string& filename) {
+  int sep = -1;
+#ifdef WIN32
+  sep = filename.find_right('\\');
+#endif
+  if (sep < 0)
+    sep = filename.find_right('/');
+
+  int dot = filename.find_right('.');
+  if (dot < sep) {
+    filename.clear();
+    return false;
+  }
+
+  filename.right(dot + 1);
+
+  return true;
+}
+
+bool file_utils::remove_extension(dynamic_string& filename) {
+  int sep = -1;
+#ifdef WIN32
+  sep = filename.find_right('\\');
+#endif
+  if (sep < 0)
+    sep = filename.find_right('/');
+
+  int dot = filename.find_right('.');
+  if (dot < sep)
+    return false;
+
+  filename.left(dot);
+
+  return true;
+}
+
+bool file_utils::create_path(const dynamic_string& fullpath) {
+#ifdef WIN32
+  bool got_unc = false;
+#endif
+  dynamic_string cur_path;
+
+  const int l = fullpath.get_len();
+
+  int n = 0;
+  while (n < l) {
+    const char c = fullpath.get_ptr()[n];
+
+    const bool sep = is_path_separator(c);
+    const bool back_sep = is_path_separator(cur_path.back());
+    const bool is_last_char = (n == (l - 1));
+
+    if (((sep) && (!back_sep)) || (is_last_char)) {
+      if ((is_last_char) && (!sep))
+        cur_path.append_char(c);
+
+      bool valid = !cur_path.is_empty();
+
+#ifdef WIN32
+      // reject obvious stuff (drives, beginning of UNC paths):
+      // c:\b\cool
+      // \\machine\blah
+      // \cool\blah
+      if ((cur_path.get_len() == 2) && (cur_path[1] == ':'))
+        valid = false;
+      else if ((cur_path.get_len() >= 2) && (cur_path[0] == '\\') && (cur_path[1] == '\\')) {
+        if (!got_unc)
+          valid = false;
+        got_unc = true;
+      } else if (cur_path == "\\")
+        valid = false;
+#endif
+      if (cur_path == "/")
+        valid = false;
+
+      if ((valid) && (cur_path.get_len())) {
+#ifdef WIN32
+        _mkdir(cur_path.get_ptr());
+#else
+        mkdir(cur_path.get_ptr(), S_IRWXU | S_IRWXG | S_IRWXO);
+#endif
+      }
+    }
+
+    cur_path.append_char(c);
+
+    n++;
+  }
+
+  return true;
+}
+
+void file_utils::trim_trailing_seperator(dynamic_string& path) {
+  if ((path.get_len()) && (is_path_separator(path.back())))
+    path.truncate(path.get_len() - 1);
+}
+
+// See http://www.codeproject.com/KB/string/wildcmp.aspx
+int file_utils::wildcmp(const char* pWild, const char* pString) {
+  const char *cp = NULL, *mp = NULL;
+
+  while ((*pString) && (*pWild != '*')) {
+    if ((*pWild != *pString) && (*pWild != '?'))
+      return 0;
+    pWild++;
+    pString++;
+  }
+
+  // Either *pString=='\0' or *pWild='*' here.
+
+  while (*pString) {
+    if (*pWild == '*') {
+      if (!*++pWild)
+        return 1;
+      mp = pWild;
+      cp = pString + 1;
+    } else if ((*pWild == *pString) || (*pWild == '?')) {
+      pWild++;
+      pString++;
+    } else {
+      pWild = mp;
+      pString = cp++;
+    }
+  }
+
+  while (*pWild == '*')
+    pWild++;
+
+  return !*pWild;
+}
+
+bool file_utils::write_buf_to_file(const char* pPath, const void* pData, size_t data_size) {
+  FILE* pFile = NULL;
+
+#ifdef _MSC_VER
+  // Compiling with MSVC
+  if (fopen_s(&pFile, pPath, "wb"))
+    return false;
+#else
+  pFile = fopen(pPath, "wb");
+#endif
+  if (!pFile)
+    return false;
+
+  bool success = fwrite(pData, 1, data_size, pFile) == data_size;
+
+  fclose(pFile);
+
+  return success;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,41 @@
+// File: crn_file_utils.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+struct file_utils {
+  // Returns true if pSrcFilename is older than pDstFilename
+  static bool is_read_only(const char* pFilename);
+  static bool disable_read_only(const char* pFilename);
+  static bool is_older_than(const char* pSrcFilename, const char* pDstFilename);
+  static bool does_file_exist(const char* pFilename);
+  static bool does_dir_exist(const char* pDir);
+  static bool get_file_size(const char* pFilename, uint64& file_size);
+  static bool get_file_size(const char* pFilename, uint32& file_size);
+
+  static bool is_path_separator(char c);
+  static bool is_path_or_drive_separator(char c);
+  static bool is_drive_separator(char c);
+
+  static bool split_path(const char* p, dynamic_string* pDrive, dynamic_string* pDir, dynamic_string* pFilename, dynamic_string* pExt);
+  static bool split_path(const char* p, dynamic_string& path, dynamic_string& filename);
+
+  static bool get_pathname(const char* p, dynamic_string& path);
+  static bool get_filename(const char* p, dynamic_string& filename);
+
+  static void combine_path(dynamic_string& dst, const char* pA, const char* pB);
+  static void combine_path(dynamic_string& dst, const char* pA, const char* pB, const char* pC);
+
+  static bool full_path(dynamic_string& path);
+  static bool get_extension(dynamic_string& filename);
+  static bool remove_extension(dynamic_string& filename);
+  static bool create_path(const dynamic_string& path);
+  static void trim_trailing_seperator(dynamic_string& path);
+
+  static int wildcmp(const char* pWild, const char* pString);
+
+  static bool write_buf_to_file(const char* pPath, const void* pData, size_t data_size);
+
+};  // struct file_utils
+
+}  // namespace crnlib
@@ -0,0 +1,253 @@
+// File: crn_win32_find_files.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_find_files.h"
+#include "crn_file_utils.h"
+#include "crn_strutils.h"
+
+#ifdef CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+
+#elif defined(__GNUC__)
+#include <fnmatch.h>
+#include <dirent.h>
+#endif
+
+namespace crnlib {
+#ifdef CRNLIB_USE_WIN32_API
+bool find_files::find(const char* pBasepath, const char* pFilespec, uint flags) {
+  m_last_error = S_OK;
+  m_files.resize(0);
+
+  return find_internal(pBasepath, "", pFilespec, flags, 0);
+}
+
+bool find_files::find(const char* pSpec, uint flags) {
+  dynamic_string find_name(pSpec);
+
+  if (!file_utils::full_path(find_name))
+    return false;
+
+  dynamic_string find_pathname, find_filename;
+  if (!file_utils::split_path(find_name.get_ptr(), find_pathname, find_filename))
+    return false;
+
+  return find(find_pathname.get_ptr(), find_filename.get_ptr(), flags);
+}
+
+bool find_files::find_internal(const char* pBasepath, const char* pRelpath, const char* pFilespec, uint flags, int level) {
+  WIN32_FIND_DATAA find_data;
+
+  dynamic_string filename;
+
+  dynamic_string_array child_paths;
+  if (flags & cFlagRecursive) {
+    if (strlen(pRelpath))
+      file_utils::combine_path(filename, pBasepath, pRelpath, "*");
+    else
+      file_utils::combine_path(filename, pBasepath, "*");
+
+    HANDLE handle = FindFirstFileA(filename.get_ptr(), &find_data);
+    if (handle == INVALID_HANDLE_VALUE) {
+      HRESULT hres = GetLastError();
+      if ((level == 0) && (hres != NO_ERROR) && (hres != ERROR_FILE_NOT_FOUND)) {
+        m_last_error = hres;
+        return false;
+      }
+    } else {
+      do {
+        const bool is_dir = (find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0;
+
+        bool skip = !is_dir;
+        if (is_dir)
+          skip = (strcmp(find_data.cFileName, ".") == 0) || (strcmp(find_data.cFileName, "..") == 0);
+
+        if (find_data.dwFileAttributes & (FILE_ATTRIBUTE_SYSTEM | FILE_ATTRIBUTE_TEMPORARY))
+          skip = true;
+
+        if (find_data.dwFileAttributes & FILE_ATTRIBUTE_HIDDEN) {
+          if ((flags & cFlagAllowHidden) == 0)
+            skip = true;
+        }
+
+        if (!skip) {
+          dynamic_string child_path(find_data.cFileName);
+          if ((!child_path.count_char('?')) && (!child_path.count_char('*')))
+            child_paths.push_back(child_path);
+        }
+
+      } while (FindNextFileA(handle, &find_data) != 0);
+
+      HRESULT hres = GetLastError();
+
+      FindClose(handle);
+      handle = INVALID_HANDLE_VALUE;
+
+      if (hres != ERROR_NO_MORE_FILES) {
+        m_last_error = hres;
+        return false;
+      }
+    }
+  }
+
+  if (strlen(pRelpath))
+    file_utils::combine_path(filename, pBasepath, pRelpath, pFilespec);
+  else
+    file_utils::combine_path(filename, pBasepath, pFilespec);
+
+  HANDLE handle = FindFirstFileA(filename.get_ptr(), &find_data);
+  if (handle == INVALID_HANDLE_VALUE) {
+    HRESULT hres = GetLastError();
+    if ((level == 0) && (hres != NO_ERROR) && (hres != ERROR_FILE_NOT_FOUND)) {
+      m_last_error = hres;
+      return false;
+    }
+  } else {
+    do {
+      const bool is_dir = (find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0;
+
+      bool skip = false;
+      if (is_dir)
+        skip = (strcmp(find_data.cFileName, ".") == 0) || (strcmp(find_data.cFileName, "..") == 0);
+
+      if (find_data.dwFileAttributes & (FILE_ATTRIBUTE_SYSTEM | FILE_ATTRIBUTE_TEMPORARY))
+        skip = true;
+
+      if (find_data.dwFileAttributes & FILE_ATTRIBUTE_HIDDEN) {
+        if ((flags & cFlagAllowHidden) == 0)
+          skip = true;
+      }
+
+      if (!skip) {
+        if (((is_dir) && (flags & cFlagAllowDirs)) || ((!is_dir) && (flags & cFlagAllowFiles))) {
+          m_files.resize(m_files.size() + 1);
+          file_desc& file = m_files.back();
+          file.m_is_dir = is_dir;
+          file.m_base = pBasepath;
+          file.m_name = find_data.cFileName;
+          file.m_rel = pRelpath;
+          if (strlen(pRelpath))
+            file_utils::combine_path(file.m_fullname, pBasepath, pRelpath, find_data.cFileName);
+          else
+            file_utils::combine_path(file.m_fullname, pBasepath, find_data.cFileName);
+        }
+      }
+
+    } while (FindNextFileA(handle, &find_data) != 0);
+
+    HRESULT hres = GetLastError();
+
+    FindClose(handle);
+
+    if (hres != ERROR_NO_MORE_FILES) {
+      m_last_error = hres;
+      return false;
+    }
+  }
+
+  for (uint i = 0; i < child_paths.size(); i++) {
+    dynamic_string child_path;
+    if (strlen(pRelpath))
+      file_utils::combine_path(child_path, pRelpath, child_paths[i].get_ptr());
+    else
+      child_path = child_paths[i];
+
+    if (!find_internal(pBasepath, child_path.get_ptr(), pFilespec, flags, level + 1))
+      return false;
+  }
+
+  return true;
+}
+#elif defined(__GNUC__)
+bool find_files::find(const char* pBasepath, const char* pFilespec, uint flags) {
+  m_files.resize(0);
+  return find_internal(pBasepath, "", pFilespec, flags, 0);
+}
+
+bool find_files::find(const char* pSpec, uint flags) {
+  dynamic_string find_name(pSpec);
+
+  if (!file_utils::full_path(find_name))
+    return false;
+
+  dynamic_string find_pathname, find_filename;
+  if (!file_utils::split_path(find_name.get_ptr(), find_pathname, find_filename))
+    return false;
+
+  return find(find_pathname.get_ptr(), find_filename.get_ptr(), flags);
+}
+
+bool find_files::find_internal(const char* pBasepath, const char* pRelpath, const char* pFilespec, uint flags, int level) {
+  dynamic_string pathname;
+  if (strlen(pRelpath))
+    file_utils::combine_path(pathname, pBasepath, pRelpath);
+  else
+    pathname = pBasepath;
+
+  if (!pathname.is_empty()) {
+    char c = pathname.back();
+    if (c != '/')
+      pathname += "/";
+  }
+
+  DIR* dp = opendir(pathname.get_ptr());
+
+  if (!dp)
+    return level ? true : false;
+
+  dynamic_string_array paths;
+
+  for (;;) {
+    struct dirent* ep = readdir(dp);
+    if (!ep)
+      break;
+    if ((strcmp(ep->d_name, ".") == 0) || (strcmp(ep->d_name, "..") == 0))
+      continue;
+
+    const bool is_directory = (ep->d_type & DT_DIR) != 0;
+    const bool is_file = (ep->d_type & DT_REG) != 0;
+
+    dynamic_string filename(ep->d_name);
+
+    if (is_directory) {
+      if (flags & cFlagRecursive) {
+        paths.push_back(filename);
+      }
+    }
+
+    if (((is_file) && (flags & cFlagAllowFiles)) || ((is_directory) && (flags & cFlagAllowDirs))) {
+      if (0 == fnmatch(pFilespec, filename.get_ptr(), 0)) {
+        m_files.resize(m_files.size() + 1);
+        file_desc& file = m_files.back();
+        file.m_is_dir = is_directory;
+        file.m_base = pBasepath;
+        file.m_rel = pRelpath;
+        file.m_name = filename;
+        file.m_fullname = pathname + filename;
+      }
+    }
+  }
+
+  closedir(dp);
+  dp = NULL;
+
+  if (flags & cFlagRecursive) {
+    for (uint i = 0; i < paths.size(); i++) {
+      dynamic_string childpath;
+      if (strlen(pRelpath))
+        file_utils::combine_path(childpath, pRelpath, paths[i].get_ptr());
+      else
+        childpath = paths[i];
+
+      if (!find_internal(pBasepath, childpath.get_ptr(), pFilespec, flags, level + 1))
+        return false;
+    }
+  }
+
+  return true;
+}
+#else
+#error Unimplemented
+#endif
+
+}  // namespace crnlib
@@ -0,0 +1,56 @@
+// File: crn_win32_find_files.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+class find_files {
+ public:
+  struct file_desc {
+    inline file_desc()
+        : m_is_dir(false) {}
+
+    dynamic_string m_fullname;
+    dynamic_string m_base;
+    dynamic_string m_rel;
+    dynamic_string m_name;
+    bool m_is_dir;
+
+    inline bool operator==(const file_desc& other) const { return m_fullname == other.m_fullname; }
+    inline bool operator<(const file_desc& other) const { return m_fullname < other.m_fullname; }
+
+    inline operator size_t() const { return static_cast<size_t>(m_fullname); }
+  };
+
+  typedef crnlib::vector<file_desc> file_desc_vec;
+
+  inline find_files() {
+    m_last_error = 0;  // S_OK;
+  }
+
+  enum flags {
+    cFlagRecursive = 1,
+    cFlagAllowDirs = 2,
+    cFlagAllowFiles = 4,
+    cFlagAllowHidden = 8
+  };
+
+  bool find(const char* pBasepath, const char* pFilespec, uint flags = cFlagAllowFiles);
+
+  bool find(const char* pSpec, uint flags = cFlagAllowFiles);
+
+  // An HRESULT under Win32. FIXME: Abstract this better?
+  inline int64 get_last_error() const { return m_last_error; }
+
+  const file_desc_vec& get_files() const { return m_files; }
+
+ private:
+  file_desc_vec m_files;
+
+  // A HRESULT under Win32
+  int64 m_last_error;
+
+  bool find_internal(const char* pBasepath, const char* pRelpath, const char* pFilespec, uint flags, int level);
+
+};  // class find_files
+
+}  // namespace crnlib
@@ -0,0 +1,144 @@
+// File: crn_freeimage_image_utils.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+// Note: This header file requires FreeImage/FreeImagePlus.
+
+#include "crn_image_utils.h"
+
+#include "freeImagePlus.h"
+
+namespace crnlib {
+namespace freeimage_image_utils {
+inline bool load_from_file(image_u8& dest, const wchar_t* pFilename, int fi_flag) {
+  fipImage src_image;
+
+  if (!src_image.loadU(pFilename, fi_flag))
+    return false;
+
+  const uint orig_bits_per_pixel = src_image.getBitsPerPixel();
+
+  const FREE_IMAGE_COLOR_TYPE orig_color_type = src_image.getColorType();
+
+  if (!src_image.convertTo32Bits())
+    return false;
+
+  if (src_image.getBitsPerPixel() != 32)
+    return false;
+
+  uint width = src_image.getWidth();
+  uint height = src_image.getHeight();
+
+  dest.resize(src_image.getWidth(), src_image.getHeight(), src_image.getWidth());
+
+  color_quad_u8* pDst = dest.get_ptr();
+
+  bool grayscale = true;
+  bool has_alpha = false;
+  for (uint y = 0; y < height; y++) {
+    const BYTE* pSrc = src_image.getScanLine((WORD)(height - 1 - y));
+    color_quad_u8* pD = pDst;
+
+    for (uint x = width; x; x--) {
+      color_quad_u8 c;
+      c.r = pSrc[FI_RGBA_RED];
+      c.g = pSrc[FI_RGBA_GREEN];
+      c.b = pSrc[FI_RGBA_BLUE];
+      c.a = pSrc[FI_RGBA_ALPHA];
+
+      if (!c.is_grayscale())
+        grayscale = false;
+      has_alpha |= (c.a < 255);
+
+      pSrc += 4;
+      *pD++ = c;
+    }
+
+    pDst += width;
+  }
+
+  dest.reset_comp_flags();
+
+  if (grayscale)
+    dest.set_grayscale(true);
+
+  dest.set_component_valid(3, has_alpha || (orig_color_type == FIC_RGBALPHA) || (orig_bits_per_pixel == 32));
+
+  return true;
+}
+
+const int cSaveLuma = -1;
+
+inline bool save_to_grayscale_file(const wchar_t* pFilename, const image_u8& src, int component, int fi_flag) {
+  fipImage dst_image(FIT_BITMAP, (WORD)src.get_width(), (WORD)src.get_height(), 8);
+
+  RGBQUAD* p = dst_image.getPalette();
+  for (uint i = 0; i < dst_image.getPaletteSize(); i++) {
+    p[i].rgbRed = (BYTE)i;
+    p[i].rgbGreen = (BYTE)i;
+    p[i].rgbBlue = (BYTE)i;
+    p[i].rgbReserved = 255;
+  }
+
+  for (uint y = 0; y < src.get_height(); y++) {
+    const color_quad_u8* pSrc = src.get_scanline(y);
+
+    for (uint x = 0; x < src.get_width(); x++) {
+      BYTE v;
+      if (component == cSaveLuma)
+        v = (BYTE)(*pSrc).get_luma();
+      else
+        v = (*pSrc)[component];
+      dst_image.setPixelIndex(x, src.get_height() - 1 - y, &v);
+
+      pSrc++;
+    }
+  }
+
+  if (!dst_image.saveU(pFilename, fi_flag))
+    return false;
+
+  return true;
+}
+
+inline bool save_to_file(const wchar_t* pFilename, const image_u8& src, int fi_flag, bool ignore_alpha = false) {
+  const bool save_alpha = src.is_component_valid(3);
+  uint bpp = (save_alpha && !ignore_alpha) ? 32 : 24;
+
+  if (bpp == 32) {
+    dynamic_wstring ext(pFilename);
+    get_extension(ext);
+
+    if ((ext == L"jpg") || (ext == L"jpeg") || (ext == L"gif") || (ext == L"jp2"))
+      bpp = 24;
+  }
+
+  if ((bpp == 24) && (src.is_grayscale()))
+    return save_to_grayscale_file(pFilename, src, cSaveLuma, fi_flag);
+
+  fipImage dst_image(FIT_BITMAP, (WORD)src.get_width(), (WORD)src.get_height(), (WORD)bpp);
+
+  for (uint y = 0; y < src.get_height(); y++) {
+    for (uint x = 0; x < src.get_width(); x++) {
+      color_quad_u8 c(src(x, y));
+
+      RGBQUAD quad;
+      quad.rgbRed = c.r;
+      quad.rgbGreen = c.g;
+      quad.rgbBlue = c.b;
+      if (bpp == 32)
+        quad.rgbReserved = c.a;
+      else
+        quad.rgbReserved = 255;
+
+      dst_image.setPixelColor(x, src.get_height() - 1 - y, &quad);
+    }
+  }
+
+  if (!dst_image.saveU(pFilename, fi_flag))
+    return false;
+
+  return true;
+}
+
+}  // namespace freeimage_image_utils
+
+}  // namespace crnlib
@@ -0,0 +1,68 @@
+// File: crn_hash.cpp
+// See Paul Hsieh's page at: http://www.azillionmonkeys.com/qed/hash.html
+// Also see http://www.concentric.net/~Ttwang/tech/inthash.htm,
+// http://burtleburtle.net/bob/hash/integer.html
+#include "crn_core.h"
+
+#undef get16bits
+#if (defined(__GNUC__) && defined(__i386__)) || defined(__WATCOMC__) || defined(_MSC_VER) || defined(__BORLANDC__) || defined(__TURBOC__)
+#define get16bits(d) (*((const uint16*)(d)))
+#endif
+
+#if !defined(get16bits)
+#define get16bits(d) ((((uint32)(((const uint8*)(d))[1])) << 8) + (uint32)(((const uint8*)(d))[0]))
+#endif
+
+namespace crnlib {
+uint32 fast_hash(const void* p, int len) {
+  const char* data = static_cast<const char*>(p);
+
+  uint32 hash = len, tmp;
+  int rem;
+
+  if (len <= 0 || data == NULL)
+    return 0;
+
+  rem = len & 3;
+  len >>= 2;
+
+  /* Main loop */
+  for (; len > 0; len--) {
+    hash += get16bits(data);
+    tmp = (get16bits(data + 2) << 11) ^ hash;
+    hash = (hash << 16) ^ tmp;
+    data += 2 * sizeof(uint16);
+    hash += hash >> 11;
+  }
+
+  /* Handle end cases */
+  switch (rem) {
+    case 3:
+      hash += get16bits(data);
+      hash ^= hash << 16;
+      hash ^= data[sizeof(uint16)] << 18;
+      hash += hash >> 11;
+      break;
+    case 2:
+      hash += get16bits(data);
+      hash ^= hash << 11;
+      hash += hash >> 17;
+      break;
+    case 1:
+      hash += *data;
+      hash ^= hash << 10;
+      hash += hash >> 1;
+  }
+
+  /* Force "avalanching" of final 127 bits */
+  hash ^= hash << 3;
+  hash += hash >> 5;
+  hash ^= hash << 4;
+  hash += hash >> 17;
+  hash ^= hash << 25;
+  hash += hash >> 6;
+
+  return hash;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,31 @@
+// File: crn_hash.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+uint32 fast_hash(const void* p, int len);
+
+// 4-byte integer hash, full avalanche
+inline uint32 bitmix32c(uint32 a) {
+  a = (a + 0x7ed55d16) + (a << 12);
+  a = (a ^ 0xc761c23c) ^ (a >> 19);
+  a = (a + 0x165667b1) + (a << 5);
+  a = (a + 0xd3a2646c) ^ (a << 9);
+  a = (a + 0xfd7046c5) + (a << 3);
+  a = (a ^ 0xb55a4f09) ^ (a >> 16);
+  return a;
+}
+
+// 4-byte integer hash, full avalanche, no constants
+inline uint32 bitmix32(uint32 a) {
+  a -= (a << 6);
+  a ^= (a >> 17);
+  a -= (a << 9);
+  a ^= (a << 4);
+  a -= (a << 3);
+  a ^= (a << 10);
+  a ^= (a >> 15);
+  return a;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,154 @@
+// File: crn_hash_map.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_hash_map.h"
+#include "crn_rand.h"
+
+namespace crnlib {
+#if 0
+   class counted_obj
+   {
+   public:
+      counted_obj(uint v = 0) :
+         m_val(v)
+      {
+         m_count++;
+      }
+
+      counted_obj(const counted_obj& obj) :
+         m_val(obj.m_val)
+      {
+         m_count++;
+      }
+
+      ~counted_obj()
+      {
+         CRNLIB_ASSERT(m_count > 0);
+         m_count--;
+      }
+
+      static uint m_count;
+
+      uint m_val;
+
+      operator size_t() const { return m_val; }
+
+      bool operator== (const counted_obj& rhs) const { return m_val == rhs.m_val; }
+      bool operator== (const uint rhs) const { return m_val == rhs; }
+
+   };
+
+   uint counted_obj::m_count;
+
+   void hash_map_test()
+   {
+      random r0, r1;
+
+      uint seed = 0;
+      for ( ; ; )
+      {
+         seed++;
+
+         typedef crnlib::hash_map<counted_obj, counted_obj> my_hash_map;
+         my_hash_map m;
+
+         const uint n = r0.irand(1, 100000);
+
+         printf("%u\n", n);
+
+         r1.seed(seed);
+
+         crnlib::vector<int> q;
+
+         uint count = 0;
+         for (uint i = 0; i < n; i++)
+         {
+            uint v = r1.urand32() & 0x7FFFFFFF;
+            my_hash_map::insert_result res = m.insert(counted_obj(v), counted_obj(v ^ 0xdeadbeef));
+            if (res.second)
+            {
+               count++;
+               q.push_back(v);
+            }
+         }
+
+         CRNLIB_VERIFY(m.size() == count);
+
+         r1.seed(seed);
+
+         my_hash_map cm(m);
+         m.clear();
+         m = cm;
+         cm.reset();
+
+         for (uint i = 0; i < n; i++)
+         {
+            uint v = r1.urand32() & 0x7FFFFFFF;
+            my_hash_map::const_iterator it = m.find(counted_obj(v));
+            CRNLIB_VERIFY(it != m.end());
+            CRNLIB_VERIFY(it->first == v);
+            CRNLIB_VERIFY(it->second == (v ^ 0xdeadbeef));
+         }
+
+         for (uint t = 0; t < 2; t++)
+         {
+            const uint nd = r0.irand(1, q.size() + 1);
+            for (uint i = 0; i < nd; i++)
+            {
+               uint p = r0.irand(0, q.size());
+
+               int k = q[p];
+               if (k >= 0)
+               {
+                  q[p] = -k - 1;
+
+                  bool s = m.erase(counted_obj(k));
+                  CRNLIB_VERIFY(s);
+               }
+            }
+
+            typedef crnlib::hash_map<uint, empty_type> uint_hash_set;
+            uint_hash_set s;
+
+            for (uint i = 0; i < q.size(); i++)
+            {
+               int v = q[i];
+
+               if (v >= 0)
+               {
+                  my_hash_map::const_iterator it = m.find(counted_obj(v));
+                  CRNLIB_VERIFY(it != m.end());
+                  CRNLIB_VERIFY(it->first == (uint)v);
+                  CRNLIB_VERIFY(it->second == ((uint)v ^ 0xdeadbeef));
+
+                  s.insert(v);
+               }
+               else
+               {
+                  my_hash_map::const_iterator it = m.find(counted_obj(-v - 1));
+                  CRNLIB_VERIFY(it == m.end());
+               }
+            }
+
+            uint found_count = 0;
+            for (my_hash_map::const_iterator it = m.begin(); it != m.end(); ++it)
+            {
+               CRNLIB_VERIFY(it->second == ((uint)it->first ^ 0xdeadbeef));
+
+               uint_hash_set::const_iterator fit(s.find((uint)it->first));
+               CRNLIB_VERIFY(fit != s.end());
+
+               CRNLIB_VERIFY(fit->first == it->first);
+
+               found_count++;
+            }
+
+            CRNLIB_VERIFY(found_count == s.size());
+         }
+
+         CRNLIB_VERIFY(counted_obj::m_count == m.size() * 2);
+      }
+   }
+#endif
+
+}  // namespace crnlib
@@ -0,0 +1,765 @@
+// File: crn_hash_map.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+//
+// Notes:
+// stl-like hash map/hash set, with predictable performance across platforms/compilers/C run times/etc.
+// Hash function ref: http://www.brpreiss.com/books/opus4/html/page215.html
+// Compared for performance against VC9's std::hash_map.
+// Linear probing, auto resizes on ~50% load factor.
+// Uses Knuth's multiplicative method (Fibonacci hashing).
+#pragma once
+#include "crn_sparse_array.h"
+#include "crn_sparse_bit_array.h"
+#include "crn_hash.h"
+
+namespace crnlib {
+template <typename T>
+struct hasher {
+  inline size_t operator()(const T& key) const { return static_cast<size_t>(key); }
+};
+
+template <typename T>
+struct bit_hasher {
+  inline size_t operator()(const T& key) const { return static_cast<size_t>(fast_hash(&key, sizeof(key))); }
+};
+
+template <typename T>
+struct equal_to {
+  inline bool operator()(const T& a, const T& b) const { return a == b; }
+};
+
+// Important: The Hasher and Equals objects must be bitwise movable!
+template <typename Key, typename Value = empty_type, typename Hasher = hasher<Key>, typename Equals = equal_to<Key> >
+class hash_map {
+  friend class iterator;
+  friend class const_iterator;
+
+  enum state {
+    cStateInvalid = 0,
+    cStateValid = 1
+  };
+
+  enum {
+    cMinHashSize = 4U
+  };
+
+ public:
+  typedef hash_map<Key, Value, Hasher, Equals> hash_map_type;
+  typedef std::pair<Key, Value> value_type;
+  typedef Key key_type;
+  typedef Value referent_type;
+  typedef Hasher hasher_type;
+  typedef Equals equals_type;
+
+  hash_map()
+      : m_hash_shift(32), m_num_valid(0), m_grow_threshold(0) {
+  }
+
+  hash_map(const hash_map& other)
+      : m_values(other.m_values),
+        m_hash_shift(other.m_hash_shift),
+        m_hasher(other.m_hasher),
+        m_equals(other.m_equals),
+        m_num_valid(other.m_num_valid),
+        m_grow_threshold(other.m_grow_threshold) {
+  }
+
+  hash_map& operator=(const hash_map& other) {
+    if (this == &other)
+      return *this;
+
+    clear();
+
+    m_values = other.m_values;
+    m_hash_shift = other.m_hash_shift;
+    m_num_valid = other.m_num_valid;
+    m_grow_threshold = other.m_grow_threshold;
+    m_hasher = other.m_hasher;
+    m_equals = other.m_equals;
+
+    return *this;
+  }
+
+  inline ~hash_map() {
+    clear();
+  }
+
+  const Equals& get_equals() const { return m_equals; }
+  Equals& get_equals() { return m_equals; }
+
+  void set_equals(const Equals& equals) { m_equals = equals; }
+
+  const Hasher& get_hasher() const { return m_hasher; }
+  Hasher& get_hasher() { return m_hasher; }
+
+  void set_hasher(const Hasher& hasher) { m_hasher = hasher; }
+
+  inline void clear() {
+    if (!m_values.empty()) {
+      if (CRNLIB_HAS_DESTRUCTOR(Key) || CRNLIB_HAS_DESTRUCTOR(Value)) {
+        node* p = &get_node(0);
+        node* p_end = p + m_values.size();
+
+        uint num_remaining = m_num_valid;
+        while (p != p_end) {
+          if (p->state) {
+            destruct_value_type(p);
+            num_remaining--;
+            if (!num_remaining)
+              break;
+          }
+
+          p++;
+        }
+      }
+
+      m_values.clear_no_destruction();
+
+      m_hash_shift = 32;
+      m_num_valid = 0;
+      m_grow_threshold = 0;
+    }
+  }
+
+  inline void reset() {
+    if (!m_num_valid)
+      return;
+
+    if (CRNLIB_HAS_DESTRUCTOR(Key) || CRNLIB_HAS_DESTRUCTOR(Value)) {
+      node* p = &get_node(0);
+      node* p_end = p + m_values.size();
+
+      uint num_remaining = m_num_valid;
+      while (p != p_end) {
+        if (p->state) {
+          destruct_value_type(p);
+          p->state = cStateInvalid;
+
+          num_remaining--;
+          if (!num_remaining)
+            break;
+        }
+
+        p++;
+      }
+    } else if (sizeof(node) <= 32) {
+      memset(&m_values[0], 0, m_values.size_in_bytes());
+    } else {
+      node* p = &get_node(0);
+      node* p_end = p + m_values.size();
+
+      uint num_remaining = m_num_valid;
+      while (p != p_end) {
+        if (p->state) {
+          p->state = cStateInvalid;
+
+          num_remaining--;
+          if (!num_remaining)
+            break;
+        }
+
+        p++;
+      }
+    }
+
+    m_num_valid = 0;
+  }
+
+  inline uint size() {
+    return m_num_valid;
+  }
+
+  inline uint get_table_size() {
+    return m_values.size();
+  }
+
+  inline bool empty() {
+    return !m_num_valid;
+  }
+
+  inline void reserve(uint new_capacity) {
+    uint new_hash_size = math::maximum(1U, new_capacity);
+
+    new_hash_size = new_hash_size * 2U;
+
+    if (!math::is_power_of_2(new_hash_size))
+      new_hash_size = math::next_pow2(new_hash_size);
+
+    new_hash_size = math::maximum<uint>(cMinHashSize, new_hash_size);
+
+    if (new_hash_size > m_values.size())
+      rehash(new_hash_size);
+  }
+
+  class const_iterator;
+
+  class iterator {
+    friend class hash_map<Key, Value, Hasher, Equals>;
+    friend class hash_map<Key, Value, Hasher, Equals>::const_iterator;
+
+   public:
+    inline iterator()
+        : m_pTable(NULL), m_index(0) {}
+    inline iterator(hash_map_type& table, uint index)
+        : m_pTable(&table), m_index(index) {}
+    inline iterator(const iterator& other)
+        : m_pTable(other.m_pTable), m_index(other.m_index) {}
+
+    inline iterator& operator=(const iterator& other) {
+      m_pTable = other.m_pTable;
+      m_index = other.m_index;
+      return *this;
+    }
+
+    // post-increment
+    inline iterator operator++(int) {
+      iterator result(*this);
+      ++*this;
+      return result;
+    }
+
+    // pre-increment
+    inline iterator& operator++() {
+      probe();
+      return *this;
+    }
+
+    inline value_type& operator*() const { return *get_cur(); }
+    inline value_type* operator->() const { return get_cur(); }
+
+    inline bool operator==(const iterator& b) const { return (m_pTable == b.m_pTable) && (m_index == b.m_index); }
+    inline bool operator!=(const iterator& b) const { return !(*this == b); }
+    inline bool operator==(const const_iterator& b) const { return (m_pTable == b.m_pTable) && (m_index == b.m_index); }
+    inline bool operator!=(const const_iterator& b) const { return !(*this == b); }
+
+   private:
+    hash_map_type* m_pTable;
+    uint m_index;
+
+    inline value_type* get_cur() const {
+      CRNLIB_ASSERT(m_pTable && (m_index < m_pTable->m_values.size()));
+      CRNLIB_ASSERT(m_pTable->get_node_state(m_index) == cStateValid);
+
+      return &m_pTable->get_node(m_index);
+    }
+
+    inline void probe() {
+      CRNLIB_ASSERT(m_pTable);
+      m_index = m_pTable->find_next(m_index);
+    }
+  };
+
+  class const_iterator {
+    friend class hash_map<Key, Value, Hasher, Equals>;
+    friend class hash_map<Key, Value, Hasher, Equals>::iterator;
+
+   public:
+    inline const_iterator()
+        : m_pTable(NULL), m_index(0) {}
+    inline const_iterator(const hash_map_type& table, uint index)
+        : m_pTable(&table), m_index(index) {}
+    inline const_iterator(const iterator& other)
+        : m_pTable(other.m_pTable), m_index(other.m_index) {}
+    inline const_iterator(const const_iterator& other)
+        : m_pTable(other.m_pTable), m_index(other.m_index) {}
+
+    inline const_iterator& operator=(const const_iterator& other) {
+      m_pTable = other.m_pTable;
+      m_index = other.m_index;
+      return *this;
+    }
+
+    inline const_iterator& operator=(const iterator& other) {
+      m_pTable = other.m_pTable;
+      m_index = other.m_index;
+      return *this;
+    }
+
+    // post-increment
+    inline const_iterator operator++(int) {
+      const_iterator result(*this);
+      ++*this;
+      return result;
+    }
+
+    // pre-increment
+    inline const_iterator& operator++() {
+      probe();
+      return *this;
+    }
+
+    inline const value_type& operator*() const { return *get_cur(); }
+    inline const value_type* operator->() const { return get_cur(); }
+
+    inline bool operator==(const const_iterator& b) const { return (m_pTable == b.m_pTable) && (m_index == b.m_index); }
+    inline bool operator!=(const const_iterator& b) const { return !(*this == b); }
+    inline bool operator==(const iterator& b) const { return (m_pTable == b.m_pTable) && (m_index == b.m_index); }
+    inline bool operator!=(const iterator& b) const { return !(*this == b); }
+
+   private:
+    const hash_map_type* m_pTable;
+    uint m_index;
+
+    inline const value_type* get_cur() const {
+      CRNLIB_ASSERT(m_pTable && (m_index < m_pTable->m_values.size()));
+      CRNLIB_ASSERT(m_pTable->get_node_state(m_index) == cStateValid);
+
+      return &m_pTable->get_node(m_index);
+    }
+
+    inline void probe() {
+      CRNLIB_ASSERT(m_pTable);
+      m_index = m_pTable->find_next(m_index);
+    }
+  };
+
+  inline const_iterator begin() const {
+    if (!m_num_valid)
+      return end();
+
+    return const_iterator(*this, find_next(-1));
+  }
+
+  inline const_iterator end() const {
+    return const_iterator(*this, m_values.size());
+  }
+
+  inline iterator begin() {
+    if (!m_num_valid)
+      return end();
+
+    return iterator(*this, find_next(-1));
+  }
+
+  inline iterator end() {
+    return iterator(*this, m_values.size());
+  }
+
+  // insert_result.first will always point to inserted key/value (or the already existing key/value).
+  // insert_resutt.second will be true if a new key/value was inserted, or false if the key already existed (in which case first will point to the already existing value).
+  typedef std::pair<iterator, bool> insert_result;
+
+  inline insert_result insert(const Key& k, const Value& v = Value()) {
+    insert_result result;
+    if (!insert_no_grow(result, k, v)) {
+      grow();
+
+      // This must succeed.
+      if (!insert_no_grow(result, k, v)) {
+        CRNLIB_FAIL("insert() failed");
+      }
+    }
+
+    return result;
+  }
+
+  inline insert_result insert(const value_type& v) {
+    return insert(v.first, v.second);
+  }
+
+  inline const_iterator find(const Key& k) const {
+    return const_iterator(*this, find_index(k));
+  }
+
+  inline iterator find(const Key& k) {
+    return iterator(*this, find_index(k));
+  }
+
+  inline bool erase(const Key& k) {
+    int i = find_index(k);
+
+    if (i >= static_cast<int>(m_values.size()))
+      return false;
+
+    node* pDst = &get_node(i);
+    destruct_value_type(pDst);
+    pDst->state = cStateInvalid;
+
+    m_num_valid--;
+
+    for (;;) {
+      int r, j = i;
+
+      node* pSrc = pDst;
+
+      do {
+        if (!i) {
+          i = m_values.size() - 1;
+          pSrc = &get_node(i);
+        } else {
+          i--;
+          pSrc--;
+        }
+
+        if (!pSrc->state)
+          return true;
+
+        r = hash_key(pSrc->first);
+
+      } while ((i <= r && r < j) || (r < j && j < i) || (j < i && i <= r));
+
+      move_node(pDst, pSrc);
+
+      pDst = pSrc;
+    }
+  }
+
+  inline void swap(hash_map_type& other) {
+    m_values.swap(other.m_values);
+    utils::swap(m_hash_shift, other.m_hash_shift);
+    utils::swap(m_num_valid, other.m_num_valid);
+    utils::swap(m_grow_threshold, other.m_grow_threshold);
+    utils::swap(m_hasher, other.m_hasher);
+    utils::swap(m_equals, other.m_equals);
+  }
+
+ private:
+  struct node : public value_type {
+    uint8 state;
+  };
+
+  static inline void construct_value_type(value_type* pDst, const Key& k, const Value& v) {
+    if (CRNLIB_IS_BITWISE_COPYABLE(Key))
+      memcpy(&pDst->first, &k, sizeof(Key));
+    else
+      scalar_type<Key>::construct(&pDst->first, k);
+
+    if (CRNLIB_IS_BITWISE_COPYABLE(Value))
+      memcpy(&pDst->second, &v, sizeof(Value));
+    else
+      scalar_type<Value>::construct(&pDst->second, v);
+  }
+
+  static inline void construct_value_type(value_type* pDst, const value_type* pSrc) {
+    if ((CRNLIB_IS_BITWISE_COPYABLE(Key)) && (CRNLIB_IS_BITWISE_COPYABLE(Value))) {
+      memcpy(pDst, pSrc, sizeof(value_type));
+    } else {
+      if (CRNLIB_IS_BITWISE_COPYABLE(Key))
+        memcpy(&pDst->first, &pSrc->first, sizeof(Key));
+      else
+        scalar_type<Key>::construct(&pDst->first, pSrc->first);
+
+      if (CRNLIB_IS_BITWISE_COPYABLE(Value))
+        memcpy(&pDst->second, &pSrc->second, sizeof(Value));
+      else
+        scalar_type<Value>::construct(&pDst->second, pSrc->second);
+    }
+  }
+
+  static inline void destruct_value_type(value_type* p) {
+    scalar_type<Key>::destruct(&p->first);
+    scalar_type<Value>::destruct(&p->second);
+  }
+
+  // Moves *pSrc to *pDst efficiently.
+  // pDst should NOT be constructed on entry.
+  static inline void move_node(node* pDst, node* pSrc) {
+    CRNLIB_ASSERT(!pDst->state);
+
+    if (CRNLIB_IS_BITWISE_COPYABLE_OR_MOVABLE(Key) && CRNLIB_IS_BITWISE_COPYABLE_OR_MOVABLE(Value)) {
+      memcpy(pDst, pSrc, sizeof(node));
+    } else {
+      if (CRNLIB_IS_BITWISE_COPYABLE_OR_MOVABLE(Key))
+        memcpy(&pDst->first, &pSrc->first, sizeof(Key));
+      else {
+        scalar_type<Key>::construct(&pDst->first, pSrc->first);
+        scalar_type<Key>::destruct(&pSrc->first);
+      }
+
+      if (CRNLIB_IS_BITWISE_COPYABLE_OR_MOVABLE(Value))
+        memcpy(&pDst->second, &pSrc->second, sizeof(Value));
+      else {
+        scalar_type<Value>::construct(&pDst->second, pSrc->second);
+        scalar_type<Value>::destruct(&pSrc->second);
+      }
+
+      pDst->state = cStateValid;
+    }
+
+    pSrc->state = cStateInvalid;
+  }
+
+  struct raw_node {
+    inline raw_node() {
+      node* p = reinterpret_cast<node*>(this);
+      p->state = cStateInvalid;
+    }
+
+    inline ~raw_node() {
+      node* p = reinterpret_cast<node*>(this);
+      if (p->state)
+        hash_map_type::destruct_value_type(p);
+    }
+
+    inline raw_node(const raw_node& other) {
+      node* pDst = reinterpret_cast<node*>(this);
+      const node* pSrc = reinterpret_cast<const node*>(&other);
+
+      if (pSrc->state) {
+        hash_map_type::construct_value_type(pDst, pSrc);
+        pDst->state = cStateValid;
+      } else
+        pDst->state = cStateInvalid;
+    }
+
+    inline raw_node& operator=(const raw_node& rhs) {
+      if (this == &rhs)
+        return *this;
+
+      node* pDst = reinterpret_cast<node*>(this);
+      const node* pSrc = reinterpret_cast<const node*>(&rhs);
+
+      if (pSrc->state) {
+        if (pDst->state) {
+          pDst->first = pSrc->first;
+          pDst->second = pSrc->second;
+        } else {
+          hash_map_type::construct_value_type(pDst, pSrc);
+          pDst->state = cStateValid;
+        }
+      } else if (pDst->state) {
+        hash_map_type::destruct_value_type(pDst);
+        pDst->state = cStateInvalid;
+      }
+
+      return *this;
+    }
+
+    uint8 m_bits[sizeof(node)];
+  };
+
+  typedef crnlib::vector<raw_node> node_vector;
+
+  node_vector m_values;
+  uint m_hash_shift;
+
+  Hasher m_hasher;
+  Equals m_equals;
+
+  uint m_num_valid;
+
+  uint m_grow_threshold;
+
+  inline int hash_key(const Key& k) const {
+    CRNLIB_ASSERT((1U << (32U - m_hash_shift)) == m_values.size());
+
+    uint hash = static_cast<uint>(m_hasher(k));
+
+    // Fibonacci hashing
+    hash = (2654435769U * hash) >> m_hash_shift;
+
+    CRNLIB_ASSERT(hash < m_values.size());
+    return hash;
+  }
+
+  inline const node& get_node(uint index) const {
+    return *reinterpret_cast<const node*>(&m_values[index]);
+  }
+
+  inline node& get_node(uint index) {
+    return *reinterpret_cast<node*>(&m_values[index]);
+  }
+
+  inline state get_node_state(uint index) const {
+    return static_cast<state>(get_node(index).state);
+  }
+
+  inline void set_node_state(uint index, bool valid) {
+    get_node(index).state = valid;
+  }
+
+  inline void grow() {
+    rehash(math::maximum<uint>(cMinHashSize, m_values.size() * 2U));
+  }
+
+  inline void rehash(uint new_hash_size) {
+    CRNLIB_ASSERT(new_hash_size >= m_num_valid);
+    CRNLIB_ASSERT(math::is_power_of_2(new_hash_size));
+
+    if ((new_hash_size < m_num_valid) || (new_hash_size == m_values.size()))
+      return;
+
+    hash_map new_map;
+    new_map.m_values.resize(new_hash_size);
+    new_map.m_hash_shift = 32U - math::floor_log2i(new_hash_size);
+    CRNLIB_ASSERT(new_hash_size == (1U << (32U - new_map.m_hash_shift)));
+    new_map.m_grow_threshold = UINT_MAX;
+
+    node* pNode = reinterpret_cast<node*>(m_values.begin());
+    node* pNode_end = pNode + m_values.size();
+
+    while (pNode != pNode_end) {
+      if (pNode->state) {
+        new_map.move_into(pNode);
+
+        if (new_map.m_num_valid == m_num_valid)
+          break;
+      }
+
+      pNode++;
+    }
+
+    new_map.m_grow_threshold = (new_hash_size + 1U) >> 1U;
+
+    m_values.clear_no_destruction();
+    m_hash_shift = 32;
+
+    swap(new_map);
+  }
+
+  inline uint find_next(int index) const {
+    index++;
+
+    if (index >= static_cast<int>(m_values.size()))
+      return index;
+
+    const node* pNode = &get_node(index);
+
+    for (;;) {
+      if (pNode->state)
+        break;
+
+      if (++index >= static_cast<int>(m_values.size()))
+        break;
+
+      pNode++;
+    }
+
+    return index;
+  }
+
+  inline uint find_index(const Key& k) const {
+    if (m_num_valid) {
+      int index = hash_key(k);
+      const node* pNode = &get_node(index);
+
+      if (pNode->state) {
+        if (m_equals(pNode->first, k))
+          return index;
+
+        const int orig_index = index;
+
+        for (;;) {
+          if (!index) {
+            index = m_values.size() - 1;
+            pNode = &get_node(index);
+          } else {
+            index--;
+            pNode--;
+          }
+
+          if (index == orig_index)
+            break;
+
+          if (!pNode->state)
+            break;
+
+          if (m_equals(pNode->first, k))
+            return index;
+        }
+      }
+    }
+
+    return m_values.size();
+  }
+
+  inline bool insert_no_grow(insert_result& result, const Key& k, const Value& v = Value()) {
+    if (!m_values.size())
+      return false;
+
+    int index = hash_key(k);
+    node* pNode = &get_node(index);
+
+    if (pNode->state) {
+      if (m_equals(pNode->first, k)) {
+        result.first = iterator(*this, index);
+        result.second = false;
+        return true;
+      }
+
+      const int orig_index = index;
+
+      for (;;) {
+        if (!index) {
+          index = m_values.size() - 1;
+          pNode = &get_node(index);
+        } else {
+          index--;
+          pNode--;
+        }
+
+        if (orig_index == index)
+          return false;
+
+        if (!pNode->state)
+          break;
+
+        if (m_equals(pNode->first, k)) {
+          result.first = iterator(*this, index);
+          result.second = false;
+          return true;
+        }
+      }
+    }
+
+    if (m_num_valid >= m_grow_threshold)
+      return false;
+
+    construct_value_type(pNode, k, v);
+
+    pNode->state = cStateValid;
+
+    m_num_valid++;
+    CRNLIB_ASSERT(m_num_valid <= m_values.size());
+
+    result.first = iterator(*this, index);
+    result.second = true;
+
+    return true;
+  }
+
+  inline void move_into(node* pNode) {
+    int index = hash_key(pNode->first);
+    node* pDst_node = &get_node(index);
+
+    if (pDst_node->state) {
+      const int orig_index = index;
+
+      for (;;) {
+        if (!index) {
+          index = m_values.size() - 1;
+          pDst_node = &get_node(index);
+        } else {
+          index--;
+          pDst_node--;
+        }
+
+        if (index == orig_index) {
+          CRNLIB_ASSERT(false);
+          return;
+        }
+
+        if (!pDst_node->state)
+          break;
+      }
+    }
+
+    move_node(pDst_node, pNode);
+
+    m_num_valid++;
+  }
+};
+
+template <typename Key, typename Value, typename Hasher, typename Equals>
+struct bitwise_movable<hash_map<Key, Value, Hasher, Equals> > {
+  enum { cFlag = true };
+};
+
+template <typename Key, typename Value, typename Hasher, typename Equals>
+inline void swap(hash_map<Key, Value, Hasher, Equals>& a, hash_map<Key, Value, Hasher, Equals>& b) {
+  a.swap(b);
+}
+
+extern void hash_map_test();
+
+}  // namespace crnlib
@@ -0,0 +1,62 @@
+// File: crn_helpers.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+#define CRNLIB_NO_COPY_OR_ASSIGNMENT_OP(c) \
+  c(const c&);                             \
+  c& operator=(const c&);
+#define CRNLIB_NO_HEAP_ALLOC()       \
+ private:                            \
+  static void* operator new(size_t); \
+  static void* operator new[](size_t);
+
+namespace crnlib {
+namespace helpers {
+template <typename T>
+struct rel_ops {
+  friend bool operator!=(const T& x, const T& y) { return (!(x == y)); }
+  friend bool operator>(const T& x, const T& y) { return (y < x); }
+  friend bool operator<=(const T& x, const T& y) { return (!(y < x)); }
+  friend bool operator>=(const T& x, const T& y) { return (!(x < y)); }
+};
+
+template <typename T>
+inline T* construct(T* p) {
+  return new (static_cast<void*>(p)) T;
+}
+
+template <typename T, typename U>
+inline T* construct(T* p, const U& init) {
+  return new (static_cast<void*>(p)) T(init);
+}
+
+template <typename T>
+inline void construct_array(T* p, uint n) {
+  T* q = p + n;
+  for (; p != q; ++p)
+    new (static_cast<void*>(p)) T;
+}
+
+template <typename T, typename U>
+inline void construct_array(T* p, uint n, const U& init) {
+  T* q = p + n;
+  for (; p != q; ++p)
+    new (static_cast<void*>(p)) T(init);
+}
+
+template <typename T>
+inline void destruct(T* p) {
+  (void)p;
+  p->~T();
+}
+
+template <typename T>
+inline void destruct_array(T* p, uint n) {
+  T* q = p + n;
+  for (; p != q; ++p)
+    p->~T();
+}
+
+}  // namespace helpers
+
+}  // namespace crnlib
@@ -0,0 +1,366 @@
+// File: crn_huffman_codes.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_huffman_codes.h"
+
+namespace crnlib {
+struct sym_freq {
+  uint m_freq;
+  uint16 m_left;
+  uint16 m_right;
+
+  inline bool operator<(const sym_freq& other) const {
+    return m_freq > other.m_freq;
+  }
+};
+
+static inline sym_freq* radix_sort_syms(uint num_syms, sym_freq* syms0, sym_freq* syms1) {
+  const uint cMaxPasses = 2;
+  uint hist[256 * cMaxPasses];
+
+  memset(hist, 0, sizeof(hist[0]) * 256 * cMaxPasses);
+
+  sym_freq* p = syms0;
+  sym_freq* q = syms0 + (num_syms >> 1) * 2;
+
+  for (; p != q; p += 2) {
+    const uint freq0 = p[0].m_freq;
+    const uint freq1 = p[1].m_freq;
+
+    hist[freq0 & 0xFF]++;
+    hist[256 + ((freq0 >> 8) & 0xFF)]++;
+
+    hist[freq1 & 0xFF]++;
+    hist[256 + ((freq1 >> 8) & 0xFF)]++;
+  }
+
+  if (num_syms & 1) {
+    const uint freq = p->m_freq;
+
+    hist[freq & 0xFF]++;
+    hist[256 + ((freq >> 8) & 0xFF)]++;
+  }
+
+  sym_freq* pCur_syms = syms0;
+  sym_freq* pNew_syms = syms1;
+
+  for (uint pass = 0; pass < cMaxPasses; pass++) {
+    const uint* pHist = &hist[pass << 8];
+
+    uint offsets[256];
+
+    uint cur_ofs = 0;
+    for (uint i = 0; i < 256; i += 2) {
+      offsets[i] = cur_ofs;
+      cur_ofs += pHist[i];
+
+      offsets[i + 1] = cur_ofs;
+      cur_ofs += pHist[i + 1];
+    }
+
+    const uint pass_shift = pass << 3;
+
+    sym_freq* p = pCur_syms;
+    sym_freq* q = pCur_syms + (num_syms >> 1) * 2;
+
+    for (; p != q; p += 2) {
+      uint c0 = p[0].m_freq;
+      uint c1 = p[1].m_freq;
+
+      if (pass) {
+        c0 >>= 8;
+        c1 >>= 8;
+      }
+
+      c0 &= 0xFF;
+      c1 &= 0xFF;
+
+      if (c0 == c1) {
+        uint dst_offset0 = offsets[c0];
+
+        offsets[c0] = dst_offset0 + 2;
+
+        pNew_syms[dst_offset0] = p[0];
+        pNew_syms[dst_offset0 + 1] = p[1];
+      } else {
+        uint dst_offset0 = offsets[c0]++;
+        uint dst_offset1 = offsets[c1]++;
+
+        pNew_syms[dst_offset0] = p[0];
+        pNew_syms[dst_offset1] = p[1];
+      }
+    }
+
+    if (num_syms & 1) {
+      uint c = ((p->m_freq) >> pass_shift) & 0xFF;
+
+      uint dst_offset = offsets[c];
+      offsets[c] = dst_offset + 1;
+
+      pNew_syms[dst_offset] = *p;
+    }
+
+    sym_freq* t = pCur_syms;
+    pCur_syms = pNew_syms;
+    pNew_syms = t;
+  }
+
+#ifdef CRNLIB_ASSERTS_ENABLED
+  uint prev_freq = 0;
+  for (uint i = 0; i < num_syms; i++) {
+    CRNLIB_ASSERT(!(pCur_syms[i].m_freq < prev_freq));
+    prev_freq = pCur_syms[i].m_freq;
+  }
+#endif
+
+  return pCur_syms;
+}
+
+struct huffman_work_tables {
+  enum { cMaxInternalNodes = cHuffmanMaxSupportedSyms };
+
+  sym_freq syms0[cHuffmanMaxSupportedSyms + 1 + cMaxInternalNodes];
+  sym_freq syms1[cHuffmanMaxSupportedSyms + 1 + cMaxInternalNodes];
+
+  uint16 queue[cMaxInternalNodes];
+};
+
+void* create_generate_huffman_codes_tables() {
+  return crnlib_new<huffman_work_tables>();
+}
+
+void free_generate_huffman_codes_tables(void* p) {
+  crnlib_delete(static_cast<huffman_work_tables*>(p));
+}
+
+#if USE_CALCULATE_MINIMUM_REDUNDANCY
+/* calculate_minimum_redundancy() written by
+      Alistair Moffat, alistair@cs.mu.oz.au,
+      Jyrki Katajainen, jyrki@diku.dk
+      November 1996.
+   */
+static void calculate_minimum_redundancy(int A[], int n) {
+  int root; /* next root node to be used */
+  int leaf; /* next leaf to be used */
+  int next; /* next value to be assigned */
+  int avbl; /* number of available nodes */
+  int used; /* number of internal nodes */
+  int dpth; /* current depth of leaves */
+
+  /* check for pathological cases */
+  if (n == 0) {
+    return;
+  }
+  if (n == 1) {
+    A[0] = 0;
+    return;
+  }
+
+  /* first pass, left to right, setting parent pointers */
+  A[0] += A[1];
+  root = 0;
+  leaf = 2;
+  for (next = 1; next < n - 1; next++) {
+    /* select first item for a pairing */
+    if (leaf >= n || A[root] < A[leaf]) {
+      A[next] = A[root];
+      A[root++] = next;
+    } else
+      A[next] = A[leaf++];
+
+    /* add on the second item */
+    if (leaf >= n || (root < next && A[root] < A[leaf])) {
+      A[next] += A[root];
+      A[root++] = next;
+    } else
+      A[next] += A[leaf++];
+  }
+
+  /* second pass, right to left, setting internal depths */
+  A[n - 2] = 0;
+  for (next = n - 3; next >= 0; next--)
+    A[next] = A[A[next]] + 1;
+
+  /* third pass, right to left, setting leaf depths */
+  avbl = 1;
+  used = dpth = 0;
+  root = n - 2;
+  next = n - 1;
+  while (avbl > 0) {
+    while (root >= 0 && A[root] == dpth) {
+      used++;
+      root--;
+    }
+    while (avbl > used) {
+      A[next--] = dpth;
+      avbl--;
+    }
+    avbl = 2 * used;
+    dpth++;
+    used = 0;
+  }
+}
+#endif
+
+bool generate_huffman_codes(void* pContext, uint num_syms, const uint16* pFreq, uint8* pCodesizes, uint& max_code_size, uint& total_freq_ret) {
+  if ((!num_syms) || (num_syms > cHuffmanMaxSupportedSyms))
+    return false;
+
+  huffman_work_tables& state = *static_cast<huffman_work_tables*>(pContext);
+  ;
+
+  uint max_freq = 0;
+  uint total_freq = 0;
+
+  uint num_used_syms = 0;
+  for (uint i = 0; i < num_syms; i++) {
+    uint freq = pFreq[i];
+
+    if (!freq)
+      pCodesizes[i] = 0;
+    else {
+      total_freq += freq;
+      max_freq = math::maximum(max_freq, freq);
+
+      sym_freq& sf = state.syms0[num_used_syms];
+      sf.m_left = (uint16)i;
+      sf.m_right = cUINT16_MAX;
+      sf.m_freq = freq;
+      num_used_syms++;
+    }
+  }
+
+  total_freq_ret = total_freq;
+
+  if (num_used_syms == 1) {
+    pCodesizes[state.syms0[0].m_left] = 1;
+    return true;
+  }
+
+  sym_freq* syms = radix_sort_syms(num_used_syms, state.syms0, state.syms1);
+
+#if USE_CALCULATE_MINIMUM_REDUNDANCY
+  int x[cHuffmanMaxSupportedSyms];
+  for (uint i = 0; i < num_used_syms; i++)
+    x[i] = state.syms0[i].m_freq;
+
+  calculate_minimum_redundancy(x, num_used_syms);
+
+  uint max_len = 0;
+  for (uint i = 0; i < num_used_syms; i++) {
+    uint len = x[i];
+    max_len = math::maximum(len, max_len);
+    pCodesizes[state.syms0[i].m_left] = static_cast<uint8>(len);
+  }
+
+  return true;
+#else
+  // Dummy node
+  sym_freq& sf = state.syms0[num_used_syms];
+  sf.m_left = cUINT16_MAX;
+  sf.m_right = cUINT16_MAX;
+  sf.m_freq = UINT_MAX;
+
+  uint next_internal_node = num_used_syms + 1;
+
+  uint queue_front = 0;
+  uint queue_end = 0;
+
+  uint next_lowest_sym = 0;
+
+  uint num_nodes_remaining = num_used_syms;
+  do {
+    uint left_freq = syms[next_lowest_sym].m_freq;
+    uint left_child = next_lowest_sym;
+
+    if ((queue_end > queue_front) && (syms[state.queue[queue_front]].m_freq < left_freq)) {
+      left_child = state.queue[queue_front];
+      left_freq = syms[left_child].m_freq;
+
+      queue_front++;
+    } else
+      next_lowest_sym++;
+
+    uint right_freq = syms[next_lowest_sym].m_freq;
+    uint right_child = next_lowest_sym;
+
+    if ((queue_end > queue_front) && (syms[state.queue[queue_front]].m_freq < right_freq)) {
+      right_child = state.queue[queue_front];
+      right_freq = syms[right_child].m_freq;
+
+      queue_front++;
+    } else
+      next_lowest_sym++;
+
+    const uint internal_node_index = next_internal_node;
+    next_internal_node++;
+
+    CRNLIB_ASSERT(next_internal_node < CRNLIB_ARRAYSIZE(state.syms0));
+
+    syms[internal_node_index].m_freq = left_freq + right_freq;
+    syms[internal_node_index].m_left = static_cast<uint16>(left_child);
+    syms[internal_node_index].m_right = static_cast<uint16>(right_child);
+
+    CRNLIB_ASSERT(queue_end < huffman_work_tables::cMaxInternalNodes);
+    state.queue[queue_end] = static_cast<uint16>(internal_node_index);
+    queue_end++;
+
+    num_nodes_remaining--;
+
+  } while (num_nodes_remaining > 1);
+
+  CRNLIB_ASSERT(next_lowest_sym == num_used_syms);
+  CRNLIB_ASSERT((queue_end - queue_front) == 1);
+
+  uint cur_node_index = state.queue[queue_front];
+
+  uint32* pStack = (syms == state.syms0) ? (uint32*)state.syms1 : (uint32*)state.syms0;
+  uint32* pStack_top = pStack;
+
+  uint max_level = 0;
+
+  for (;;) {
+    uint level = cur_node_index >> 16;
+    uint node_index = cur_node_index & 0xFFFF;
+
+    uint left_child = syms[node_index].m_left;
+    uint right_child = syms[node_index].m_right;
+
+    uint next_level = (cur_node_index + 0x10000) & 0xFFFF0000;
+
+    if (left_child < num_used_syms) {
+      max_level = math::maximum(max_level, level);
+
+      pCodesizes[syms[left_child].m_left] = static_cast<uint8>(level + 1);
+
+      if (right_child < num_used_syms) {
+        pCodesizes[syms[right_child].m_left] = static_cast<uint8>(level + 1);
+
+        if (pStack == pStack_top)
+          break;
+        cur_node_index = *--pStack;
+      } else {
+        cur_node_index = next_level | right_child;
+      }
+    } else {
+      if (right_child < num_used_syms) {
+        max_level = math::maximum(max_level, level);
+
+        pCodesizes[syms[right_child].m_left] = static_cast<uint8>(level + 1);
+
+        cur_node_index = next_level | left_child;
+      } else {
+        *pStack++ = next_level | left_child;
+
+        cur_node_index = next_level | right_child;
+      }
+    }
+  }
+
+  max_code_size = max_level + 1;
+#endif
+
+  return true;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,13 @@
+// File: crn_huffman_codes.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+const uint cHuffmanMaxSupportedSyms = 8192;
+
+void* create_generate_huffman_codes_tables();
+void free_generate_huffman_codes_tables(void* p);
+
+bool generate_huffman_codes(void* pContext, uint num_syms, const uint16* pFreq, uint8* pCodesizes, uint& max_code_size, uint& total_freq_ret);
+
+}  // namespace crnlib
@@ -0,0 +1,635 @@
+// File: crn_image.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_color.h"
+#include "crn_vec.h"
+#include "crn_pixel_format.h"
+#include "crn_rect.h"
+
+namespace crnlib {
+template <typename color_type>
+class image {
+ public:
+  typedef color_type color_t;
+
+  typedef crnlib::vector<color_type> pixel_buf_t;
+
+  image()
+      : m_width(0),
+        m_height(0),
+        m_pitch(0),
+        m_total(0),
+        m_comp_flags(pixel_format_helpers::cDefaultCompFlags),
+        m_pPixels(NULL) {
+  }
+
+  // pitch is in PIXELS, not bytes.
+  image(uint width, uint height, uint pitch = UINT_MAX, const color_type& background = color_type::make_black(), uint flags = pixel_format_helpers::cDefaultCompFlags)
+      : m_comp_flags(flags) {
+    CRNLIB_ASSERT((width > 0) && (height > 0));
+    if (pitch == UINT_MAX)
+      pitch = width;
+
+    m_pixel_buf.resize(pitch * height);
+
+    m_width = width;
+    m_height = height;
+    m_pitch = pitch;
+    m_total = m_pitch * m_height;
+
+    m_pPixels = &m_pixel_buf.front();
+
+    set_all(background);
+  }
+
+  // pitch is in PIXELS, not bytes.
+  image(color_type* pPixels, uint width, uint height, uint pitch = UINT_MAX, uint flags = pixel_format_helpers::cDefaultCompFlags) {
+    alias(pPixels, width, height, pitch, flags);
+  }
+
+  image& operator=(const image& other) {
+    if (this == &other)
+      return *this;
+
+    if (other.m_pixel_buf.empty()) {
+      // This doesn't look very safe - let's make a new instance.
+      //m_pixel_buf.clear();
+      //m_pPixels = other.m_pPixels;
+
+      const uint total_pixels = other.m_pitch * other.m_height;
+      if ((total_pixels) && (other.m_pPixels)) {
+        m_pixel_buf.resize(total_pixels);
+        m_pixel_buf.insert(0, other.m_pPixels, m_pixel_buf.size());
+        m_pPixels = &m_pixel_buf.front();
+      } else {
+        m_pixel_buf.clear();
+        m_pPixels = NULL;
+      }
+    } else {
+      m_pixel_buf = other.m_pixel_buf;
+      m_pPixels = &m_pixel_buf.front();
+    }
+
+    m_width = other.m_width;
+    m_height = other.m_height;
+    m_pitch = other.m_pitch;
+    m_total = other.m_total;
+    m_comp_flags = other.m_comp_flags;
+
+    return *this;
+  }
+
+  image(const image& other)
+      : m_width(0), m_height(0), m_pitch(0), m_total(0), m_comp_flags(pixel_format_helpers::cDefaultCompFlags), m_pPixels(NULL) {
+    *this = other;
+  }
+
+  // pitch is in PIXELS, not bytes.
+  void alias(color_type* pPixels, uint width, uint height, uint pitch = UINT_MAX, uint flags = pixel_format_helpers::cDefaultCompFlags) {
+    m_pixel_buf.clear();
+
+    m_pPixels = pPixels;
+
+    m_width = width;
+    m_height = height;
+    m_pitch = (pitch == UINT_MAX) ? width : pitch;
+    m_total = m_pitch * m_height;
+    m_comp_flags = flags;
+  }
+
+  // pitch is in PIXELS, not bytes.
+  bool grant_ownership(color_type* pPixels, uint width, uint height, uint pitch = UINT_MAX, uint flags = pixel_format_helpers::cDefaultCompFlags) {
+    if (pitch == UINT_MAX)
+      pitch = width;
+
+    if ((!pPixels) || (!width) || (!height) || (pitch < width)) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    if (pPixels == get_ptr()) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    clear();
+
+    if (!m_pixel_buf.grant_ownership(pPixels, height * pitch, height * pitch))
+      return false;
+
+    m_pPixels = pPixels;
+
+    m_width = width;
+    m_height = height;
+    m_pitch = pitch;
+    m_total = pitch * height;
+    m_comp_flags = flags;
+
+    return true;
+  }
+
+  void clear() {
+    m_pPixels = NULL;
+    m_pixel_buf.clear();
+    m_width = 0;
+    m_height = 0;
+    m_pitch = 0;
+    m_total = 0;
+    m_comp_flags = pixel_format_helpers::cDefaultCompFlags;
+  }
+
+  inline bool is_valid() const { return m_total > 0; }
+
+  inline pixel_format_helpers::component_flags get_comp_flags() const { return static_cast<pixel_format_helpers::component_flags>(m_comp_flags); }
+  inline void set_comp_flags(pixel_format_helpers::component_flags new_flags) { m_comp_flags = new_flags; }
+  inline void reset_comp_flags() { m_comp_flags = pixel_format_helpers::cDefaultCompFlags; }
+
+  inline bool is_component_valid(uint index) const {
+    CRNLIB_ASSERT(index < 4U);
+    return utils::is_flag_set(m_comp_flags, index);
+  }
+  inline void set_component_valid(uint index, bool state) {
+    CRNLIB_ASSERT(index < 4U);
+    utils::set_flag(m_comp_flags, index, state);
+  }
+
+  inline bool has_rgb() const { return is_component_valid(0) || is_component_valid(1) || is_component_valid(2); }
+  inline bool has_alpha() const { return is_component_valid(3); }
+
+  inline bool is_grayscale() const { return utils::is_bit_set(m_comp_flags, pixel_format_helpers::cCompFlagGrayscale); }
+  inline void set_grayscale(bool state) { utils::set_bit(m_comp_flags, pixel_format_helpers::cCompFlagGrayscale, state); }
+
+  void set_all(const color_type& c) {
+    for (uint i = 0; i < m_total; i++)
+      m_pPixels[i] = c;
+  }
+
+  void flip_x() {
+    const uint half_width = m_width / 2;
+    for (uint y = 0; y < m_height; y++) {
+      for (uint x = 0; x < half_width; x++) {
+        color_type c((*this)(x, y));
+        (*this)(x, y) = (*this)(m_width - 1 - x, y);
+        (*this)(m_width - 1 - x, y) = c;
+      }
+    }
+  }
+
+  void flip_y() {
+    const uint half_height = m_height / 2;
+    for (uint y = 0; y < half_height; y++) {
+      for (uint x = 0; x < m_width; x++) {
+        color_type c((*this)(x, y));
+        (*this)(x, y) = (*this)(x, m_height - 1 - y);
+        (*this)(x, m_height - 1 - y) = c;
+      }
+    }
+  }
+
+  void convert_to_grayscale() {
+    for (uint y = 0; y < m_height; y++)
+      for (uint x = 0; x < m_width; x++) {
+        color_type c((*this)(x, y));
+        typename color_type::component_t l = static_cast<typename color_type::component_t>(c.get_luma());
+        c.r = l;
+        c.g = l;
+        c.b = l;
+        (*this)(x, y) = c;
+      }
+
+    set_grayscale(true);
+  }
+
+  void swizzle(uint r, uint g, uint b, uint a) {
+    for (uint y = 0; y < m_height; y++)
+      for (uint x = 0; x < m_width; x++) {
+        const color_type& c = (*this)(x, y);
+
+        (*this)(x, y) = color_type(c[r], c[g], c[b], c[a]);
+      }
+  }
+
+  void set_alpha_to_luma() {
+    for (uint y = 0; y < m_height; y++)
+      for (uint x = 0; x < m_width; x++) {
+        color_type c((*this)(x, y));
+        typename color_type::component_t l = static_cast<typename color_type::component_t>(c.get_luma());
+        c.a = l;
+        (*this)(x, y) = c;
+      }
+
+    set_component_valid(3, true);
+  }
+
+  bool extract_block(color_type* pDst, uint x, uint y, uint w, uint h, bool flip_xy = false) const {
+    if ((x >= m_width) || (y >= m_height)) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    if (flip_xy) {
+      for (uint y_ofs = 0; y_ofs < h; y_ofs++)
+        for (uint x_ofs = 0; x_ofs < w; x_ofs++)
+          pDst[x_ofs * h + y_ofs] = get_clamped(x_ofs + x, y_ofs + y);  // 5/4/12 - this was incorrectly x_ofs * 4
+    } else if (((x + w) > m_width) || ((y + h) > m_height)) {
+      for (uint y_ofs = 0; y_ofs < h; y_ofs++)
+        for (uint x_ofs = 0; x_ofs < w; x_ofs++)
+          *pDst++ = get_clamped(x_ofs + x, y_ofs + y);
+    } else {
+      const color_type* pSrc = get_scanline(y) + x;
+
+      for (uint i = h; i; i--) {
+        memcpy(pDst, pSrc, w * sizeof(color_type));
+        pDst += w;
+
+        pSrc += m_pitch;
+      }
+    }
+
+    return true;
+  }
+
+  // No clipping!
+  void unclipped_fill_box(uint x, uint y, uint w, uint h, const color_type& c) {
+    if (((x + w) > m_width) || ((y + h) > m_height)) {
+      CRNLIB_ASSERT(0);
+      return;
+    }
+
+    color_type* p = get_scanline(y) + x;
+
+    for (uint i = h; i; i--) {
+      color_type* q = p;
+      for (uint j = w; j; j--)
+        *q++ = c;
+      p += m_pitch;
+    }
+  }
+
+  void draw_rect(int x, int y, uint width, uint height, const color_type& c) {
+    draw_line(x, y, x + width - 1, y, c);
+    draw_line(x, y, x, y + height - 1, c);
+    draw_line(x + width - 1, y, x + width - 1, y + height - 1, c);
+    draw_line(x, y + height - 1, x + width - 1, y + height - 1, c);
+  }
+
+  // No clipping!
+  bool unclipped_blit(uint src_x, uint src_y, uint src_w, uint src_h, uint dst_x, uint dst_y, const image& src) {
+    if ((!is_valid()) || (!src.is_valid())) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    if (((src_x + src_w) > src.get_width()) || ((src_y + src_h) > src.get_height())) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    if (((dst_x + src_w) > get_width()) || ((dst_y + src_h) > get_height())) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    const color_type* pS = &src(src_x, src_y);
+    color_type* pD = &(*this)(dst_x, dst_y);
+
+    const uint bytes_to_copy = src_w * sizeof(color_type);
+    for (uint i = src_h; i; i--) {
+      memcpy(pD, pS, bytes_to_copy);
+
+      pS += src.get_pitch();
+      pD += get_pitch();
+    }
+
+    return true;
+  }
+
+  // With clipping.
+  bool blit(int dst_x, int dst_y, const image& src) {
+    if ((!is_valid()) || (!src.is_valid())) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    int src_x = 0;
+    int src_y = 0;
+
+    if (dst_x < 0) {
+      src_x = -dst_x;
+      if (src_x >= static_cast<int>(src.get_width()))
+        return false;
+      dst_x = 0;
+    }
+
+    if (dst_y < 0) {
+      src_y = -dst_y;
+      if (src_y >= static_cast<int>(src.get_height()))
+        return false;
+      dst_y = 0;
+    }
+
+    if ((dst_x >= (int)m_width) || (dst_y >= (int)m_height))
+      return false;
+
+    uint width = math::minimum(m_width - dst_x, src.get_width() - src_x);
+    uint height = math::minimum(m_height - dst_y, src.get_height() - src_y);
+
+    bool success = unclipped_blit(src_x, src_y, width, height, dst_x, dst_y, src);
+    (void)success;
+    CRNLIB_ASSERT(success);
+
+    return true;
+  }
+
+  // With clipping.
+  bool blit(int src_x, int src_y, int src_w, int src_h, int dst_x, int dst_y, const image& src) {
+    if ((!is_valid()) || (!src.is_valid())) {
+      CRNLIB_ASSERT(0);
+      return false;
+    }
+
+    rect src_rect(src_x, src_y, src_x + src_w, src_y + src_h);
+    if (!src_rect.intersect(src.get_bounds()))
+      return false;
+
+    rect dst_rect(dst_x, dst_y, dst_x + src_rect.get_width(), dst_y + src_rect.get_height());
+    if (!dst_rect.intersect(get_bounds()))
+      return false;
+
+    bool success = unclipped_blit(
+        src_rect.get_left(), src_rect.get_top(),
+        math::minimum(src_rect.get_width(), dst_rect.get_width()), math::minimum(src_rect.get_height(), dst_rect.get_height()),
+        dst_rect.get_left(), dst_rect.get_top(), src);
+    (void)success;
+    CRNLIB_ASSERT(success);
+
+    return true;
+  }
+
+  // In-place resize of image dimensions (cropping).
+  bool resize(uint new_width, uint new_height, uint new_pitch = UINT_MAX, const color_type background = color_type::make_black()) {
+    if (new_pitch == UINT_MAX)
+      new_pitch = new_width;
+
+    if ((new_width == m_width) && (new_height == m_height) && (new_pitch == m_pitch))
+      return true;
+
+    if ((!new_width) || (!new_height) || (!new_pitch)) {
+      clear();
+      return false;
+    }
+
+    pixel_buf_t existing_pixels;
+    existing_pixels.swap(m_pixel_buf);
+
+    if (!m_pixel_buf.try_resize(new_height * new_pitch)) {
+      clear();
+      return false;
+    }
+
+    for (uint y = 0; y < new_height; y++) {
+      for (uint x = 0; x < new_width; x++) {
+        if ((x < m_width) && (y < m_height))
+          m_pixel_buf[x + y * new_pitch] = existing_pixels[x + y * m_pitch];
+        else
+          m_pixel_buf[x + y * new_pitch] = background;
+      }
+    }
+
+    m_width = new_width;
+    m_height = new_height;
+    m_pitch = new_pitch;
+    m_total = new_pitch * new_height;
+    m_pPixels = &m_pixel_buf.front();
+
+    return true;
+  }
+
+  inline uint get_width() const { return m_width; }
+  inline uint get_height() const { return m_height; }
+  inline uint get_total_pixels() const { return m_width * m_height; }
+
+  inline rect get_bounds() const { return rect(0, 0, m_width, m_height); }
+
+  inline uint get_pitch() const { return m_pitch; }
+  inline uint get_pitch_in_bytes() const { return m_pitch * sizeof(color_type); }
+
+  // Returns pitch * height, NOT width * height!
+  inline uint get_total() const { return m_total; }
+
+  inline uint get_block_width(uint block_size) const { return (m_width + block_size - 1) / block_size; }
+  inline uint get_block_height(uint block_size) const { return (m_height + block_size - 1) / block_size; }
+  inline uint get_total_blocks(uint block_size) const { return get_block_width(block_size) * get_block_height(block_size); }
+
+  inline uint get_size_in_bytes() const { return sizeof(color_type) * m_total; }
+
+  inline const color_type* get_pixels() const { return m_pPixels; }
+  inline color_type* get_pixels() { return m_pPixels; }
+
+  inline const color_type& operator()(uint x, uint y) const {
+    CRNLIB_ASSERT((x < m_width) && (y < m_height));
+    return m_pPixels[x + y * m_pitch];
+  }
+
+  inline color_type& operator()(uint x, uint y) {
+    CRNLIB_ASSERT((x < m_width) && (y < m_height));
+    return m_pPixels[x + y * m_pitch];
+  }
+
+  inline const color_type& get_unclamped(uint x, uint y) const {
+    CRNLIB_ASSERT((x < m_width) && (y < m_height));
+    return m_pPixels[x + y * m_pitch];
+  }
+
+  inline const color_type& get_clamped(int x, int y) const {
+    x = math::clamp<int>(x, 0, m_width - 1);
+    y = math::clamp<int>(y, 0, m_height - 1);
+    return m_pPixels[x + y * m_pitch];
+  }
+
+  // Sample image with bilinear filtering.
+  // (x,y) - Continuous coordinates, where pixel centers are at (.5,.5), valid image coords are [0,width] and [0,height].
+  void get_filtered(float x, float y, color_type& result) const {
+    x -= .5f;
+    y -= .5f;
+
+    int ix = (int)floor(x);
+    int iy = (int)floor(y);
+    float wx = x - ix;
+    float wy = y - iy;
+
+    color_type a(get_clamped(ix, iy));
+    color_type b(get_clamped(ix + 1, iy));
+    color_type c(get_clamped(ix, iy + 1));
+    color_type d(get_clamped(ix + 1, iy + 1));
+
+    for (uint i = 0; i < 4; i++) {
+      double top = math::lerp<double>(a[i], b[i], wx);
+      double bot = math::lerp<double>(c[i], d[i], wx);
+      double m = math::lerp<double>(top, bot, wy);
+
+      if (!color_type::component_traits::cFloat)
+        m += .5f;
+
+      result.set_component(i, static_cast<typename color_type::parameter_t>(m));
+    }
+  }
+
+  void get_filtered(float x, float y, vec4F& result) const {
+    x -= .5f;
+    y -= .5f;
+
+    int ix = (int)floor(x);
+    int iy = (int)floor(y);
+    float wx = x - ix;
+    float wy = y - iy;
+
+    color_type a(get_clamped(ix, iy));
+    color_type b(get_clamped(ix + 1, iy));
+    color_type c(get_clamped(ix, iy + 1));
+    color_type d(get_clamped(ix + 1, iy + 1));
+
+    for (uint i = 0; i < 4; i++) {
+      float top = math::lerp<float>(a[i], b[i], wx);
+      float bot = math::lerp<float>(c[i], d[i], wx);
+      float m = math::lerp<float>(top, bot, wy);
+
+      result[i] = m;
+    }
+  }
+
+  inline void set_pixel_unclipped(uint x, uint y, const color_type& c) {
+    CRNLIB_ASSERT((x < m_width) && (y < m_height));
+    m_pPixels[x + y * m_pitch] = c;
+  }
+
+  inline void set_pixel_clipped(int x, int y, const color_type& c) {
+    if ((static_cast<uint>(x) >= m_width) || (static_cast<uint>(y) >= m_height))
+      return;
+
+    m_pPixels[x + y * m_pitch] = c;
+  }
+
+  inline const color_type* get_scanline(uint y) const {
+    CRNLIB_ASSERT(y < m_height);
+    return &m_pPixels[y * m_pitch];
+  }
+
+  inline color_type* get_scanline(uint y) {
+    CRNLIB_ASSERT(y < m_height);
+    return &m_pPixels[y * m_pitch];
+  }
+
+  inline const color_type* get_ptr() const {
+    return m_pPixels;
+  }
+
+  inline color_type* get_ptr() {
+    return m_pPixels;
+  }
+
+  inline void swap(image& other) {
+    utils::swap(m_width, other.m_width);
+    utils::swap(m_height, other.m_height);
+    utils::swap(m_pitch, other.m_pitch);
+    utils::swap(m_total, other.m_total);
+    utils::swap(m_comp_flags, other.m_comp_flags);
+    utils::swap(m_pPixels, other.m_pPixels);
+    m_pixel_buf.swap(other.m_pixel_buf);
+  }
+
+  void draw_line(int xs, int ys, int xe, int ye, const color_type& color) {
+    if (xs > xe) {
+      utils::swap(xs, xe);
+      utils::swap(ys, ye);
+    }
+
+    int dx = xe - xs, dy = ye - ys;
+    if (!dx) {
+      if (ys > ye)
+        utils::swap(ys, ye);
+      for (int i = ys; i <= ye; i++)
+        set_pixel_clipped(xs, i, color);
+    } else if (!dy) {
+      for (int i = xs; i < xe; i++)
+        set_pixel_clipped(i, ys, color);
+    } else if (dy > 0) {
+      if (dy <= dx) {
+        int e = 2 * dy - dx, e_no_inc = 2 * dy, e_inc = 2 * (dy - dx);
+        rasterize_line(xs, ys, xe, ye, 0, 1, e, e_inc, e_no_inc, color);
+      } else {
+        int e = 2 * dx - dy, e_no_inc = 2 * dx, e_inc = 2 * (dx - dy);
+        rasterize_line(xs, ys, xe, ye, 1, 1, e, e_inc, e_no_inc, color);
+      }
+    } else {
+      dy = -dy;
+      if (dy <= dx) {
+        int e = 2 * dy - dx, e_no_inc = 2 * dy, e_inc = 2 * (dy - dx);
+        rasterize_line(xs, ys, xe, ye, 0, -1, e, e_inc, e_no_inc, color);
+      } else {
+        int e = 2 * dx - dy, e_no_inc = (2 * dx), e_inc = 2 * (dx - dy);
+        rasterize_line(xe, ye, xs, ys, 1, -1, e, e_inc, e_no_inc, color);
+      }
+    }
+  }
+
+  const pixel_buf_t& get_pixel_buf() const { return m_pixel_buf; }
+  pixel_buf_t& get_pixel_buf() { return m_pixel_buf; }
+
+ private:
+  uint m_width;
+  uint m_height;
+  uint m_pitch;
+  uint m_total;
+  uint m_comp_flags;
+
+  color_type* m_pPixels;
+
+  pixel_buf_t m_pixel_buf;
+
+  void rasterize_line(int xs, int ys, int xe, int ye, int pred, int inc_dec, int e, int e_inc, int e_no_inc, const color_type& color) {
+    int start, end, var;
+
+    if (pred) {
+      start = ys;
+      end = ye;
+      var = xs;
+      for (int i = start; i <= end; i++) {
+        set_pixel_clipped(var, i, color);
+        if (e < 0)
+          e += e_no_inc;
+        else {
+          var += inc_dec;
+          e += e_inc;
+        }
+      }
+    } else {
+      start = xs;
+      end = xe;
+      var = ys;
+      for (int i = start; i <= end; i++) {
+        set_pixel_clipped(i, var, color);
+        if (e < 0)
+          e += e_no_inc;
+        else {
+          var += inc_dec;
+          e += e_inc;
+        }
+      }
+    }
+  }
+};
+
+typedef image<color_quad_u8> image_u8;
+typedef image<color_quad_i16> image_i16;
+typedef image<color_quad_u16> image_u16;
+typedef image<color_quad_i32> image_i32;
+typedef image<color_quad_u32> image_u32;
+typedef image<color_quad_f> image_f;
+
+template <typename color_type>
+inline void swap(image<color_type>& a, image<color_type>& b) {
+  a.swap(b);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,183 @@
+// File: crn_image_utils.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_image.h"
+#include "crn_data_stream_serializer.h"
+
+namespace crnlib {
+enum pixel_format;
+
+namespace image_utils {
+enum read_flags_t {
+  cReadFlagForceSTB = 1,
+
+  cReadFlagsAllFlags = 1
+};
+
+bool read_from_stream_stb(data_stream_serializer& serializer, image_u8& img);
+bool read_from_stream_jpgd(data_stream_serializer& serializer, image_u8& img);
+bool read_from_stream(image_u8& dest, data_stream_serializer& serializer, uint read_flags = 0);
+bool read_from_file(image_u8& dest, const char* pFilename, uint read_flags = 0);
+
+// Reads texture from memory, results returned stb_image.c style.
+// *pActual_comps is set to 1, 3, or 4. req_comps must range from 1-4.
+uint8* read_from_memory(const uint8* pImage, int nSize, int* pWidth, int* pHeight, int* pActualComps, int req_comps, const char* pFilename);
+
+enum {
+  cWriteFlagIgnoreAlpha = 0x00000001,
+  cWriteFlagGrayscale = 0x00000002,
+
+  cWriteFlagJPEGH1V1 = 0x00010000,
+  cWriteFlagJPEGH2V1 = 0x00020000,
+  cWriteFlagJPEGH2V2 = 0x00040000,
+  cWriteFlagJPEGTwoPass = 0x00080000,
+  cWriteFlagJPEGNoChromaDiscrim = 0x00100000,
+  cWriteFlagJPEGQualityLevelMask = 0xFF000000,
+  cWriteFlagJPEGQualityLevelShift = 24,
+};
+
+const int cLumaComponentIndex = -1;
+
+inline uint create_jpeg_write_flags(uint base_flags, uint quality_level) {
+  CRNLIB_ASSERT(quality_level <= 100);
+  return base_flags | ((quality_level << cWriteFlagJPEGQualityLevelShift) & cWriteFlagJPEGQualityLevelMask);
+}
+
+bool write_to_file(const char* pFilename, const image_u8& img, uint write_flags = 0, int grayscale_comp_index = cLumaComponentIndex);
+
+bool has_alpha(const image_u8& img);
+bool is_normal_map(const image_u8& img, const char* pFilename = NULL);
+void renorm_normal_map(image_u8& img);
+
+struct resample_params {
+  resample_params()
+      : m_dst_width(0),
+        m_dst_height(0),
+        m_pFilter("lanczos4"),
+        m_filter_scale(1.0f),
+        m_srgb(true),
+        m_wrapping(false),
+        m_first_comp(0),
+        m_num_comps(4),
+        m_source_gamma(2.2f),  // 1.75f
+        m_multithreaded(true) {
+  }
+
+  uint m_dst_width;
+  uint m_dst_height;
+  const char* m_pFilter;
+  float m_filter_scale;
+  bool m_srgb;
+  bool m_wrapping;
+  uint m_first_comp;
+  uint m_num_comps;
+  float m_source_gamma;
+  bool m_multithreaded;
+};
+
+bool resample_single_thread(const image_u8& src, image_u8& dst, const resample_params& params);
+bool resample_multithreaded(const image_u8& src, image_u8& dst, const resample_params& params);
+bool resample(const image_u8& src, image_u8& dst, const resample_params& params);
+
+bool compute_delta(image_u8& dest, image_u8& a, image_u8& b, uint scale = 2);
+
+class error_metrics {
+ public:
+  error_metrics() { utils::zero_this(this); }
+
+  void print(const char* pName) const;
+
+  // If num_channels==0, luma error is computed.
+  // If pHist != NULL, it must point to a 256 entry array.
+  bool compute(const image_u8& a, const image_u8& b, uint first_channel, uint num_channels, bool average_component_error = true);
+
+  uint mMax;
+  double mMean;
+  double mMeanSquared;
+  double mRootMeanSquared;
+  double mPeakSNR;
+
+  inline bool operator==(const error_metrics& other) const {
+    return mPeakSNR == other.mPeakSNR;
+  }
+
+  inline bool operator<(const error_metrics& other) const {
+    return mPeakSNR < other.mPeakSNR;
+  }
+
+  inline bool operator>(const error_metrics& other) const {
+    return mPeakSNR > other.mPeakSNR;
+  }
+};
+
+void print_image_metrics(const image_u8& src_img, const image_u8& dst_img);
+
+double compute_block_ssim(uint n, const uint8* pX, const uint8* pY);
+double compute_ssim(const image_u8& a, const image_u8& b, int channel_index);
+void print_ssim(const image_u8& src_img, const image_u8& dst_img);
+
+enum conversion_type {
+  cConversion_Invalid = -1,
+
+  cConversion_To_CCxY,
+  cConversion_From_CCxY,
+
+  cConversion_To_xGxR,
+  cConversion_From_xGxR,
+
+  cConversion_To_xGBR,
+  cConversion_From_xGBR,
+
+  cConversion_To_AGBR,
+  cConversion_From_AGBR,
+
+  cConversion_XY_to_XYZ,
+
+  cConversion_Y_To_A,
+
+  cConversion_A_To_RGBA,
+  cConversion_Y_To_RGB,
+
+  cConversion_To_Y,
+
+  cConversionTotal
+};
+
+void convert_image(image_u8& img, conversion_type conv_type);
+
+template <typename image_type>
+inline uint8* pack_image(const image_type& img, const pixel_packer& packer, uint& n) {
+  n = 0;
+
+  if (!packer.is_valid())
+    return NULL;
+
+  const uint width = img.get_width(), height = img.get_height();
+  uint dst_pixel_stride = packer.get_pixel_stride();
+  uint dst_pitch = width * dst_pixel_stride;
+
+  n = dst_pitch * height;
+
+  uint8* pImage = static_cast<uint8*>(crnlib_malloc(n));
+
+  uint8* pDst = pImage;
+  for (uint y = 0; y < height; y++) {
+    const typename image_type::color_t* pSrc = img.get_scanline(y);
+    for (uint x = 0; x < width; x++)
+      pDst = (uint8*)packer.pack(*pSrc++, pDst);
+  }
+
+  return pImage;
+}
+
+image_utils::conversion_type get_conversion_type(bool cooking, pixel_format fmt);
+
+image_utils::conversion_type get_image_conversion_type_from_crn_format(crn_format fmt);
+
+double compute_std_dev(uint n, const color_quad_u8* pPixels, uint first_channel, uint num_channels);
+
+uint8* read_image_from_memory(const uint8* pImage, int nSize, int* pWidth, int* pHeight, int* pActualComps, int req_comps, const char* pFilename);
+
+}  // namespace image_utils
+
+}  // namespace crnlib
@@ -0,0 +1,104 @@
+// File: crn_intersect.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_ray.h"
+
+namespace crnlib {
+namespace intersection {
+enum result {
+  cBackfacing = -1,
+  cFailure = 0,
+  cSuccess,
+  cParallel,
+  cInside,
+};
+
+// Returns cInside, cSuccess, or cFailure.
+// Algorithm: Graphics Gems 1
+template <typename vector_type, typename scalar_type, typename ray_type, typename aabb_type>
+result ray_aabb(vector_type& coord, scalar_type& t, const ray_type& ray, const aabb_type& box) {
+  enum {
+    cNumDim = vector_type::num_elements,
+    cRight = 0,
+    cLeft = 1,
+    cMiddle = 2
+  };
+
+  bool inside = true;
+  int quadrant[cNumDim];
+  scalar_type candidate_plane[cNumDim];
+
+  for (int i = 0; i < cNumDim; i++) {
+    if (ray.get_origin()[i] < box[0][i]) {
+      quadrant[i] = cLeft;
+      candidate_plane[i] = box[0][i];
+      inside = false;
+    } else if (ray.get_origin()[i] > box[1][i]) {
+      quadrant[i] = cRight;
+      candidate_plane[i] = box[1][i];
+      inside = false;
+    } else {
+      quadrant[i] = cMiddle;
+    }
+  }
+
+  if (inside) {
+    coord = ray.get_origin();
+    t = 0.0f;
+    return cInside;
+  }
+
+  scalar_type max_t[cNumDim];
+  for (int i = 0; i < cNumDim; i++) {
+    if ((quadrant[i] != cMiddle) && (ray.get_direction()[i] != 0.0f))
+      max_t[i] = (candidate_plane[i] - ray.get_origin()[i]) / ray.get_direction()[i];
+    else
+      max_t[i] = -1.0f;
+  }
+
+  int which_plane = 0;
+  for (int i = 1; i < cNumDim; i++)
+    if (max_t[which_plane] < max_t[i])
+      which_plane = i;
+
+  if (max_t[which_plane] < 0.0f)
+    return cFailure;
+
+  for (int i = 0; i < cNumDim; i++) {
+    if (i != which_plane) {
+      coord[i] = ray.get_origin()[i] + max_t[which_plane] * ray.get_direction()[i];
+
+      if ((coord[i] < box[0][i]) || (coord[i] > box[1][i]))
+        return cFailure;
+    } else {
+      coord[i] = candidate_plane[i];
+    }
+
+    CRNLIB_ASSERT(coord[i] >= box[0][i] && coord[i] <= box[1][i]);
+  }
+
+  t = max_t[which_plane];
+  return cSuccess;
+}
+
+template <typename vector_type, typename scalar_type, typename ray_type, typename aabb_type>
+result ray_aabb(bool& started_within, vector_type& coord, scalar_type& t, const ray_type& ray, const aabb_type& box) {
+  if (!box.contains(ray.get_origin())) {
+    started_within = false;
+    return ray_aabb(coord, t, ray, box);
+  }
+
+  started_within = true;
+
+  float diag_dist = box.diagonal_length() * 1.5f;
+  ray_type outside_ray(ray.eval(diag_dist), -ray.get_direction());
+
+  result res(ray_aabb(coord, t, outside_ray, box));
+  if (res != cSuccess)
+    return res;
+
+  t = math::maximum(0.0f, diag_dist - t);
+  return cSuccess;
+}
+}
+}
@@ -0,0 +1,349 @@
+// jpgd.h - C++ class for JPEG decompression.
+// Public domain, Rich Geldreich <richgel99@gmail.com>
+#ifndef JPEG_DECODER_H
+#define JPEG_DECODER_H
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <setjmp.h>
+
+#ifdef _MSC_VER
+#define JPGD_NORETURN __declspec(noreturn)
+#elif defined(__GNUC__)
+#define JPGD_NORETURN __attribute__((noreturn))
+#else
+#define JPGD_NORETURN
+#endif
+
+namespace jpgd {
+typedef unsigned char uint8;
+typedef signed short int16;
+typedef unsigned short uint16;
+typedef unsigned int uint;
+typedef signed int int32;
+
+// Loads a JPEG image from a memory buffer or a file.
+// req_comps can be 1 (grayscale), 3 (RGB), or 4 (RGBA).
+// On return, width/height will be set to the image's dimensions, and actual_comps will be set to the either 1 (grayscale) or 3 (RGB).
+// Notes: For more control over where and how the source data is read, see the decompress_jpeg_image_from_stream() function below, or call the jpeg_decoder class directly.
+// Requesting a 8 or 32bpp image is currently a little faster than 24bpp because the jpeg_decoder class itself currently always unpacks to either 8 or 32bpp.
+unsigned char* decompress_jpeg_image_from_memory(const unsigned char* pSrc_data, int src_data_size, int* width, int* height, int* actual_comps, int req_comps);
+unsigned char* decompress_jpeg_image_from_file(const char* pSrc_filename, int* width, int* height, int* actual_comps, int req_comps);
+
+// Success/failure error codes.
+enum jpgd_status {
+  JPGD_SUCCESS = 0,
+  JPGD_FAILED = -1,
+  JPGD_DONE = 1,
+  JPGD_BAD_DHT_COUNTS = -256,
+  JPGD_BAD_DHT_INDEX,
+  JPGD_BAD_DHT_MARKER,
+  JPGD_BAD_DQT_MARKER,
+  JPGD_BAD_DQT_TABLE,
+  JPGD_BAD_PRECISION,
+  JPGD_BAD_HEIGHT,
+  JPGD_BAD_WIDTH,
+  JPGD_TOO_MANY_COMPONENTS,
+  JPGD_BAD_SOF_LENGTH,
+  JPGD_BAD_VARIABLE_MARKER,
+  JPGD_BAD_DRI_LENGTH,
+  JPGD_BAD_SOS_LENGTH,
+  JPGD_BAD_SOS_COMP_ID,
+  JPGD_W_EXTRA_BYTES_BEFORE_MARKER,
+  JPGD_NO_ARITHMITIC_SUPPORT,
+  JPGD_UNEXPECTED_MARKER,
+  JPGD_NOT_JPEG,
+  JPGD_UNSUPPORTED_MARKER,
+  JPGD_BAD_DQT_LENGTH,
+  JPGD_TOO_MANY_BLOCKS,
+  JPGD_UNDEFINED_QUANT_TABLE,
+  JPGD_UNDEFINED_HUFF_TABLE,
+  JPGD_NOT_SINGLE_SCAN,
+  JPGD_UNSUPPORTED_COLORSPACE,
+  JPGD_UNSUPPORTED_SAMP_FACTORS,
+  JPGD_DECODE_ERROR,
+  JPGD_BAD_RESTART_MARKER,
+  JPGD_ASSERTION_ERROR,
+  JPGD_BAD_SOS_SPECTRAL,
+  JPGD_BAD_SOS_SUCCESSIVE,
+  JPGD_STREAM_READ,
+  JPGD_NOTENOUGHMEM
+};
+
+// Input stream interface.
+// Derive from this class to read input data from sources other than files or memory. Set m_eof_flag to true when no more data is available.
+// The decoder is rather greedy: it will keep on calling this method until its internal input buffer is full, or until the EOF flag is set.
+// It the input stream contains data after the JPEG stream's EOI (end of image) marker it will probably be pulled into the internal buffer.
+// Call the get_total_bytes_read() method to determine the actual size of the JPEG stream after successful decoding.
+class jpeg_decoder_stream {
+ public:
+  jpeg_decoder_stream() {}
+  virtual ~jpeg_decoder_stream() {}
+
+  // The read() method is called when the internal input buffer is empty.
+  // Parameters:
+  // pBuf - input buffer
+  // max_bytes_to_read - maximum bytes that can be written to pBuf
+  // pEOF_flag - set this to true if at end of stream (no more bytes remaining)
+  // Returns -1 on error, otherwise return the number of bytes actually written to the buffer (which may be 0).
+  // Notes: This method will be called in a loop until you set *pEOF_flag to true or the internal buffer is full.
+  virtual int read(uint8* pBuf, int max_bytes_to_read, bool* pEOF_flag) = 0;
+};
+
+// stdio FILE stream class.
+class jpeg_decoder_file_stream : public jpeg_decoder_stream {
+  jpeg_decoder_file_stream(const jpeg_decoder_file_stream&);
+  jpeg_decoder_file_stream& operator=(const jpeg_decoder_file_stream&);
+
+  FILE* m_pFile;
+  bool m_eof_flag, m_error_flag;
+
+ public:
+  jpeg_decoder_file_stream();
+  virtual ~jpeg_decoder_file_stream();
+
+  bool open(const char* Pfilename);
+  void close();
+
+  virtual int read(uint8* pBuf, int max_bytes_to_read, bool* pEOF_flag);
+};
+
+// Memory stream class.
+class jpeg_decoder_mem_stream : public jpeg_decoder_stream {
+  const uint8* m_pSrc_data;
+  uint m_ofs, m_size;
+
+ public:
+  jpeg_decoder_mem_stream()
+      : m_pSrc_data(NULL), m_ofs(0), m_size(0) {}
+  jpeg_decoder_mem_stream(const uint8* pSrc_data, uint size)
+      : m_pSrc_data(pSrc_data), m_ofs(0), m_size(size) {}
+
+  virtual ~jpeg_decoder_mem_stream() {}
+
+  bool open(const uint8* pSrc_data, uint size);
+  void close() {
+    m_pSrc_data = NULL;
+    m_ofs = 0;
+    m_size = 0;
+  }
+
+  virtual int read(uint8* pBuf, int max_bytes_to_read, bool* pEOF_flag);
+};
+
+// Loads JPEG file from a jpeg_decoder_stream.
+unsigned char* decompress_jpeg_image_from_stream(jpeg_decoder_stream* pStream, int* width, int* height, int* actual_comps, int req_comps);
+
+enum {
+  JPGD_IN_BUF_SIZE = 8192,
+  JPGD_MAX_BLOCKS_PER_MCU = 10,
+  JPGD_MAX_HUFF_TABLES = 8,
+  JPGD_MAX_QUANT_TABLES = 4,
+  JPGD_MAX_COMPONENTS = 4,
+  JPGD_MAX_COMPS_IN_SCAN = 4,
+  JPGD_MAX_BLOCKS_PER_ROW = 8192,
+  JPGD_MAX_HEIGHT = 16384,
+  JPGD_MAX_WIDTH = 16384
+};
+
+typedef int16 jpgd_quant_t;
+typedef int16 jpgd_block_t;
+
+class jpeg_decoder {
+ public:
+  // Call get_error_code() after constructing to determine if the stream is valid or not. You may call the get_width(), get_height(), etc.
+  // methods after the constructor is called. You may then either destruct the object, or begin decoding the image by calling begin_decoding(), then decode() on each scanline.
+  jpeg_decoder(jpeg_decoder_stream* pStream);
+
+  ~jpeg_decoder();
+
+  // Call this method after constructing the object to begin decompression.
+  // If JPGD_SUCCESS is returned you may then call decode() on each scanline.
+  int begin_decoding();
+
+  // Returns the next scan line.
+  // For grayscale images, pScan_line will point to a buffer containing 8-bit pixels (get_bytes_per_pixel() will return 1).
+  // Otherwise, it will always point to a buffer containing 32-bit RGBA pixels (A will always be 255, and get_bytes_per_pixel() will return 4).
+  // Returns JPGD_SUCCESS if a scan line has been returned.
+  // Returns JPGD_DONE if all scan lines have been returned.
+  // Returns JPGD_FAILED if an error occurred. Call get_error_code() for a more info.
+  int decode(const void** pScan_line, uint* pScan_line_len);
+
+  inline jpgd_status get_error_code() const { return m_error_code; }
+
+  inline int get_width() const { return m_image_x_size; }
+  inline int get_height() const { return m_image_y_size; }
+
+  inline int get_num_components() const { return m_comps_in_frame; }
+
+  inline int get_bytes_per_pixel() const { return m_dest_bytes_per_pixel; }
+  inline int get_bytes_per_scan_line() const { return m_image_x_size * get_bytes_per_pixel(); }
+
+  // Returns the total number of bytes actually consumed by the decoder (which should equal the actual size of the JPEG file).
+  inline int get_total_bytes_read() const { return m_total_bytes_read; }
+
+ private:
+  jpeg_decoder(const jpeg_decoder&);
+  jpeg_decoder& operator=(const jpeg_decoder&);
+
+  typedef void (*pDecode_block_func)(jpeg_decoder*, int, int, int);
+
+  struct huff_tables {
+    bool ac_table;
+    uint look_up[256];
+    uint look_up2[256];
+    uint8 code_size[256];
+    uint tree[512];
+  };
+
+  struct coeff_buf {
+    uint8* pData;
+    int block_num_x, block_num_y;
+    int block_len_x, block_len_y;
+    int block_size;
+  };
+
+  struct mem_block {
+    mem_block* m_pNext;
+    size_t m_used_count;
+    size_t m_size;
+    char m_data[1];
+  };
+
+  jmp_buf m_jmp_state;
+  mem_block* m_pMem_blocks;
+  int m_image_x_size;
+  int m_image_y_size;
+  jpeg_decoder_stream* m_pStream;
+  int m_progressive_flag;
+  uint8 m_huff_ac[JPGD_MAX_HUFF_TABLES];
+  uint8* m_huff_num[JPGD_MAX_HUFF_TABLES];       // pointer to number of Huffman codes per bit size
+  uint8* m_huff_val[JPGD_MAX_HUFF_TABLES];       // pointer to Huffman codes per bit size
+  jpgd_quant_t* m_quant[JPGD_MAX_QUANT_TABLES];  // pointer to quantization tables
+  int m_scan_type;                               // Gray, Yh1v1, Yh1v2, Yh2v1, Yh2v2 (CMYK111, CMYK4114 no longer supported)
+  int m_comps_in_frame;                          // # of components in frame
+  int m_comp_h_samp[JPGD_MAX_COMPONENTS];        // component's horizontal sampling factor
+  int m_comp_v_samp[JPGD_MAX_COMPONENTS];        // component's vertical sampling factor
+  int m_comp_quant[JPGD_MAX_COMPONENTS];         // component's quantization table selector
+  int m_comp_ident[JPGD_MAX_COMPONENTS];         // component's ID
+  int m_comp_h_blocks[JPGD_MAX_COMPONENTS];
+  int m_comp_v_blocks[JPGD_MAX_COMPONENTS];
+  int m_comps_in_scan;                      // # of components in scan
+  int m_comp_list[JPGD_MAX_COMPS_IN_SCAN];  // components in this scan
+  int m_comp_dc_tab[JPGD_MAX_COMPONENTS];   // component's DC Huffman coding table selector
+  int m_comp_ac_tab[JPGD_MAX_COMPONENTS];   // component's AC Huffman coding table selector
+  int m_spectral_start;                     // spectral selection start
+  int m_spectral_end;                       // spectral selection end
+  int m_successive_low;                     // successive approximation low
+  int m_successive_high;                    // successive approximation high
+  int m_max_mcu_x_size;                     // MCU's max. X size in pixels
+  int m_max_mcu_y_size;                     // MCU's max. Y size in pixels
+  int m_blocks_per_mcu;
+  int m_max_blocks_per_row;
+  int m_mcus_per_row, m_mcus_per_col;
+  int m_mcu_org[JPGD_MAX_BLOCKS_PER_MCU];
+  int m_total_lines_left;  // total # lines left in image
+  int m_mcu_lines_left;    // total # lines left in this MCU
+  int m_real_dest_bytes_per_scan_line;
+  int m_dest_bytes_per_scan_line;  // rounded up
+  int m_dest_bytes_per_pixel;      // 4 (RGB) or 1 (Y)
+  huff_tables* m_pHuff_tabs[JPGD_MAX_HUFF_TABLES];
+  coeff_buf* m_dc_coeffs[JPGD_MAX_COMPONENTS];
+  coeff_buf* m_ac_coeffs[JPGD_MAX_COMPONENTS];
+  int m_eob_run;
+  int m_block_y_mcu[JPGD_MAX_COMPONENTS];
+  uint8* m_pIn_buf_ofs;
+  int m_in_buf_left;
+  int m_tem_flag;
+  bool m_eof_flag;
+  uint8 m_in_buf_pad_start[128];
+  uint8 m_in_buf[JPGD_IN_BUF_SIZE + 128];
+  uint8 m_in_buf_pad_end[128];
+  int m_bits_left;
+  uint m_bit_buf;
+  int m_restart_interval;
+  int m_restarts_left;
+  int m_next_restart_num;
+  int m_max_mcus_per_row;
+  int m_max_blocks_per_mcu;
+  int m_expanded_blocks_per_mcu;
+  int m_expanded_blocks_per_row;
+  int m_expanded_blocks_per_component;
+  bool m_freq_domain_chroma_upsample;
+  int m_max_mcus_per_col;
+  uint m_last_dc_val[JPGD_MAX_COMPONENTS];
+  jpgd_block_t* m_pMCU_coefficients;
+  int m_mcu_block_max_zag[JPGD_MAX_BLOCKS_PER_MCU];
+  uint8* m_pSample_buf;
+  int m_crr[256];
+  int m_cbb[256];
+  int m_crg[256];
+  int m_cbg[256];
+  uint8* m_pScan_line_0;
+  uint8* m_pScan_line_1;
+  jpgd_status m_error_code;
+  bool m_ready_flag;
+  int m_total_bytes_read;
+
+  void free_all_blocks();
+  JPGD_NORETURN void stop_decoding(jpgd_status status);
+  void* alloc(size_t n, bool zero = false);
+  void word_clear(void* p, uint16 c, uint n);
+  void prep_in_buffer();
+  void read_dht_marker();
+  void read_dqt_marker();
+  void read_sof_marker();
+  void skip_variable_marker();
+  void read_dri_marker();
+  void read_sos_marker();
+  int next_marker();
+  int process_markers();
+  void locate_soi_marker();
+  void locate_sof_marker();
+  int locate_sos_marker();
+  void init(jpeg_decoder_stream* pStream);
+  void create_look_ups();
+  void fix_in_buffer();
+  void transform_mcu(int mcu_row);
+  void transform_mcu_expand(int mcu_row);
+  coeff_buf* coeff_buf_open(int block_num_x, int block_num_y, int block_len_x, int block_len_y);
+  inline jpgd_block_t* coeff_buf_getp(coeff_buf* cb, int block_x, int block_y);
+  void load_next_row();
+  void decode_next_row();
+  void make_huff_table(int index, huff_tables* pH);
+  void check_quant_tables();
+  void check_huff_tables();
+  void calc_mcu_block_order();
+  int init_scan();
+  void init_frame();
+  void process_restart();
+  void decode_scan(pDecode_block_func decode_block_func);
+  void init_progressive();
+  void init_sequential();
+  void decode_start();
+  void decode_init(jpeg_decoder_stream* pStream);
+  void H2V2Convert();
+  void H2V1Convert();
+  void H1V2Convert();
+  void H1V1Convert();
+  void gray_convert();
+  void expanded_convert();
+  void find_eoi();
+  inline uint get_char();
+  inline uint get_char(bool* pPadding_flag);
+  inline void stuff_char(uint8 q);
+  inline uint8 get_octet();
+  inline uint get_bits(int num_bits);
+  inline uint get_bits_no_markers(int numbits);
+  inline int huff_decode(huff_tables* pH);
+  inline int huff_decode(huff_tables* pH, int& extrabits);
+  static inline uint8 clamp(int i);
+  static void decode_block_dc_first(jpeg_decoder* pD, int component_id, int block_x, int block_y);
+  static void decode_block_dc_refine(jpeg_decoder* pD, int component_id, int block_x, int block_y);
+  static void decode_block_ac_first(jpeg_decoder* pD, int component_id, int block_x, int block_y);
+  static void decode_block_ac_refine(jpeg_decoder* pD, int component_id, int block_x, int block_y);
+};
+
+}  // namespace jpgd
+
+#endif  // JPEG_DECODER_H
@@ -0,0 +1,171 @@
+// jpge.h - C++ class for JPEG compression.
+// Public domain, Rich Geldreich <richgel99@gmail.com>
+// Alex Evans: Added RGBA support, linear memory allocator.
+#ifndef JPEG_ENCODER_H
+#define JPEG_ENCODER_H
+
+namespace jpge {
+typedef unsigned char uint8;
+typedef signed short int16;
+typedef signed int int32;
+typedef unsigned short uint16;
+typedef unsigned int uint32;
+typedef unsigned int uint;
+
+// JPEG chroma subsampling factors. Y_ONLY (grayscale images) and H2V2 (color images) are the most common.
+enum subsampling_t { Y_ONLY = 0,
+                     H1V1 = 1,
+                     H2V1 = 2,
+                     H2V2 = 3 };
+
+// JPEG compression parameters structure.
+struct params {
+  inline params()
+      : m_quality(85), m_subsampling(H2V2), m_no_chroma_discrim_flag(false), m_two_pass_flag(false) {}
+
+  inline bool check() const {
+    if ((m_quality < 1) || (m_quality > 100))
+      return false;
+    if ((uint)m_subsampling > (uint)H2V2)
+      return false;
+    return true;
+  }
+
+  // Quality: 1-100, higher is better. Typical values are around 50-95.
+  int m_quality;
+
+  // m_subsampling:
+  // 0 = Y (grayscale) only
+  // 1 = YCbCr, no subsampling (H1V1, YCbCr 1x1x1, 3 blocks per MCU)
+  // 2 = YCbCr, H2V1 subsampling (YCbCr 2x1x1, 4 blocks per MCU)
+  // 3 = YCbCr, H2V2 subsampling (YCbCr 4x1x1, 6 blocks per MCU-- very common)
+  subsampling_t m_subsampling;
+
+  // Disables CbCr discrimination - only intended for testing.
+  // If true, the Y quantization table is also used for the CbCr channels.
+  bool m_no_chroma_discrim_flag;
+
+  bool m_two_pass_flag;
+};
+
+// Writes JPEG image to a file.
+// num_channels must be 1 (Y) or 3 (RGB), image pitch must be width*num_channels.
+bool compress_image_to_jpeg_file(const char* pFilename, int width, int height, int num_channels, const uint8* pImage_data, const params& comp_params = params());
+
+// Writes JPEG image to memory buffer.
+// On entry, buf_size is the size of the output buffer pointed at by pBuf, which should be at least ~1024 bytes.
+// If return value is true, buf_size will be set to the size of the compressed data.
+bool compress_image_to_jpeg_file_in_memory(void* pBuf, int& buf_size, int width, int height, int num_channels, const uint8* pImage_data, const params& comp_params = params());
+
+// Output stream abstract class - used by the jpeg_encoder class to write to the output stream.
+// put_buf() is generally called with len==JPGE_OUT_BUF_SIZE bytes, but for headers it'll be called with smaller amounts.
+class output_stream {
+ public:
+  virtual ~output_stream(){};
+  virtual bool put_buf(const void* Pbuf, int len) = 0;
+  template <class T>
+  inline bool put_obj(const T& obj) { return put_buf(&obj, sizeof(T)); }
+};
+
+// Lower level jpeg_encoder class - useful if more control is needed than the above helper functions.
+class jpeg_encoder {
+ public:
+  jpeg_encoder();
+  ~jpeg_encoder();
+
+  // Initializes the compressor.
+  // pStream: The stream object to use for writing compressed data.
+  // params - Compression parameters structure, defined above.
+  // width, height  - Image dimensions.
+  // channels - May be 1, or 3. 1 indicates grayscale, 3 indicates RGB source data.
+  // Returns false on out of memory or if a stream write fails.
+  bool init(output_stream* pStream, int width, int height, int src_channels, const params& comp_params = params());
+
+  const params& get_params() const { return m_params; }
+
+  // Deinitializes the compressor, freeing any allocated memory. May be called at any time.
+  void deinit();
+
+  uint get_total_passes() const { return m_params.m_two_pass_flag ? 2 : 1; }
+  inline uint get_cur_pass() { return m_pass_num; }
+
+  // Call this method with each source scanline.
+  // width * src_channels bytes per scanline is expected (RGB or Y format).
+  // You must call with NULL after all scanlines are processed to finish compression.
+  // Returns false on out of memory or if a stream write fails.
+  bool process_scanline(const void* pScanline);
+
+ private:
+  jpeg_encoder(const jpeg_encoder&);
+  jpeg_encoder& operator=(const jpeg_encoder&);
+
+  typedef int32 sample_array_t;
+
+  output_stream* m_pStream;
+  params m_params;
+  uint8 m_num_components;
+  uint8 m_comp_h_samp[3], m_comp_v_samp[3];
+  int m_image_x, m_image_y, m_image_bpp, m_image_bpl;
+  int m_image_x_mcu, m_image_y_mcu;
+  int m_image_bpl_xlt, m_image_bpl_mcu;
+  int m_mcus_per_row;
+  int m_mcu_x, m_mcu_y;
+  uint8* m_mcu_lines[16];
+  uint8 m_mcu_y_ofs;
+  sample_array_t m_sample_array[64];
+  int16 m_coefficient_array[64];
+  int32 m_quantization_tables[2][64];
+  uint m_huff_codes[4][256];
+  uint8 m_huff_code_sizes[4][256];
+  uint8 m_huff_bits[4][17];
+  uint8 m_huff_val[4][256];
+  uint32 m_huff_count[4][256];
+  int m_last_dc_val[3];
+  enum { JPGE_OUT_BUF_SIZE = 2048 };
+  uint8 m_out_buf[JPGE_OUT_BUF_SIZE];
+  uint8* m_pOut_buf;
+  uint m_out_buf_left;
+  uint32 m_bit_buffer;
+  uint m_bits_in;
+  uint8 m_pass_num;
+  bool m_all_stream_writes_succeeded;
+
+  void optimize_huffman_table(int table_num, int table_len);
+  void emit_byte(uint8 i);
+  void emit_word(uint i);
+  void emit_marker(int marker);
+  void emit_jfif_app0();
+  void emit_dqt();
+  void emit_sof();
+  void emit_dht(uint8* bits, uint8* val, int index, bool ac_flag);
+  void emit_dhts();
+  void emit_sos();
+  void emit_markers();
+  void compute_huffman_table(uint* codes, uint8* code_sizes, uint8* bits, uint8* val);
+  void compute_quant_table(int32* dst, int16* src);
+  void adjust_quant_table(int32* dst, int32* src);
+  void first_pass_init();
+  bool second_pass_init();
+  bool jpg_open(int p_x_res, int p_y_res, int src_channels);
+  void load_block_8_8_grey(int x);
+  void load_block_8_8(int x, int y, int c);
+  void load_block_16_8(int x, int c);
+  void load_block_16_8_8(int x, int c);
+  void load_quantized_coefficients(int component_num);
+  void flush_output_buffer();
+  void put_bits(uint bits, uint len);
+  void code_coefficients_pass_one(int component_num);
+  void code_coefficients_pass_two(int component_num);
+  void code_block(int component_num);
+  void process_mcu_row();
+  bool terminate_pass_one();
+  bool terminate_pass_two();
+  bool process_end_of_image();
+  void load_mcu(const void* src);
+  void clear();
+  void init();
+};
+
+}  // namespace jpge
+
+#endif  // JPEG_ENCODER
@@ -0,0 +1,834 @@
+// File: crn_ktx_texture.cpp
+#include "crn_core.h"
+#include "crn_ktx_texture.h"
+#include "crn_console.h"
+
+// Set #if CRNLIB_KTX_PVRTEX_WORKAROUNDS to 1 to enable various workarounds for oddball KTX files written by PVRTexTool.
+#define CRNLIB_KTX_PVRTEX_WORKAROUNDS 1
+
+namespace crnlib {
+const uint8 s_ktx_file_id[12] = {0xAB, 0x4B, 0x54, 0x58, 0x20, 0x31, 0x31, 0xBB, 0x0D, 0x0A, 0x1A, 0x0A};
+
+bool is_packed_pixel_ogl_type(uint32 ogl_type) {
+  switch (ogl_type) {
+    case KTX_UNSIGNED_BYTE_3_3_2:
+    case KTX_UNSIGNED_BYTE_2_3_3_REV:
+    case KTX_UNSIGNED_SHORT_5_6_5:
+    case KTX_UNSIGNED_SHORT_5_6_5_REV:
+    case KTX_UNSIGNED_SHORT_4_4_4_4:
+    case KTX_UNSIGNED_SHORT_4_4_4_4_REV:
+    case KTX_UNSIGNED_SHORT_5_5_5_1:
+    case KTX_UNSIGNED_SHORT_1_5_5_5_REV:
+    case KTX_UNSIGNED_INT_8_8_8_8:
+    case KTX_UNSIGNED_INT_8_8_8_8_REV:
+    case KTX_UNSIGNED_INT_10_10_10_2:
+    case KTX_UNSIGNED_INT_2_10_10_10_REV:
+    case KTX_UNSIGNED_INT_24_8:
+    case KTX_UNSIGNED_INT_10F_11F_11F_REV:
+    case KTX_UNSIGNED_INT_5_9_9_9_REV:
+      return true;
+  }
+  return false;
+}
+
+uint get_ogl_type_size(uint32 ogl_type) {
+  switch (ogl_type) {
+    case KTX_UNSIGNED_BYTE:
+    case KTX_BYTE:
+      return 1;
+    case KTX_HALF_FLOAT:
+    case KTX_UNSIGNED_SHORT:
+    case KTX_SHORT:
+      return 2;
+    case KTX_FLOAT:
+    case KTX_UNSIGNED_INT:
+    case KTX_INT:
+      return 4;
+    case KTX_UNSIGNED_BYTE_3_3_2:
+    case KTX_UNSIGNED_BYTE_2_3_3_REV:
+      return 1;
+    case KTX_UNSIGNED_SHORT_5_6_5:
+    case KTX_UNSIGNED_SHORT_5_6_5_REV:
+    case KTX_UNSIGNED_SHORT_4_4_4_4:
+    case KTX_UNSIGNED_SHORT_4_4_4_4_REV:
+    case KTX_UNSIGNED_SHORT_5_5_5_1:
+    case KTX_UNSIGNED_SHORT_1_5_5_5_REV:
+      return 2;
+    case KTX_UNSIGNED_INT_8_8_8_8:
+    case KTX_UNSIGNED_INT_8_8_8_8_REV:
+    case KTX_UNSIGNED_INT_10_10_10_2:
+    case KTX_UNSIGNED_INT_2_10_10_10_REV:
+    case KTX_UNSIGNED_INT_24_8:
+    case KTX_UNSIGNED_INT_10F_11F_11F_REV:
+    case KTX_UNSIGNED_INT_5_9_9_9_REV:
+      return 4;
+  }
+  return 0;
+}
+
+uint32 get_ogl_base_internal_fmt(uint32 ogl_fmt) {
+  switch (ogl_fmt) {
+    case KTX_ETC1_RGB8_OES:
+    case KTX_COMPRESSED_RGB8_ETC2:
+    case KTX_RGB_S3TC:
+    case KTX_RGB4_S3TC:
+    case KTX_COMPRESSED_RGB_S3TC_DXT1_EXT:
+    case KTX_COMPRESSED_SRGB_S3TC_DXT1_EXT:
+      return KTX_RGB;
+    case KTX_COMPRESSED_RGBA8_ETC2_EAC:
+    case KTX_COMPRESSED_RGBA_S3TC_DXT1_EXT:
+    case KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT:
+    case KTX_RGBA_S3TC:
+    case KTX_RGBA4_S3TC:
+    case KTX_COMPRESSED_RGBA_S3TC_DXT3_EXT:
+    case KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT:
+    case KTX_COMPRESSED_RGBA_S3TC_DXT5_EXT:
+    case KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT:
+    case KTX_RGBA_DXT5_S3TC:
+    case KTX_RGBA4_DXT5_S3TC:
+      return KTX_RGBA;
+    case 1:
+    case KTX_RED:
+    case KTX_RED_INTEGER:
+    case KTX_GREEN:
+    case KTX_GREEN_INTEGER:
+    case KTX_BLUE:
+    case KTX_BLUE_INTEGER:
+    case KTX_R8:
+    case KTX_R8UI:
+    case KTX_LUMINANCE8:
+    case KTX_ALPHA:
+    case KTX_LUMINANCE:
+    case KTX_COMPRESSED_RED_RGTC1_EXT:
+    case KTX_COMPRESSED_SIGNED_RED_RGTC1_EXT:
+    case KTX_COMPRESSED_LUMINANCE_LATC1_EXT:
+    case KTX_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT:
+      return KTX_RED;
+    case 2:
+    case KTX_RG:
+    case KTX_RG8:
+    case KTX_RG_INTEGER:
+    case KTX_LUMINANCE_ALPHA:
+    case KTX_COMPRESSED_RED_GREEN_RGTC2_EXT:
+    case KTX_COMPRESSED_SIGNED_RED_GREEN_RGTC2_EXT:
+    case KTX_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT:
+    case KTX_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT:
+      return KTX_RG;
+    case 3:
+    case KTX_SRGB:
+    case KTX_RGB:
+    case KTX_RGB_INTEGER:
+    case KTX_BGR:
+    case KTX_BGR_INTEGER:
+    case KTX_RGB8:
+    case KTX_SRGB8:
+      return KTX_RGB;
+    case 4:
+    case KTX_RGBA:
+    case KTX_BGRA:
+    case KTX_RGBA_INTEGER:
+    case KTX_BGRA_INTEGER:
+    case KTX_SRGB_ALPHA:
+    case KTX_SRGB8_ALPHA8:
+    case KTX_RGBA8:
+      return KTX_RGBA;
+  }
+  return 0;
+}
+
+bool get_ogl_fmt_desc(uint32 ogl_fmt, uint32 ogl_type, uint& block_dim, uint& bytes_per_block) {
+  uint ogl_type_size = get_ogl_type_size(ogl_type);
+
+  block_dim = 1;
+  bytes_per_block = 0;
+
+  switch (ogl_fmt) {
+    case KTX_COMPRESSED_RED_RGTC1_EXT:
+    case KTX_COMPRESSED_SIGNED_RED_RGTC1_EXT:
+    case KTX_COMPRESSED_LUMINANCE_LATC1_EXT:
+    case KTX_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT:
+    case KTX_ETC1_RGB8_OES:
+    case KTX_COMPRESSED_RGB8_ETC2:
+    case KTX_RGB_S3TC:
+    case KTX_RGB4_S3TC:
+    case KTX_COMPRESSED_RGB_S3TC_DXT1_EXT:
+    case KTX_COMPRESSED_RGBA_S3TC_DXT1_EXT:
+    case KTX_COMPRESSED_SRGB_S3TC_DXT1_EXT:
+    case KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT: {
+      block_dim = 4;
+      bytes_per_block = 8;
+      break;
+    }
+    case KTX_COMPRESSED_RGBA8_ETC2_EAC:
+    case KTX_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT:
+    case KTX_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT:
+    case KTX_COMPRESSED_RED_GREEN_RGTC2_EXT:
+    case KTX_COMPRESSED_SIGNED_RED_GREEN_RGTC2_EXT:
+    case KTX_RGBA_S3TC:
+    case KTX_RGBA4_S3TC:
+    case KTX_COMPRESSED_RGBA_S3TC_DXT3_EXT:
+    case KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT:
+    case KTX_COMPRESSED_RGBA_S3TC_DXT5_EXT:
+    case KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT:
+    case KTX_RGBA_DXT5_S3TC:
+    case KTX_RGBA4_DXT5_S3TC: {
+      block_dim = 4;
+      bytes_per_block = 16;
+      break;
+    }
+    case 1:
+    case KTX_ALPHA:
+    case KTX_RED:
+    case KTX_GREEN:
+    case KTX_BLUE:
+    case KTX_RED_INTEGER:
+    case KTX_GREEN_INTEGER:
+    case KTX_BLUE_INTEGER:
+    case KTX_LUMINANCE: {
+      bytes_per_block = ogl_type_size;
+      break;
+    }
+    case KTX_R8:
+    case KTX_R8UI:
+    case KTX_ALPHA8:
+    case KTX_LUMINANCE8: {
+      bytes_per_block = 1;
+      break;
+    }
+    case 2:
+    case KTX_RG:
+    case KTX_RG_INTEGER:
+    case KTX_LUMINANCE_ALPHA: {
+      bytes_per_block = 2 * ogl_type_size;
+      break;
+    }
+    case KTX_RG8:
+    case KTX_LUMINANCE8_ALPHA8: {
+      bytes_per_block = 2;
+      break;
+    }
+    case 3:
+    case KTX_SRGB:
+    case KTX_RGB:
+    case KTX_BGR:
+    case KTX_RGB_INTEGER:
+    case KTX_BGR_INTEGER: {
+      bytes_per_block = is_packed_pixel_ogl_type(ogl_type) ? ogl_type_size : (3 * ogl_type_size);
+      break;
+    }
+    case KTX_RGB8:
+    case KTX_SRGB8: {
+      bytes_per_block = 3;
+      break;
+    }
+    case 4:
+    case KTX_RGBA:
+    case KTX_BGRA:
+    case KTX_RGBA_INTEGER:
+    case KTX_BGRA_INTEGER:
+    case KTX_SRGB_ALPHA: {
+      bytes_per_block = is_packed_pixel_ogl_type(ogl_type) ? ogl_type_size : (4 * ogl_type_size);
+      break;
+    }
+    case KTX_SRGB8_ALPHA8:
+    case KTX_RGBA8: {
+      bytes_per_block = 4;
+      break;
+    }
+    default:
+      return false;
+  }
+  return true;
+}
+
+bool ktx_texture::compute_pixel_info() {
+  if ((!m_header.m_glType) || (!m_header.m_glFormat)) {
+    if ((m_header.m_glType) || (m_header.m_glFormat))
+      return false;
+
+    // Must be a compressed format.
+    if (!get_ogl_fmt_desc(m_header.m_glInternalFormat, m_header.m_glType, m_block_dim, m_bytes_per_block)) {
+#if CRNLIB_KTX_PVRTEX_WORKAROUNDS
+      if ((!m_header.m_glInternalFormat) && (!m_header.m_glType) && (!m_header.m_glTypeSize) && (!m_header.m_glBaseInternalFormat)) {
+        // PVRTexTool writes bogus headers when outputting ETC1.
+        console::warning("ktx_texture::compute_pixel_info: Header doesn't specify any format, assuming ETC1 and hoping for the best");
+        m_header.m_glBaseInternalFormat = KTX_RGB;
+        m_header.m_glInternalFormat = KTX_ETC1_RGB8_OES;
+        m_header.m_glTypeSize = 1;
+        m_block_dim = 4;
+        m_bytes_per_block = 8;
+        return true;
+      }
+#endif
+      return false;
+    }
+
+    if (m_block_dim == 1)
+      return false;
+  } else {
+    // Must be an uncompressed format.
+    if (!get_ogl_fmt_desc(m_header.m_glFormat, m_header.m_glType, m_block_dim, m_bytes_per_block))
+      return false;
+
+    if (m_block_dim > 1)
+      return false;
+  }
+  return true;
+}
+
+bool ktx_texture::read_from_stream(data_stream_serializer& serializer) {
+  clear();
+
+  // Read header
+  if (serializer.read(&m_header, 1, sizeof(m_header)) != sizeof(ktx_header))
+    return false;
+
+  // Check header
+  if (memcmp(s_ktx_file_id, m_header.m_identifier, sizeof(m_header.m_identifier)))
+    return false;
+
+  if ((m_header.m_endianness != KTX_OPPOSITE_ENDIAN) && (m_header.m_endianness != KTX_ENDIAN))
+    return false;
+
+  m_opposite_endianness = (m_header.m_endianness == KTX_OPPOSITE_ENDIAN);
+  if (m_opposite_endianness) {
+    m_header.endian_swap();
+
+    if ((m_header.m_glTypeSize != sizeof(uint8)) && (m_header.m_glTypeSize != sizeof(uint16)) && (m_header.m_glTypeSize != sizeof(uint32)))
+      return false;
+  }
+
+  if (!check_header())
+    return false;
+
+  if (!compute_pixel_info())
+    return false;
+
+  uint8 pad_bytes[3];
+
+  // Read the key value entries
+  uint num_key_value_bytes_remaining = m_header.m_bytesOfKeyValueData;
+  while (num_key_value_bytes_remaining) {
+    if (num_key_value_bytes_remaining < sizeof(uint32))
+      return false;
+
+    uint32 key_value_byte_size;
+    if (serializer.read(&key_value_byte_size, 1, sizeof(uint32)) != sizeof(uint32))
+      return false;
+
+    num_key_value_bytes_remaining -= sizeof(uint32);
+
+    if (m_opposite_endianness)
+      key_value_byte_size = utils::swap32(key_value_byte_size);
+
+    if (key_value_byte_size > num_key_value_bytes_remaining)
+      return false;
+
+    uint8_vec key_value_data;
+    if (key_value_byte_size) {
+      key_value_data.resize(key_value_byte_size);
+      if (serializer.read(&key_value_data[0], 1, key_value_byte_size) != key_value_byte_size)
+        return false;
+    }
+
+    m_key_values.push_back(key_value_data);
+
+    uint padding = 3 - ((key_value_byte_size + 3) % 4);
+    if (padding) {
+      if (serializer.read(pad_bytes, 1, padding) != padding)
+        return false;
+    }
+
+    num_key_value_bytes_remaining -= key_value_byte_size;
+    if (num_key_value_bytes_remaining < padding)
+      return false;
+    num_key_value_bytes_remaining -= padding;
+  }
+
+  // Now read the mip levels
+  uint total_faces = get_num_mips() * get_array_size() * get_num_faces() * get_depth();
+  if ((!total_faces) || (total_faces > 65535))
+    return false;
+
+// See Section 2.8 of KTX file format: No rounding to block sizes should be applied for block compressed textures.
+// OK, I'm going to break that rule otherwise KTX can only store a subset of textures that DDS can handle for no good reason.
+#if 0
+      const uint mip0_row_blocks = m_header.m_pixelWidth / m_block_dim;
+      const uint mip0_col_blocks = CRNLIB_MAX(1, m_header.m_pixelHeight) / m_block_dim;
+#else
+  const uint mip0_row_blocks = (m_header.m_pixelWidth + m_block_dim - 1) / m_block_dim;
+  const uint mip0_col_blocks = (CRNLIB_MAX(1, m_header.m_pixelHeight) + m_block_dim - 1) / m_block_dim;
+#endif
+  if ((!mip0_row_blocks) || (!mip0_col_blocks))
+    return false;
+
+  bool has_valid_image_size_fields = true;
+  bool disable_mip_and_cubemap_padding = false;
+
+#if CRNLIB_KTX_PVRTEX_WORKAROUNDS
+  {
+    // PVRTexTool has a bogus KTX writer that doesn't write any imageSize fields. Nice.
+    size_t expected_bytes_remaining = 0;
+    for (uint mip_level = 0; mip_level < get_num_mips(); mip_level++) {
+      uint mip_width, mip_height, mip_depth;
+      get_mip_dim(mip_level, mip_width, mip_height, mip_depth);
+
+      const uint mip_row_blocks = (mip_width + m_block_dim - 1) / m_block_dim;
+      const uint mip_col_blocks = (mip_height + m_block_dim - 1) / m_block_dim;
+      if ((!mip_row_blocks) || (!mip_col_blocks))
+        return false;
+
+      expected_bytes_remaining += sizeof(uint32);
+
+      if ((!m_header.m_numberOfArrayElements) && (get_num_faces() == 6)) {
+        for (uint face = 0; face < get_num_faces(); face++) {
+          uint slice_size = mip_row_blocks * mip_col_blocks * m_bytes_per_block;
+          expected_bytes_remaining += slice_size;
+
+          uint num_cube_pad_bytes = 3 - ((slice_size + 3) % 4);
+          expected_bytes_remaining += num_cube_pad_bytes;
+        }
+      } else {
+        uint total_mip_size = 0;
+        for (uint array_element = 0; array_element < get_array_size(); array_element++) {
+          for (uint face = 0; face < get_num_faces(); face++) {
+            for (uint zslice = 0; zslice < mip_depth; zslice++) {
+              uint slice_size = mip_row_blocks * mip_col_blocks * m_bytes_per_block;
+              total_mip_size += slice_size;
+            }
+          }
+        }
+        expected_bytes_remaining += total_mip_size;
+
+        uint num_mip_pad_bytes = 3 - ((total_mip_size + 3) % 4);
+        expected_bytes_remaining += num_mip_pad_bytes;
+      }
+    }
+
+    if (serializer.get_stream()->get_remaining() < expected_bytes_remaining) {
+      has_valid_image_size_fields = false;
+      disable_mip_and_cubemap_padding = true;
+      console::warning("ktx_texture::read_from_stream: KTX file size is smaller than expected - trying to read anyway without imageSize fields");
+    }
+  }
+#endif
+
+  for (uint mip_level = 0; mip_level < get_num_mips(); mip_level++) {
+    uint mip_width, mip_height, mip_depth;
+    get_mip_dim(mip_level, mip_width, mip_height, mip_depth);
+
+    const uint mip_row_blocks = (mip_width + m_block_dim - 1) / m_block_dim;
+    const uint mip_col_blocks = (mip_height + m_block_dim - 1) / m_block_dim;
+    if ((!mip_row_blocks) || (!mip_col_blocks))
+      return false;
+
+    uint32 image_size = 0;
+    if (!has_valid_image_size_fields)
+      image_size = mip_depth * mip_row_blocks * mip_col_blocks * m_bytes_per_block * get_array_size() * get_num_faces();
+    else {
+      if (serializer.read(&image_size, 1, sizeof(image_size)) != sizeof(image_size))
+        return false;
+
+      if (m_opposite_endianness)
+        image_size = utils::swap32(image_size);
+    }
+
+    if (!image_size)
+      return false;
+
+    uint total_mip_size = 0;
+
+    if ((!m_header.m_numberOfArrayElements) && (get_num_faces() == 6)) {
+      // plain non-array cubemap
+      for (uint face = 0; face < get_num_faces(); face++) {
+        CRNLIB_ASSERT(m_image_data.size() == get_image_index(mip_level, 0, face, 0));
+
+        m_image_data.push_back(uint8_vec());
+        uint8_vec& image_data = m_image_data.back();
+
+        image_data.resize(image_size);
+        if (serializer.read(&image_data[0], 1, image_size) != image_size)
+          return false;
+
+        if (m_opposite_endianness)
+          utils::endian_swap_mem(&image_data[0], image_size, m_header.m_glTypeSize);
+
+        uint num_cube_pad_bytes = disable_mip_and_cubemap_padding ? 0 : (3 - ((image_size + 3) % 4));
+        if (serializer.read(pad_bytes, 1, num_cube_pad_bytes) != num_cube_pad_bytes)
+          return false;
+
+        total_mip_size += image_size + num_cube_pad_bytes;
+      }
+    } else {
+      // 1D, 2D, 3D (normal or array texture), or array cubemap
+      uint num_image_bytes_remaining = image_size;
+
+      for (uint array_element = 0; array_element < get_array_size(); array_element++) {
+        for (uint face = 0; face < get_num_faces(); face++) {
+          for (uint zslice = 0; zslice < mip_depth; zslice++) {
+            CRNLIB_ASSERT(m_image_data.size() == get_image_index(mip_level, array_element, face, zslice));
+
+            uint slice_size = mip_row_blocks * mip_col_blocks * m_bytes_per_block;
+            if ((!slice_size) || (slice_size > num_image_bytes_remaining))
+              return false;
+
+            m_image_data.push_back(uint8_vec());
+            uint8_vec& image_data = m_image_data.back();
+
+            image_data.resize(slice_size);
+            if (serializer.read(&image_data[0], 1, slice_size) != slice_size)
+              return false;
+
+            if (m_opposite_endianness)
+              utils::endian_swap_mem(&image_data[0], slice_size, m_header.m_glTypeSize);
+
+            num_image_bytes_remaining -= slice_size;
+
+            total_mip_size += slice_size;
+          }
+        }
+      }
+
+      if (num_image_bytes_remaining)
+        return false;
+    }
+
+    uint num_mip_pad_bytes = disable_mip_and_cubemap_padding ? 0 : (3 - ((total_mip_size + 3) % 4));
+    if (serializer.read(pad_bytes, 1, num_mip_pad_bytes) != num_mip_pad_bytes)
+      return false;
+  }
+  return true;
+}
+
+bool ktx_texture::write_to_stream(data_stream_serializer& serializer, bool no_keyvalue_data) {
+  if (!consistency_check()) {
+    CRNLIB_ASSERT(0);
+    return false;
+  }
+
+  memcpy(m_header.m_identifier, s_ktx_file_id, sizeof(m_header.m_identifier));
+  m_header.m_endianness = m_opposite_endianness ? KTX_OPPOSITE_ENDIAN : KTX_ENDIAN;
+
+  if (m_block_dim == 1) {
+    m_header.m_glTypeSize = get_ogl_type_size(m_header.m_glType);
+    m_header.m_glBaseInternalFormat = m_header.m_glFormat;
+  } else {
+    m_header.m_glBaseInternalFormat = get_ogl_base_internal_fmt(m_header.m_glInternalFormat);
+  }
+
+  m_header.m_bytesOfKeyValueData = 0;
+  if (!no_keyvalue_data) {
+    for (uint i = 0; i < m_key_values.size(); i++)
+      m_header.m_bytesOfKeyValueData += sizeof(uint32) + ((m_key_values[i].size() + 3) & ~3);
+  }
+
+  if (m_opposite_endianness)
+    m_header.endian_swap();
+
+  bool success = (serializer.write(&m_header, sizeof(m_header), 1) == 1);
+
+  if (m_opposite_endianness)
+    m_header.endian_swap();
+
+  if (!success)
+    return success;
+
+  uint total_key_value_bytes = 0;
+  const uint8 padding[3] = {0, 0, 0};
+
+  if (!no_keyvalue_data) {
+    for (uint i = 0; i < m_key_values.size(); i++) {
+      uint32 key_value_size = m_key_values[i].size();
+
+      if (m_opposite_endianness)
+        key_value_size = utils::swap32(key_value_size);
+
+      success = (serializer.write(&key_value_size, sizeof(key_value_size), 1) == 1);
+      total_key_value_bytes += sizeof(key_value_size);
+
+      if (m_opposite_endianness)
+        key_value_size = utils::swap32(key_value_size);
+
+      if (!success)
+        return false;
+
+      if (key_value_size) {
+        if (serializer.write(&m_key_values[i][0], key_value_size, 1) != 1)
+          return false;
+        total_key_value_bytes += key_value_size;
+
+        uint num_padding = 3 - ((key_value_size + 3) % 4);
+        if ((num_padding) && (serializer.write(padding, num_padding, 1) != 1))
+          return false;
+        total_key_value_bytes += num_padding;
+      }
+    }
+    (void)total_key_value_bytes;
+  }
+
+  CRNLIB_ASSERT(total_key_value_bytes == m_header.m_bytesOfKeyValueData);
+
+  for (uint mip_level = 0; mip_level < get_num_mips(); mip_level++) {
+    uint mip_width, mip_height, mip_depth;
+    get_mip_dim(mip_level, mip_width, mip_height, mip_depth);
+
+    const uint mip_row_blocks = (mip_width + m_block_dim - 1) / m_block_dim;
+    const uint mip_col_blocks = (mip_height + m_block_dim - 1) / m_block_dim;
+    if ((!mip_row_blocks) || (!mip_col_blocks))
+      return false;
+
+    uint32 image_size = mip_row_blocks * mip_col_blocks * m_bytes_per_block;
+    if ((m_header.m_numberOfArrayElements) || (get_num_faces() == 1))
+      image_size *= (get_array_size() * get_num_faces() * get_depth());
+
+    if (!image_size)
+      return false;
+
+    if (m_opposite_endianness)
+      image_size = utils::swap32(image_size);
+
+    success = (serializer.write(&image_size, sizeof(image_size), 1) == 1);
+
+    if (m_opposite_endianness)
+      image_size = utils::swap32(image_size);
+
+    if (!success)
+      return false;
+
+    uint total_mip_size = 0;
+
+    if ((!m_header.m_numberOfArrayElements) && (get_num_faces() == 6)) {
+      // plain non-array cubemap
+      for (uint face = 0; face < get_num_faces(); face++) {
+        const uint8_vec& image_data = get_image_data(get_image_index(mip_level, 0, face, 0));
+        if ((!image_data.size()) || (image_data.size() != image_size))
+          return false;
+
+        if (m_opposite_endianness) {
+          uint8_vec tmp_image_data(image_data);
+          utils::endian_swap_mem(&tmp_image_data[0], tmp_image_data.size(), m_header.m_glTypeSize);
+          if (serializer.write(&tmp_image_data[0], tmp_image_data.size(), 1) != 1)
+            return false;
+        } else if (serializer.write(&image_data[0], image_data.size(), 1) != 1)
+          return false;
+
+        uint num_cube_pad_bytes = 3 - ((image_data.size() + 3) % 4);
+        if ((num_cube_pad_bytes) && (serializer.write(padding, num_cube_pad_bytes, 1) != 1))
+          return false;
+
+        total_mip_size += image_size + num_cube_pad_bytes;
+      }
+    } else {
+      // 1D, 2D, 3D (normal or array texture), or array cubemap
+      for (uint array_element = 0; array_element < get_array_size(); array_element++) {
+        for (uint face = 0; face < get_num_faces(); face++) {
+          for (uint zslice = 0; zslice < mip_depth; zslice++) {
+            const uint8_vec& image_data = get_image_data(get_image_index(mip_level, array_element, face, zslice));
+            if (!image_data.size())
+              return false;
+
+            if (m_opposite_endianness) {
+              uint8_vec tmp_image_data(image_data);
+              utils::endian_swap_mem(&tmp_image_data[0], tmp_image_data.size(), m_header.m_glTypeSize);
+              if (serializer.write(&tmp_image_data[0], tmp_image_data.size(), 1) != 1)
+                return false;
+            } else if (serializer.write(&image_data[0], image_data.size(), 1) != 1)
+              return false;
+
+            total_mip_size += image_data.size();
+          }
+        }
+      }
+
+      uint num_mip_pad_bytes = 3 - ((total_mip_size + 3) % 4);
+      if ((num_mip_pad_bytes) && (serializer.write(padding, num_mip_pad_bytes, 1) != 1))
+        return false;
+      total_mip_size += num_mip_pad_bytes;
+    }
+    CRNLIB_ASSERT((total_mip_size & 3) == 0);
+  }
+
+  return true;
+}
+
+bool ktx_texture::init_2D(uint width, uint height, uint num_mips, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type) {
+  clear();
+
+  m_header.m_pixelWidth = width;
+  m_header.m_pixelHeight = height;
+  m_header.m_numberOfMipmapLevels = num_mips;
+  m_header.m_glInternalFormat = ogl_internal_fmt;
+  m_header.m_glFormat = ogl_fmt;
+  m_header.m_glType = ogl_type;
+  m_header.m_numberOfFaces = 1;
+
+  if (!compute_pixel_info())
+    return false;
+
+  return true;
+}
+
+bool ktx_texture::init_2D_array(uint width, uint height, uint num_mips, uint array_size, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type) {
+  clear();
+
+  m_header.m_pixelWidth = width;
+  m_header.m_pixelHeight = height;
+  m_header.m_numberOfMipmapLevels = num_mips;
+  m_header.m_numberOfArrayElements = array_size;
+  m_header.m_glInternalFormat = ogl_internal_fmt;
+  m_header.m_glFormat = ogl_fmt;
+  m_header.m_glType = ogl_type;
+  m_header.m_numberOfFaces = 1;
+
+  if (!compute_pixel_info())
+    return false;
+
+  return true;
+}
+
+bool ktx_texture::init_3D(uint width, uint height, uint depth, uint num_mips, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type) {
+  clear();
+
+  m_header.m_pixelWidth = width;
+  m_header.m_pixelHeight = height;
+  m_header.m_pixelDepth = depth;
+  m_header.m_numberOfMipmapLevels = num_mips;
+  m_header.m_glInternalFormat = ogl_internal_fmt;
+  m_header.m_glFormat = ogl_fmt;
+  m_header.m_glType = ogl_type;
+  m_header.m_numberOfFaces = 1;
+
+  if (!compute_pixel_info())
+    return false;
+
+  return true;
+}
+
+bool ktx_texture::init_cubemap(uint dim, uint num_mips, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type) {
+  clear();
+
+  m_header.m_pixelWidth = dim;
+  m_header.m_pixelHeight = dim;
+  m_header.m_numberOfMipmapLevels = num_mips;
+  m_header.m_glInternalFormat = ogl_internal_fmt;
+  m_header.m_glFormat = ogl_fmt;
+  m_header.m_glType = ogl_type;
+  m_header.m_numberOfFaces = 6;
+
+  if (!compute_pixel_info())
+    return false;
+
+  return true;
+}
+
+bool ktx_texture::check_header() const {
+  if (((get_num_faces() != 1) && (get_num_faces() != 6)) || (!m_header.m_pixelWidth))
+    return false;
+
+  if ((!m_header.m_pixelHeight) && (m_header.m_pixelDepth))
+    return false;
+
+  if ((get_num_faces() == 6) && ((m_header.m_pixelDepth) || (!m_header.m_pixelHeight)))
+    return false;
+
+  if (m_header.m_numberOfMipmapLevels) {
+    const uint max_mipmap_dimension = 1U << (m_header.m_numberOfMipmapLevels - 1U);
+    if (max_mipmap_dimension > (CRNLIB_MAX(CRNLIB_MAX(m_header.m_pixelWidth, m_header.m_pixelHeight), m_header.m_pixelDepth)))
+      return false;
+  }
+
+  return true;
+}
+
+bool ktx_texture::consistency_check() const {
+  if (!check_header())
+    return false;
+
+  uint block_dim = 0, bytes_per_block = 0;
+  if ((!m_header.m_glType) || (!m_header.m_glFormat)) {
+    if ((m_header.m_glType) || (m_header.m_glFormat))
+      return false;
+    if (!get_ogl_fmt_desc(m_header.m_glInternalFormat, m_header.m_glType, block_dim, bytes_per_block))
+      return false;
+    if (block_dim == 1)
+      return false;
+    //if ((get_width() % block_dim) || (get_height() % block_dim))
+    //   return false;
+  } else {
+    if (!get_ogl_fmt_desc(m_header.m_glFormat, m_header.m_glType, block_dim, bytes_per_block))
+      return false;
+    if (block_dim > 1)
+      return false;
+  }
+  if ((m_block_dim != block_dim) || (m_bytes_per_block != bytes_per_block))
+    return false;
+
+  if (m_image_data.size() != get_total_images())
+    return false;
+
+  for (uint mip_level = 0; mip_level < get_num_mips(); mip_level++) {
+    uint mip_width, mip_height, mip_depth;
+    get_mip_dim(mip_level, mip_width, mip_height, mip_depth);
+
+    const uint mip_row_blocks = (mip_width + m_block_dim - 1) / m_block_dim;
+    const uint mip_col_blocks = (mip_height + m_block_dim - 1) / m_block_dim;
+    if ((!mip_row_blocks) || (!mip_col_blocks))
+      return false;
+
+    for (uint array_element = 0; array_element < get_array_size(); array_element++) {
+      for (uint face = 0; face < get_num_faces(); face++) {
+        for (uint zslice = 0; zslice < mip_depth; zslice++) {
+          const uint8_vec& image_data = get_image_data(get_image_index(mip_level, array_element, face, zslice));
+
+          uint expected_image_size = mip_row_blocks * mip_col_blocks * m_bytes_per_block;
+          if (image_data.size() != expected_image_size)
+            return false;
+        }
+      }
+    }
+  }
+
+  return true;
+}
+
+const uint8_vec* ktx_texture::find_key(const char* pKey) const {
+  const size_t n = strlen(pKey) + 1;
+  for (uint i = 0; i < m_key_values.size(); i++) {
+    const uint8_vec& v = m_key_values[i];
+    if ((v.size() >= n) && (!memcmp(&v[0], pKey, n)))
+      return &v;
+  }
+
+  return NULL;
+}
+
+bool ktx_texture::get_key_value_as_string(const char* pKey, dynamic_string& str) const {
+  const uint8_vec* p = find_key(pKey);
+  if (!p) {
+    str.clear();
+    return false;
+  }
+
+  const uint ofs = (static_cast<uint>(strlen(pKey)) + 1);
+  const uint8* pValue = p->get_ptr() + ofs;
+  const uint n = p->size() - ofs;
+
+  uint i;
+  for (i = 0; i < n; i++)
+    if (!pValue[i])
+      break;
+
+  str.set_from_buf(pValue, i);
+  return true;
+}
+
+uint ktx_texture::add_key_value(const char* pKey, const void* pVal, uint val_size) {
+  const uint idx = m_key_values.size();
+  m_key_values.resize(idx + 1);
+  uint8_vec& v = m_key_values.back();
+  v.append(reinterpret_cast<const uint8*>(pKey), static_cast<uint>(strlen(pKey)) + 1);
+  v.append(static_cast<const uint8*>(pVal), val_size);
+  return idx;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,291 @@
+// File: crn_ktx_texture.h
+#ifndef _KTX_TEXTURE_H_
+#define _KTX_TEXTURE_H_
+#ifdef _MSC_VER
+#pragma once
+#endif
+
+#include "crn_data_stream_serializer.h"
+
+#define KTX_ENDIAN 0x04030201
+#define KTX_OPPOSITE_ENDIAN 0x01020304
+
+namespace crnlib {
+extern const uint8 s_ktx_file_id[12];
+
+struct ktx_header {
+  uint8 m_identifier[12];
+  uint32 m_endianness;
+  uint32 m_glType;
+  uint32 m_glTypeSize;
+  uint32 m_glFormat;
+  uint32 m_glInternalFormat;
+  uint32 m_glBaseInternalFormat;
+  uint32 m_pixelWidth;
+  uint32 m_pixelHeight;
+  uint32 m_pixelDepth;
+  uint32 m_numberOfArrayElements;
+  uint32 m_numberOfFaces;
+  uint32 m_numberOfMipmapLevels;
+  uint32 m_bytesOfKeyValueData;
+
+  void clear() {
+    memset(this, 0, sizeof(*this));
+  }
+
+  void endian_swap() {
+    utils::endian_swap_mem32(&m_endianness, (sizeof(*this) - sizeof(m_identifier)) / sizeof(uint32));
+  }
+};
+
+typedef crnlib::vector<uint8_vec> ktx_key_value_vec;
+typedef crnlib::vector<uint8_vec> ktx_image_data_vec;
+
+// Compressed pixel data formats: ETC1, DXT1, DXT3, DXT5
+enum {
+  KTX_ETC1_RGB8_OES = 0x8D64,
+  KTX_COMPRESSED_RGB8_ETC2 = 0x9274,
+  KTX_COMPRESSED_RGBA8_ETC2_EAC = 0x9278,
+  KTX_RGB_S3TC = 0x83A0,
+  KTX_RGB4_S3TC = 0x83A1,
+  KTX_COMPRESSED_RGB_S3TC_DXT1_EXT = 0x83F0,
+  KTX_COMPRESSED_RGBA_S3TC_DXT1_EXT = 0x83F1,
+  KTX_COMPRESSED_SRGB_S3TC_DXT1_EXT = 0x8C4C,
+  KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT = 0x8C4D,
+  KTX_RGBA_S3TC = 0x83A2,
+  KTX_RGBA4_S3TC = 0x83A3,
+  KTX_COMPRESSED_RGBA_S3TC_DXT3_EXT = 0x83F2,
+  KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT = 0x8C4E,
+  KTX_COMPRESSED_RGBA_S3TC_DXT5_EXT = 0x83F3,
+  KTX_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT = 0x8C4F,
+  KTX_RGBA_DXT5_S3TC = 0x83A4,
+  KTX_RGBA4_DXT5_S3TC = 0x83A5,
+  KTX_COMPRESSED_RED_RGTC1_EXT = 0x8DBB,
+  KTX_COMPRESSED_SIGNED_RED_RGTC1_EXT = 0x8DBC,
+  KTX_COMPRESSED_RED_GREEN_RGTC2_EXT = 0x8DBD,
+  KTX_COMPRESSED_SIGNED_RED_GREEN_RGTC2_EXT = 0x8DBE,
+  KTX_COMPRESSED_LUMINANCE_LATC1_EXT = 0x8C70,
+  KTX_COMPRESSED_SIGNED_LUMINANCE_LATC1_EXT = 0x8C71,
+  KTX_COMPRESSED_LUMINANCE_ALPHA_LATC2_EXT = 0x8C72,
+  KTX_COMPRESSED_SIGNED_LUMINANCE_ALPHA_LATC2_EXT = 0x8C73
+};
+
+// Pixel formats (various internal, base, and base internal formats)
+enum {
+  KTX_R8 = 0x8229,
+  KTX_R8UI = 0x8232,
+  KTX_RGB8 = 0x8051,
+  KTX_SRGB8 = 0x8C41,
+  KTX_SRGB = 0x8C40,
+  KTX_SRGB_ALPHA = 0x8C42,
+  KTX_SRGB8_ALPHA8 = 0x8C43,
+  KTX_RGBA8 = 0x8058,
+  KTX_STENCIL_INDEX = 0x1901,
+  KTX_DEPTH_COMPONENT = 0x1902,
+  KTX_DEPTH_STENCIL = 0x84F9,
+  KTX_RED = 0x1903,
+  KTX_GREEN = 0x1904,
+  KTX_BLUE = 0x1905,
+  KTX_ALPHA = 0x1906,
+  KTX_RG = 0x8227,
+  KTX_RGB = 0x1907,
+  KTX_RGBA = 0x1908,
+  KTX_BGR = 0x80E0,
+  KTX_BGRA = 0x80E1,
+  KTX_RED_INTEGER = 0x8D94,
+  KTX_GREEN_INTEGER = 0x8D95,
+  KTX_BLUE_INTEGER = 0x8D96,
+  KTX_ALPHA_INTEGER = 0x8D97,
+  KTX_RGB_INTEGER = 0x8D98,
+  KTX_RGBA_INTEGER = 0x8D99,
+  KTX_BGR_INTEGER = 0x8D9A,
+  KTX_BGRA_INTEGER = 0x8D9B,
+  KTX_LUMINANCE = 0x1909,
+  KTX_LUMINANCE_ALPHA = 0x190A,
+  KTX_RG_INTEGER = 0x8228,
+  KTX_RG8 = 0x822B,
+  KTX_ALPHA8 = 0x803C,
+  KTX_LUMINANCE8 = 0x8040,
+  KTX_LUMINANCE8_ALPHA8 = 0x8045
+};
+
+// Pixel data types
+enum {
+  KTX_UNSIGNED_BYTE = 0x1401,
+  KTX_BYTE = 0x1400,
+  KTX_UNSIGNED_SHORT = 0x1403,
+  KTX_SHORT = 0x1402,
+  KTX_UNSIGNED_INT = 0x1405,
+  KTX_INT = 0x1404,
+  KTX_HALF_FLOAT = 0x140B,
+  KTX_FLOAT = 0x1406,
+  KTX_UNSIGNED_BYTE_3_3_2 = 0x8032,
+  KTX_UNSIGNED_BYTE_2_3_3_REV = 0x8362,
+  KTX_UNSIGNED_SHORT_5_6_5 = 0x8363,
+  KTX_UNSIGNED_SHORT_5_6_5_REV = 0x8364,
+  KTX_UNSIGNED_SHORT_4_4_4_4 = 0x8033,
+  KTX_UNSIGNED_SHORT_4_4_4_4_REV = 0x8365,
+  KTX_UNSIGNED_SHORT_5_5_5_1 = 0x8034,
+  KTX_UNSIGNED_SHORT_1_5_5_5_REV = 0x8366,
+  KTX_UNSIGNED_INT_8_8_8_8 = 0x8035,
+  KTX_UNSIGNED_INT_8_8_8_8_REV = 0x8367,
+  KTX_UNSIGNED_INT_10_10_10_2 = 0x8036,
+  KTX_UNSIGNED_INT_2_10_10_10_REV = 0x8368,
+  KTX_UNSIGNED_INT_24_8 = 0x84FA,
+  KTX_UNSIGNED_INT_10F_11F_11F_REV = 0x8C3B,
+  KTX_UNSIGNED_INT_5_9_9_9_REV = 0x8C3E,
+  KTX_FLOAT_32_UNSIGNED_INT_24_8_REV = 0x8DAD
+};
+
+bool is_packed_pixel_ogl_type(uint32 ogl_type);
+uint get_ogl_type_size(uint32 ogl_type);
+bool get_ogl_fmt_desc(uint32 ogl_fmt, uint32 ogl_type, uint& block_dim, uint& bytes_per_block);
+uint get_ogl_type_size(uint32 ogl_type);
+uint32 get_ogl_base_internal_fmt(uint32 ogl_fmt);
+
+class ktx_texture {
+ public:
+  ktx_texture() {
+    clear();
+  }
+
+  ktx_texture(const ktx_texture& other) {
+    *this = other;
+  }
+
+  ktx_texture& operator=(const ktx_texture& rhs) {
+    if (this == &rhs)
+      return *this;
+
+    clear();
+
+    m_header = rhs.m_header;
+    m_key_values = rhs.m_key_values;
+    m_image_data = rhs.m_image_data;
+    m_block_dim = rhs.m_block_dim;
+    m_bytes_per_block = rhs.m_bytes_per_block;
+    m_opposite_endianness = rhs.m_opposite_endianness;
+
+    return *this;
+  }
+
+  void clear() {
+    m_header.clear();
+    m_key_values.clear();
+    m_image_data.clear();
+
+    m_block_dim = 0;
+    m_bytes_per_block = 0;
+
+    m_opposite_endianness = false;
+  }
+
+  // High level methods
+  bool read_from_stream(data_stream_serializer& serializer);
+  bool write_to_stream(data_stream_serializer& serializer, bool no_keyvalue_data = false);
+
+  bool init_2D(uint width, uint height, uint num_mips, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type);
+  bool init_2D_array(uint width, uint height, uint num_mips, uint array_size, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type);
+  bool init_3D(uint width, uint height, uint depth, uint num_mips, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type);
+  bool init_cubemap(uint dim, uint num_mips, uint32 ogl_internal_fmt, uint32 ogl_fmt, uint32 ogl_type);
+
+  bool check_header() const;
+  bool consistency_check() const;
+
+  // General info
+
+  bool is_valid() const { return (m_header.m_pixelWidth > 0) && (m_image_data.size() > 0); }
+
+  uint get_width() const { return m_header.m_pixelWidth; }
+  uint get_height() const { return CRNLIB_MAX(m_header.m_pixelHeight, 1); }
+  uint get_depth() const { return CRNLIB_MAX(m_header.m_pixelDepth, 1); }
+  uint get_num_mips() const { return CRNLIB_MAX(m_header.m_numberOfMipmapLevels, 1); }
+  uint get_array_size() const { return CRNLIB_MAX(m_header.m_numberOfArrayElements, 1); }
+  uint get_num_faces() const { return m_header.m_numberOfFaces; }
+
+  uint32 get_ogl_type() const { return m_header.m_glType; }
+  uint32 get_ogl_fmt() const { return m_header.m_glFormat; }
+  uint32 get_ogl_base_fmt() const { return m_header.m_glBaseInternalFormat; }
+  uint32 get_ogl_internal_fmt() const { return m_header.m_glInternalFormat; }
+
+  uint get_total_images() const { return get_num_mips() * (get_depth() * get_num_faces() * get_array_size()); }
+
+  bool is_compressed() const { return m_block_dim > 1; }
+  bool is_uncompressed() const { return !is_compressed(); }
+
+  bool get_opposite_endianness() const { return m_opposite_endianness; }
+  void set_opposite_endianness(bool flag) { m_opposite_endianness = flag; }
+
+  uint32 get_block_dim() const { return m_block_dim; }
+  uint32 get_bytes_per_block() const { return m_bytes_per_block; }
+
+  const ktx_header& get_header() const { return m_header; }
+
+  // Key values
+  const ktx_key_value_vec& get_key_value_vec() const { return m_key_values; }
+  ktx_key_value_vec& get_key_value_vec() { return m_key_values; }
+
+  const uint8_vec* find_key(const char* pKey) const;
+  bool get_key_value_as_string(const char* pKey, dynamic_string& str) const;
+
+  uint add_key_value(const char* pKey, const void* pVal, uint val_size);
+  uint add_key_value(const char* pKey, const char* pVal) { return add_key_value(pKey, pVal, static_cast<uint>(strlen(pVal)) + 1); }
+
+  // Image data
+  uint get_num_images() const { return m_image_data.size(); }
+
+  const uint8_vec& get_image_data(uint image_index) const { return m_image_data[image_index]; }
+  uint8_vec& get_image_data(uint image_index) { return m_image_data[image_index]; }
+
+  const uint8_vec& get_image_data(uint mip_index, uint array_index, uint face_index, uint zslice_index) const { return get_image_data(get_image_index(mip_index, array_index, face_index, zslice_index)); }
+  uint8_vec& get_image_data(uint mip_index, uint array_index, uint face_index, uint zslice_index) { return get_image_data(get_image_index(mip_index, array_index, face_index, zslice_index)); }
+
+  const ktx_image_data_vec& get_image_data_vec() const { return m_image_data; }
+  ktx_image_data_vec& get_image_data_vec() { return m_image_data; }
+
+  void add_image(uint face_index, uint mip_index, const void* pImage, uint image_size) {
+    const uint image_index = get_image_index(mip_index, 0, face_index, 0);
+    if (image_index >= m_image_data.size())
+      m_image_data.resize(image_index + 1);
+    if (image_size) {
+      uint8_vec& v = m_image_data[image_index];
+      v.resize(image_size);
+      memcpy(&v[0], pImage, image_size);
+    }
+  }
+
+  uint get_image_index(uint mip_index, uint array_index, uint face_index, uint zslice_index) const {
+    CRNLIB_ASSERT((mip_index < get_num_mips()) && (array_index < get_array_size()) && (face_index < get_num_faces()) && (zslice_index < get_depth()));
+    return zslice_index + (face_index * get_depth()) + (array_index * (get_depth() * get_num_faces())) + (mip_index * (get_depth() * get_num_faces() * get_array_size()));
+  }
+
+  void get_mip_dim(uint mip_index, uint& mip_width, uint& mip_height) const {
+    CRNLIB_ASSERT(mip_index < get_num_mips());
+    mip_width = CRNLIB_MAX(get_width() >> mip_index, 1);
+    mip_height = CRNLIB_MAX(get_height() >> mip_index, 1);
+  }
+
+  void get_mip_dim(uint mip_index, uint& mip_width, uint& mip_height, uint& mip_depth) const {
+    CRNLIB_ASSERT(mip_index < get_num_mips());
+    mip_width = CRNLIB_MAX(get_width() >> mip_index, 1);
+    mip_height = CRNLIB_MAX(get_height() >> mip_index, 1);
+    mip_depth = CRNLIB_MAX(get_depth() >> mip_index, 1);
+  }
+
+ private:
+  ktx_header m_header;
+
+  ktx_key_value_vec m_key_values;
+  ktx_image_data_vec m_image_data;
+
+  uint32 m_block_dim;
+  uint32 m_bytes_per_block;
+
+  bool m_opposite_endianness;
+
+  bool compute_pixel_info();
+};
+
+}  // namespace crnlib
+
+#endif  // #ifndef _KTX_TEXTURE_H_
@@ -0,0 +1,132 @@
+// File: crn_lzma_codec.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_lzma_codec.h"
+#include "crn_strutils.h"
+#include "crn_checksum.h"
+#include "lzma_LzmaLib.h"
+#include "crn_threading.h"
+
+namespace crnlib {
+lzma_codec::lzma_codec()
+    : m_pCompress(LzmaCompress),
+      m_pUncompress(LzmaUncompress) {
+  CRNLIB_ASSUME(cLZMAPropsSize == LZMA_PROPS_SIZE);
+}
+
+lzma_codec::~lzma_codec() {
+}
+
+bool lzma_codec::pack(const void* p, uint n, crnlib::vector<uint8>& buf) {
+  if (n > 1024U * 1024U * 1024U)
+    return false;
+
+  uint max_comp_size = n + math::maximum<uint>(128, n >> 8);
+  buf.resize(sizeof(header) + max_comp_size);
+
+  header* pHDR = reinterpret_cast<header*>(&buf[0]);
+  uint8* pComp_data = &buf[sizeof(header)];
+
+  utils::zero_object(*pHDR);
+
+  pHDR->m_uncomp_size = n;
+  pHDR->m_adler32 = adler32(p, n);
+
+  if (n) {
+    size_t destLen = 0;
+    size_t outPropsSize = 0;
+    int status = SZ_ERROR_INPUT_EOF;
+
+    for (uint trial = 0; trial < 3; trial++) {
+      destLen = max_comp_size;
+      outPropsSize = cLZMAPropsSize;
+
+      status = (*m_pCompress)(pComp_data, &destLen, reinterpret_cast<const unsigned char*>(p), n,
+                              pHDR->m_lzma_props, &outPropsSize,
+                              -1, /* 0 <= level <= 9, default = 5 */
+                              0,  /* default = (1 << 24) */
+                              -1, /* 0 <= lc <= 8, default = 3  */
+                              -1, /* 0 <= lp <= 4, default = 0  */
+                              -1, /* 0 <= pb <= 4, default = 2  */
+                              -1, /* 5 <= fb <= 273, default = 32 */
+#ifdef WIN32
+                              (g_number_of_processors > 1) ? 2 : 1
+#else
+                              1
+#endif
+                              );
+
+      if (status != SZ_ERROR_OUTPUT_EOF)
+        break;
+
+      max_comp_size += ((n + 1) / 2);
+      buf.resize(sizeof(header) + max_comp_size);
+      pHDR = reinterpret_cast<header*>(&buf[0]);
+      pComp_data = &buf[sizeof(header)];
+    }
+
+    if (status != SZ_OK) {
+      buf.clear();
+      return false;
+    }
+
+    pHDR->m_comp_size = static_cast<uint>(destLen);
+
+    buf.resize(CRNLIB_SIZEOF_U32(header) + static_cast<uint32>(destLen));
+  }
+
+  pHDR->m_sig = header::cSig;
+  pHDR->m_checksum = static_cast<uint8>(adler32((uint8*)pHDR + header::cChecksumSkipBytes, sizeof(header) - header::cChecksumSkipBytes));
+
+  return true;
+}
+
+bool lzma_codec::unpack(const void* p, uint n, crnlib::vector<uint8>& buf) {
+  buf.resize(0);
+
+  if (n < sizeof(header))
+    return false;
+
+  const header& hdr = *static_cast<const header*>(p);
+  if (hdr.m_sig != header::cSig)
+    return false;
+
+  if (static_cast<uint8>(adler32((const uint8*)&hdr + header::cChecksumSkipBytes, sizeof(hdr) - header::cChecksumSkipBytes)) != hdr.m_checksum)
+    return false;
+
+  if (!hdr.m_uncomp_size)
+    return true;
+
+  if (!hdr.m_comp_size)
+    return false;
+
+  if (hdr.m_uncomp_size > 1024U * 1024U * 1024U)
+    return false;
+
+  if (!buf.try_resize(hdr.m_uncomp_size))
+    return false;
+
+  const uint8* pComp_data = static_cast<const uint8*>(p) + sizeof(header);
+  size_t srcLen = n - sizeof(header);
+  if (srcLen < hdr.m_comp_size)
+    return false;
+
+  size_t destLen = hdr.m_uncomp_size;
+
+  int status = (*m_pUncompress)(&buf[0], &destLen, pComp_data, &srcLen,
+                                hdr.m_lzma_props, cLZMAPropsSize);
+
+  if ((status != SZ_OK) || (destLen != hdr.m_uncomp_size)) {
+    buf.clear();
+    return false;
+  }
+
+  if (adler32(&buf[0], buf.size()) != hdr.m_adler32) {
+    buf.clear();
+    return false;
+  }
+
+  return true;
+}
+
+}  // namespace crnlib
@@ -0,0 +1,57 @@
+// File: crn_lzma_codec.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_packed_uint.h"
+
+namespace crnlib {
+class lzma_codec {
+ public:
+  lzma_codec();
+  ~lzma_codec();
+
+  // Always available, because we're statically linking in lzmalib now vs. dynamically loading the DLL.
+  bool is_initialized() const { return true; }
+
+  bool pack(const void* p, uint n, crnlib::vector<uint8>& buf);
+
+  bool unpack(const void* p, uint n, crnlib::vector<uint8>& buf);
+
+ private:
+  typedef int(CRNLIB_STDCALL* LzmaCompressFuncPtr)(unsigned char* dest, size_t* destLen, const unsigned char* src, size_t srcLen,
+                                                   unsigned char* outProps, size_t* outPropsSize, /* *outPropsSize must be = 5 */
+                                                   int level,                                     /* 0 <= level <= 9, default = 5 */
+                                                   unsigned dictSize,                             /* default = (1 << 24) */
+                                                   int lc,                                        /* 0 <= lc <= 8, default = 3  */
+                                                   int lp,                                        /* 0 <= lp <= 4, default = 0  */
+                                                   int pb,                                        /* 0 <= pb <= 4, default = 2  */
+                                                   int fb,                                        /* 5 <= fb <= 273, default = 32 */
+                                                   int numThreads                                 /* 1 or 2, default = 2 */
+                                                   );
+
+  typedef int(CRNLIB_STDCALL* LzmaUncompressFuncPtr)(unsigned char* dest, size_t* destLen, const unsigned char* src, size_t* srcLen,
+                                                     const unsigned char* props, size_t propsSize);
+
+  LzmaCompressFuncPtr m_pCompress;
+  LzmaUncompressFuncPtr m_pUncompress;
+
+  enum { cLZMAPropsSize = 5 };
+
+#pragma pack(push)
+#pragma pack(1)
+  struct header {
+    enum { cSig = 'L' | ('0' << 8),
+           cChecksumSkipBytes = 3 };
+    packed_uint<2> m_sig;
+    uint8 m_checksum;
+
+    uint8 m_lzma_props[cLZMAPropsSize];
+
+    packed_uint<4> m_comp_size;
+    packed_uint<4> m_uncomp_size;
+
+    packed_uint<4> m_adler32;
+  };
+#pragma pack(pop)
+};
+
+}  // namespace crnlib
@@ -0,0 +1,67 @@
+// File: crn_math.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+
+namespace crnlib {
+namespace math {
+uint g_bitmasks[32] =
+    {
+        1U << 0U, 1U << 1U, 1U << 2U, 1U << 3U,
+        1U << 4U, 1U << 5U, 1U << 6U, 1U << 7U,
+        1U << 8U, 1U << 9U, 1U << 10U, 1U << 11U,
+        1U << 12U, 1U << 13U, 1U << 14U, 1U << 15U,
+        1U << 16U, 1U << 17U, 1U << 18U, 1U << 19U,
+        1U << 20U, 1U << 21U, 1U << 22U, 1U << 23U,
+        1U << 24U, 1U << 25U, 1U << 26U, 1U << 27U,
+        1U << 28U, 1U << 29U, 1U << 30U, 1U << 31U};
+
+double compute_entropy(const uint8* p, uint n) {
+  uint hist[256];
+  utils::zero_object(hist);
+
+  for (uint i = 0; i < n; i++)
+    hist[*p++]++;
+
+  double entropy = 0.0f;
+
+  const double invln2 = 1.0f / log(2.0f);
+  for (uint i = 0; i < 256; i++) {
+    if (!hist[i])
+      continue;
+
+    double prob = static_cast<double>(hist[i]) / n;
+    entropy += (-log(prob) * invln2) * hist[i];
+  }
+
+  return entropy;
+}
+
+void compute_lower_pow2_dim(int& width, int& height) {
+  const int tex_width = width;
+  const int tex_height = height;
+
+  width = 1;
+  for (;;) {
+    if ((width * 2) > tex_width)
+      break;
+    width *= 2;
+  }
+
+  height = 1;
+  for (;;) {
+    if ((height * 2) > tex_height)
+      break;
+    height *= 2;
+  }
+}
+
+void compute_upper_pow2_dim(int& width, int& height) {
+  if (!math::is_power_of_2((uint32)width))
+    width = math::next_pow2((uint32)width);
+
+  if (!math::is_power_of_2((uint32)height))
+    height = math::next_pow2((uint32)height);
+}
+
+}  // namespace math
+}  // namespace crnlib
@@ -0,0 +1,280 @@
+// File: crn_math.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+#if defined(_M_IX86) && defined(_MSC_VER)
+#include <intrin.h>
+#pragma intrinsic(__emulu)
+unsigned __int64 __emulu(unsigned int a, unsigned int b);
+#endif
+
+namespace crnlib {
+namespace math {
+const float cNearlyInfinite = 1.0e+37f;
+
+const float cDegToRad = 0.01745329252f;
+const float cRadToDeg = 57.29577951f;
+
+extern uint g_bitmasks[32];
+
+template <typename T>
+inline bool within_closed_range(T a, T b, T c) {
+  return (a >= b) && (a <= c);
+}
+
+template <typename T>
+inline bool within_open_range(T a, T b, T c) {
+  return (a >= b) && (a < c);
+}
+
+// Yes I know these should probably be pass by ref, not val:
+// http://www.stepanovpapers.com/notes.pdf
+// Just don't use them on non-simple (non built-in) types!
+template <typename T>
+inline T minimum(T a, T b) {
+  return (a < b) ? a : b;
+}
+
+template <typename T>
+inline T minimum(T a, T b, T c) {
+  return minimum(minimum(a, b), c);
+}
+
+template <typename T>
+inline T maximum(T a, T b) {
+  return (a > b) ? a : b;
+}
+
+template <typename T>
+inline T maximum(T a, T b, T c) {
+  return maximum(maximum(a, b), c);
+}
+
+template <typename T, typename U>
+inline T lerp(T a, T b, U c) {
+  return a + (b - a) * c;
+}
+
+template <typename T>
+inline T clamp(T value, T low, T high) {
+  return (value < low) ? low : ((value > high) ? high : value);
+}
+
+template <typename T>
+inline T saturate(T value) {
+  return (value < 0.0f) ? 0.0f : ((value > 1.0f) ? 1.0f : value);
+}
+
+inline int float_to_int(float f) {
+  return static_cast<int>(f);
+}
+
+inline uint float_to_uint(float f) {
+  return static_cast<uint>(f);
+}
+
+inline int float_to_int(double f) {
+  return static_cast<int>(f);
+}
+
+inline uint float_to_uint(double f) {
+  return static_cast<uint>(f);
+}
+
+inline int float_to_int_round(float f) {
+  return static_cast<int>((f < 0.0f) ? -floor(-f + .5f) : floor(f + .5f));
+}
+
+inline uint float_to_uint_round(float f) {
+  return static_cast<uint>((f < 0.0f) ? 0.0f : floor(f + .5f));
+}
+
+template <typename T>
+inline int sign(T value) {
+  return (value < 0) ? -1 : ((value > 0) ? 1 : 0);
+}
+
+template <typename T>
+inline T square(T value) {
+  return value * value;
+}
+
+inline bool is_power_of_2(uint32 x) {
+  return x && ((x & (x - 1U)) == 0U);
+}
+inline bool is_power_of_2(uint64 x) {
+  return x && ((x & (x - 1U)) == 0U);
+}
+
+template <typename T>
+inline T align_up_value(T x, uint alignment) {
+  CRNLIB_ASSERT(is_power_of_2(alignment));
+  uint q = static_cast<uint>(x);
+  q = (q + alignment - 1) & (~(alignment - 1));
+  return static_cast<T>(q);
+}
+
+template <typename T>
+inline T align_down_value(T x, uint alignment) {
+  CRNLIB_ASSERT(is_power_of_2(alignment));
+  uint q = static_cast<uint>(x);
+  q = q & (~(alignment - 1));
+  return static_cast<T>(q);
+}
+
+template <typename T>
+inline T get_align_up_value_delta(T x, uint alignment) {
+  return align_up_value(x, alignment) - x;
+}
+
+// From "Hackers Delight"
+inline uint32 next_pow2(uint32 val) {
+  val--;
+  val |= val >> 16;
+  val |= val >> 8;
+  val |= val >> 4;
+  val |= val >> 2;
+  val |= val >> 1;
+  return val + 1;
+}
+
+inline uint64 next_pow2(uint64 val) {
+  val--;
+  val |= val >> 32;
+  val |= val >> 16;
+  val |= val >> 8;
+  val |= val >> 4;
+  val |= val >> 2;
+  val |= val >> 1;
+  return val + 1;
+}
+
+inline uint floor_log2i(uint v) {
+  uint l = 0;
+  while (v > 1U) {
+    v >>= 1;
+    l++;
+  }
+  return l;
+}
+
+inline uint ceil_log2i(uint v) {
+  uint l = floor_log2i(v);
+  if ((l != cIntBits) && (v > (1U << l)))
+    l++;
+  return l;
+}
+
+// Returns the total number of bits needed to encode v.
+inline uint total_bits(uint v) {
+  uint l = 0;
+  while (v > 0U) {
+    v >>= 1;
+    l++;
+  }
+  return l;
+}
+
+// Actually counts the number of set bits, but hey
+inline uint bitmask_size(uint mask) {
+  uint size = 0;
+  while (mask) {
+    mask &= (mask - 1U);
+    size++;
+  }
+  return size;
+}
+
+inline uint bitmask_ofs(uint mask) {
+  if (!mask)
+    return 0;
+  uint ofs = 0;
+  while ((mask & 1U) == 0) {
+    mask >>= 1U;
+    ofs++;
+  }
+  return ofs;
+}
+
+// See Bit Twiddling Hacks (public domain)
+// http://www-graphics.stanford.edu/~seander/bithacks.html
+inline uint count_trailing_zero_bits(uint v) {
+  uint c = 32;  // c will be the number of zero bits on the right
+
+  static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
+  static const unsigned int S[] = {1, 2, 4, 8, 16};  // Our Magic Binary Numbers
+
+  for (int i = 4; i >= 0; --i)  // unroll for more speed
+  {
+    if (v & B[i]) {
+      v <<= S[i];
+      c -= S[i];
+    }
+  }
+
+  if (v) {
+    c--;
+  }
+
+  return c;
+}
+
+inline uint count_leading_zero_bits(uint v) {
+  uint temp;
+  uint result = 32U;
+
+  temp = (v >> 16U);
+  if (temp) {
+    result -= 16U;
+    v = temp;
+  }
+  temp = (v >> 8U);
+  if (temp) {
+    result -= 8U;
+    v = temp;
+  }
+  temp = (v >> 4U);
+  if (temp) {
+    result -= 4U;
+    v = temp;
+  }
+  temp = (v >> 2U);
+  if (temp) {
+    result -= 2U;
+    v = temp;
+  }
+  temp = (v >> 1U);
+  if (temp) {
+    result -= 1U;
+    v = temp;
+  }
+
+  if (v & 1U)
+    result--;
+
+  return result;
+}
+
+inline uint64 emulu(uint32 a, uint32 b) {
+#if defined(_M_IX86) && defined(_MSC_VER)
+  return __emulu(a, b);
+#else
+  return static_cast<uint64>(a) * static_cast<uint64>(b);
+#endif
+}
+
+double compute_entropy(const uint8* p, uint n);
+
+void compute_lower_pow2_dim(int& width, int& height);
+void compute_upper_pow2_dim(int& width, int& height);
+
+inline bool equal_tol(float a, float b, float t) {
+  return fabs(a - b) < ((maximum(fabs(a), fabs(b)) + 1.0f) * t);
+}
+
+inline bool equal_tol(double a, double b, double t) {
+  return fabs(a - b) < ((maximum(fabs(a), fabs(b)) + 1.0f) * t);
+}
+}
+
+}  // namespace crnlib
@@ -0,0 +1,494 @@
+// File: crn_matrix.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+#include "crn_vec.h"
+
+namespace crnlib {
+template <class X, class Y, class Z>
+Z& matrix_mul_helper(Z& result, const X& lhs, const Y& rhs) {
+  CRNLIB_ASSUME(Z::num_rows == X::num_rows);
+  CRNLIB_ASSUME(Z::num_cols == Y::num_cols);
+  CRNLIB_ASSUME(X::num_cols == Y::num_rows);
+  CRNLIB_ASSERT((&result != &lhs) && (&result != &rhs));
+  for (int r = 0; r < X::num_rows; r++)
+    for (int c = 0; c < Y::num_cols; c++) {
+      typename Z::scalar_type s = lhs(r, 0) * rhs(0, c);
+      for (uint i = 1; i < X::num_cols; i++)
+        s += lhs(r, i) * rhs(i, c);
+      result(r, c) = s;
+    }
+  return result;
+}
+
+template <class X, class Y, class Z>
+Z& matrix_mul_helper_transpose_lhs(Z& result, const X& lhs, const Y& rhs) {
+  CRNLIB_ASSUME(Z::num_rows == X::num_cols);
+  CRNLIB_ASSUME(Z::num_cols == Y::num_cols);
+  CRNLIB_ASSUME(X::num_rows == Y::num_rows);
+  for (int r = 0; r < X::num_cols; r++)
+    for (int c = 0; c < Y::num_cols; c++) {
+      typename Z::scalar_type s = lhs(0, r) * rhs(0, c);
+      for (uint i = 1; i < X::num_rows; i++)
+        s += lhs(i, r) * rhs(i, c);
+      result(r, c) = s;
+    }
+  return result;
+}
+
+template <class X, class Y, class Z>
+Z& matrix_mul_helper_transpose_rhs(Z& result, const X& lhs, const Y& rhs) {
+  CRNLIB_ASSUME(Z::num_rows == X::num_rows);
+  CRNLIB_ASSUME(Z::num_cols == Y::num_rows);
+  CRNLIB_ASSUME(X::num_cols == Y::num_cols);
+  for (int r = 0; r < X::num_rows; r++)
+    for (int c = 0; c < Y::num_rows; c++) {
+      typename Z::scalar_type s = lhs(r, 0) * rhs(c, 0);
+      for (uint i = 1; i < X::num_cols; i++)
+        s += lhs(r, i) * rhs(c, i);
+      result(r, c) = s;
+    }
+  return result;
+}
+
+template <uint R, uint C, typename T>
+class matrix {
+ public:
+  typedef T scalar_type;
+  enum { num_rows = R,
+         num_cols = C };
+
+  typedef vec<R, T> col_vec;
+  typedef vec<(R > 1) ? (R - 1) : 0, T> subcol_vec;
+
+  typedef vec<C, T> row_vec;
+  typedef vec<(C > 1) ? (C - 1) : 0, T> subrow_vec;
+
+  inline matrix() {}
+
+  inline matrix(eClear) { clear(); }
+
+  inline matrix(const T* p) { set(p); }
+
+  inline matrix(const matrix& other) {
+    for (uint i = 0; i < R; i++)
+      m_rows[i] = other.m_rows[i];
+  }
+
+  inline matrix& operator=(const matrix& rhs) {
+    if (this != &rhs)
+      for (uint i = 0; i < R; i++)
+        m_rows[i] = rhs.m_rows[i];
+    return *this;
+  }
+
+  inline matrix(T val00, T val01,
+                T val10, T val11) {
+    set(val00, val01, val10, val11);
+  }
+
+  inline matrix(T val00, T val01, T val02,
+                T val10, T val11, T val12,
+                T val20, T val21, T val22) {
+    set(val00, val01, val02, val10, val11, val12, val20, val21, val22);
+  }
+
+  inline matrix(T val00, T val01, T val02, T val03,
+                T val10, T val11, T val12, T val13,
+                T val20, T val21, T val22, T val23,
+                T val30, T val31, T val32, T val33) {
+    set(val00, val01, val02, val03, val10, val11, val12, val13, val20, val21, val22, val23, val30, val31, val32, val33);
+  }
+
+  inline void set(const float* p) {
+    for (uint i = 0; i < R; i++) {
+      m_rows[i].set(p);
+      p += C;
+    }
+  }
+
+  inline void set(T val00, T val01,
+                  T val10, T val11) {
+    m_rows[0].set(val00, val01);
+    if (R >= 2) {
+      m_rows[1].set(val10, val11);
+
+      for (uint i = 2; i < R; i++)
+        m_rows[i].clear();
+    }
+  }
+
+  inline void set(T val00, T val01, T val02,
+                  T val10, T val11, T val12,
+                  T val20, T val21, T val22) {
+    m_rows[0].set(val00, val01, val02);
+    if (R >= 2) {
+      m_rows[1].set(val10, val11, val12);
+      if (R >= 3) {
+        m_rows[2].set(val20, val21, val22);
+
+        for (uint i = 3; i < R; i++)
+          m_rows[i].clear();
+      }
+    }
+  }
+
+  inline void set(T val00, T val01, T val02, T val03,
+                  T val10, T val11, T val12, T val13,
+                  T val20, T val21, T val22, T val23,
+                  T val30, T val31, T val32, T val33) {
+    m_rows[0].set(val00, val01, val02, val03);
+    if (R >= 2) {
+      m_rows[1].set(val10, val11, val12, val13);
+      if (R >= 3) {
+        m_rows[2].set(val20, val21, val22, val23);
+
+        if (R >= 4) {
+          m_rows[3].set(val30, val31, val32, val33);
+
+          for (uint i = 4; i < R; i++)
+            m_rows[i].clear();
+        }
+      }
+    }
+  }
+
+  inline T operator()(uint r, uint c) const {
+    CRNLIB_ASSERT((r < R) && (c < C));
+    return m_rows[r][c];
+  }
+
+  inline T& operator()(uint r, uint c) {
+    CRNLIB_ASSERT((r < R) && (c < C));
+    return m_rows[r][c];
+  }
+
+  inline const row_vec& operator[](uint r) const {
+    CRNLIB_ASSERT(r < R);
+    return m_rows[r];
+  }
+
+  inline row_vec& operator[](uint r) {
+    CRNLIB_ASSERT(r < R);
+    return m_rows[r];
+  }
+
+  inline const row_vec& get_row(uint r) const { return (*this)[r]; }
+  inline row_vec& get_row(uint r) { return (*this)[r]; }
+
+  inline col_vec get_col(uint c) const {
+    CRNLIB_ASSERT(c < C);
+    col_vec result;
+    for (uint i = 0; i < R; i++)
+      result[i] = m_rows[i][c];
+    return result;
+  }
+
+  inline void set_col(uint c, const col_vec& col) {
+    CRNLIB_ASSERT(c < C);
+    for (uint i = 0; i < R; i++)
+      m_rows[i][c] = col[i];
+  }
+
+  inline void set_col(uint c, const subcol_vec& col) {
+    CRNLIB_ASSERT(c < C);
+    for (uint i = 0; i < (R - 1); i++)
+      m_rows[i][c] = col[i];
+
+    m_rows[R - 1][c] = 0.0f;
+  }
+
+  inline const row_vec& get_translate() const {
+    return m_rows[R - 1];
+  }
+
+  inline matrix& set_translate(const row_vec& r) {
+    m_rows[R - 1] = r;
+    return *this;
+  }
+
+  inline matrix& set_translate(const subrow_vec& r) {
+    m_rows[R - 1] = row_vec(r).as_point();
+    return *this;
+  }
+
+  inline const T* get_ptr() const { return reinterpret_cast<const T*>(&m_rows[0]); }
+  inline T* get_ptr() { return reinterpret_cast<T*>(&m_rows[0]); }
+
+  inline matrix& operator+=(const matrix& other) {
+    for (uint i = 0; i < R; i++)
+      m_rows[i] += other.m_rows[i];
+    return *this;
+  }
+
+  inline matrix& operator-=(const matrix& other) {
+    for (uint i = 0; i < R; i++)
+      m_rows[i] -= other.m_rows[i];
+    return *this;
+  }
+
+  inline matrix& operator*=(T val) {
+    for (uint i = 0; i < R; i++)
+      m_rows[i] *= val;
+    return *this;
+  }
+
+  inline matrix& operator/=(T val) {
+    for (uint i = 0; i < R; i++)
+      m_rows[i] /= val;
+    return *this;
+  }
+
+  inline matrix& operator*=(const matrix& other) {
+    matrix result;
+    matrix_mul_helper(result, *this, other);
+    *this = result;
+    return *this;
+  }
+
+  friend inline matrix operator+(const matrix& lhs, const matrix& rhs) {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      result[i] = lhs.m_rows[i] + rhs.m_rows[i];
+    return result;
+  }
+
+  friend inline matrix operator-(const matrix& lhs, const matrix& rhs) {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      result[i] = lhs.m_rows[i] - rhs.m_rows[i];
+    return result;
+  }
+
+  friend inline matrix operator*(const matrix& lhs, T val) {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      result[i] = lhs.m_rows[i] * val;
+    return result;
+  }
+
+  friend inline matrix operator/(const matrix& lhs, T val) {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      result[i] = lhs.m_rows[i] / val;
+    return result;
+  }
+
+  friend inline matrix operator*(T val, const matrix& rhs) {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      result[i] = val * rhs.m_rows[i];
+    return result;
+  }
+
+  friend inline matrix operator*(const matrix& lhs, const matrix& rhs) {
+    matrix result;
+    return matrix_mul_helper(result, lhs, rhs);
+  }
+
+  friend inline row_vec operator*(const col_vec& a, const matrix& b) {
+    return transform(a, b);
+  }
+
+  inline matrix operator+() const {
+    return *this;
+  }
+
+  inline matrix operator-() const {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      result[i] = -m_rows[i];
+    return result;
+  }
+
+  inline void clear(void) {
+    for (uint i = 0; i < R; i++)
+      m_rows[i].clear();
+  }
+
+  inline void set_zero_matrix() {
+    clear();
+  }
+
+  inline void set_identity_matrix() {
+    for (uint i = 0; i < R; i++) {
+      m_rows[i].clear();
+      m_rows[i][i] = 1.0f;
+    }
+  }
+
+  inline matrix& set_scale_matrix(float s) {
+    clear();
+    for (int i = 0; i < (R - 1); i++)
+      m_rows[i][i] = s;
+    m_rows[R - 1][C - 1] = 1.0f;
+    return *this;
+  }
+
+  inline matrix& set_scale_matrix(const row_vec& s) {
+    clear();
+    for (uint i = 0; i < R; i++)
+      m_rows[i][i] = s[i];
+    return *this;
+  }
+
+  inline matrix& set_translate_matrix(const row_vec& s) {
+    set_identity_matrix();
+    set_translate(s);
+    return *this;
+  }
+
+  inline matrix& set_translate_matrix(float x, float y) {
+    set_identity_matrix();
+    set_translate(row_vec(x, y).as_point());
+    return *this;
+  }
+
+  inline matrix& set_translate_matrix(float x, float y, float z) {
+    set_identity_matrix();
+    set_translate(row_vec(x, y, z).as_point());
+    return *this;
+  }
+
+  inline matrix get_transposed(void) const {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      for (uint j = 0; j < C; j++)
+        result.m_rows[i][j] = m_rows[j][i];
+    return result;
+  }
+
+  inline matrix& transpose_in_place(void) {
+    matrix result;
+    for (uint i = 0; i < R; i++)
+      for (uint j = 0; j < C; j++)
+        result.m_rows[i][j] = m_rows[j][i];
+    *this = result;
+    return *this;
+  }
+
+  // This method transforms a column vec by a matrix (D3D-style).
+  static inline row_vec transform(const col_vec& a, const matrix& b) {
+    row_vec result(b[0] * a[0]);
+    for (uint r = 1; r < R; r++)
+      result += b[r] * a[r];
+    return result;
+  }
+
+  // This method transforms a column vec by a matrix. Last component of vec is assumed to be 1.
+  static inline row_vec transform_point(const col_vec& a, const matrix& b) {
+    row_vec result(0);
+    for (int r = 0; r < (R - 1); r++)
+      result += b[r] * a[r];
+    result += b[R - 1];
+    return result;
+  }
+
+  // This method transforms a column vec by a matrix. Last component of vec is assumed to be 0.
+  static inline row_vec transform_vector(const col_vec& a, const matrix& b) {
+    row_vec result(0);
+    for (int r = 0; r < (R - 1); r++)
+      result += b[r] * a[r];
+    return result;
+  }
+
+  static inline subcol_vec transform_point(const subcol_vec& a, const matrix& b) {
+    subcol_vec result(0);
+    for (int r = 0; r < R; r++) {
+      const T s = (r < subcol_vec::num_elements) ? a[r] : 1.0f;
+      for (int c = 0; c < (C - 1); c++)
+        result[c] += b[r][c] * s;
+    }
+    return result;
+  }
+
+  static inline subcol_vec transform_vector(const subcol_vec& a, const matrix& b) {
+    subcol_vec result(0);
+    for (int r = 0; r < (R - 1); r++) {
+      const T s = a[r];
+      for (int c = 0; c < (C - 1); c++)
+        result[c] += b[r][c] * s;
+    }
+    return result;
+  }
+
+  // This method transforms a column vec by the transpose of a matrix.
+  static inline col_vec transform_transposed(const matrix& b, const col_vec& a) {
+    CRNLIB_ASSUME(R == C);
+    col_vec result;
+    for (uint r = 0; r < R; r++)
+      result[r] = b[r] * a;
+    return result;
+  }
+
+  // This method transforms a column vec by the transpose of a matrix. Last component of vec is assumed to be 0.
+  static inline col_vec transform_vector_transposed(const matrix& b, const col_vec& a) {
+    CRNLIB_ASSUME(R == C);
+    col_vec result;
+    for (uint r = 0; r < R; r++) {
+      T s = 0;
+      for (uint c = 0; c < (C - 1); c++)
+        s += b[r][c] * a[c];
+
+      result[r] = s;
+    }
+    return result;
+  }
+
+  // This method transforms a matrix by a row vector (OGL style).
+  static inline col_vec transform(const matrix& b, const row_vec& a) {
+    col_vec result;
+    for (int r = 0; r < R; r++)
+      result[r] = b[r] * a;
+    return result;
+  }
+
+  static inline matrix& multiply(matrix& result, const matrix& lhs, const matrix& rhs) {
+    return matrix_mul_helper(result, lhs, rhs);
+  }
+
+  static inline matrix make_scale_matrix(float s) {
+    return matrix().set_scale_matrix(s);
+  }
+
+  static inline matrix make_scale_matrix(const row_vec& s) {
+    return matrix().set_scale_matrix(s);
+  }
+
+  static inline matrix make_scale_matrix(float x, float y) {
+    CRNLIB_ASSUME(R >= 3 && C >= 3);
+    matrix result;
+    result.clear();
+    result.m_rows[0][0] = x;
+    result.m_rows[1][1] = y;
+    result.m_rows[2][2] = 1.0f;
+    return result;
+  }
+
+  static inline matrix make_scale_matrix(float x, float y, float z) {
+    CRNLIB_ASSUME(R >= 4 && C >= 4);
+    matrix result;
+    result.clear();
+    result.m_rows[0][0] = x;
+    result.m_rows[1][1] = y;
+    result.m_rows[2][2] = z;
+    result.m_rows[3][3] = 1.0f;
+    return result;
+  }
+
+ private:
+  row_vec m_rows[R];
+};
+
+typedef matrix<2, 2, float> matrix22F;
+typedef matrix<2, 2, double> matrix22D;
+
+typedef matrix<3, 3, float> matrix33F;
+typedef matrix<3, 3, double> matrix33D;
+
+typedef matrix<4, 4, float> matrix44F;
+typedef matrix<4, 4, double> matrix44D;
+
+typedef matrix<8, 8, float> matrix88F;
+
+}  // namespace crnlib
@@ -0,0 +1,257 @@
+// File: crn_mem.cpp
+// See Copyright Notice and license at the end of inc/crnlib.h
+#include "crn_core.h"
+#include "crn_console.h"
+#include "../inc/crnlib.h"
+#include <malloc.h>
+#if CRNLIB_USE_WIN32_API
+#include "crn_winhdr.h"
+#endif
+
+#define CRNLIB_MEM_STATS 0
+
+#if !CRNLIB_USE_WIN32_API
+#define _msize malloc_usable_size
+#endif
+
+namespace crnlib {
+#if CRNLIB_MEM_STATS
+#if CRNLIB_64BIT_POINTERS
+typedef LONGLONG mem_stat_t;
+#define CRNLIB_MEM_COMPARE_EXCHANGE InterlockedCompareExchange64
+#else
+typedef LONG mem_stat_t;
+#define CRNLIB_MEM_COMPARE_EXCHANGE InterlockedCompareExchange
+#endif
+
+static volatile mem_stat_t g_total_blocks;
+static volatile mem_stat_t g_total_allocated;
+static volatile mem_stat_t g_max_allocated;
+
+static mem_stat_t update_total_allocated(int block_delta, mem_stat_t byte_delta) {
+  mem_stat_t cur_total_blocks;
+  for (;;) {
+    cur_total_blocks = (mem_stat_t)g_total_blocks;
+    mem_stat_t new_total_blocks = static_cast<mem_stat_t>(cur_total_blocks + block_delta);
+    CRNLIB_ASSERT(new_total_blocks >= 0);
+    if (CRNLIB_MEM_COMPARE_EXCHANGE(&g_total_blocks, new_total_blocks, cur_total_blocks) == cur_total_blocks)
+      break;
+  }
+
+  mem_stat_t cur_total_allocated, new_total_allocated;
+  for (;;) {
+    cur_total_allocated = g_total_allocated;
+    new_total_allocated = static_cast<mem_stat_t>(cur_total_allocated + byte_delta);
+    CRNLIB_ASSERT(new_total_allocated >= 0);
+    if (CRNLIB_MEM_COMPARE_EXCHANGE(&g_total_allocated, new_total_allocated, cur_total_allocated) == cur_total_allocated)
+      break;
+  }
+  for (;;) {
+    mem_stat_t cur_max_allocated = g_max_allocated;
+    mem_stat_t new_max_allocated = CRNLIB_MAX(new_total_allocated, cur_max_allocated);
+    if (CRNLIB_MEM_COMPARE_EXCHANGE(&g_max_allocated, new_max_allocated, cur_max_allocated) == cur_max_allocated)
+      break;
+  }
+  return new_total_allocated;
+}
+#endif  // CRNLIB_MEM_STATS
+
+static void* crnlib_default_realloc(void* p, size_t size, size_t* pActual_size, bool movable, void*) {
+  void* p_new;
+
+  if (!p) {
+    p_new = ::malloc(size);
+    CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(p_new) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) == 0);
+
+    if (!p_new) {
+      printf("WARNING: ::malloc() of size %u failed!\n", (uint)size);
+    }
+
+    if (pActual_size)
+      *pActual_size = p_new ? ::_msize(p_new) : 0;
+  } else if (!size) {
+    ::free(p);
+    p_new = NULL;
+
+    if (pActual_size)
+      *pActual_size = 0;
+  } else {
+    void* p_final_block = p;
+#ifdef WIN32
+    p_new = ::_expand(p, size);
+#else
+    p_new = NULL;
+#endif
+
+    if (p_new) {
+      CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(p_new) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) == 0);
+      p_final_block = p_new;
+    } else if (movable) {
+      p_new = ::realloc(p, size);
+
+      if (p_new) {
+        CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(p_new) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) == 0);
+        p_final_block = p_new;
+      } else {
+        printf("WARNING: ::realloc() of size %u failed!\n", (uint)size);
+      }
+    }
+
+    if (pActual_size)
+      *pActual_size = ::_msize(p_final_block);
+  }
+
+  return p_new;
+}
+
+static size_t crnlib_default_msize(void* p, void*) {
+  return p ? _msize(p) : 0;
+}
+
+static crn_realloc_func g_pRealloc = crnlib_default_realloc;
+static crn_msize_func g_pMSize = crnlib_default_msize;
+static void* g_pUser_data;
+
+void crnlib_mem_error(const char* p_msg) {
+  crnlib_assert(p_msg, __FILE__, __LINE__);
+}
+void* crnlib_malloc(size_t size) {
+  return crnlib_malloc(size, NULL);
+}
+
+void* crnlib_malloc(size_t size, size_t* pActual_size) {
+  size = (size + sizeof(uint32) - 1U) & ~(sizeof(uint32) - 1U);
+  if (!size)
+    size = sizeof(uint32);
+
+  if (size > CRNLIB_MAX_POSSIBLE_BLOCK_SIZE) {
+    crnlib_mem_error("crnlib_malloc: size too big");
+    return NULL;
+  }
+
+  size_t actual_size = size;
+  uint8* p_new = static_cast<uint8*>((*g_pRealloc)(NULL, size, &actual_size, true, g_pUser_data));
+
+  if (pActual_size)
+    *pActual_size = actual_size;
+
+  if ((!p_new) || (actual_size < size)) {
+    crnlib_mem_error("crnlib_malloc: out of memory");
+    return NULL;
+  }
+
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(p_new) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) == 0);
+
+#if CRNLIB_MEM_STATS
+  CRNLIB_ASSERT((*g_pMSize)(p_new, g_pUser_data) == actual_size);
+  update_total_allocated(1, static_cast<mem_stat_t>(actual_size));
+#endif
+
+  return p_new;
+}
+
+void* crnlib_realloc(void* p, size_t size, size_t* pActual_size, bool movable) {
+  if ((ptr_bits_t)p & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) {
+    crnlib_mem_error("crnlib_realloc: bad ptr");
+    return NULL;
+  }
+
+  if (size > CRNLIB_MAX_POSSIBLE_BLOCK_SIZE) {
+    crnlib_mem_error("crnlib_malloc: size too big");
+    return NULL;
+  }
+
+#if CRNLIB_MEM_STATS
+  size_t cur_size = p ? (*g_pMSize)(p, g_pUser_data) : 0;
+  CRNLIB_ASSERT(!p || (cur_size >= sizeof(uint32)));
+#endif
+  if ((size) && (size < sizeof(uint32)))
+    size = sizeof(uint32);
+
+  size_t actual_size = size;
+  void* p_new = (*g_pRealloc)(p, size, &actual_size, movable, g_pUser_data);
+
+  if (pActual_size)
+    *pActual_size = actual_size;
+
+  CRNLIB_ASSERT((reinterpret_cast<ptr_bits_t>(p_new) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) == 0);
+
+#if CRNLIB_MEM_STATS
+  CRNLIB_ASSERT(!p_new || ((*g_pMSize)(p_new, g_pUser_data) == actual_size));
+
+  int num_new_blocks = 0;
+  if (p) {
+    if (!p_new)
+      num_new_blocks = -1;
+  } else if (p_new) {
+    num_new_blocks = 1;
+  }
+  update_total_allocated(num_new_blocks, static_cast<mem_stat_t>(actual_size) - static_cast<mem_stat_t>(cur_size));
+#endif
+
+  return p_new;
+}
+
+void* crnlib_calloc(size_t count, size_t size, size_t* pActual_size) {
+  size_t total = count * size;
+  void* p = crnlib_malloc(total, pActual_size);
+  if (p)
+    memset(p, 0, total);
+  return p;
+}
+
+void crnlib_free(void* p) {
+  if (!p)
+    return;
+
+  if (reinterpret_cast<ptr_bits_t>(p) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) {
+    crnlib_mem_error("crnlib_free: bad ptr");
+    return;
+  }
+
+#if CRNLIB_MEM_STATS
+  size_t cur_size = (*g_pMSize)(p, g_pUser_data);
+  CRNLIB_ASSERT(cur_size >= sizeof(uint32));
+  update_total_allocated(-1, -static_cast<mem_stat_t>(cur_size));
+#endif
+
+  (*g_pRealloc)(p, 0, NULL, true, g_pUser_data);
+}
+
+size_t crnlib_msize(void* p) {
+  if (!p)
+    return 0;
+
+  if (reinterpret_cast<ptr_bits_t>(p) & (CRNLIB_MIN_ALLOC_ALIGNMENT - 1)) {
+    crnlib_mem_error("crnlib_msize: bad ptr");
+    return 0;
+  }
+
+  return (*g_pMSize)(p, g_pUser_data);
+}
+
+void crnlib_print_mem_stats() {
+#if CRNLIB_MEM_STATS
+  if (console::is_initialized()) {
+    console::debug("crnlib_print_mem_stats:");
+    console::debug("Current blocks: %u, allocated: " CRNLIB_INT64_FORMAT_SPECIFIER ", max ever allocated: " CRNLIB_INT64_FORMAT_SPECIFIER, g_total_blocks, (int64)g_total_allocated, (int64)g_max_allocated);
+  } else {
+    printf("crnlib_print_mem_stats:\n");
+    printf("Current blocks: %u, allocated: " CRNLIB_INT64_FORMAT_SPECIFIER ", max ever allocated: " CRNLIB_INT64_FORMAT_SPECIFIER "\n", g_total_blocks, (int64)g_total_allocated, (int64)g_max_allocated);
+  }
+#endif
+}
+
+}  // namespace crnlib
+
+void crn_set_memory_callbacks(crn_realloc_func pRealloc, crn_msize_func pMSize, void* pUser_data) {
+  if ((!pRealloc) || (!pMSize)) {
+    crnlib::g_pRealloc = crnlib::crnlib_default_realloc;
+    crnlib::g_pMSize = crnlib::crnlib_default_msize;
+    crnlib::g_pUser_data = NULL;
+  } else {
+    crnlib::g_pRealloc = pRealloc;
+    crnlib::g_pMSize = pMSize;
+    crnlib::g_pUser_data = pUser_data;
+  }
+}
@@ -0,0 +1,181 @@
+// File: crn_mem.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+#ifndef CRNLIB_MIN_ALLOC_ALIGNMENT
+#define CRNLIB_MIN_ALLOC_ALIGNMENT sizeof(size_t) * 2
+#endif
+
+namespace crnlib {
+#if CRNLIB_64BIT_POINTERS
+const uint64 CRNLIB_MAX_POSSIBLE_BLOCK_SIZE = 0x400000000ULL;
+#else
+const uint32 CRNLIB_MAX_POSSIBLE_BLOCK_SIZE = 0x7FFF0000U;
+#endif
+
+void* crnlib_malloc(size_t size);
+void* crnlib_malloc(size_t size, size_t* pActual_size);
+void* crnlib_realloc(void* p, size_t size, size_t* pActual_size = NULL, bool movable = true);
+void* crnlib_calloc(size_t count, size_t size, size_t* pActual_size = NULL);
+void crnlib_free(void* p);
+size_t crnlib_msize(void* p);
+void crnlib_print_mem_stats();
+void crnlib_mem_error(const char* p_msg);
+
+// omfg - there must be a better way
+
+template <typename T>
+inline T* crnlib_new() {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  if (CRNLIB_IS_SCALAR_TYPE(T))
+    return p;
+  return helpers::construct(p);
+}
+
+template <typename T, typename A>
+inline T* crnlib_new(const A& init0) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0);
+}
+
+template <typename T, typename A>
+inline T* crnlib_new(A& init0) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0);
+}
+
+template <typename T, typename A, typename B>
+inline T* crnlib_new(const A& init0, const B& init1) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1);
+}
+
+template <typename T, typename A, typename B, typename C>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2);
+}
+
+template <typename T, typename A, typename B, typename C, typename D>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F, typename G>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5, const G& init6) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5, init6);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F, typename G, typename H>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5, const G& init6, const H& init7) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5, init6, init7);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F, typename G, typename H, typename I>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5, const G& init6, const H& init7, const I& init8) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5, init6, init7, init8);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F, typename G, typename H, typename I, typename J>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5, const G& init6, const H& init7, const I& init8, const J& init9) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5, init6, init7, init8, init9);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F, typename G, typename H, typename I, typename J, typename K>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5, const G& init6, const H& init7, const I& init8, const J& init9, const K& init10) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5, init6, init7, init8, init9, init10);
+}
+
+template <typename T, typename A, typename B, typename C, typename D, typename E, typename F, typename G, typename H, typename I, typename J, typename K, typename L>
+inline T* crnlib_new(const A& init0, const B& init1, const C& init2, const D& init3, const E& init4, const F& init5, const G& init6, const H& init7, const I& init8, const J& init9, const K& init10, const L& init11) {
+  T* p = static_cast<T*>(crnlib_malloc(sizeof(T)));
+  return new (static_cast<void*>(p)) T(init0, init1, init2, init3, init4, init5, init6, init7, init8, init9, init10, init11);
+}
+
+template <typename T>
+inline T* crnlib_new_array(uint32 num) {
+  if (!num)
+    num = 1;
+
+  uint64 total = CRNLIB_MIN_ALLOC_ALIGNMENT + sizeof(T) * num;
+  if (total > CRNLIB_MAX_POSSIBLE_BLOCK_SIZE) {
+    crnlib_mem_error("crnlib_new_array: Array too large!");
+    return NULL;
+  }
+  uint8* q = static_cast<uint8*>(crnlib_malloc(static_cast<size_t>(total)));
+
+  T* p = reinterpret_cast<T*>(q + CRNLIB_MIN_ALLOC_ALIGNMENT);
+
+  reinterpret_cast<uint32*>(p)[-1] = num;
+  reinterpret_cast<uint32*>(p)[-2] = ~num;
+
+  if (!CRNLIB_IS_SCALAR_TYPE(T)) {
+    helpers::construct_array(p, num);
+  }
+  return p;
+}
+
+template <typename T>
+inline void crnlib_delete(T* p) {
+  if (p) {
+    if (!CRNLIB_IS_SCALAR_TYPE(T)) {
+      helpers::destruct(p);
+    }
+    crnlib_free(p);
+  }
+}
+
+template <typename T>
+inline void crnlib_delete_array(T* p) {
+  if (p) {
+    const uint32 num = reinterpret_cast<uint32*>(p)[-1];
+    const uint32 num_check = reinterpret_cast<uint32*>(p)[-2];
+    CRNLIB_ASSERT(num && (num == ~num_check));
+    if (num == ~num_check) {
+      if (!CRNLIB_IS_SCALAR_TYPE(T)) {
+        helpers::destruct_array(p, num);
+      }
+
+      crnlib_free(reinterpret_cast<uint8*>(p) - CRNLIB_MIN_ALLOC_ALIGNMENT);
+    }
+  }
+}
+
+}  // namespace crnlib
+#define CRNLIB_DEFINE_NEW_DELETE                                \
+  void* operator new(size_t size) {                             \
+    void* p = crnlib::crnlib_malloc(size);                      \
+    if (!p)                                                     \
+      crnlib_fail("new: Out of memory!", __FILE__, __LINE__);   \
+    return p;                                                   \
+  }                                                             \
+  void* operator new[](size_t size) {                           \
+    void* p = crnlib::crnlib_malloc(size);                      \
+    if (!p)                                                     \
+      crnlib_fail("new[]: Out of memory!", __FILE__, __LINE__); \
+    return p;                                                   \
+  }                                                             \
+  void operator delete(void* p_block) {                         \
+    crnlib::crnlib_free(p_block);                               \
+  }                                                             \
+  void operator delete[](void* p_block) {                       \
+    crnlib::crnlib_free(p_block);                               \
+  }
@@ -0,0 +1,934 @@
+/* miniz.c v1.14 - public domain deflate/inflate, zlib-subset, ZIP reading/writing/appending, PNG writing
+   See "unlicense" statement at the end of this file.
+   Rich Geldreich <richgel99@gmail.com>, last updated May 20, 2012
+   Implements RFC 1950: http://www.ietf.org/rfc/rfc1950.txt and RFC 1951: http://www.ietf.org/rfc/rfc1951.txt
+
+   Most API's defined in miniz.c are optional. For example, to disable the archive related functions just define
+   MINIZ_NO_ARCHIVE_APIS, or to get rid of all stdio usage define MINIZ_NO_STDIO (see the list below for more macros).
+
+   * Change History
+     5/20/12 v1.14 - MinGW32/64 GCC 4.6.1 compiler fixes: added MZ_FORCEINLINE, #include <time.h> (thanks fermtect).
+     5/19/12 v1.13 - From jason@cornsyrup.org and kelwert@mtu.edu - Fix mz_crc32() so it doesn't compute the wrong CRC-32's when mz_ulong is 64-bit.
+       Temporarily/locally slammed in "typedef unsigned long mz_ulong" and re-ran a randomized regression test on ~500k files.
+       Eliminated a bunch of warnings when compiling with GCC 32-bit/64.
+       Ran all examples, miniz.c, and tinfl.c through MSVC 2008's /analyze (static analysis) option and fixed all warnings (except for the silly
+       "Use of the comma-operator in a tested expression.." analysis warning, which I purposely use to work around a MSVC compiler warning).
+       Created 32-bit and 64-bit Codeblocks projects/workspace. Built and tested Linux executables. The codeblocks workspace is compatible with Linux+Win32/x64.
+       Added miniz_tester solution/project, which is a useful little app derived from LZHAM's tester app that I use as part of the regression test.
+       Ran miniz.c and tinfl.c through another series of regression testing on ~500,000 files and archives.
+       Modified example5.c so it purposely disables a bunch of high-level functionality (MINIZ_NO_STDIO, etc.). (Thanks to corysama for the MINIZ_NO_STDIO bug report.)
+       Fix ftell() usage in examples so they exit with an error on files which are too large (a limitation of the examples, not miniz itself).
+     4/12/12 v1.12 - More comments, added low-level example5.c, fixed a couple minor level_and_flags issues in the archive API's.
+      level_and_flags can now be set to MZ_DEFAULT_COMPRESSION. Thanks to Bruce Dawson <bruced@valvesoftware.com> for the feedback/bug report.
+     5/28/11 v1.11 - Added statement from unlicense.org
+     5/27/11 v1.10 - Substantial compressor optimizations:
+      Level 1 is now ~4x faster than before. The L1 compressor's throughput now varies between 70-110MB/sec. on a
+      Core i7 (actual throughput varies depending on the type of data, and x64 vs. x86).
+      Improved baseline L2-L9 compression perf. Also, greatly improved compression perf. issues on some file types.
+      Refactored the compression code for better readability and maintainability.
+      Added level 10 compression level (L10 has slightly better ratio than level 9, but could have a potentially large
+      drop in throughput on some files).
+     5/15/11 v1.09 - Initial stable release.
+
+   * Low-level Deflate/Inflate implementation notes:
+
+     Compression: Use the "tdefl" API's. The compressor supports raw, static, and dynamic blocks, lazy or
+     greedy parsing, match length filtering, RLE-only, and Huffman-only streams. It performs and compresses
+     approximately as well as zlib.
+
+     Decompression: Use the "tinfl" API's. The entire decompressor is implemented as a single function
+     coroutine: see tinfl_decompress(). It supports decompression into a 32KB (or larger power of 2) wrapping buffer, or into a memory
+     block large enough to hold the entire file.
+
+     The low-level tdefl/tinfl API's do not make any use of dynamic memory allocation.
+
+   * zlib-style API notes:
+
+     miniz.c implements a fairly large subset of zlib. There's enough functionality present for it to be a drop-in
+     zlib replacement in many apps:
+        The z_stream struct, optional memory allocation callbacks
+        deflateInit/deflateInit2/deflate/deflateReset/deflateEnd/deflateBound
+        inflateInit/inflateInit2/inflate/inflateEnd
+        compress, compress2, compressBound, uncompress
+        CRC-32, Adler-32 - Using modern, minimal code size, CPU cache friendly routines.
+        Supports raw deflate streams or standard zlib streams with adler-32 checking.
+
+     Limitations:
+      The callback API's are not implemented yet. No support for gzip headers or zlib static dictionaries.
+      I've tried to closely emulate zlib's various flavors of stream flushing and return status codes, but
+      there are no guarantees that miniz.c pulls this off perfectly.
+
+   * PNG writing: See the tdefl_write_image_to_png_file_in_memory() function, originally written by
+     Alex Evans. Supports 1-4 bytes/pixel images.
+
+   * ZIP archive API notes:
+
+     The ZIP archive API's where designed with simplicity and efficiency in mind, with just enough abstraction to
+     get the job done with minimal fuss. There are simple API's to retrieve file information, read files from
+     existing archives, create new archives, append new files to existing archives, or clone archive data from
+     one archive to another. It supports archives located in memory or the heap, on disk (using stdio.h),
+     or you can specify custom file read/write callbacks.
+
+     - Archive reading: Just call this function to read a single file from a disk archive:
+
+      void *mz_zip_extract_archive_file_to_heap(const char *pZip_filename, const char *pArchive_name,
+        size_t *pSize, mz_uint zip_flags);
+
+     For more complex cases, use the "mz_zip_reader" functions. Upon opening an archive, the entire central
+     directory is located and read as-is into memory, and subsequent file access only occurs when reading individual files.
+
+     - Archives file scanning: The simple way is to use this function to scan a loaded archive for a specific file:
+
+     int mz_zip_reader_locate_file(mz_zip_archive *pZip, const char *pName, const char *pComment, mz_uint flags);
+
+     The locate operation can optionally check file comments too, which (as one example) can be used to identify
+     multiple versions of the same file in an archive. This function uses a simple linear search through the central
+     directory, so it's not very fast.
+
+     Alternately, you can iterate through all the files in an archive (using mz_zip_reader_get_num_files()) and
+     retrieve detailed info on each file by calling mz_zip_reader_file_stat().
+
+     - Archive creation: Use the "mz_zip_writer" functions. The ZIP writer immediately writes compressed file data
+     to disk and builds an exact image of the central directory in memory. The central directory image is written
+     all at once at the end of the archive file when the archive is finalized.
+
+     The archive writer can optionally align each file's local header and file data to any power of 2 alignment,
+     which can be useful when the archive will be read from optical media. Also, the writer supports placing
+     arbitrary data blobs at the very beginning of ZIP archives. Archives written using either feature are still
+     readable by any ZIP tool.
+
+     - Archive appending: The simple way to add a single file to an archive is to call this function:
+
+      mz_bool mz_zip_add_mem_to_archive_file_in_place(const char *pZip_filename, const char *pArchive_name,
+        const void *pBuf, size_t buf_size, const void *pComment, mz_uint16 comment_size, mz_uint level_and_flags);
+
+     The archive will be created if it doesn't already exist, otherwise it'll be appended to.
+     Note the appending is done in-place and is not an atomic operation, so if something goes wrong
+     during the operation it's possible the archive could be left without a central directory (although the local
+     file headers and file data will be fine, so the archive will be recoverable).
+
+     For more complex archive modification scenarios:
+     1. The safest way is to use a mz_zip_reader to read the existing archive, cloning only those bits you want to
+     preserve into a new archive using using the mz_zip_writer_add_from_zip_reader() function (which compiles the
+     compressed file data as-is). When you're done, delete the old archive and rename the newly written archive, and
+     you're done. This is safe but requires a bunch of temporary disk space or heap memory.
+
+     2. Or, you can convert an mz_zip_reader in-place to an mz_zip_writer using mz_zip_writer_init_from_reader(),
+     append new files as needed, then finalize the archive which will write an updated central directory to the
+     original archive. (This is basically what mz_zip_add_mem_to_archive_file_in_place() does.) There's a
+     possibility that the archive's central directory could be lost with this method if anything goes wrong, though.
+
+     - ZIP archive support limitations:
+     No zip64 or spanning support. Extraction functions can only handle unencrypted, stored or deflated files.
+     Requires streams capable of seeking.
+
+   * This is a header file library, like stb_image.c. To get only a header file, either cut and paste the
+     below header, or create miniz.h, #define MINIZ_HEADER_FILE_ONLY, and then include miniz.c from it.
+
+   * Important: For best perf. be sure to customize the below macros for your target platform:
+     #define MINIZ_USE_UNALIGNED_LOADS_AND_STORES 1
+     #define MINIZ_LITTLE_ENDIAN 1
+     #define MINIZ_HAS_64BIT_REGISTERS 1
+*/
+#pragma once
+
+#ifndef MINIZ_HEADER_INCLUDED
+#define MINIZ_HEADER_INCLUDED
+
+#include <stdlib.h>
+
+#if !defined(MINIZ_NO_TIME) && !defined(MINIZ_NO_ARCHIVE_APIS)
+#include <time.h>
+#endif
+
+// Defines to completely disable specific portions of miniz.c:
+// If all macros here are defined the only functionality remaining will be CRC-32, adler-32, tinfl, and tdefl.
+
+// Define MINIZ_NO_STDIO to disable all usage and any functions which rely on stdio for file I/O.
+//#define MINIZ_NO_STDIO
+
+// If MINIZ_NO_TIME is specified then the ZIP archive functions will not be able to get the current time, or
+// get/set file times.
+//#define MINIZ_NO_TIME
+
+// Define MINIZ_NO_ARCHIVE_APIS to disable all ZIP archive API's.
+//#define MINIZ_NO_ARCHIVE_APIS
+
+// Define MINIZ_NO_ARCHIVE_APIS to disable all writing related ZIP archive API's.
+//#define MINIZ_NO_ARCHIVE_WRITING_APIS
+
+// Define MINIZ_NO_ZLIB_APIS to remove all ZLIB-style compression/decompression API's.
+//#define MINIZ_NO_ZLIB_APIS
+
+// Define MINIZ_NO_ZLIB_COMPATIBLE_NAME to disable zlib names, to prevent conflicts against stock zlib.
+//#define MINIZ_NO_ZLIB_COMPATIBLE_NAMES
+
+// Define MINIZ_NO_MALLOC to disable all calls to malloc, free, and realloc.
+// Note if MINIZ_NO_MALLOC is defined then the user must always provide custom user alloc/free/realloc
+// callbacks to the zlib and archive API's, and a few stand-alone helper API's which don't provide custom user
+// functions (such as tdefl_compress_mem_to_heap() and tinfl_decompress_mem_to_heap()) won't work.
+//#define MINIZ_NO_MALLOC
+
+#if defined(_M_IX86) || defined(_M_X64) || defined(__i386__) || defined(__i386) || defined(__i486__) || defined(__i486) || defined(i386) || defined(__ia64__) || defined(__x86_64__)
+// MINIZ_X86_OR_X64_CPU is only used to help set the below macros.
+#define MINIZ_X86_OR_X64_CPU 1
+#endif
+
+#if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__) || MINIZ_X86_OR_X64_CPU
+// Set MINIZ_LITTLE_ENDIAN to 1 if the processor is little endian.
+#define MINIZ_LITTLE_ENDIAN 1
+#endif
+
+#if MINIZ_X86_OR_X64_CPU
+// Set MINIZ_USE_UNALIGNED_LOADS_AND_STORES to 1 on CPU's that permit efficient integer loads and stores from unaligned addresses.
+#define MINIZ_USE_UNALIGNED_LOADS_AND_STORES 1
+#endif
+
+#if defined(_M_X64) || defined(_WIN64) || defined(__MINGW64__) || defined(_LP64) || defined(__LP64__) || defined(__ia64__) || defined(__x86_64__)
+// Set MINIZ_HAS_64BIT_REGISTERS to 1 if operations on 64-bit integers are reasonably fast (and don't involve compiler generated calls to helper functions).
+#define MINIZ_HAS_64BIT_REGISTERS 1
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// ------------------- zlib-style API Definitions.
+
+// For more compatibility with zlib, miniz.c uses unsigned long for some parameters/struct members. Beware: mz_ulong can be either 32 or 64-bits!
+typedef unsigned long mz_ulong;
+
+// mz_free() internally uses the MZ_FREE() macro (which by default calls free() unless you've modified the MZ_MALLOC macro) to release a block allocated from the heap.
+void mz_free(void* p);
+
+#define MZ_ADLER32_INIT (1)
+// mz_adler32() returns the initial adler-32 value to use when called with ptr==NULL.
+mz_ulong mz_adler32(mz_ulong adler, const unsigned char* ptr, size_t buf_len);
+
+#define MZ_CRC32_INIT (0)
+// mz_crc32() returns the initial CRC-32 value to use when called with ptr==NULL.
+mz_ulong mz_crc32(mz_ulong crc, const unsigned char* ptr, size_t buf_len);
+
+// Compression strategies.
+enum { MZ_DEFAULT_STRATEGY = 0,
+       MZ_FILTERED = 1,
+       MZ_HUFFMAN_ONLY = 2,
+       MZ_RLE = 3,
+       MZ_FIXED = 4 };
+
+// Method
+#define MZ_DEFLATED 8
+
+#ifndef MINIZ_NO_ZLIB_APIS
+
+// Heap allocation callbacks.
+// Note that mz_alloc_func parameter types purpsosely differ from zlib's: items/size is size_t, not unsigned long.
+typedef void* (*mz_alloc_func)(void* opaque, size_t items, size_t size);
+typedef void (*mz_free_func)(void* opaque, void* address);
+typedef void* (*mz_realloc_func)(void* opaque, void* address, size_t items, size_t size);
+
+#define MZ_VERSION "9.1.14"
+#define MZ_VERNUM 0x91E0
+#define MZ_VER_MAJOR 9
+#define MZ_VER_MINOR 1
+#define MZ_VER_REVISION 14
+#define MZ_VER_SUBREVISION 0
+
+// Flush values. For typical usage you only need MZ_NO_FLUSH and MZ_FINISH. The other values are for advanced use (refer to the zlib docs).
+enum { MZ_NO_FLUSH = 0,
+       MZ_PARTIAL_FLUSH = 1,
+       MZ_SYNC_FLUSH = 2,
+       MZ_FULL_FLUSH = 3,
+       MZ_FINISH = 4,
+       MZ_BLOCK = 5 };
+
+// Return status codes. MZ_PARAM_ERROR is non-standard.
+enum { MZ_OK = 0,
+       MZ_STREAM_END = 1,
+       MZ_NEED_DICT = 2,
+       MZ_ERRNO = -1,
+       MZ_STREAM_ERROR = -2,
+       MZ_DATA_ERROR = -3,
+       MZ_MEM_ERROR = -4,
+       MZ_BUF_ERROR = -5,
+       MZ_VERSION_ERROR = -6,
+       MZ_PARAM_ERROR = -10000 };
+
+// Compression levels: 0-9 are the standard zlib-style levels, 10 is best possible compression (not zlib compatible, and may be very slow), MZ_DEFAULT_COMPRESSION=MZ_DEFAULT_LEVEL.
+enum { MZ_NO_COMPRESSION = 0,
+       MZ_BEST_SPEED = 1,
+       MZ_BEST_COMPRESSION = 9,
+       MZ_UBER_COMPRESSION = 10,
+       MZ_DEFAULT_LEVEL = 6,
+       MZ_DEFAULT_COMPRESSION = -1 };
+
+// Window bits
+#define MZ_DEFAULT_WINDOW_BITS 15
+
+struct mz_internal_state;
+
+// Compression/decompression stream struct.
+typedef struct mz_stream_s {
+  const unsigned char* next_in;  // pointer to next byte to read
+  unsigned int avail_in;         // number of bytes available at next_in
+  mz_ulong total_in;             // total number of bytes consumed so far
+
+  unsigned char* next_out;  // pointer to next byte to write
+  unsigned int avail_out;   // number of bytes that can be written to next_out
+  mz_ulong total_out;       // total number of bytes produced so far
+
+  char* msg;                        // error msg (unused)
+  struct mz_internal_state* state;  // internal state, allocated by zalloc/zfree
+
+  mz_alloc_func zalloc;  // optional heap allocation function (defaults to malloc)
+  mz_free_func zfree;    // optional heap free function (defaults to free)
+  void* opaque;          // heap alloc function user pointer
+
+  int data_type;      // data_type (unused)
+  mz_ulong adler;     // adler32 of the source or uncompressed data
+  mz_ulong reserved;  // not used
+} mz_stream;
+
+typedef mz_stream* mz_streamp;
+
+// Returns the version string of miniz.c.
+const char* mz_version(void);
+
+// mz_deflateInit() initializes a compressor with default options:
+// Parameters:
+//  pStream must point to an initialized mz_stream struct.
+//  level must be between [MZ_NO_COMPRESSION, MZ_BEST_COMPRESSION].
+//  level 1 enables a specially optimized compression function that's been optimized purely for performance, not ratio.
+//  (This special func. is currently only enabled when MINIZ_USE_UNALIGNED_LOADS_AND_STORES and MINIZ_LITTLE_ENDIAN are defined.)
+// Return values:
+//  MZ_OK on success.
+//  MZ_STREAM_ERROR if the stream is bogus.
+//  MZ_PARAM_ERROR if the input parameters are bogus.
+//  MZ_MEM_ERROR on out of memory.
+int mz_deflateInit(mz_streamp pStream, int level);
+
+// mz_deflateInit2() is like mz_deflate(), except with more control:
+// Additional parameters:
+//   method must be MZ_DEFLATED
+//   window_bits must be MZ_DEFAULT_WINDOW_BITS (to wrap the deflate stream with zlib header/adler-32 footer) or -MZ_DEFAULT_WINDOW_BITS (raw deflate/no header or footer)
+//   mem_level must be between [1, 9] (it's checked but ignored by miniz.c)
+int mz_deflateInit2(mz_streamp pStream, int level, int method, int window_bits, int mem_level, int strategy);
+
+// Quickly resets a compressor without having to reallocate anything. Same as calling mz_deflateEnd() followed by mz_deflateInit()/mz_deflateInit2().
+int mz_deflateReset(mz_streamp pStream);
+
+// mz_deflate() compresses the input to output, consuming as much of the input and producing as much output as possible.
+// Parameters:
+//   pStream is the stream to read from and write to. You must initialize/update the next_in, avail_in, next_out, and avail_out members.
+//   flush may be MZ_NO_FLUSH, MZ_PARTIAL_FLUSH/MZ_SYNC_FLUSH, MZ_FULL_FLUSH, or MZ_FINISH.
+// Return values:
+//   MZ_OK on success (when flushing, or if more input is needed but not available, and/or there's more output to be written but the output buffer is full).
+//   MZ_STREAM_END if all input has been consumed and all output bytes have been written. Don't call mz_deflate() on the stream anymore.
+//   MZ_STREAM_ERROR if the stream is bogus.
+//   MZ_PARAM_ERROR if one of the parameters is invalid.
+//   MZ_BUF_ERROR if no forward progress is possible because the input and/or output buffers are empty. (Fill up the input buffer or free up some output space and try again.)
+int mz_deflate(mz_streamp pStream, int flush);
+
+// mz_deflateEnd() deinitializes a compressor:
+// Return values:
+//  MZ_OK on success.
+//  MZ_STREAM_ERROR if the stream is bogus.
+int mz_deflateEnd(mz_streamp pStream);
+
+// mz_deflateBound() returns a (very) conservative upper bound on the amount of data that could be generated by deflate(), assuming flush is set to only MZ_NO_FLUSH or MZ_FINISH.
+mz_ulong mz_deflateBound(mz_streamp pStream, mz_ulong source_len);
+
+// Single-call compression functions mz_compress() and mz_compress2():
+// Returns MZ_OK on success, or one of the error codes from mz_deflate() on failure.
+int mz_compress(unsigned char* pDest, mz_ulong* pDest_len, const unsigned char* pSource, mz_ulong source_len);
+int mz_compress2(unsigned char* pDest, mz_ulong* pDest_len, const unsigned char* pSource, mz_ulong source_len, int level);
+
+// mz_compressBound() returns a (very) conservative upper bound on the amount of data that could be generated by calling mz_compress().
+mz_ulong mz_compressBound(mz_ulong source_len);
+
+// Initializes a decompressor.
+int mz_inflateInit(mz_streamp pStream);
+
+// mz_inflateInit2() is like mz_inflateInit() with an additional option that controls the window size and whether or not the stream has been wrapped with a zlib header/footer:
+// window_bits must be MZ_DEFAULT_WINDOW_BITS (to parse zlib header/footer) or -MZ_DEFAULT_WINDOW_BITS (raw deflate).
+int mz_inflateInit2(mz_streamp pStream, int window_bits);
+
+// Decompresses the input stream to the output, consuming only as much of the input as needed, and writing as much to the output as possible.
+// Parameters:
+//   pStream is the stream to read from and write to. You must initialize/update the next_in, avail_in, next_out, and avail_out members.
+//   flush may be MZ_NO_FLUSH, MZ_SYNC_FLUSH, or MZ_FINISH.
+//   On the first call, if flush is MZ_FINISH it's assumed the input and output buffers are both sized large enough to decompress the entire stream in a single call (this is slightly faster).
+//   MZ_FINISH implies that there are no more source bytes available beside what's already in the input buffer, and that the output buffer is large enough to hold the rest of the decompressed data.
+// Return values:
+//   MZ_OK on success. Either more input is needed but not available, and/or there's more output to be written but the output buffer is full.
+//   MZ_STREAM_END if all needed input has been consumed and all output bytes have been written. For zlib streams, the adler-32 of the decompressed data has also been verified.
+//   MZ_STREAM_ERROR if the stream is bogus.
+//   MZ_DATA_ERROR if the deflate stream is invalid.
+//   MZ_PARAM_ERROR if one of the parameters is invalid.
+//   MZ_BUF_ERROR if no forward progress is possible because the input buffer is empty but the inflater needs more input to continue, or if the output buffer is not large enough. Call mz_inflate() again
+//   with more input data, or with more room in the output buffer (except when using single call decompression, described above).
+int mz_inflate(mz_streamp pStream, int flush);
+
+// Deinitializes a decompressor.
+int mz_inflateEnd(mz_streamp pStream);
+
+// Single-call decompression.
+// Returns MZ_OK on success, or one of the error codes from mz_inflate() on failure.
+int mz_uncompress(unsigned char* pDest, mz_ulong* pDest_len, const unsigned char* pSource, mz_ulong source_len);
+
+// Returns a string description of the specified error code, or NULL if the error code is invalid.
+const char* mz_error(int err);
+
+// Redefine zlib-compatible names to miniz equivalents, so miniz.c can be used as a drop-in replacement for the subset of zlib that miniz.c supports.
+// Define MINIZ_NO_ZLIB_COMPATIBLE_NAMES to disable zlib-compatibility if you use zlib in the same project.
+#ifndef MINIZ_NO_ZLIB_COMPATIBLE_NAMES
+typedef unsigned char Byte;
+typedef unsigned int uInt;
+typedef mz_ulong uLong;
+typedef Byte Bytef;
+typedef uInt uIntf;
+typedef char charf;
+typedef int intf;
+typedef void* voidpf;
+typedef uLong uLongf;
+typedef void* voidp;
+typedef void* const voidpc;
+#define Z_NULL 0
+#define Z_NO_FLUSH MZ_NO_FLUSH
+#define Z_PARTIAL_FLUSH MZ_PARTIAL_FLUSH
+#define Z_SYNC_FLUSH MZ_SYNC_FLUSH
+#define Z_FULL_FLUSH MZ_FULL_FLUSH
+#define Z_FINISH MZ_FINISH
+#define Z_BLOCK MZ_BLOCK
+#define Z_OK MZ_OK
+#define Z_STREAM_END MZ_STREAM_END
+#define Z_NEED_DICT MZ_NEED_DICT
+#define Z_ERRNO MZ_ERRNO
+#define Z_STREAM_ERROR MZ_STREAM_ERROR
+#define Z_DATA_ERROR MZ_DATA_ERROR
+#define Z_MEM_ERROR MZ_MEM_ERROR
+#define Z_BUF_ERROR MZ_BUF_ERROR
+#define Z_VERSION_ERROR MZ_VERSION_ERROR
+#define Z_PARAM_ERROR MZ_PARAM_ERROR
+#define Z_NO_COMPRESSION MZ_NO_COMPRESSION
+#define Z_BEST_SPEED MZ_BEST_SPEED
+#define Z_BEST_COMPRESSION MZ_BEST_COMPRESSION
+#define Z_DEFAULT_COMPRESSION MZ_DEFAULT_COMPRESSION
+#define Z_DEFAULT_STRATEGY MZ_DEFAULT_STRATEGY
+#define Z_FILTERED MZ_FILTERED
+#define Z_HUFFMAN_ONLY MZ_HUFFMAN_ONLY
+#define Z_RLE MZ_RLE
+#define Z_FIXED MZ_FIXED
+#define Z_DEFLATED MZ_DEFLATED
+#define Z_DEFAULT_WINDOW_BITS MZ_DEFAULT_WINDOW_BITS
+#define alloc_func mz_alloc_func
+#define free_func mz_free_func
+#define internal_state mz_internal_state
+#define z_stream mz_stream
+#define deflateInit mz_deflateInit
+#define deflateInit2 mz_deflateInit2
+#define deflateReset mz_deflateReset
+#define deflate mz_deflate
+#define deflateEnd mz_deflateEnd
+#define deflateBound mz_deflateBound
+#define compress mz_compress
+#define compress2 mz_compress2
+#define compressBound mz_compressBound
+#define inflateInit mz_inflateInit
+#define inflateInit2 mz_inflateInit2
+#define inflate mz_inflate
+#define inflateEnd mz_inflateEnd
+#define uncompress mz_uncompress
+#define crc32 mz_crc32
+#define adler32 mz_adler32
+#define MAX_WBITS 15
+#define MAX_MEM_LEVEL 9
+#define zError mz_error
+#define ZLIB_VERSION MZ_VERSION
+#define ZLIB_VERNUM MZ_VERNUM
+#define ZLIB_VER_MAJOR MZ_VER_MAJOR
+#define ZLIB_VER_MINOR MZ_VER_MINOR
+#define ZLIB_VER_REVISION MZ_VER_REVISION
+#define ZLIB_VER_SUBREVISION MZ_VER_SUBREVISION
+#define zlibVersion mz_version
+#define zlib_version mz_version()
+#endif  // #ifndef MINIZ_NO_ZLIB_COMPATIBLE_NAMES
+
+#endif  // MINIZ_NO_ZLIB_APIS
+
+// ------------------- Types and macros
+
+typedef unsigned char mz_uint8;
+typedef signed short mz_int16;
+typedef unsigned short mz_uint16;
+typedef unsigned int mz_uint32;
+typedef unsigned int mz_uint;
+typedef long long mz_int64;
+typedef unsigned long long mz_uint64;
+typedef int mz_bool;
+
+#define MZ_FALSE (0)
+#define MZ_TRUE (1)
+
+// Works around MSVC's spammy "warning C4127: conditional expression is constant" message.
+#ifdef _MSC_VER
+#define MZ_MACRO_END while (0, 0)
+#else
+#define MZ_MACRO_END while (0)
+#endif
+
+// ------------------- ZIP archive reading/writing
+
+#ifndef MINIZ_NO_ARCHIVE_APIS
+
+enum {
+  MZ_ZIP_MAX_IO_BUF_SIZE = 64 * 1024,
+  MZ_ZIP_MAX_ARCHIVE_FILENAME_SIZE = 260,
+  MZ_ZIP_MAX_ARCHIVE_FILE_COMMENT_SIZE = 256
+};
+
+typedef struct
+{
+  mz_uint32 m_file_index;
+  mz_uint32 m_central_dir_ofs;
+  mz_uint16 m_version_made_by;
+  mz_uint16 m_version_needed;
+  mz_uint16 m_bit_flag;
+  mz_uint16 m_method;
+#ifndef MINIZ_NO_TIME
+  time_t m_time;
+#endif
+  mz_uint32 m_crc32;
+  mz_uint64 m_comp_size;
+  mz_uint64 m_uncomp_size;
+  mz_uint16 m_internal_attr;
+  mz_uint32 m_external_attr;
+  mz_uint64 m_local_header_ofs;
+  mz_uint32 m_comment_size;
+  char m_filename[MZ_ZIP_MAX_ARCHIVE_FILENAME_SIZE];
+  char m_comment[MZ_ZIP_MAX_ARCHIVE_FILE_COMMENT_SIZE];
+} mz_zip_archive_file_stat;
+
+typedef size_t (*mz_file_read_func)(void* pOpaque, mz_uint64 file_ofs, void* pBuf, size_t n);
+typedef size_t (*mz_file_write_func)(void* pOpaque, mz_uint64 file_ofs, const void* pBuf, size_t n);
+
+struct mz_zip_internal_state_tag;
+typedef struct mz_zip_internal_state_tag mz_zip_internal_state;
+
+typedef enum {
+  MZ_ZIP_MODE_INVALID = 0,
+  MZ_ZIP_MODE_READING = 1,
+  MZ_ZIP_MODE_WRITING = 2,
+  MZ_ZIP_MODE_WRITING_HAS_BEEN_FINALIZED = 3
+} mz_zip_mode;
+
+typedef struct
+{
+  mz_uint64 m_archive_size;
+  mz_uint64 m_central_directory_file_ofs;
+  mz_uint m_total_files;
+  mz_zip_mode m_zip_mode;
+
+  mz_uint m_file_offset_alignment;
+
+  mz_alloc_func m_pAlloc;
+  mz_free_func m_pFree;
+  mz_realloc_func m_pRealloc;
+  void* m_pAlloc_opaque;
+
+  mz_file_read_func m_pRead;
+  mz_file_write_func m_pWrite;
+  void* m_pIO_opaque;
+
+  mz_zip_internal_state* m_pState;
+
+} mz_zip_archive;
+
+typedef enum {
+  MZ_ZIP_FLAG_CASE_SENSITIVE = 0x0100,
+  MZ_ZIP_FLAG_IGNORE_PATH = 0x0200,
+  MZ_ZIP_FLAG_COMPRESSED_DATA = 0x0400,
+  MZ_ZIP_FLAG_DO_NOT_SORT_CENTRAL_DIRECTORY = 0x0800
+} mz_zip_flags;
+
+// ZIP archive reading
+
+// Inits a ZIP archive reader.
+// These functions read and validate the archive's central directory.
+mz_bool mz_zip_reader_init(mz_zip_archive* pZip, mz_uint64 size, mz_uint32 flags);
+mz_bool mz_zip_reader_init_mem(mz_zip_archive* pZip, const void* pMem, size_t size, mz_uint32 flags);
+
+#ifndef MINIZ_NO_STDIO
+mz_bool mz_zip_reader_init_file(mz_zip_archive* pZip, const char* pFilename, mz_uint32 flags);
+#endif
+
+// Returns the total number of files in the archive.
+mz_uint mz_zip_reader_get_num_files(mz_zip_archive* pZip);
+
+// Returns detailed information about an archive file entry.
+mz_bool mz_zip_reader_file_stat(mz_zip_archive* pZip, mz_uint file_index, mz_zip_archive_file_stat* pStat);
+
+// Determines if an archive file entry is a directory entry.
+mz_bool mz_zip_reader_is_file_a_directory(mz_zip_archive* pZip, mz_uint file_index);
+mz_bool mz_zip_reader_is_file_encrypted(mz_zip_archive* pZip, mz_uint file_index);
+
+// Retrieves the filename of an archive file entry.
+// Returns the number of bytes written to pFilename, or if filename_buf_size is 0 this function returns the number of bytes needed to fully store the filename.
+mz_uint mz_zip_reader_get_filename(mz_zip_archive* pZip, mz_uint file_index, char* pFilename, mz_uint filename_buf_size);
+
+// Attempts to locates a file in the archive's central directory.
+// Valid flags: MZ_ZIP_FLAG_CASE_SENSITIVE, MZ_ZIP_FLAG_IGNORE_PATH
+// Returns -1 if the file cannot be found.
+int mz_zip_reader_locate_file(mz_zip_archive* pZip, const char* pName, const char* pComment, mz_uint flags);
+
+// Extracts a archive file to a memory buffer using no memory allocation.
+mz_bool mz_zip_reader_extract_to_mem_no_alloc(mz_zip_archive* pZip, mz_uint file_index, void* pBuf, size_t buf_size, mz_uint flags, void* pUser_read_buf, size_t user_read_buf_size);
+mz_bool mz_zip_reader_extract_file_to_mem_no_alloc(mz_zip_archive* pZip, const char* pFilename, void* pBuf, size_t buf_size, mz_uint flags, void* pUser_read_buf, size_t user_read_buf_size);
+
+// Extracts a archive file to a memory buffer.
+mz_bool mz_zip_reader_extract_to_mem(mz_zip_archive* pZip, mz_uint file_index, void* pBuf, size_t buf_size, mz_uint flags);
+mz_bool mz_zip_reader_extract_file_to_mem(mz_zip_archive* pZip, const char* pFilename, void* pBuf, size_t buf_size, mz_uint flags);
+
+// Extracts a archive file to a dynamically allocated heap buffer.
+void* mz_zip_reader_extract_to_heap(mz_zip_archive* pZip, mz_uint file_index, size_t* pSize, mz_uint flags);
+void* mz_zip_reader_extract_file_to_heap(mz_zip_archive* pZip, const char* pFilename, size_t* pSize, mz_uint flags);
+
+// Extracts a archive file using a callback function to output the file's data.
+mz_bool mz_zip_reader_extract_to_callback(mz_zip_archive* pZip, mz_uint file_index, mz_file_write_func pCallback, void* pOpaque, mz_uint flags);
+mz_bool mz_zip_reader_extract_file_to_callback(mz_zip_archive* pZip, const char* pFilename, mz_file_write_func pCallback, void* pOpaque, mz_uint flags);
+
+#ifndef MINIZ_NO_STDIO
+// Extracts a archive file to a disk file and sets its last accessed and modified times.
+// This function only extracts files, not archive directory records.
+mz_bool mz_zip_reader_extract_to_file(mz_zip_archive* pZip, mz_uint file_index, const char* pDst_filename, mz_uint flags);
+mz_bool mz_zip_reader_extract_file_to_file(mz_zip_archive* pZip, const char* pArchive_filename, const char* pDst_filename, mz_uint flags);
+#endif
+
+// Ends archive reading, freeing all allocations, and closing the input archive file if mz_zip_reader_init_file() was used.
+mz_bool mz_zip_reader_end(mz_zip_archive* pZip);
+
+// ZIP archive writing
+
+#ifndef MINIZ_NO_ARCHIVE_WRITING_APIS
+
+// Inits a ZIP archive writer.
+mz_bool mz_zip_writer_init(mz_zip_archive* pZip, mz_uint64 existing_size);
+mz_bool mz_zip_writer_init_heap(mz_zip_archive* pZip, size_t size_to_reserve_at_beginning, size_t initial_allocation_size);
+
+#ifndef MINIZ_NO_STDIO
+mz_bool mz_zip_writer_init_file(mz_zip_archive* pZip, const char* pFilename, mz_uint64 size_to_reserve_at_beginning);
+#endif
+
+// Converts a ZIP archive reader object into a writer object, to allow efficient in-place file appends to occur on an existing archive.
+// For archives opened using mz_zip_reader_init_file, pFilename must be the archive's filename so it can be reopened for writing. If the file can't be reopened, mz_zip_reader_end() will be called.
+// For archives opened using mz_zip_reader_init_mem, the memory block must be growable using the realloc callback (which defaults to realloc unless you've overridden it).
+// Finally, for archives opened using mz_zip_reader_init, the mz_zip_archive's user provided m_pWrite function cannot be NULL.
+// Note: In-place archive modification is not recommended unless you know what you're doing, because if execution stops or something goes wrong before
+// the archive is finalized the file's central directory will be hosed.
+mz_bool mz_zip_writer_init_from_reader(mz_zip_archive* pZip, const char* pFilename);
+
+// Adds the contents of a memory buffer to an archive. These functions record the current local time into the archive.
+// To add a directory entry, call this method with an archive name ending in a forwardslash with empty buffer.
+// level_and_flags - compression level (0-10, see MZ_BEST_SPEED, MZ_BEST_COMPRESSION, etc.) logically OR'd with zero or more mz_zip_flags, or just set to MZ_DEFAULT_COMPRESSION.
+mz_bool mz_zip_writer_add_mem(mz_zip_archive* pZip, const char* pArchive_name, const void* pBuf, size_t buf_size, mz_uint level_and_flags);
+mz_bool mz_zip_writer_add_mem_ex(mz_zip_archive* pZip, const char* pArchive_name, const void* pBuf, size_t buf_size, const void* pComment, mz_uint16 comment_size, mz_uint level_and_flags, mz_uint64 uncomp_size, mz_uint32 uncomp_crc32);
+
+#ifndef MINIZ_NO_STDIO
+// Adds the contents of a disk file to an archive. This function also records the disk file's modified time into the archive.
+// level_and_flags - compression level (0-10, see MZ_BEST_SPEED, MZ_BEST_COMPRESSION, etc.) logically OR'd with zero or more mz_zip_flags, or just set to MZ_DEFAULT_COMPRESSION.
+mz_bool mz_zip_writer_add_file(mz_zip_archive* pZip, const char* pArchive_name, const char* pSrc_filename, const void* pComment, mz_uint16 comment_size, mz_uint level_and_flags);
+#endif
+
+// Adds a file to an archive by fully cloning the data from another archive.
+// This function fully clones the source file's compressed data (no recompression), along with its full filename, extra data, and comment fields.
+mz_bool mz_zip_writer_add_from_zip_reader(mz_zip_archive* pZip, mz_zip_archive* pSource_zip, mz_uint file_index);
+
+// Finalizes the archive by writing the central directory records followed by the end of central directory record.
+// After an archive is finalized, the only valid call on the mz_zip_archive struct is mz_zip_writer_end().
+// An archive must be manually finalized by calling this function for it to be valid.
+mz_bool mz_zip_writer_finalize_archive(mz_zip_archive* pZip);
+mz_bool mz_zip_writer_finalize_heap_archive(mz_zip_archive* pZip, void** pBuf, size_t* pSize);
+
+// Ends archive writing, freeing all allocations, and closing the output file if mz_zip_writer_init_file() was used.
+// Note for the archive to be valid, it must have been finalized before ending.
+mz_bool mz_zip_writer_end(mz_zip_archive* pZip);
+
+// Misc. high-level helper functions:
+
+// mz_zip_add_mem_to_archive_file_in_place() efficiently (but not atomically) appends a memory blob to a ZIP archive.
+// level_and_flags - compression level (0-10, see MZ_BEST_SPEED, MZ_BEST_COMPRESSION, etc.) logically OR'd with zero or more mz_zip_flags, or just set to MZ_DEFAULT_COMPRESSION.
+mz_bool mz_zip_add_mem_to_archive_file_in_place(const char* pZip_filename, const char* pArchive_name, const void* pBuf, size_t buf_size, const void* pComment, mz_uint16 comment_size, mz_uint level_and_flags);
+
+// Reads a single file from an archive into a heap block.
+// If pComment is not NULL, only the file with the specified comment will be extracted.
+// Returns NULL on failure.
+void* mz_zip_extract_archive_file_to_heap(const char* pZip_filename, const char* pArchive_name, const char* pComment, size_t* pSize, mz_uint flags);
+
+#endif  // #ifndef MINIZ_NO_ARCHIVE_WRITING_APIS
+
+#endif  // #ifndef MINIZ_NO_ARCHIVE_APIS
+
+// ------------------- Low-level Decompression API Definitions
+
+// Decompression flags used by tinfl_decompress().
+// TINFL_FLAG_PARSE_ZLIB_HEADER: If set, the input has a valid zlib header and ends with an adler32 checksum (it's a valid zlib stream). Otherwise, the input is a raw deflate stream.
+// TINFL_FLAG_HAS_MORE_INPUT: If set, there are more input bytes available beyond the end of the supplied input buffer. If clear, the input buffer contains all remaining input.
+// TINFL_FLAG_USING_NON_WRAPPING_OUTPUT_BUF: If set, the output buffer is large enough to hold the entire decompressed stream. If clear, the output buffer is at least the size of the dictionary (typically 32KB).
+// TINFL_FLAG_COMPUTE_ADLER32: Force adler-32 checksum computation of the decompressed bytes.
+enum {
+  TINFL_FLAG_PARSE_ZLIB_HEADER = 1,
+  TINFL_FLAG_HAS_MORE_INPUT = 2,
+  TINFL_FLAG_USING_NON_WRAPPING_OUTPUT_BUF = 4,
+  TINFL_FLAG_COMPUTE_ADLER32 = 8
+};
+
+// High level decompression functions:
+// tinfl_decompress_mem_to_heap() decompresses a block in memory to a heap block allocated via malloc().
+// On entry:
+//  pSrc_buf, src_buf_len: Pointer and size of the Deflate or zlib source data to decompress.
+// On return:
+//  Function returns a pointer to the decompressed data, or NULL on failure.
+//  *pOut_len will be set to the decompressed data's size, which could be larger than src_buf_len on uncompressible data.
+//  The caller must call mz_free() on the returned block when it's no longer needed.
+void* tinfl_decompress_mem_to_heap(const void* pSrc_buf, size_t src_buf_len, size_t* pOut_len, int flags);
+
+// tinfl_decompress_mem_to_mem() decompresses a block in memory to another block in memory.
+// Returns TINFL_DECOMPRESS_MEM_TO_MEM_FAILED on failure, or the number of bytes written on success.
+#define TINFL_DECOMPRESS_MEM_TO_MEM_FAILED ((size_t)(-1))
+size_t tinfl_decompress_mem_to_mem(void* pOut_buf, size_t out_buf_len, const void* pSrc_buf, size_t src_buf_len, int flags);
+
+// tinfl_decompress_mem_to_callback() decompresses a block in memory to an internal 32KB buffer, and a user provided callback function will be called to flush the buffer.
+// Returns 1 on success or 0 on failure.
+typedef int (*tinfl_put_buf_func_ptr)(const void* pBuf, int len, void* pUser);
+int tinfl_decompress_mem_to_callback(const void* pIn_buf, size_t* pIn_buf_size, tinfl_put_buf_func_ptr pPut_buf_func, void* pPut_buf_user, int flags);
+
+struct tinfl_decompressor_tag;
+typedef struct tinfl_decompressor_tag tinfl_decompressor;
+
+// Max size of LZ dictionary.
+#define TINFL_LZ_DICT_SIZE 32768
+
+// Return status.
+typedef enum {
+  TINFL_STATUS_BAD_PARAM = -3,
+  TINFL_STATUS_ADLER32_MISMATCH = -2,
+  TINFL_STATUS_FAILED = -1,
+  TINFL_STATUS_DONE = 0,
+  TINFL_STATUS_NEEDS_MORE_INPUT = 1,
+  TINFL_STATUS_HAS_MORE_OUTPUT = 2
+} tinfl_status;
+
+// Initializes the decompressor to its initial state.
+#define tinfl_init(r) \
+  do {                \
+    (r)->m_state = 0; \
+  }                   \
+  MZ_MACRO_END
+#define tinfl_get_adler32(r) (r)->m_check_adler32
+
+// Main low-level decompressor coroutine function. This is the only function actually needed for decompression. All the other functions are just high-level helpers for improved usability.
+// This is a universal API, i.e. it can be used as a building block to build any desired higher level decompression API. In the limit case, it can be called once per every byte input or output.
+tinfl_status tinfl_decompress(tinfl_decompressor* r, const mz_uint8* pIn_buf_next, size_t* pIn_buf_size, mz_uint8* pOut_buf_start, mz_uint8* pOut_buf_next, size_t* pOut_buf_size, const mz_uint32 decomp_flags);
+
+// Internal/private bits follow.
+enum {
+  TINFL_MAX_HUFF_TABLES = 3,
+  TINFL_MAX_HUFF_SYMBOLS_0 = 288,
+  TINFL_MAX_HUFF_SYMBOLS_1 = 32,
+  TINFL_MAX_HUFF_SYMBOLS_2 = 19,
+  TINFL_FAST_LOOKUP_BITS = 10,
+  TINFL_FAST_LOOKUP_SIZE = 1 << TINFL_FAST_LOOKUP_BITS
+};
+
+typedef struct
+{
+  mz_uint8 m_code_size[TINFL_MAX_HUFF_SYMBOLS_0];
+  mz_int16 m_look_up[TINFL_FAST_LOOKUP_SIZE], m_tree[TINFL_MAX_HUFF_SYMBOLS_0 * 2];
+} tinfl_huff_table;
+
+#if MINIZ_HAS_64BIT_REGISTERS
+#define TINFL_USE_64BIT_BITBUF 1
+#endif
+
+#if TINFL_USE_64BIT_BITBUF
+typedef mz_uint64 tinfl_bit_buf_t;
+#define TINFL_BITBUF_SIZE (64)
+#else
+typedef mz_uint32 tinfl_bit_buf_t;
+#define TINFL_BITBUF_SIZE (32)
+#endif
+
+struct tinfl_decompressor_tag {
+  mz_uint32 m_state, m_num_bits, m_zhdr0, m_zhdr1, m_z_adler32, m_final, m_type, m_check_adler32, m_dist, m_counter, m_num_extra, m_table_sizes[TINFL_MAX_HUFF_TABLES];
+  tinfl_bit_buf_t m_bit_buf;
+  size_t m_dist_from_out_buf_start;
+  tinfl_huff_table m_tables[TINFL_MAX_HUFF_TABLES];
+  mz_uint8 m_raw_header[4], m_len_codes[TINFL_MAX_HUFF_SYMBOLS_0 + TINFL_MAX_HUFF_SYMBOLS_1 + 137];
+};
+
+// ------------------- Low-level Compression API Definitions
+
+// Set TDEFL_LESS_MEMORY to 1 to use less memory (compression will be slightly slower, and raw/dynamic blocks will be output more frequently).
+#define TDEFL_LESS_MEMORY 0
+
+// tdefl_init() compression flags logically OR'd together (low 12 bits contain the max. number of probes per dictionary search):
+// TDEFL_DEFAULT_MAX_PROBES: The compressor defaults to 128 dictionary probes per dictionary search. 0=Huffman only, 1=Huffman+LZ (fastest/crap compression), 4095=Huffman+LZ (slowest/best compression).
+enum {
+  TDEFL_HUFFMAN_ONLY = 0,
+  TDEFL_DEFAULT_MAX_PROBES = 128,
+  TDEFL_MAX_PROBES_MASK = 0xFFF
+};
+
+// TDEFL_WRITE_ZLIB_HEADER: If set, the compressor outputs a zlib header before the deflate data, and the Adler-32 of the source data at the end. Otherwise, you'll get raw deflate data.
+// TDEFL_COMPUTE_ADLER32: Always compute the adler-32 of the input data (even when not writing zlib headers).
+// TDEFL_GREEDY_PARSING_FLAG: Set to use faster greedy parsing, instead of more efficient lazy parsing.
+// TDEFL_NONDETERMINISTIC_PARSING_FLAG: Enable to decrease the compressor's initialization time to the minimum, but the output may vary from run to run given the same input (depending on the contents of memory).
+// TDEFL_RLE_MATCHES: Only look for RLE matches (matches with a distance of 1)
+// TDEFL_FILTER_MATCHES: Discards matches <= 5 chars if enabled.
+// TDEFL_FORCE_ALL_STATIC_BLOCKS: Disable usage of optimized Huffman tables.
+// TDEFL_FORCE_ALL_RAW_BLOCKS: Only use raw (uncompressed) deflate blocks.
+enum {
+  TDEFL_WRITE_ZLIB_HEADER = 0x01000,
+  TDEFL_COMPUTE_ADLER32 = 0x02000,
+  TDEFL_GREEDY_PARSING_FLAG = 0x04000,
+  TDEFL_NONDETERMINISTIC_PARSING_FLAG = 0x08000,
+  TDEFL_RLE_MATCHES = 0x10000,
+  TDEFL_FILTER_MATCHES = 0x20000,
+  TDEFL_FORCE_ALL_STATIC_BLOCKS = 0x40000,
+  TDEFL_FORCE_ALL_RAW_BLOCKS = 0x80000
+};
+
+// High level compression functions:
+// tdefl_compress_mem_to_heap() compresses a block in memory to a heap block allocated via malloc().
+// On entry:
+//  pSrc_buf, src_buf_len: Pointer and size of source block to compress.
+//  flags: The max match finder probes (default is 128) logically OR'd against the above flags. Higher probes are slower but improve compression.
+// On return:
+//  Function returns a pointer to the compressed data, or NULL on failure.
+//  *pOut_len will be set to the compressed data's size, which could be larger than src_buf_len on uncompressible data.
+//  The caller must free() the returned block when it's no longer needed.
+void* tdefl_compress_mem_to_heap(const void* pSrc_buf, size_t src_buf_len, size_t* pOut_len, int flags);
+
+// tdefl_compress_mem_to_mem() compresses a block in memory to another block in memory.
+// Returns 0 on failure.
+size_t tdefl_compress_mem_to_mem(void* pOut_buf, size_t out_buf_len, const void* pSrc_buf, size_t src_buf_len, int flags);
+
+// Compresses an image to a compressed PNG file in memory.
+// On entry:
+//  pImage, w, h, and num_chans describe the image to compress. num_chans may be 1, 2, 3, or 4.
+//  The image pitch in bytes per scanline will be w*num_chans. The leftmost pixel on the top scanline is stored first in memory.
+// On return:
+//  Function returns a pointer to the compressed data, or NULL on failure.
+//  *pLen_out will be set to the size of the PNG image file.
+//  The caller must mz_free() the returned heap block (which will typically be larger than *pLen_out) when it's no longer needed.
+void* tdefl_write_image_to_png_file_in_memory(const void* pImage, int w, int h, int num_chans, size_t* pLen_out);
+
+// Output stream interface. The compressor uses this interface to write compressed data. It'll typically be called TDEFL_OUT_BUF_SIZE at a time.
+typedef mz_bool (*tdefl_put_buf_func_ptr)(const void* pBuf, int len, void* pUser);
+
+// tdefl_compress_mem_to_output() compresses a block to an output stream. The above helpers use this function internally.
+mz_bool tdefl_compress_mem_to_output(const void* pBuf, size_t buf_len, tdefl_put_buf_func_ptr pPut_buf_func, void* pPut_buf_user, int flags);
+
+enum { TDEFL_MAX_HUFF_TABLES = 3,
+       TDEFL_MAX_HUFF_SYMBOLS_0 = 288,
+       TDEFL_MAX_HUFF_SYMBOLS_1 = 32,
+       TDEFL_MAX_HUFF_SYMBOLS_2 = 19,
+       TDEFL_LZ_DICT_SIZE = 32768,
+       TDEFL_LZ_DICT_SIZE_MASK = TDEFL_LZ_DICT_SIZE - 1,
+       TDEFL_MIN_MATCH_LEN = 3,
+       TDEFL_MAX_MATCH_LEN = 258 };
+
+// TDEFL_OUT_BUF_SIZE MUST be large enough to hold a single entire compressed output block (using static/fixed Huffman codes).
+#if TDEFL_LESS_MEMORY
+enum { TDEFL_LZ_CODE_BUF_SIZE = 24 * 1024,
+       TDEFL_OUT_BUF_SIZE = (TDEFL_LZ_CODE_BUF_SIZE * 13) / 10,
+       TDEFL_MAX_HUFF_SYMBOLS = 288,
+       TDEFL_LZ_HASH_BITS = 12,
+       TDEFL_LEVEL1_HASH_SIZE_MASK = 4095,
+       TDEFL_LZ_HASH_SHIFT = (TDEFL_LZ_HASH_BITS + 2) / 3,
+       TDEFL_LZ_HASH_SIZE = 1 << TDEFL_LZ_HASH_BITS };
+#else
+enum { TDEFL_LZ_CODE_BUF_SIZE = 64 * 1024,
+       TDEFL_OUT_BUF_SIZE = (TDEFL_LZ_CODE_BUF_SIZE * 13) / 10,
+       TDEFL_MAX_HUFF_SYMBOLS = 288,
+       TDEFL_LZ_HASH_BITS = 15,
+       TDEFL_LEVEL1_HASH_SIZE_MASK = 4095,
+       TDEFL_LZ_HASH_SHIFT = (TDEFL_LZ_HASH_BITS + 2) / 3,
+       TDEFL_LZ_HASH_SIZE = 1 << TDEFL_LZ_HASH_BITS };
+#endif
+
+// The low-level tdefl functions below may be used directly if the above helper functions aren't flexible enough. The low-level functions don't make any heap allocations, unlike the above helper functions.
+typedef enum {
+  TDEFL_STATUS_BAD_PARAM = -2,
+  TDEFL_STATUS_PUT_BUF_FAILED = -1,
+  TDEFL_STATUS_OKAY = 0,
+  TDEFL_STATUS_DONE = 1,
+} tdefl_status;
+
+// Must map to MZ_NO_FLUSH, MZ_SYNC_FLUSH, etc. enums
+typedef enum {
+  TDEFL_NO_FLUSH = 0,
+  TDEFL_SYNC_FLUSH = 2,
+  TDEFL_FULL_FLUSH = 3,
+  TDEFL_FINISH = 4
+} tdefl_flush;
+
+// tdefl's compression state structure.
+typedef struct
+{
+  tdefl_put_buf_func_ptr m_pPut_buf_func;
+  void* m_pPut_buf_user;
+  mz_uint m_flags, m_max_probes[2];
+  int m_greedy_parsing;
+  mz_uint m_adler32, m_lookahead_pos, m_lookahead_size, m_dict_size;
+  mz_uint8 *m_pLZ_code_buf, *m_pLZ_flags, *m_pOutput_buf, *m_pOutput_buf_end;
+  mz_uint m_num_flags_left, m_total_lz_bytes, m_lz_code_buf_dict_pos, m_bits_in, m_bit_buffer;
+  mz_uint m_saved_match_dist, m_saved_match_len, m_saved_lit, m_output_flush_ofs, m_output_flush_remaining, m_finished, m_block_index, m_wants_to_finish;
+  tdefl_status m_prev_return_status;
+  const void* m_pIn_buf;
+  void* m_pOut_buf;
+  size_t *m_pIn_buf_size, *m_pOut_buf_size;
+  tdefl_flush m_flush;
+  const mz_uint8* m_pSrc;
+  size_t m_src_buf_left, m_out_buf_ofs;
+  mz_uint8 m_dict[TDEFL_LZ_DICT_SIZE + TDEFL_MAX_MATCH_LEN - 1];
+  mz_uint16 m_huff_count[TDEFL_MAX_HUFF_TABLES][TDEFL_MAX_HUFF_SYMBOLS];
+  mz_uint16 m_huff_codes[TDEFL_MAX_HUFF_TABLES][TDEFL_MAX_HUFF_SYMBOLS];
+  mz_uint8 m_huff_code_sizes[TDEFL_MAX_HUFF_TABLES][TDEFL_MAX_HUFF_SYMBOLS];
+  mz_uint8 m_lz_code_buf[TDEFL_LZ_CODE_BUF_SIZE];
+  mz_uint16 m_next[TDEFL_LZ_DICT_SIZE];
+  mz_uint16 m_hash[TDEFL_LZ_HASH_SIZE];
+  mz_uint8 m_output_buf[TDEFL_OUT_BUF_SIZE];
+} tdefl_compressor;
+
+// Initializes the compressor.
+// There is no corresponding deinit() function because the tdefl API's do not dynamically allocate memory.
+// pBut_buf_func: If NULL, output data will be supplied to the specified callback. In this case, the user should call the tdefl_compress_buffer() API for compression.
+// If pBut_buf_func is NULL the user should always call the tdefl_compress() API.
+// flags: See the above enums (TDEFL_HUFFMAN_ONLY, TDEFL_WRITE_ZLIB_HEADER, etc.)
+tdefl_status tdefl_init(tdefl_compressor* d, tdefl_put_buf_func_ptr pPut_buf_func, void* pPut_buf_user, int flags);
+
+// Compresses a block of data, consuming as much of the specified input buffer as possible, and writing as much compressed data to the specified output buffer as possible.
+tdefl_status tdefl_compress(tdefl_compressor* d, const void* pIn_buf, size_t* pIn_buf_size, void* pOut_buf, size_t* pOut_buf_size, tdefl_flush flush);
+
+// tdefl_compress_buffer() is only usable when the tdefl_init() is called with a non-NULL tdefl_put_buf_func_ptr.
+// tdefl_compress_buffer() always consumes the entire input buffer.
+tdefl_status tdefl_compress_buffer(tdefl_compressor* d, const void* pIn_buf, size_t in_buf_size, tdefl_flush flush);
+
+tdefl_status tdefl_get_prev_return_status(tdefl_compressor* d);
+mz_uint32 tdefl_get_adler32(tdefl_compressor* d);
+
+// Can't use tdefl_create_comp_flags_from_zip_params if MINIZ_NO_ZLIB_APIS isn't defined, because it uses some of its macros.
+#ifndef MINIZ_NO_ZLIB_APIS
+// Create tdefl_compress() flags given zlib-style compression parameters.
+// level may range from [0,10] (where 10 is absolute max compression, but may be much slower on some files)
+// window_bits may be -15 (raw deflate) or 15 (zlib)
+// strategy may be either MZ_DEFAULT_STRATEGY, MZ_FILTERED, MZ_HUFFMAN_ONLY, MZ_RLE, or MZ_FIXED
+mz_uint tdefl_create_comp_flags_from_zip_params(int level, int window_bits, int strategy);
+#endif  // #ifndef MINIZ_NO_ZLIB_APIS
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // MINIZ_HEADER_INCLUDED
@@ -0,0 +1,337 @@
+// File: crn_mipmapped_texture.h
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+#include "crn_dxt_image.h"
+#include "../inc/dds_defs.h"
+#include "crn_pixel_format.h"
+#include "crn_image.h"
+#include "crn_resampler.h"
+#include "crn_data_stream_serializer.h"
+#include "crn_qdxt1.h"
+#include "crn_qdxt5.h"
+#include "crn_texture_file_types.h"
+#include "crn_image_utils.h"
+
+namespace crnlib {
+extern const vec2I g_vertical_cross_image_offsets[6];
+
+enum orientation_flags_t {
+  cOrientationFlagXFlipped = 1,
+  cOrientationFlagYFlipped = 2,
+
+  cDefaultOrientationFlags = 0
+};
+
+enum unpack_flags_t {
+  cUnpackFlagUncook = 1,
+  cUnpackFlagUnflip = 2
+};
+
+class mip_level {
+  friend class mipmapped_texture;
+
+ public:
+  mip_level();
+  ~mip_level();
+
+  mip_level(const mip_level& other);
+  mip_level& operator=(const mip_level& rhs);
+
+  // Assumes ownership.
+  void assign(image_u8* p, pixel_format fmt = PIXEL_FMT_INVALID, orientation_flags_t orient_flags = cDefaultOrientationFlags);
+  void assign(dxt_image* p, pixel_format fmt = PIXEL_FMT_INVALID, orientation_flags_t orient_flags = cDefaultOrientationFlags);
+
+  void clear();
+
+  inline uint get_width() const { return m_width; }
+  inline uint get_height() const { return m_height; }
+  inline uint get_total_pixels() const { return m_width * m_height; }
+
+  orientation_flags_t get_orientation_flags() const { return m_orient_flags; }
+  void set_orientation_flags(orientation_flags_t flags) { m_orient_flags = flags; }
+
+  inline image_u8* get_image() const { return m_pImage; }
+  inline dxt_image* get_dxt_image() const { return m_pDXTImage; }
+
+  image_u8* get_unpacked_image(image_u8& tmp, uint unpack_flags) const;
+
+  inline bool is_packed() const { return m_pDXTImage != NULL; }
+
+  inline bool is_valid() const { return (m_pImage != NULL) || (m_pDXTImage != NULL); }
+
+  inline pixel_format_helpers::component_flags get_comp_flags() const { return m_comp_flags; }
+  inline void set_comp_flags(pixel_format_helpers::component_flags comp_flags) { m_comp_flags = comp_flags; }
+
+  inline pixel_format get_format() const { return m_format; }
+  inline void set_format(pixel_format fmt) { m_format = fmt; }
+
+  bool convert(pixel_format fmt, bool cook, const dxt_image::pack_params& p);
+
+  bool pack_to_dxt(const image_u8& img, pixel_format fmt, bool cook, const dxt_image::pack_params& p, orientation_flags_t orient_flags = cDefaultOrientationFlags);
+  bool pack_to_dxt(pixel_format fmt, bool cook, const dxt_image::pack_params& p);
+
+  bool unpack_from_dxt(bool uncook = true);
+
+  // Returns true if flipped on either axis.
+  bool is_flipped() const;
+
+  bool is_x_flipped() const;
+  bool is_y_flipped() const;
+
+  bool can_unflip_without_unpacking() const;
+
+  // Returns true if unflipped on either axis.
+  // Will try to flip packed (DXT/ETC) data in-place, if this isn't possible it'll unpack/uncook the mip level then unflip.
+  bool unflip(bool allow_unpacking_to_flip, bool uncook_during_unpack);
+
+  bool set_alpha_to_luma();
+  bool convert(image_utils::conversion_type conv_type);
+
+  bool flip_x();
+  bool flip_y();
+
+ private:
+  uint m_width;
+  uint m_height;
+
+  pixel_format_helpers::component_flags m_comp_flags;
+  pixel_format m_format;
+
+  image_u8* m_pImage;
+  dxt_image* m_pDXTImage;
+
+  orientation_flags_t m_orient_flags;
+
+  void cook_image(image_u8& img) const;
+  void uncook_image(image_u8& img) const;
+};
+
+// A face is an array of mip_level ptr's.
+typedef crnlib::vector<mip_level*> mip_ptr_vec;
+
+// And an array of one, six, or N faces make up a texture.
+typedef crnlib::vector<mip_ptr_vec> face_vec;
+
+class mipmapped_texture {
+ public:
+  // Construction/destruction
+  mipmapped_texture();
+  ~mipmapped_texture();
+
+  mipmapped_texture(const mipmapped_texture& other);
+  mipmapped_texture& operator=(const mipmapped_texture& rhs);
+
+  void clear();
+
+  void init(uint width, uint height, uint levels, uint faces, pixel_format fmt, const char* pName, orientation_flags_t orient_flags);
+
+  // Assumes ownership.
+  void assign(face_vec& faces);
+  void assign(mip_level* pLevel);
+  void assign(image_u8* p, pixel_format fmt = PIXEL_FMT_INVALID, orientation_flags_t orient_flags = cDefaultOrientationFlags);
+  void assign(dxt_image* p, pixel_format fmt = PIXEL_FMT_INVALID, orientation_flags_t orient_flags = cDefaultOrientationFlags);
+
+  void set(texture_file_types::format source_file_type, const mipmapped_texture& mipmapped_texture);
+
+  // Accessors
+  image_u8* get_level_image(uint face, uint level, image_u8& img, uint unpack_flags = cUnpackFlagUncook | cUnpackFlagUnflip) const;
+
+  inline bool is_valid() const { return m_faces.size() > 0; }
+
+  const dynamic_string& get_name() const { return m_name; }
+  void set_name(const dynamic_string& name) { m_name = name; }
+
+  const dynamic_string& get_source_filename() const { return get_name(); }
+  texture_file_types::format get_source_file_type() const { return m_source_file_type; }
+
+  inline uint get_width() const { return m_width; }
+  inline uint get_height() const { return m_height; }
+  inline uint get_total_pixels() const { return m_width * m_height; }
+  uint get_total_pixels_in_all_faces_and_mips() const;
+
+  inline uint get_num_faces() const { return m_faces.size(); }
+  inline uint get_num_levels() const {
+    if (m_faces.empty())
+      return 0;
+    else
+      return m_faces[0].size();
+  }
+
+  inline pixel_format_helpers::component_flags get_comp_flags() const { return m_comp_flags; }
+  inline pixel_format get_format() const { return m_format; }
+
+  inline bool is_unpacked() const {
+    if (get_num_faces()) {
+      return get_level(0, 0)->get_image() != NULL;
+    }
+    return false;
+  }
+
+  inline const mip_ptr_vec& get_face(uint face) const { return m_faces[face]; }
+  inline mip_ptr_vec& get_face(uint face) { return m_faces[face]; }
+
+  inline const mip_level* get_level(uint face, uint mip) const { return m_faces[face][mip]; }
+  inline mip_level* get_level(uint face, uint mip) { return m_faces[face][mip]; }
+
+  bool has_alpha() const;
+  bool is_normal_map() const;
+  bool is_vertical_cross() const;
+  bool is_packed() const;
+  texture_type determine_texture_type() const;
+
+  const dynamic_string& get_last_error() const { return m_last_error; }
+  void clear_last_error() { m_last_error.clear(); }
+
+  // Reading/writing
+  bool read_dds(data_stream_serializer& serializer);
+  bool write_dds(data_stream_serializer& serializer) const;
+
+  bool read_ktx(data_stream_serializer& serializer);
+  bool write_ktx(data_stream_serializer& serializer) const;
+
+  bool read_crn(data_stream_serializer& serializer);
+  bool read_crn_from_memory(const void* pData, uint data_size, const char* pFilename);
+
+  // If file_format is texture_file_types::cFormatInvalid, the format will be determined from the filename's extension.
+  bool read_from_file(const char* pFilename, texture_file_types::format file_format = texture_file_types::cFormatInvalid);
+  bool read_from_stream(data_stream_serializer& serializer, texture_file_types::format file_format = texture_file_types::cFormatInvalid);
+
+  bool write_to_file(
+      const char* pFilename,
+      texture_file_types::format file_format = texture_file_types::cFormatInvalid,
+      crn_comp_params* pComp_params = NULL,
+      uint32* pActual_quality_level = NULL, float* pActual_bitrate = NULL,
+      uint32 image_write_flags = 0);
+
+  // Conversion
+  bool convert(pixel_format fmt, bool cook, const dxt_image::pack_params& p);
+  bool convert(pixel_format fmt, const dxt_image::pack_params& p);
+  bool convert(pixel_format fmt, bool cook, const dxt_image::pack_params& p, int qdxt_quality, bool hierarchical = true);
+  bool convert(image_utils::conversion_type conv_type);
+
+  bool unpack_from_dxt(bool uncook = true);
+
+  bool set_alpha_to_luma();
+
+  void discard_mipmaps();
+
+  void discard_mips();
+
+  struct resample_params {
+    resample_params()
+        : m_pFilter("kaiser"),
+          m_wrapping(false),
+          m_srgb(false),
+          m_renormalize(false),
+          m_filter_scale(.9f),
+          m_gamma(1.75f),  // or 2.2f
+          m_multithreaded(true) {
+    }
+
+    const char* m_pFilter;
+    bool m_wrapping;
+    bool m_srgb;
+    bool m_renormalize;
+    float m_filter_scale;
+    float m_gamma;
+    bool m_multithreaded;
+  };
+
+  bool resize(uint new_width, uint new_height, const resample_params& params);
+
+  struct generate_mipmap_params : public resample_params {
+    generate_mipmap_params()
+        : resample_params(),
+          m_min_mip_size(1),
+          m_max_mips(0) {
+    }
+
+    uint m_min_mip_size;
+    uint m_max_mips;  // actually the max # of total levels
+  };
+
+  bool generate_mipmaps(const generate_mipmap_params& params, bool force);
+
+  bool crop(uint x, uint y, uint width, uint height);
+
+  bool vertical_cross_to_cubemap();
+
+  // Low-level clustered DXT (QDXT) compression
+  struct qdxt_state {
+    qdxt_state(task_pool& tp)
+        : m_fmt(PIXEL_FMT_INVALID), m_qdxt1(tp), m_qdxt5a(tp), m_qdxt5b(tp) {
+    }
+
+    pixel_format m_fmt;
+    qdxt1 m_qdxt1;
+    qdxt5 m_qdxt5a;
+    qdxt5 m_qdxt5b;
+    crnlib::vector<dxt_pixel_block> m_pixel_blocks;
+
+    qdxt1_params m_qdxt1_params;
+    qdxt5_params m_qdxt5_params[2];
+    bool m_has_blocks[3];
+
+    void clear() {
+      m_fmt = PIXEL_FMT_INVALID;
+      m_qdxt1.clear();
+      m_qdxt5a.clear();
+      m_qdxt5b.clear();
+      m_pixel_blocks.clear();
+      m_qdxt1_params.clear();
+      m_qdxt5_params[0].clear();
+      m_qdxt5_params[1].clear();
+      utils::zero_object(m_has_blocks);
+    }
+  };
+  bool qdxt_pack_init(qdxt_state& state, mipmapped_texture& dst_tex, const qdxt1_params& dxt1_params, const qdxt5_params& dxt5_params, pixel_format fmt, bool cook);
+  bool qdxt_pack(qdxt_state& state, mipmapped_texture& dst_tex, const qdxt1_params& dxt1_params, const qdxt5_params& dxt5_params);
+
+  void swap(mipmapped_texture& img);
+
+  bool check() const;
+
+  void set_orientation_flags(orientation_flags_t flags);
+
+  // Returns true if any face/miplevel is flipped.
+  bool is_flipped() const;
+  bool is_x_flipped() const;
+  bool is_y_flipped() const;
+  bool can_unflip_without_unpacking() const;
+  bool unflip(bool allow_unpacking_to_flip, bool uncook_if_necessary_to_unpack);
+
+  bool flip_y(bool update_orientation_flags);
+
+ private:
+  dynamic_string m_name;
+
+  uint m_width;
+  uint m_height;
+
+  pixel_format_helpers::component_flags m_comp_flags;
+  pixel_format m_format;
+
+  face_vec m_faces;
+
+  texture_file_types::format m_source_file_type;
+
+  mutable dynamic_string m_last_error;
+
+  inline void clear_last_error() const { m_last_error.clear(); }
+  inline void set_last_error(const char* p) const { m_last_error = p; }
+
+  void free_all_mips();
+  bool read_regular_image(data_stream_serializer& serializer);
+  bool write_regular_image(const char* pFilename, uint32 image_write_flags);
+  bool read_dds_internal(data_stream_serializer& serializer);
+  void print_crn_comp_params(const crn_comp_params& p);
+  bool write_comp_texture(const char* pFilename, const crn_comp_params& comp_params, uint32* pActual_quality_level, float* pActual_bitrate);
+  void change_dxt1_to_dxt1a();
+  bool flip_y_helper();
+};
+
+inline void swap(mipmapped_texture& a, mipmapped_texture& b) {
+  a.swap(b);
+}
+
+}  // namespace crnlib
@@ -0,0 +1,79 @@
+// File: crn_packed_uint
+// See Copyright Notice and license at the end of inc/crnlib.h
+#pragma once
+
+namespace crnlib {
+template <unsigned int N>
+struct packed_uint {
+  inline packed_uint() {}
+
+  inline packed_uint(unsigned int val) { *this = val; }
+
+  inline packed_uint(const packed_uint& other) { *this = other; }
+
+  inline packed_uint& operator=(const packed_uint& rhs) {
+    if (this != &rhs)
+      memcpy(m_buf, rhs.m_buf, sizeof(m_buf));
+    return *this;
+  }
+
+  inline packed_uint& operator=(unsigned int val) {
+#ifdef CRNLIB_BUILD_DEBUG
+    if (N == 1) {
+      CRNLIB_ASSERT(val <= 0xFFU);
+    } else if (N == 2) {
+      CRNLIB_ASSERT(val <= 0xFFFFU);
+    } else if (N == 3) {
+      CRNLIB_ASSERT(val <= 0xFFFFFFU);
+    }
+#endif
+
+    val <<= (8U * (4U - N));
+
+    for (unsigned int i = 0; i < N; i++) {
+      m_buf[i] = static_cast<unsigned char>(val >> 24U);
+      val <<= 8U;
+    }
+
+    return *this;
+  }
+
+  inline operator unsigned int() const {
+    switch (N) {
+      case 1:
+        return m_buf[0];
+      case 2:
+        return (m_buf[0] << 8U) | m_buf[1];
+      case 3:
+        return (m_buf[0] << 16U) | (m_buf[1] << 8U) | (m_buf[2]);
+      default:
+        return (m_buf[0] << 24U) | (m_buf[1] << 16U) | (m_buf[2] << 8U) | (m_buf[3]);
+    }
+  }
+
+  unsigned char m_buf[N];
+};
+template <typename T>
+class packed_value {
+ public:
+  packed_value() {}
+  packed_value(T val) { *this = val; }
+
+  inline operator T() const {
+    T result = 0;
+    for (int i = sizeof(T) - 1; i >= 0; i--)
+      result = static_cast<T>((result << 8) | m_bytes[i]);
+    return result;
+  }
+  packed_value& operator=(T val) {
+    for (int i = 0; i < sizeof(T); i++) {
+      m_bytes[i] = static_cast<uint8>(val);
+      val >>= 8;
+    }
+    return *this;
+  }
+
+ private:
+  uint8 m_bytes[sizeof(T)];
+};
+}  // namespace crnlib
--- a/Show More
+++ b/Show More