This change significantly improves compression speed.
Explanation:
The main ideas used for the endpoint and selector sorting optimization:
- unpacked color and alpha endpoints can be cached
- pixel selectors can be processed in groups, while the intermediate error results for those groups can be precalculated
- instead of maintaining the mask of the processed elements, the remaining elements can be reorganized to form a continuous block on each iteration (the last remaining element is moved into the position of the processed element)
- after optimization, endpoint sorting works significantly faster than endpoint reordering, so the overall performance can be improved by moving selector optimization into the endpoint sorting thread
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.863 sec
Modified: 1482780 bytes / 14.564 sec
Improvement: 6.28% (compression ratio) / 49.54% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.968 sec
Modified: 1931586 bytes / 19.717 sec
Improvement: 6.47% (compression ratio) / 46.66% (compression time)
This change significantly improves the compression ratio and compression speed.
Explanation:
After the endpoint codebook has been determined, the endpoints can be reordered in order to improve the compression ratio. On the one hand, endpoint indices of the neighbor blocks should be similar, as the encoder compresses the deltas between those neighbour indices. On the other hand, the neighbor endpoints in the codebook should be also similar, as the encoder compresses the deltas between the color components of those neighbor endpoints. The optimization is based on the Zeng's technique, using a weighted function which takes into account both similarity of the endpoint indices for the neighbor blocks and similarity of the neighbor endpoints in the codebook.
The similarity of the endpoint indices is optimized using the combined neighborhood frequency of the candidate endpoint and all the currently selected endpoints in the list. The similarity of the neighbor endpoints in the codebook is optimized using euclidian distance from the candidate endpoint to the extremity of selected endpoints list. The original optimization function for the endpoint candidate (i) can be represented as:
F(i) = (total_neighborhood_frequency(i) + 1) * (endpoint_similarity(i) + 1)
The problem with this approach is the following. While the endpoint_similarity(i) has a limited range of values, the total_neighborhood_frequency(i) grows rapidly with the increasing size of the selected endpoints list. With each iteration this introduces additional disbalance for the weighted function. In order to minimize this effect, is it proposed to normalize the total_neighborhood_frequency(i) on each iteration. For computational simplicity, the normalizer is computed as the optimal total_neighborhood_frequency value from the previous iteration, multiplied by a constant. The modified optimization function can be represented as:
F(i) = (total_neighborhood_frequency(i) + total_neighborhood_frequency_normalizer) * (endpoint_similarity(i) + 1)
The main ideas used for endpoint reordering optimization:
- all the computations, which are common for the endpoint reordering threads, have been moved outside of the threads
- the ordering histogram offsets, which point to the neighborhood frequency values for a specific endpoint, are now cached, which reduces the number of multiplications when accessing the histogram
- floating point operations have been replaced with integer operations
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.873 sec
Modified: 1482726 bytes / 15.791 sec
Improvement: 6.29% (compression ratio) / 45.31% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.925 sec
Modified: 1931475 bytes / 20.970 sec
Improvement: 6.48% (compression ratio) / 43.21% (compression time)
This change slightly improves compression speed and simplifies further modification of the code.
Explanation:
Additional performance boost is achieved by using linear representation for selectors and storing block selectors in a single uint32/uint64.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.927 sec
Modified: 1494501 bytes / 17.301 sec
Improvement: 5.54% (compression ratio) / 40.19% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.992 sec
Modified: 1945365 bytes / 22.548 sec
Improvement: 5.80% (compression ratio) / 39.05% (compression time)
This change improves compression speed and simplifies further modification of the code.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.947 sec
Modified: 1494501 bytes / 17.642 sec
Improvement: 5.54% (compression ratio) / 39.05% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.965 sec
Modified: 1945365 bytes / 22.989 sec
Improvement: 5.80% (compression ratio) / 37.81% (compression time)
This change improves compression speed and simplifies further modification of the code.
Explanation:
This change is required for further optimization of the tile computation code. Additional performance boost is achieved by moving the tile palettizing into the tile computation thread.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.928 sec
Modified: 1494501 bytes / 18.259 sec
Improvement: 5.54% (compression ratio) / 36.88% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.978 sec
Modified: 1945365 bytes / 23.857 sec
Improvement: 5.80% (compression ratio) / 35.48% (compression time)
This change significantly improves compression speed.
Explanation:
The main ideas used for selector computations optimization:
- possible pixel values for each endpoint can be cached
- the distances between the possible pixel values and the actual pixels values within a block can be cached for fast error computation during selector assignment
- selector refinement can be efficiently integrated with the selector assignment, as it is based on the same set of cached error values
- using block encoding instead of chunk encoding for both endpoints and selectors eliminates extra levels of indirection
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.953 sec
Modified: 1494501 bytes / 19.667 sec
Improvement: 5.54% (compression ratio) / 32.07% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.998 sec
Modified: 1945365 bytes / 25.642 sec
Improvement: 5.80% (compression ratio) / 30.69% (compression time)
This change simplifies further modification of the code.
Explanation:
This change is required for further optimization of the quantization code.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.935 sec
Modified: 1494501 bytes / 24.528 sec
Improvement: 5.54% (compression ratio) / 15.23% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.982 sec
Modified: 1945365 bytes / 32.308 sec
Improvement: 5.80% (compression ratio) / 12.64% (compression time)
This change improves compression speed when using alpha channel.
Explanation:
As the alpha endpoint refinement does not depend on the alpha selector codebook computation, it can be safely moved into the alpha endpoint optimization thread.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.912 sec
Modified: 1494501 bytes / 24.128 sec
Improvement: 5.54% (compression ratio) / 16.55% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.985 sec
Modified: 1945365 bytes / 31.741 sec
Improvement: 5.80% (compression ratio) / 14.18% (compression time)
This change significantly improves compression speed.
Explanation:
If we take a closer look at the color endpoint refinement, we can see that the input for the color refinement comes directly from the color endpoint optimization step, while the selector codebook computation does not affect the color endpoint refinement at all. Therefore color endpoint refinement can be safely moved into the endpoint optimization thread.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.899 sec
Modified: 1494501 bytes / 24.043 sec
Improvement: 5.54% (compression ratio) / 16.80% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.884 sec
Modified: 1945365 bytes / 31.586 sec
Improvement: 5.80% (compression ratio) / 14.36% (compression time)
This change improves compression speed.
Explanation:
While trying different remappings for the endpoint indices, there is no need to perform full pack simulation when using Huffman coding. Once the delta index histogram is generated, it is sufficient to simply multiply the code sizes by the corresponding frequences in order to get the total size of the compressed endpoint indices stream. There is also no need to compute the rest of the compressed stream, as its size does not depend on the endpoint remapping and therefore is always constant, so it will not affect the size comparison during endpoint optimization.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.864 sec
Modified: 1494501 bytes / 25.317 sec
Improvement: 5.54% (compression ratio) / 12.29% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.927 sec
Modified: 1945365 bytes / 33.151 sec
Improvement: 5.80% (compression ratio) / 10.23% (compression time)
This change simplifies further modification of the code.
Explanation:
Considering that chunks are no longer used in the output format, it makes sense to also remove chunk related code from the intermediate processing. This modification also allows to use endpoint references from the leftmost block to the rightmost block in the previous scanline (wrapped reference to the left).
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.846 sec
Modified: 1494501 bytes / 25.628 sec
Improvement: 5.54% (compression ratio) / 11.16% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.869 sec
Modified: 1945365 bytes / 33.497 sec
Improvement: 5.80% (compression ratio) / 9.15% (compression time)
This change significantly improves the compression ratio.
Explanation:
By default, the size of the endpoint and selector codebooks is calculated based on the number of blocks in the image and the quality parameter, while the actual complexity of the image does not affect the initial codebook size. So the target codebook size is selected in such a way, that even complex images can be approximated well enough. At the same time, normally, the lower is the complexity of the image, the higher is the density of the quantized vectors. Considering that vector quantization is performed using floating point computations, and the quantized endpoints have integer components, high density of quantized vectors will result in large number of duplicate endpoints. As the result, some identical endpoints are being represented by multiple different indices, which significantly affects the compression ratio. Note that this is not the case for selectors, as their corresponding vector components are rounded after quantization, but instead it leads to some duplicate selectors in the codebook being not used. In the modified version of the algorithm all the duplicate codebook entries are merged together, unused entries are removed from the codebooks, the endpoint and selector indices are updated accordingly.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.835 sec
Modified: 1494630 bytes / 25.637 sec
Improvement: 5.54% (compression ratio) / 11.09% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.875 sec
Modified: 1946533 bytes / 33.546 sec
Improvement: 5.75% (compression ratio) / 9.03% (compression time)
This change significantly improves compression ratio and compression speed.
Explanation:
The original version of Crunch encodes the differences between the neighbour indices in order to get advantage of the neighbour indices similarity. The efficiency of such approach highly depends on the continuity of the encoded data. While neighbour color and alpha endpoints are usualy similar, this is usually not the case for selectors. Of course, in some situations, encoding deltas for selector indices makes sense, for example, when the image contains a lot of regular patterns (except the special case of completely flat areas, where using selector deltas does not bring much advantage). In any case, such situations are relatively rare, so it usually appears to be more efficient to encode raw selector indices. Note that when not using deltas for selector indices, the remapping of the selector indices no longer affects the size of the encoded selector indices stream (at least when using Huffman coding). This makes the Zeng optimization step unnecessary, and it is sufficient to simply optimize the size of the packed selector codebook.
Note:
This modification alters the output file format and makes it incompatible with the previous revisions.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.845 sec
Modified: 1521167 bytes / 26.048 sec
Improvement: 3.86% (compression ratio) / 9.70% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.949 sec
Modified: 1977373 bytes / 33.889 sec
Improvement: 4.25% (compression ratio) / 8.28% (compression time)
This change improves the compression ratio.
Explanation:
In the original version of Crunch all the blocks are grouped into chunks of 2x2 blocks. Each chunk can have one of 8 different types. The type of the chunk determines which blocks inside the chunk share the same endpoints (for example, all the blocks inside the chunk share the same endpoints, or blocks in the right column share the same endpoints, or all the blocks have different endpoints, etc.). Encoding of endpoints equality is usually cheaper than encoding of duplicate endpoint indices. The used 8 chunk types do not cover all the possibilities, but they can be efficiently encoded using 0.75 bits per block (uncompressed).
The modified algorithm no longer uses the concept of chunks in the output file format and is based on an alternative approach. Endpoints for each block can be either copied from the left nearest block (reference to the left), copied from the upper nearest block (reference to the top), or decoded from the stream (reference to itself). Note that this is a superset of the original encoding, so all the images previously encoded with the original algorithm can be losslessly transcoded into the new format, but not vice versa. Even though the new endpoint equality encoding is more expensive (about 1.58 bits per block, uncompressed), it provides more flexibility for endpoint matching inside the former "chunks", and more importantly, it allows to inherit endpoints from outside the former "chunks" (which is not possible when using the original chunk encoding). The blocks are no longer grouped together and are encoded in the same order as they appear on the image.
Note:
This modification alters the output file format and makes it incompatible with the previous revisions.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.903 sec
Modified: 1548791 bytes / 28.818 sec
Improvement: 2.11% (compression ratio) / 0.29% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.978 sec
Modified: 2017245 bytes / 36.846 sec
Improvement: 2.32% (compression ratio) / 0.36% (compression time)
Explanation:
After switching to ordering histograms, the linear lists of endpoint and selector indices are no longer used in Zeng function, and therefore can be removed.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.872 sec
Modified: 1561622 bytes / 28.434 sec
Improvement: 1.30% (compression ratio) / 1.52% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.910 sec
Modified: 2033151 bytes / 36.369 sec
Improvement: 1.55% (compression ratio) / 1.47% (compression time)
This change improves compression ratio.
Explanation:
In the original algorithm the relative position of the block, used for prediction of the selector index for the currently decoded block, depends on the position of the current block in the chunk. It can be a horizontal neighbour or a diagonal neighbour. Using left nearest neighbour for selector index prediction for each block (except the blocks at the image borders) minimizes the average distance to the prediction block and therefore usually improves the selector index prediction. Similarly to the endpoint index processing, the selector ordering histogram in now generated based on the selector index prediction order.
Note:
This modification alters the output file format and makes it incompatible with the previous revisions.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.869 sec
Modified: 1561622 bytes / 28.522 sec
Improvement: 1.30% (compression ratio) / 1.20% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 37.038 sec
Modified: 2033151 bytes / 36.407 sec
Improvement: 1.55% (compression ratio) / 1.70% (compression time)
This change improves compression ratio.
Explanation:
The original histogram has been generated based on the linear order of encoded endpoint indexes. In the modified version of the algorithm, endpoint indexes are predicted using the nearest left block on the image, which is not necessarily the preceding block in the encoded sequence. Using the same block ordering both for prediction and Zeng optimization normally improves the compression ratio.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.905 sec
Modified: 1566133 bytes / 28.457 sec
Improvement: 1.02% (compression ratio) / 1.55% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 37.021 sec
Modified: 2040086 bytes / 36.300 sec
Improvement: 1.22% (compression ratio) / 1.95% (compression time)
This change makes the compression scheme more flexible.
Explanation:
In the original scheme, indexes are encoded in linear order, which means that each index uses the previously encoded index for prediction. However, more sophisticated schemes might require arbitrary references into the stream of already encoded indexes. For this reason, Zeng function has been modified to accept the ordering histogram as an input, instead of the linear array of indexes. Note that Zeng function itself does not rely on the indexes being encoded in linear order.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.867 sec
Modified: 1570534 bytes / 28.524 sec
Improvement: 0.74% (compression ratio) / 1.19% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 37.001 sec
Modified: 2051509 bytes / 36.388 sec
Improvement: 0.67% (compression ratio) / 1.66% (compression time)
This change improves compression ratio.
Explanation:
In the original algorithm the relative position of the block, used for prediction of the endpoint index for the currently decoded block, depends on the chunk encoding type. It can be a horizontal neighbour, a vertical neighbour, a diagonal neighbour, or in some rare cases even a block at relative position (-2, 0) or (-3, 0). Using left nearest neighbour for endpoint index prediction for each block (except the blocks at the image borders) minimizes the average distance to the prediction block and therefore usually improves the endpoint index prediction.
Note:
This modification alters the output file format and makes it incompatible with the previous revisions.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.838 sec
Modified: 1570534 bytes / 28.629 sec
Improvement: 0.74% (compression ratio) / 0.72% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.977 sec
Modified: 2051509 bytes / 36.568 sec
Improvement: 0.67% (compression ratio) / 1.11% (compression time)
This change slightly improves compression ratio and compression time.
Explanation:
The efficiency of the Crunch encoding scheme depends on the similarity between the neighbour chunks. For this reason in original version of Crunch the order of chunks is reversed after each scanline, so that there is no jump from one side of the image to another at the image borders. The problem here is that inside of each chunk, the blocks are normally ordered in a usual up-to-down-left-to-right manner, regardless of the chunk scanning order. While on the forward scan we normally need to perform diagonal jumps (+1, +1) in order to get to the next chunk, on the reverse scan we normally need to perform much larger (-3, +1) jumps, which usually defeats the advantage of not having discontinuity at the image borders.
Note:
This modification alters the output format and makes it incompatible with the previous revisions.
Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch.
[Compressing Kodak set without mipmaps]
Original: 1582222 bytes / 28.882 sec
Modified: 1579618 bytes / 28.743 sec
Improvement: 0.16% (compression ratio) / 0.48% (compression time)
[Compressing Kodak set with mipmaps]
Original: 2065243 bytes / 36.920 sec
Modified: 2061499 bytes / 36.833 sec
Improvement: 0.18% (compression ratio) / 0.24% (compression time)
Moved the version info in top of readme to a changelog. Converted the
version identifiers to more semantic versioning, but it is easy to
change if semantic versioning is not interesting, or my conversion
doesn't make sense.