unity

Author	SHA1	Message	Date
Alexander Suvorov	7143913032	Optimize DXT endpoints computation This change improves the compression speed for DXT encoding. Explanation: When performing per-component endpoint optimization, the trial solutions are generated using all possible combinations of the component values. Then the error boundary computation is performed for each block color of the trial solution in order to check the possibility of early out. The important observation here is that some component values are present in several trial solutions and therefore are processed multiple times. The overall performance can therefore be improved by computing and caching the errors for all the possible component values in advance. DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.843 sec Modified: 1468204 bytes / 6.067 sec Improvement: 7.21% (compression ratio) / 78.97% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.983 sec Modified: 1914805 bytes / 8.080 sec Improvement: 7.28% (compression ratio) / 78.15% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 13.421 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-10-24 19:48:37 +02:00
Alexander Suvorov	a14a313361	Optimize color endpoint solution evaluation This change improves the compression speed for DXT encoding. Explanation: In order to evaluate an endpoint solution, it is necessary to compute the sum of the squared distances from the source pixels to their nearest block colors, defined by the evaluated endpoint solution. Such computation is quite complicated, so before it is performed, we can compute the sum of the squared distances from the source pixels to the axis-aligned bounding box enclosing all the evaluated block colors (if the source pixel appears to be inside the AABB of the evaluated solution, then the distance is considered to be 0). If the sum of the squared distances to the AABB of the current solution is already bigger than the sum of the squared distances computed for the previously found best solution, then the current solution does not need to be evaluated. The actual trick here is that the sum of the squared distances to the AABB of the current solution can be computed in constant time using the following approach. The sums of the squared distances for each color component can be computed separately. For each color component the AABB determines 2 planes: the "lower" plane, defined by the lower boundary of the AABB, and the "upper" plane, defined by the upper boundary of the AABB. The sum for each color component is combined from two parts: the sum of the squared distances from the lower plane to all the source pixels which are below the lower plane, and the sum of the squared distances from the upper plane to all the source pixels which are above the upper plane. Considering that the endpoints of the evaluated solution are encoded as RGB565, there are 32 possible planes for the red and blue components, and 64 possible planes for the green component. For each plane it is sufficient to precompute the following two values: the sum of the squared distances from the plane to all the source pixels which are "below" this plane, and the sum of the squared distances from the plane to all the source pixels which are "above" this plane. The total sum of the squared distances from the source pixels to any evaluated AABB can then be represented as a sum of 6 precomputed values, while all the used values can be precomputed in linear time with dynamic programming. Note: The AABB check seems to work faster than inserting a solution into the hash map. For this reason the AABB check is performed first. Additional improvements: A few minor adjustments have been made in order to make sure that the texture decompression gives identical result to the original version of Crunch also for 32-bit builds (original Crunch library uses different floating point models for 32-bit and 64-bit builds). DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.861 sec Modified: 1468204 bytes / 8.622 sec Improvement: 7.21% (compression ratio) / 70.13% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.980 sec Modified: 1914805 bytes / 11.294 sec Improvement: 7.28% (compression ratio) / 69.46% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 15.529 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-10-13 17:20:31 +02:00
Alexander Suvorov	205829da99	Optimize DXT color endpoint solution evaluation This change improves the compression speed for DXT encoding. Explanation: In order to evaluate an endpoint solution, it is necessary to compute the sum of the squared distances from the source pixels to their nearest block colors, defined by the evaluated endpoint solution. Considering that we are looking for a solution with a minimal sum, the computation can be stopped as soon as the current sum is higher or equal than the previously found best sum. An interesting observation here is that the performance improvement, achieved by such early out approach, depends on the order in which the source pixels are processed. It makes sense to process the pixels with the highest introduced errors first, as this significantly increases the chances to exit the computation earlier. On the one hand, equal source pixels are grouped together, so the computed distance from each unique source pixel is multiplied by its weight. For this reason it makes sense to first process the pixels with the highest weights, as their errors have the highest multipliers. On the other hand, the pixels which project onto the middle part of the endpoint interval, have higher chances of being close to one of the block colors. For this reason it makes sense to first process the pixels, which projections are the most distant from the middle of the endpoint interval, as those pixels will normally introduce the highest errors. In order to combine those two aspects, it is proposed to sort the pixels according to the multiplication of the weight and the distance from the projected pixel to the center of the endpoint interval. Of course, reordering the pixels on each iteration would be very expensive and is not considered. However, there is a high chance that most endpoint intervals will be aligned in a similar way as the principle axis, as well as have their centers close to the mean color. As soon as the principle axis is computed, it can be used for approximation of all the future endpoint intervals. So the projection and reordering of the source pixels is performed only once. Two approaches have been considered. In the first approach, the pixels have been sorted by the multiplication of the weight and the absolute distance in decreasing order. In the second approach, the pixels have been sorted by the multiplication of the weight and the signed distance, and then interleaved starting from the opposite sides of the ordered sequence. When tested on the Kodak image set, the interleaving approach shows better results. Additional optimization: perceptual and uniform versions of the evaluation function are now implemented separately, which slightly improves the performance. DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.845 sec Modified: 1468204 bytes / 10.071 sec Improvement: 7.21% (compression ratio) / 65.09% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.929 sec Modified: 1914805 bytes / 13.248 sec Improvement: 7.28% (compression ratio) / 64.13% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 17.126 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-10-04 17:33:32 +02:00
Alexander Suvorov	4bd4355683	Optimize DXT endpoints computation This change improves the compression speed for DXT encoding. Explanation: When performing per-component endpoint optimization, it is not necessary to go through all the source pixels on every iteration in order to calculate the total weighted squared error for a specific trial endpoint. The computation can be optimized in the following way: sum(w(i) * (x - p(i)) * (x - p(i))) = sum(w(i) * x * x) - sum(w(i) * 2 * x * p(i)) + sum(w(i) * p(i) * p(i)) = sum(w(i)) * x * x - sum(2 * w(i) * p(i)) * x + sum(w(i) * p(i) * p(i)) The values of sum(w(i)), sum(2 * w(i) * p(i)), sum(w(i) * p(i) * p(i)) can be precalculated for each of 4 selectors, and only have to be updated when the solution improves. This way the error computation on each iteration can be performed using 12 multiplications instead of 2 * N (where N is the number of pixels in the processed cluster). DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.811 sec Modified: 1468204 bytes / 10.520 sec Improvement: 7.21% (compression ratio) / 63.49% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.936 sec Modified: 1914805 bytes / 13.902 sec Improvement: 7.28% (compression ratio) / 62.36% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 17.121 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-09-13 14:10:00 +02:00
Alexander Suvorov	3053c9dd93	Optimize DXT endpoints computation This change improves the compression speed for DXT encoding. Explanation: The main ideas used for the DXT endpoints computation optimization: - Instead of using map in tree clusterizer, the source vectors can be stored in an array and sorted before the quantization. This might increase the amount of used memory, but is much more efficient in terms of memory reallocation. - Endpoint caching can be used throughout the color endpoint computation, and not just within the optimize_endpoints function. The only place where endpoint caching can not be used is the final step of the try_combinatorial_encoding function, where alternate rounding is used. - When computing endpoint codebooks, endpoint optimizer and endpoint refiner can be reused, which eliminates unnecessary memory reallocations. DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.879 sec Modified: 1468204 bytes / 11.099 sec Improvement: 7.21% (compression ratio) / 61.57% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.919 sec Modified: 1914805 bytes / 14.621 sec Improvement: 7.28% (compression ratio) / 60.40% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 17.108 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-09-12 13:03:56 +02:00
Alexander Suvorov	3e12aff909	Fix miscellaneous compiler warnings DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.866 sec Modified: 1468204 bytes / 11.858 sec Improvement: 7.21% (compression ratio) / 58.92% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.878 sec Modified: 1914805 bytes / 15.625 sec Improvement: 7.28% (compression ratio) / 57.63% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 17.181 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-09-11 13:52:21 +02:00
Alexander Suvorov	6b3172f793	Optimize DXT color endpoints computation This change significantly improves the compression speed for DXT encoding. Explanation: The main ideas used for the DXT color endpoints computation optimization: - When the DXT endpoint computation function is called from the qunatization algorithm, almost all of its input parameters (except the color metrics) are hardcoded in the quantization code. This allows to optimize the endpoint evaluation function (which is the bottleneck of the endpoint computation algorithm) for this specific set of parameters. - In the original version of the evaluation function, selectors are computed each time when a new endpoint is evaluated. While in fact, this is not necessary, because some selector values are never used, so they can be computed lazily, based on the previously determined optimal endpoint values. This approach significantly reduces the amount of computations. Other improvements: - The original version of Crunch has a minor bug: the counter for the cached endpoint values does not get initialized. This results in nondeterministic DXT conversion of large textures, as the counter overflow can occur at a random moment. The issue is now fixed in the current branch. DXT Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision `ea9b8d8`). [Compressing Kodak set without mipmaps using DXT1 encoding] Original: 1582222 bytes / 28.893 sec Modified: 1468204 bytes / 11.882 sec Improvement: 7.21% (compression ratio) / 58.88% (compression time) [Compressing Kodak set with mipmaps using DXT1 encoding] Original: 2065243 bytes / 36.946 sec Modified: 1914805 bytes / 15.628 sec Improvement: 7.28% (compression ratio) / 57.70% (compression time) ETC Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings). [Compressing Kodak set without mipmaps using ETC1 encoding] Total size: 1607858 bytes Total time: 17.352 sec Average bitrate: 1.363 bpp Average Luma PSNR: 34.050 dB	2017-08-11 13:12:44 +02:00
Alexander Suvorov	b8349dfac8	Use block encoding to store intermediate selectors after endpoint quantization This change simplifies further modification of the code. Explanation: This change is required for further optimization of the quantization code. Testing: The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch. [Compressing Kodak set without mipmaps] Original: 1582222 bytes / 28.935 sec Modified: 1494501 bytes / 24.528 sec Improvement: 5.54% (compression ratio) / 15.23% (compression time) [Compressing Kodak set with mipmaps] Original: 2065243 bytes / 36.982 sec Modified: 1945365 bytes / 32.308 sec Improvement: 5.80% (compression ratio) / 12.64% (compression time)	2017-05-18 13:44:04 +02:00
Alexander Suvorov	7c02055d05	Reformat the source files. The source files have been reformatted using: clang-format.exe -style="{BasedOnStyle: Google, AllowAllParametersOfDeclarationOnNextLine: false, AllowShortFunctionsOnASingleLine: Inline, AllowShortIfStatementsOnASingleLine: false, AllowShortLoopsOnASingleLine: false, ColumnLimit: 0, DerivePointerAlignment: false, SortIncludes: false}"	2017-04-26 11:41:07 +02:00
Rich Geldreich	91fbf1fcc4	Fixing integer overflow problem, which can rarely cause serious artifacts.	2015-11-19 18:55:22 -08:00
richgel99@gmail.com	f71b49be60	Initial checkin of v1.04 - KTX file format support, basic ETC1 compression/decompression, Linux makefile with proper gcc options, lots of high-level improvements to get crnlib into a state where I can more easily add additional formats.	2012-11-25 08:41:25 +00:00
richgel99@gmail.com	f63e26aee6	v1.03 prerelease - Full Linux port of crnlib/crunch, in progress - still more testing to do, and some cmd line options (such as -timestamp) don't work under linux yet, but the core stuff (compression/decompression/transcoding) should work fine and performance under Linux is comparable to Windows. The 3 examples haven't been ported yet.	2012-04-26 07:14:21 +00:00
richgel99@gmail.com	9f98ea7e22		2011-12-27 21:18:07 +00:00

13 Commits