Optimize color endpoint solution evaluation
This change improves the compression speed for DXT encoding.
Explanation:
In order to evaluate an endpoint solution, it is necessary to compute the sum of the squared distances from the source pixels to their nearest block colors, defined by the evaluated endpoint solution. Such computation is quite complicated, so before it is performed, we can compute the sum of the squared distances from the source pixels to the axis-aligned bounding box enclosing all the evaluated block colors (if the source pixel appears to be inside the AABB of the evaluated solution, then the distance is considered to be 0). If the sum of the squared distances to the AABB of the current solution is already bigger than the sum of the squared distances computed for the previously found best solution, then the current solution does not need to be evaluated.
The actual trick here is that the sum of the squared distances to the AABB of the current solution can be computed in constant time using the following approach. The sums of the squared distances for each color component can be computed separately. For each color component the AABB determines 2 planes: the "lower" plane, defined by the lower boundary of the AABB, and the "upper" plane, defined by the upper boundary of the AABB. The sum for each color component is combined from two parts: the sum of the squared distances from the lower plane to all the source pixels which are below the lower plane, and the sum of the squared distances from the upper plane to all the source pixels which are above the upper plane. Considering that the endpoints of the evaluated solution are encoded as RGB565, there are 32 possible planes for the red and blue components, and 64 possible planes for the green component. For each plane it is sufficient to precompute the following two values: the sum of the squared distances from the plane to all the source pixels which are "below" this plane, and the sum of the squared distances from the plane to all the source pixels which are "above" this plane. The total sum of the squared distances from the source pixels to any evaluated AABB can then be represented as a sum of 6 precomputed values, while all the used values can be precomputed in linear time with dynamic programming.
Note: The AABB check seems to work faster than inserting a solution into the hash map. For this reason the AABB check is performed first.
Additional improvements: A few minor adjustments have been made in order to make sure that the texture decompression gives identical result to the original version of Crunch also for 32-bit builds (original Crunch library uses different floating point models for 32-bit and 64-bit builds).
DXT Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
[Compressing Kodak set without mipmaps using DXT1 encoding]
Original: 1582222 bytes / 28.861 sec
Modified: 1468204 bytes / 8.622 sec
Improvement: 7.21% (compression ratio) / 70.13% (compression time)
[Compressing Kodak set with mipmaps using DXT1 encoding]
Original: 2065243 bytes / 36.980 sec
Modified: 1914805 bytes / 11.294 sec
Improvement: 7.28% (compression ratio) / 69.46% (compression time)
ETC Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
[Compressing Kodak set without mipmaps using ETC1 encoding]
Total size: 1607858 bytes
Total time: 15.529 sec
Average bitrate: 1.363 bpp
Average Luma PSNR: 34.050 dB
This commit is contained in:
Binary file not shown.
@@ -1115,6 +1115,13 @@ bool dxt1_endpoint_optimizer::try_median4(const vec3F& low_color, const vec3F& h
|
||||
// Given candidate low/high endpoints, find the optimal selectors for 3 and 4 color blocks, compute the resulting error,
|
||||
// and use the candidate if it results in less error than the best found result so far.
|
||||
bool dxt1_endpoint_optimizer::evaluate_solution(const dxt1_solution_coordinates& coords, bool alternate_rounding) {
|
||||
color_quad_u8 c0 = dxt1_block::unpack_color(coords.m_low_color, false);
|
||||
color_quad_u8 c1 = dxt1_block::unpack_color(coords.m_high_color, false);
|
||||
uint64 rError = c0.r < c1.r ? m_rDist[c0.r].low + m_rDist[c1.r].high : m_rDist[c0.r].high + m_rDist[c1.r].low;
|
||||
uint64 gError = c0.g < c1.g ? m_gDist[c0.g].low + m_gDist[c1.g].high : m_gDist[c0.g].high + m_gDist[c1.g].low;
|
||||
uint64 bError = c0.b < c1.b ? m_bDist[c0.b].low + m_bDist[c1.b].high : m_bDist[c0.b].high + m_bDist[c1.b].low;
|
||||
if (rError + gError + bError >= m_best_solution.m_error)
|
||||
return false;
|
||||
if (!alternate_rounding) {
|
||||
solution_hash_map::insert_result solution_res(m_solutions_tried.insert(coords.m_low_color | coords.m_high_color << 16));
|
||||
if (!solution_res.second)
|
||||
@@ -1752,6 +1759,68 @@ void dxt1_endpoint_optimizer::compute_internal(const params& p, results& r) {
|
||||
m_has_transparent_pixels = m_total_unique_color_weight != m_pParams->m_num_pixels;
|
||||
m_evaluated_colors = m_unique_colors;
|
||||
|
||||
struct {
|
||||
uint64 weight, weightedColor, weightedSquaredColor;
|
||||
} rPlane[32] = {}, gPlane[64] = {}, bPlane[32] = {};
|
||||
|
||||
for (uint i = 0; i < m_unique_colors.size(); i++) {
|
||||
const unique_color& color = m_unique_colors[i];
|
||||
uint8 R = color.m_color.r, r = (R >> 3) + ((R & 7) > (R >> 5) ? 1 : 0);
|
||||
rPlane[r].weight += color.m_weight;
|
||||
rPlane[r].weightedColor += (uint64)color.m_weight * R;
|
||||
rPlane[r].weightedSquaredColor += (uint64)color.m_weight * R * R;
|
||||
uint8 G = color.m_color.g, g = (G >> 2) + ((G & 3) > (G >> 6) ? 1 : 0);
|
||||
gPlane[g].weight += color.m_weight;
|
||||
gPlane[g].weightedColor += (uint64)color.m_weight * G;
|
||||
gPlane[g].weightedSquaredColor += (uint64)color.m_weight * G * G;
|
||||
uint8 B = color.m_color.b, b = (B >> 3) + ((B & 7) > (B >> 5) ? 1 : 0);
|
||||
bPlane[b].weight += color.m_weight;
|
||||
bPlane[b].weightedColor += (uint64)color.m_weight * B;
|
||||
bPlane[b].weightedSquaredColor += (uint64)color.m_weight * B * B;
|
||||
}
|
||||
|
||||
if (m_perceptual) {
|
||||
for (uint c = 0; c < 32; c++) {
|
||||
rPlane[c].weight *= 8;
|
||||
rPlane[c].weightedColor *= 8;
|
||||
rPlane[c].weightedSquaredColor *= 8;
|
||||
}
|
||||
for (uint c = 0; c < 64; c++) {
|
||||
gPlane[c].weight *= 25;
|
||||
gPlane[c].weightedColor *= 25;
|
||||
gPlane[c].weightedSquaredColor *= 25;
|
||||
}
|
||||
}
|
||||
|
||||
for (uint c = 1; c < 32; c++) {
|
||||
rPlane[c].weight += rPlane[c - 1].weight;
|
||||
rPlane[c].weightedColor += rPlane[c - 1].weightedColor;
|
||||
rPlane[c].weightedSquaredColor += rPlane[c - 1].weightedSquaredColor;
|
||||
bPlane[c].weight += bPlane[c - 1].weight;
|
||||
bPlane[c].weightedColor += bPlane[c - 1].weightedColor;
|
||||
bPlane[c].weightedSquaredColor += bPlane[c - 1].weightedSquaredColor;
|
||||
}
|
||||
|
||||
for (uint c = 1; c < 64; c++) {
|
||||
gPlane[c].weight += gPlane[c - 1].weight;
|
||||
gPlane[c].weightedColor += gPlane[c - 1].weightedColor;
|
||||
gPlane[c].weightedSquaredColor += gPlane[c - 1].weightedSquaredColor;
|
||||
}
|
||||
|
||||
for (uint c = 0; c < 32; c++) {
|
||||
uint8 C = c << 3 | c >> 2;
|
||||
m_rDist[c].low = rPlane[c].weightedSquaredColor + C * C * rPlane[c].weight - 2 * C * rPlane[c].weightedColor;
|
||||
m_rDist[c].high = rPlane[31].weightedSquaredColor + C * C * rPlane[31].weight - 2 * C * rPlane[31].weightedColor - m_rDist[c].low;
|
||||
m_bDist[c].low = bPlane[c].weightedSquaredColor + C * C * bPlane[c].weight - 2 * C * bPlane[c].weightedColor;
|
||||
m_bDist[c].high = bPlane[31].weightedSquaredColor + C * C * bPlane[31].weight - 2 * C * bPlane[31].weightedColor - m_bDist[c].low;
|
||||
}
|
||||
|
||||
for (uint c = 0; c < 64; c++) {
|
||||
uint8 C = c << 2 | c >> 4;
|
||||
m_gDist[c].low = gPlane[c].weightedSquaredColor + C * C * gPlane[c].weight - 2 * C * gPlane[c].weightedColor;
|
||||
m_gDist[c].high = gPlane[63].weightedSquaredColor + C * C * gPlane[63].weight - 2 * C * gPlane[63].weightedColor - m_gDist[c].low;
|
||||
}
|
||||
|
||||
if (!m_unique_colors.size()) {
|
||||
m_pResults->m_low_color = 0;
|
||||
m_pResults->m_high_color = 0;
|
||||
|
||||
@@ -177,6 +177,10 @@ class dxt1_endpoint_optimizer {
|
||||
unique_color_vec m_evaluated_colors;
|
||||
unique_color_vec m_temp_unique_colors;
|
||||
|
||||
struct {
|
||||
uint64 low, high;
|
||||
} m_rDist[32], m_gDist[64], m_bDist[32];
|
||||
|
||||
uint m_total_unique_color_weight;
|
||||
|
||||
bool m_has_transparent_pixels;
|
||||
|
||||
@@ -640,16 +640,16 @@ void dxt_hc::determine_color_endpoint_clusters_task(uint64 data, void* pData_ptr
|
||||
for (uint i = 0; i < codebook.size(); i++) {
|
||||
const vec6F& c = codebook[i];
|
||||
float dist = 0;
|
||||
dist += (c[0] - v[0]) * (c[0] - v[0]);
|
||||
dist += (c[1] - v[1]) * (c[1] - v[1]);
|
||||
float d0 = c[0] - v[0]; dist += d0 * d0;
|
||||
float d1 = c[1] - v[1]; dist += d1 * d1;
|
||||
if (dist > node_dist)
|
||||
continue;
|
||||
dist += (c[2] - v[2]) * (c[2] - v[2]);
|
||||
dist += (c[3] - v[3]) * (c[3] - v[3]);
|
||||
float d2 = c[2] - v[2]; dist += d2 * d2;
|
||||
float d3 = c[3] - v[3]; dist += d3 * d3;
|
||||
if (dist > node_dist)
|
||||
continue;
|
||||
dist += (c[4] - v[4]) * (c[4] - v[4]);
|
||||
dist += (c[5] - v[5]) * (c[5] - v[5]);
|
||||
float d4 = c[4] - v[4]; dist += d4 * d4;
|
||||
float d5 = c[5] - v[5]; dist += d5 * d5;
|
||||
if (dist < best_dist) {
|
||||
best_dist = dist;
|
||||
best_index = i;
|
||||
|
||||
@@ -12,7 +12,7 @@ class tree_clusterizer {
|
||||
struct VectorInfo {
|
||||
uint index;
|
||||
uint weight;
|
||||
float weightedDotProduct;
|
||||
double weightedDotProduct;
|
||||
};
|
||||
|
||||
void clear() {
|
||||
|
||||
Reference in New Issue
Block a user