Optimize DXT color endpoint solution evaluation
This change improves the compression speed for DXT encoding.
Explanation:
In order to evaluate an endpoint solution, it is necessary to compute the sum of the squared distances from the source pixels to their nearest block colors, defined by the evaluated endpoint solution. Considering that we are looking for a solution with a minimal sum, the computation can be stopped as soon as the current sum is higher or equal than the previously found best sum. An interesting observation here is that the performance improvement, achieved by such early out approach, depends on the order in which the source pixels are processed. It makes sense to process the pixels with the highest introduced errors first, as this significantly increases the chances to exit the computation earlier.
On the one hand, equal source pixels are grouped together, so the computed distance from each unique source pixel is multiplied by its weight. For this reason it makes sense to first process the pixels with the highest weights, as their errors have the highest multipliers. On the other hand, the pixels which project onto the middle part of the endpoint interval, have higher chances of being close to one of the block colors. For this reason it makes sense to first process the pixels, which projections are the most distant from the middle of the endpoint interval, as those pixels will normally introduce the highest errors. In order to combine those two aspects, it is proposed to sort the pixels according to the multiplication of the weight and the distance from the projected pixel to the center of the endpoint interval.
Of course, reordering the pixels on each iteration would be very expensive and is not considered. However, there is a high chance that most endpoint intervals will be aligned in a similar way as the principle axis, as well as have their centers close to the mean color. As soon as the principle axis is computed, it can be used for approximation of all the future endpoint intervals. So the projection and reordering of the source pixels is performed only once.
Two approaches have been considered. In the first approach, the pixels have been sorted by the multiplication of the weight and the absolute distance in decreasing order. In the second approach, the pixels have been sorted by the multiplication of the weight and the signed distance, and then interleaved starting from the opposite sides of the ordered sequence. When tested on the Kodak image set, the interleaving approach shows better results.
Additional optimization: perceptual and uniform versions of the evaluation function are now implemented separately, which slightly improves the performance.
DXT Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). All the decompressed test images are identical to the images being compressed and decompressed using original version of Crunch (revision ea9b8d8).
[Compressing Kodak set without mipmaps using DXT1 encoding]
Original: 1582222 bytes / 28.845 sec
Modified: 1468204 bytes / 10.071 sec
Improvement: 7.21% (compression ratio) / 65.09% (compression time)
[Compressing Kodak set with mipmaps using DXT1 encoding]
Original: 2065243 bytes / 36.929 sec
Modified: 1914805 bytes / 13.248 sec
Improvement: 7.28% (compression ratio) / 64.13% (compression time)
ETC Testing:
The modified algorithm has been tested on the Kodak test set using 64-bit build with default settings (running on Windows 10, i7-4790, 3.6GHz). The ETC1 quantization parameters have been selected in such a way, so that ETC1 compression gives approximately the same average Luma PSNR as the corresponding DXT1 compression (which is equal to 34.044 dB for the Kodak test set compressed without mipmaps using DXT1 encoding and default quality settings).
[Compressing Kodak set without mipmaps using ETC1 encoding]
Total size: 1607858 bytes
Total time: 17.126 sec
Average bitrate: 1.363 bpp
Average Luma PSNR: 34.050 dB
This commit is contained in:
Binary file not shown.
+60
-15
@@ -30,6 +30,14 @@ static const uint cBetterProbeTableSize = sizeof(g_better_probe_table) / sizeof(
|
||||
static const int16 g_uber_probe_table[] = {0, 1, 2, 3, 5, 7, 9, 10, 13, 15, 19, 27, 43, 59, 91};
|
||||
static const uint cUberProbeTableSize = sizeof(g_uber_probe_table) / sizeof(g_uber_probe_table[0]);
|
||||
|
||||
struct unique_color_projection {
|
||||
unique_color color;
|
||||
int64 projection;
|
||||
};
|
||||
static struct {
|
||||
bool operator()(unique_color_projection a, unique_color_projection b) const { return a.projection < b.projection; }
|
||||
} g_unique_color_projection_sort;
|
||||
|
||||
//-----------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
dxt1_endpoint_optimizer::dxt1_endpoint_optimizer()
|
||||
@@ -693,6 +701,20 @@ void dxt1_endpoint_optimizer::optimize_endpoints(vec3F& low_color, vec3F& high_c
|
||||
high_color[1] = math::clamp(high_color[1] * 63.0f, 0.0f, 63.0f);
|
||||
high_color[2] = math::clamp(high_color[2] * 31.0f, 0.0f, 31.0f);
|
||||
|
||||
int d[3];
|
||||
for (uint c = 0; c < 3; c++)
|
||||
d[c] = math::float_to_int_round((high_color[c] - low_color[c]) * (c == 0 ? m_perceptual ? 16 : 2 : c == 1 ? m_perceptual ? 25 : 1 : 2));
|
||||
crnlib::vector<unique_color_projection> evaluated_color_projections(m_evaluated_colors.size());
|
||||
int64 average_projection = d[0] * (high_color[0] + low_color[0]) * 4 + d[1] * (high_color[1] + low_color[1]) * 2 + d[2] * (high_color[2] + low_color[2]) * 4;
|
||||
for (uint i = 0; i < m_evaluated_colors.size(); i++) {
|
||||
int64 delta = d[0] * m_evaluated_colors[i].m_color[0] + d[1] * m_evaluated_colors[i].m_color[1] + d[2] * m_evaluated_colors[i].m_color[2] - average_projection;
|
||||
evaluated_color_projections[i].projection = delta * m_evaluated_colors[i].m_weight;
|
||||
evaluated_color_projections[i].color = m_evaluated_colors[i];
|
||||
}
|
||||
std::sort(evaluated_color_projections.begin(), evaluated_color_projections.end(), g_unique_color_projection_sort);
|
||||
for (uint i = 0, iEnd = m_evaluated_colors.size(); i < iEnd; i++)
|
||||
m_evaluated_colors[i] = evaluated_color_projections[i & 1 ? i >> 1 : iEnd - 1 - (i >> 1)].color;
|
||||
|
||||
for (uint pass = 0; pass < num_passes; pass++) {
|
||||
// Now separately sweep or probe the low and high colors along the principle axis, both positively and negatively.
|
||||
// This results in two arrays of candidate low/high endpoints. Every unique combination of candidate endpoints is tried as a potential solution.
|
||||
@@ -1099,7 +1121,7 @@ bool dxt1_endpoint_optimizer::evaluate_solution(const dxt1_solution_coordinates&
|
||||
return false;
|
||||
}
|
||||
if (m_evaluate_hc)
|
||||
return evaluate_solution_hc(coords, alternate_rounding);
|
||||
return m_perceptual ? evaluate_solution_hc_perceptual(coords, alternate_rounding) : evaluate_solution_hc_uniform(coords, alternate_rounding);
|
||||
if (m_pParams->m_quality >= cCRNDXTQualityBetter)
|
||||
return evaluate_solution_uber(coords, alternate_rounding);
|
||||
return evaluate_solution_fast(coords, alternate_rounding);
|
||||
@@ -1434,27 +1456,49 @@ bool dxt1_endpoint_optimizer::evaluate_solution_fast(const dxt1_solution_coordin
|
||||
return false;
|
||||
}
|
||||
|
||||
bool dxt1_endpoint_optimizer::evaluate_solution_hc(const dxt1_solution_coordinates& coords, bool alternate_rounding) {
|
||||
bool dxt1_endpoint_optimizer::evaluate_solution_hc_perceptual(const dxt1_solution_coordinates& coords, bool alternate_rounding) {
|
||||
color_quad_u8 c0 = dxt1_block::unpack_color(coords.m_low_color, true);
|
||||
color_quad_u8 c1 = dxt1_block::unpack_color(coords.m_high_color, true);
|
||||
color_quad_u8 c2((c0.r * 2 + c1.r + alternate_rounding) / 3, (c0.g * 2 + c1.g + alternate_rounding) / 3, (c0.b * 2 + c1.b + alternate_rounding) / 3, 0);
|
||||
color_quad_u8 c3((c1.r * 2 + c0.r + alternate_rounding) / 3, (c1.g * 2 + c0.g + alternate_rounding) / 3, (c1.b * 2 + c0.b + alternate_rounding) / 3, 0);
|
||||
unique_color* color = m_unique_colors.get_ptr();
|
||||
uint count = m_unique_colors.size();
|
||||
uint64 error = 0;
|
||||
if (m_perceptual) {
|
||||
for (; count; color++, error < m_best_solution.m_error ? count-- : count = 0) {
|
||||
uint e01 = math::minimum(color::color_distance(true, color->m_color, c0, false), color::color_distance(true, color->m_color, c1, false));
|
||||
uint e23 = math::minimum(color::color_distance(true, color->m_color, c2, false), color::color_distance(true, color->m_color, c3, false));
|
||||
error += math::minimum(e01, e23) * (uint64)color->m_weight;
|
||||
}
|
||||
} else {
|
||||
for (; count; color++, error < m_best_solution.m_error ? count-- : count = 0) {
|
||||
uint e01 = math::minimum(color::color_distance(false, color->m_color, c0, false), color::color_distance(false, color->m_color, c1, false));
|
||||
uint e23 = math::minimum(color::color_distance(false, color->m_color, c2, false), color::color_distance(false, color->m_color, c3, false));
|
||||
error += math::minimum(e01, e23) * (uint64)color->m_weight;
|
||||
unique_color* color = m_evaluated_colors.get_ptr();
|
||||
for (uint count = m_evaluated_colors.size(); count; color++, error < m_best_solution.m_error ? count-- : count = 0) {
|
||||
uint e01 = math::minimum(color::color_distance(true, color->m_color, c0, false), color::color_distance(true, color->m_color, c1, false));
|
||||
uint e23 = math::minimum(color::color_distance(true, color->m_color, c2, false), color::color_distance(true, color->m_color, c3, false));
|
||||
error += math::minimum(e01, e23) * (uint64)color->m_weight;
|
||||
}
|
||||
if (error >= m_best_solution.m_error)
|
||||
return false;
|
||||
m_best_solution.m_coords = coords;
|
||||
m_best_solution.m_error = error;
|
||||
m_best_solution.m_alpha_block = false;
|
||||
m_best_solution.m_alternate_rounding = alternate_rounding;
|
||||
m_best_solution.m_enforce_selector = m_best_solution.m_coords.m_low_color == m_best_solution.m_coords.m_high_color;
|
||||
if (m_best_solution.m_enforce_selector) {
|
||||
if ((m_best_solution.m_coords.m_low_color & 31) != 31) {
|
||||
m_best_solution.m_coords.m_low_color++;
|
||||
m_best_solution.m_enforced_selector = 1;
|
||||
} else {
|
||||
m_best_solution.m_coords.m_high_color--;
|
||||
m_best_solution.m_enforced_selector = 0;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool dxt1_endpoint_optimizer::evaluate_solution_hc_uniform(const dxt1_solution_coordinates& coords, bool alternate_rounding) {
|
||||
color_quad_u8 c0 = dxt1_block::unpack_color(coords.m_low_color, true);
|
||||
color_quad_u8 c1 = dxt1_block::unpack_color(coords.m_high_color, true);
|
||||
color_quad_u8 c2((c0.r * 2 + c1.r + alternate_rounding) / 3, (c0.g * 2 + c1.g + alternate_rounding) / 3, (c0.b * 2 + c1.b + alternate_rounding) / 3, 0);
|
||||
color_quad_u8 c3((c1.r * 2 + c0.r + alternate_rounding) / 3, (c1.g * 2 + c0.g + alternate_rounding) / 3, (c1.b * 2 + c0.b + alternate_rounding) / 3, 0);
|
||||
uint64 error = 0;
|
||||
unique_color* color = m_evaluated_colors.get_ptr();
|
||||
for (uint count = m_evaluated_colors.size(); count; color++, error < m_best_solution.m_error ? count-- : count = 0) {
|
||||
uint e01 = math::minimum(color::color_distance(false, color->m_color, c0, false), color::color_distance(false, color->m_color, c1, false));
|
||||
uint e23 = math::minimum(color::color_distance(false, color->m_color, c2, false), color::color_distance(false, color->m_color, c3, false));
|
||||
error += math::minimum(e01, e23) * (uint64)color->m_weight;
|
||||
}
|
||||
if (error >= m_best_solution.m_error)
|
||||
return false;
|
||||
m_best_solution.m_coords = coords;
|
||||
@@ -1706,6 +1750,7 @@ void dxt1_endpoint_optimizer::compute_internal(const params& p, results& r) {
|
||||
}
|
||||
}
|
||||
m_has_transparent_pixels = m_total_unique_color_weight != m_pParams->m_num_pixels;
|
||||
m_evaluated_colors = m_unique_colors;
|
||||
|
||||
if (!m_unique_colors.size()) {
|
||||
m_pResults->m_low_color = 0;
|
||||
|
||||
+3
-1
@@ -174,6 +174,7 @@ class dxt1_endpoint_optimizer {
|
||||
unique_color_hash_map m_unique_color_hash_map;
|
||||
|
||||
unique_color_vec m_unique_colors; // excludes transparent colors!
|
||||
unique_color_vec m_evaluated_colors;
|
||||
unique_color_vec m_temp_unique_colors;
|
||||
|
||||
uint m_total_unique_color_weight;
|
||||
@@ -243,7 +244,8 @@ class dxt1_endpoint_optimizer {
|
||||
bool evaluate_solution(const dxt1_solution_coordinates& coords, bool alternate_rounding = false);
|
||||
bool evaluate_solution_uber(const dxt1_solution_coordinates& coords, bool alternate_rounding);
|
||||
bool evaluate_solution_fast(const dxt1_solution_coordinates& coords, bool alternate_rounding);
|
||||
bool evaluate_solution_hc(const dxt1_solution_coordinates& coords, bool alternate_rounding);
|
||||
bool evaluate_solution_hc_perceptual(const dxt1_solution_coordinates& coords, bool alternate_rounding);
|
||||
bool evaluate_solution_hc_uniform(const dxt1_solution_coordinates& coords, bool alternate_rounding);
|
||||
void compute_selectors();
|
||||
void compute_selectors_hc();
|
||||
|
||||
|
||||
Reference in New Issue
Block a user