Optimization Turns Out to Be Pessimization!

Benchmark is hosted at Github

I have doing this optimization in my hobby OpenGL projects for years. My only sin is I did not measure their performance. I just assumed the one with less operation must be the faster one. Whenever a programmer wants to do get a fractional value of a maximum integer value, he/she cast the numerator and denominator to float before doing division and get the product of the division result and the max value and then cast back the result to integer. See below.

// With float conversion
int value = (int)(((float)numerator / denominator) * max_value);

Being the smart ass I am, I get the product of max value and numerator before divided by denominator. In that way, I can get away with casting to float and back. but with this method, care must be taken that product of max value and numerator must never exceed the maximum limit of integer!

// Without float conversion
int value = max_value * numerator / denominator;

To my dismay, I discovered today that the one with float conversion is the faster one. The culprit could be integer division is several times slower than float division. See Floating Point and Integer Arithmetic Benchmark.

It turns out that it is also the faster than a constant literal divisor.

// Without float conversion with constant divisor
int value = max_value * numerator / 1000;

The benchmark result is as follows.

C# 7, .NET Framework 4.6.1
===========================
                             With float conversion timing: 1827ms
                          Without float conversion timing: 2326ms
    Without float conversion with constant divisor timing: 1813ms

G++ 7.4.0, -O3
===========================
                             With float conversion timing:  298ms
                          Without float conversion timing: 2067ms
    Without float conversion with constant divisor timing:  869ms

Clang++ 6.0.0, -O3
===========================
                             With float conversion timing:  273ms
                          Without float conversion timing: 1169ms
    Without float conversion with constant divisor timing:  470ms

VC++, update 16.4, /Ox
===========================
                             With float conversion timing:  951ms
                          Without float conversion timing: 2087ms
    Without float conversion with constant divisor timing: 1022ms

The lesson here is to always measure, measure and measure!

As a point of interest, Windows SDK provides a MulDiv() that closely mirrors the assembly instructions by multiplying two 32-bit values and then divides the 64-bit result by a third 32-bit value.

int MulDiv(
  int nNumber,
  int nNumerator,
  int nDenominator
);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: