It's been a shocker to find out the hard way that complicated expressions that handle floating point operands take as much time to evaluate as those handling integer operands. Now, is memory access and retrieval faster than either two? I'm thinking some results are redundant and might as well cache them aside and reuse them.
It's been a shocker to find out the hard way that complicated expressions that handle floating point operands take as much time to evaluate as those handling integer operands.
You should expect integer arithmetic to be faster. Your observation that the two run at the same speed on your kit just goes to show how much developement has gone into FPUs and code generators.
Now, is memory access and retrieval faster than either two?
It depends on the calculation you're comparing it to.
I'm thinking some results are redundant and might as well cache them aside and reuse them.
Again it depends on what you're doing. Performance tuning always depends on what you're doing.
Generally speaking, it's normal to store the results of expensive computations. But if your kit can do what you need done on the fly, then nuf respect. The rest of us tend to store these things.
It's been a shocker to find out the hard way that complicated expressions that handle floating point operands take as much time to evaluate as those handling integer operands.
I find this hard to believe.
Could I see the code you used to make the measurement?
On the subject of lookup table efficiency, one time I was optimizing this function. I had basically reduced the thing to its near minimal expression, save for one thing: there was an integer division that I had no way of eliminating. Since its domain was small (both of its operands where in the [0;255] range), I tried building a lookup table such that a/b == table[(a<<8)+b] (or something to that effect)
The speedup was rather small, but it was still large enough to be measurable.
Memory access is fast enough and some operations are slow enough to make lookup tables worth considering in many situations.
However, an equality check takes only a subtraction and a comparison to zero. Both of these operations are what modern processors are best at. There's no way a lookup table would make them faster.
So, basically, if your hot data set is bigger than L2 cache, it is only worth caching results of operations taking longer than several hundreds of CPU cycles each. Caching results of simple operations (comparisons, additions), either integer, or FP is pointless, because simple ones take less than 1 CPU cycle on most modern CPUs.
I was working on a function that does convolution and some local image enhancements in the range [0,1]. But it is called very often that I thought if I worked in the range [0,255] i.e. data type Byte, then rescaled in the end, it would be definitely better. The run time was almost the same.
Maybe I didn't make myself clear regarding how I will be using LUTs, the function I'm using will be called twice almost 80% of the time with the same parameters, therefore, I was thinking I'd cache its return value and find an appropriate way to retrieve the value (not so smart way though). And the index of the LUT would be the two parameters of the function, which are integers.
Thanks all for the help, I'll definitely take a look at the links provided.
I was working on a function that does convolution and some local image enhancements in the range [0,1]. But it is called very often that I thought if I worked in the range [0,255] i.e. data type Byte, then rescaled in the end, it would be definitely better. The run time was almost the same.
In my scenario, the operands were already integers, so the lookup table made sense in that case.
You probably wasted too much time in the float to integer conversion. That operation is something you should avoid doing at all costs in high performance code.
Maybe I didn't make myself clear regarding how I will be using LUTs, the function I'm using will be called twice almost 80% of the time with the same parameters, therefore, I was thinking I'd cache its return value and find an appropriate way to retrieve the value (not so smart way though). And the index of the LUT would be the two parameters of the function, which are integers.
That's memoization, not lookup. Lookup involves storing in advance a large number of common results, maybe all possible results. Memoization only remembers results that have already been calculated.
There are many possible solutions for your problem. Which one is the best one depends on how exactly your code is structured. Could you post a simplified version?
int main()
{
char * in, out;
// Load image
int i, j;
for (i = 0; i < width*height; i++)
{
for (j = 0; j < width*height; j++)
{
// ... some code here
some_function(out, in, i, j);
}
}
// Store out
return 0;
}
Now I can't just return an integer variant in some_function because it involves normalization and what not, but I am trying to restrict myself to calling some_function with integer-variant operands. In other words, I am only converting to double in one line of the code, not the line in main where some_function, which is btw used inline, is called, but rather one line within some_function.
Yes, indeed, I could split it into two, and let the first of the two return a value, do something with the value, then pass the value (or something else) to the second function.