Operation time or clock cycles

So I am making a game and I want to push performance to the limit. That's why I really want to know how many clock cycles every operation, cast, memory allocation - EVERYTHING takes. Or approximate time consumption ratio, anything like that.

I tried doing it myself: I created a timer based on clock cycle counting, measured time of an empty loop and the same loop with various operations inside, but the results were extremely inconsistent and confusing: empty loop would take more time that the same loop with an addition, the time would vary greatly,... I guess it's because of background operations using up some of the CPU...

Since I didn't manage to find anything on the internet I guess there might be something I'm missing: maybe it depends on the processor?...

Anyways, if anyone could at least direct me in the right direction I'd be grateful.
Here's some advice you may not want to hear.
Judging by your other threads, I say don't worry about it at this point.

Here's a path I recommend:

1.) Learn
2.) Do fun things
3.) Learn some more
4.) Write your ambitious project
5.) Optimize

It's very likely that, through the learning process, you'll change directions/ambitions/project ideas several times anyways.
Thanks xismn, I must say you have a point. Right now I'll do just that, but still, in the future I'll want to optimize. And I'm sure I'll get there, because the current project is pretty simple - not the one I discussed in other threads.

Plus, I'm puzzled at questions like "should I pass this argument from somewhere far away by pointer, or should I just make a simple calculation to find it?". I use this kind of thing everywhere, might as well start writing code efficiently right away - will save a lot of time later.
System need time to read clock cycles and also can be interrupted pre-empted by scheduler just before reading current cpu time stamp. If you want true speed check you would need to count clock cycles per assembly instruction. But still that could not be 100% accurate because of pipelining - branch mis/prediction.

With PC (x86/64) programming you can skip optimizing for now if you are beginner, since these irrelevant with current hardware speed. The other way around it would be with microprocessor architectures (like ARM cortex-m0-4).

If you still want to read more about it, here are some of the best free tutorials on the topic:
http://www.agner.org/optimize/
I agree with xismn that you really don't need to concern yourself with these matters just yet, so this is just FYI:

For the purposes of optimization, taking a simple operation, putting it in a long loop, and timing the loop is useless. It doesn't tell you how expensive is, just how fast the computer is at performing that operation. It's useless for two reasons:
1. Compilers are often very clever, and can do rather unexpected things to your code. That one line you're so worried about might not even exist in the generated executable.
2. Modern CPUs are complex beasts. Because of pipelining and caching, the performance of two copies of the same sequence of instructions may be different depending on what was previously executed. So, in this example:
1
2
3
4
5
6
7
8
9
10
void a(){
    for (/*...*/){
        g();
    }
}

void b(){
    f();
    g();
}
timing a() may noy tell you anything about how long b() will take.
When optimizing, one normally uses a profiler. A profiler is a tool that's capable of measuring the performance of a program at various levels of precision (function, statement, instruction, etc.) without modifying the source code.

Now, timing isn't completely useless, either. If you want to optimize a complex operation (e.g. compressing a file), timing the operation as a whole is useful to know if your changes are overall improvements or not.
Topic archived. No new replies allowed.