different time with same input

Hi everyone,

I write a code for reverse nearest neighbor query. I run an experiment to check the performance of my algorithm. the metric is CPU cost in seconds. However i have a problem. When I run the code with same input, it gives me different results in CPU cost. The difference is quite high (up to 15 seconds). What would be likely the problem?

thanks
What's the size of the input? Where does the input come from? Is the algorithm deterministic?
I give 100K users moving every seconds for 100 seconds. Input is from file and it is deterministic algorithm
Is any other process using the disk? You could try loading the entire file to memory before starting the clock.
Hi,

Thanks for your response. I'm sure there was no other process using the disk. Actually I substract the reading disk time from the total time. So, I guess it is not the problem. Please give me another possibility of the problem? Thanks
Are you measuring the times in a single run of the program or in successive runs? In the former case, the cache might be warmed up after the first measurement, so later measurements will be shorter.
Can't really think of anything else.
I did it successively through a linux script. Before each experiment unit, I clear cache memory using "sync && echo 3 > /proc/sys/vm/drop_caches"
Well, I'm talking about the CPU cache, not the disk cache. You've already said you omit the input time from the total, anyway.

I'm out of ideas, then.
What does the time(1) report for real/user/sys?
Does the processor time (measured with std::clock() ) show a wide variance?

> I substract the reading disk time from the total time. So, I guess it is not the problem.

Not relevant here perhaps; but to clear the disk cache, see: http://stackoverflow.com/a/14614407
Thanks JLBorges,

> What does the time(1) report for real/user/sys?
> Does the processor time (measured with std::clock() ) show a wide variance?
I don't know what to do to answer these question? I am quite new to C++. Would you please help me, what to do to get the answer of this question.

Thanks for the link. I will implement it
time(1)
The time command runs the specified program command with the given arguments. When command finishes, time writes a message to standard error giving timing statistics about this program
http://linux.die.net/man/1/time


std::clock() http://en.cppreference.com/w/cpp/chrono/c/clock

Here is an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <ctime>
#include <vector>
#include <random>
#include <algorithm>

int main()
{
    // generate a vector containing 16 million random numbers
    const std::size_t N = 16'000'000 ;
    std::vector< std::size_t > numbers(N) ;
    std::generate_n( numbers.begin(), N, std::mt19937( std::time(nullptr) ) ) ;
    
    // sort ascending on last digit, print out the processor time used for the sort
    {
        const auto start = std::clock() ;
        std::sort( numbers.begin(), numbers.end(), [] ( auto a, auto b ) { return a%10 < b%10 ; } ) ;
        const auto finish = std::clock() ;
        std::cout << ( finish - start ) * 1000.0 / CLOCKS_PER_SEC << " processor milliseconds\n" ;
    }
}

clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors -omy_program main.cpp -lsupc++ #compile and link (create my_program)
time ./my_program #run the program 'my_program' and when it finishes, display timing statistics


490 processor milliseconds

real	0m1.307s
user	0m0.796s
sys	0m0.236s
Thanks for the explanation.

I use someone's code to measure the time.

// begin to counting
inline void CUtility::startCT()
{
m_suspended = 0.0;
gettimeofday(&qStart, NULL);
}


// stop counting
inline double CUtility::endCT(const bool show)
{
gettimeofday(&qEnd, NULL);
double qTime = (double)(qEnd.tv_usec - qStart.tv_usec) / 1000 +
(double)(qEnd.tv_sec - qStart.tv_sec) * 1000;

qTime -= m_suspended;

if ( show )
fprintf( stderr, "%.3lf #Process Time (ms) ( %.2lf s)\n" ,qTime, qTime/1000 );

return qTime;
}

Whenever i want tomeasure the time, I call startCT(), call the function, call endCT(). So far I get the result similar with actual time (in seconds)
gettimeofday() measures elapsed wall clock time "affected by discontinuous jumps in the system time" http://linux.die.net/man/2/gettimeofday

This would not be appropriate for measuring the performance of the algorithm. For instance, the process may be waiting for processor time-slices; the NTP daemon may call ntp_adjtime()

The POSIX function clock_gettime() with a suitable clock (CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID) could be used. This may give a better resolution than that provided by the standard C function clock().
http://linux.die.net/man/2/clock_gettime
Hi,

After trying many things, I found that I can get consistent results if I put "sleep 60" in between experiment unit in the linux script that call the program. Would someone please help me with the explanation of how it can happen? What is the effect of this "pausing" to the CPU and/or disk cache so that I can get consistent results? Thanks
> I found that I can get consistent results if I put "sleep 60" in between experiment unit
> What is the effect of this "pausing" to the CPU and/or disk cache so that I can get consistent results?

I presume that you are continuing to measure the elapsed wall clock time with gettimeofday(). This will be roughly equal to the processor time used if the system is in a reasonably quiescent state otherwise. My guess is that the "sleep 60" gives some time to the kernel and the storage driver threads to finish purging and flushing the disk cache; when it runs, your program is not pre-empted, and does not have to wait for processor time slices.
Topic archived. No new replies allowed.