Compile time array generation as global value

Pages: 12
Can I generate my array and place it like a global array?
Should I leave the array empty at first?

The reason: I had a very large array, it should be a global, so that functions can get access to that.
And I want to push this build_array function done into the compile time.
And I don't want to give the entries by hand, because the array is large and computation-able by a little function.

Yes I need Runtime performance. So I push as many as possible things to compile time.



Example:
Assumed we have int array[256].
At index 0, we have FFFFFF00 (-256) in array[0]
At index 1, we have FFFFFF01 (-255) in array[1]
At index 2, we have FFFFFF02 (-254) in array[2]
At index 3, we have FFFFFF03 (-253) in array[3]
Such simple work around but too much effort for giving one by one by hand.
I hope this "simple computation" is done in the compile time.

1
2
3
4
5
6
7
8
int array[256];
int main() {
  // so build_array is done at the compile time
  // if program runs (run time), we can get access to this array, 
  // which is fulfilled in the compile time
  access_index_of_array();
  array[0]; // should be here -256
}

Addtional quesion: in this case should my array be const, if I build it I would not change it. But if I set it as const, how can I build it?
My question refers here: http://www.cplusplus.com/forum/beginner/249023/.
Last edited on
you can make anything global.

there are multiple ways to control globals to avoid problems.
you can wrap it as a static member of a class, and/or put it in a namespace, as 2 examples of improvement over pure global. I do not consider constants to be variables, so I am fine personally with global constants, but they still should be in a namespace in a program of any real size.

what you do to make a constant array of computed values is make a little junk program that writes the C++ code to initialize it. Then just copy and paste those values into the real program.



jonnin wrote:
make a little junk program that writes the C++ code to initialize it.

Me not understand?
You could declare the array as constexpr and use a constexpr function to create the array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <array>

constexpr int array_size = 256;

constexpr auto create_array()
{
	std::array<int, array_size> array {};
	for (int i = 0; i < array_size; ++i)
	{
		array[i] = i - array_size;
	}
	return array;
}

constexpr auto array = create_array();


Or, if you don't want to introduce a separate function you could use an immediately-invoked lambda.

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <array>

constexpr int array_size = 256;

constexpr auto array = []
{
	std::array<int, array_size> array {};
	for (int i = 0; i < array_size; ++i)
	{
		array[i] = i - array_size;
	}
	return array;
}();
do what he said.
I was saying write a program that writes the variable initialize to a constant {1,2,3} format output and paste it in. I haven't gotten so deep into the constexpr.
But... why do all of this at all? Why not just fill the array in the main function? It's just being filled with integer constants no computation involved.. Me is confusionnn.

Grime wrote:
I don't get it.. Whether the computation is exclusively marked as to be done at compile time or whether the assigning is left to be done at run-time, what is the difference?

Both should do the same thing right? Both would produce assembly code that would say "allocate memory and put value so and so".

Computations can be done at compile-time but you can't allocate and assign values at compile time. So what is the difference between a for-loop and using templates and constexpr? Do they at all even produce different results?

Somebody pls enlighten me. 0_0
quote from: http://www.cplusplus.com/forum/beginner/249023/
I don't know why the OP wants this done at compile time but there are two main motivations in general.

1. Runtime performance
If the array values has been calculated at compile time it doesn't need to do the looping, calculating, and assigning of the values at the start of the program. Instead the precomputed values are just read in when the program is loaded. The compiler might also have an easier time optimizing the code since it knows what the values are and that they cannot change.

This might however slow down the compilation time and make the executable file bigger, so it's a tradeoff.

2. Things that require constexpr values
There are things that you can only do with constexpr values, such as specifying the size of a local array, the passing of non-type template arguments, and to calculate other constexpr values.
Last edited on
If it were an optimizing compiler, for OP the values would probably be evaluated compile time.

Assume no optimizing happens. Now we have to do extra work that is take integer from an address and also we have an extra task for 'jump' because it's a loop. I know it's probably more complex than that but that's about all assembly I know. So bare with me.

Now instead, we are defining a template function. Template functions can be called at run-time too. So we're adding those 10 more lines of assembly which we were trying to avoid.

You get what I'm saying right? What is being gained by using compile time deduction in THIS case?
I am writing a database project, which acquire a lot of performance in run time.
This array is like a loop up dictionary and is not changeable.
If it were an optimizing compiler, for OP the values would probably be evaluated compile time.

This is not what I observe when comparing constexpr without constexpr in Compiler Explorer (https://godbolt.org/z/CtrApn). The consexpr version generates much less code.

I don't think compilers generally go all in and try to precompute large data structures like this because it can turn out to be impossible, or add too much to the code size, or take too much time to be worth it.

Now instead, we are defining a template function. Template functions can be called at run-time too. So we're adding those 10 more lines of assembly which we were trying to avoid.

There is no need for the compiler to generate assembly instructions for templated/inline/constexpr functions unless they are actually needed at runtime.
Last edited on
Thanks Peter. How does the compiler know whether a constexpr or templates function is going to be used at runtime or not?

By checking if variables are passed?

Is it guaranteed by standard that the template function won't be generated if it's not required?
> I am writing a database project, which acquire a lot of performance in run time.
An array initialised just ONCE at program startup will have just the same look-up performance as an array generated at compile time and then loaded into memory.

So now you have another problem. Your program image on disk is now megabytes larger.
https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics
Great, you replace nanosecond instruction and memory access times of your 'little initialisation function' with microsecond (*1000 times worse) disk transfer speed and millisecond (*1000000 times worse) disk seek times.



This whole exercise seems like premature optimisation disease. You're trying to optimise a problem you have no idea whether it's even a problem, the cause of a problem or just a side effect of a much bigger problem.
salem c I'm confused. Seek from what? RAM? Both should allocate the same 'data' and seeking from RAM takes constant time.. right.. Just that the constexpr has an extra function. But how should a function that was not called at run time affect the execution of the rest of the instructions for the program?
Grime wrote:
How does the compiler know whether a constexpr or templates function is going to be used at runtime or not?

What inline functions, constexpr functions and function templates has in common is that they need to be defined in each translation unit that use them. If the compiler sees that the function is not needed within the translation unit (because it's not called, or it's only called at compile time, or it's always inlined) it doesn't have to emit the code. Other translation units might need the function but since the function definitions are always available and will be recompiled for each translation unit those translation units can always emit the function code on their own if necessary. This might lead to multiple copies of the same function, but don't worry, they will be merged in the linking step.

Grime wrote:
Is it guaranteed by standard that the template function won't be generated if it's not required?

A function template will not be instantiated for a type unless it is used with that type. This means you don't need to worry that your function template that expects an integer type will suddenly fail to compile because it was instantiated with a string type unless you actually use it with a string type.

1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>

template <typename T>
T square(T value)
{
    return value * value;
}

int main()
{
    std::cout << square(2); 
}

In the above example square<int> will be instantiated, but square<std::string> is guaranteed not to be instantiated because square is never used with a string. This is fortunate because the instantiation of square<std::string> would have lead to a compilation error.

Whether or not the code for square<int> is actually included in the binary is up to the compiler. The C++ standard doesn't go into this much detail. With optimizations turned on the compiler will probably inline the call to square<int>, because it's small, so there is no need for it to emit code for square<int> as a separate function.

EDIT: I changed sum to square to get something that actually fails to compile with std::string.
Last edited on
So in that case the template for int is indeed generated for runtime use.

Can we explicitly make a function that is used only for compile time and doesn't exist at run time?
> salem c I'm confused. Seek from what? RAM?
No, seek from disk.

Somewhere buried in the OS is the program loader, which at it's heart is going to be some code to open your executable file and read all those bytes into memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ cat foo.cpp
#include <iostream>
int a[1000000] = {
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    0,1,2,3,4,5,6,7,8,9,
    // rest are 0
};
int main()
{
    std::cout << "hello\n";
}
$ g++ foo.cpp
$ size ./a.out 
   text	   data	    bss	    dec	    hex	filename
   1946	4000616	    280	4002842	 3d141a	./a.out
$ ls -l a.out 
-rwxrwxr-x 1 sc sc 4009088 Jan 25 08:49 a.out


vs

1
2
3
4
5
6
7
8
9
10
11
12
$ cat foo.cpp
#include <iostream>
int main()
{
    std::cout << "hello\n";
}
$ g++ foo.cpp
$ size ./a.out 
   text	   data	    bss	    dec	    hex	filename
   1946	    600	    280	   2826	    b0a	./a.out
$ ls -l a.out 
-rwxrwxr-x 1 sc sc 9048 Jan 25 08:49 a.out


That process of physically reading the executable from disk is going to be a lot quicker when the executable is measured in KB and not MB.
So it comes down to the fact that it takes longer to load the program. Would loading a[1000000] the array take longer than computing a[1000000] the array (assigning values from 1 to n)? Interesting.

edit:

but wait so does that mean

a[1000]
for(int i=0; i<1000; i++)
a[i] = i+1;

is better than

a[1000] = {1, 2, 3,..}??????

But then the for-loop has some additional computations. So how do you strike a balance between loading and computing?

The constexpr for OP is the same as in the second case.. right?
Last edited on
Grime wrote:
So in that case the template for int is indeed generated for runtime use.

Well, it depends on how you mean. My point is that it might end up having no runtime overhead at all.

Grime wrote:
Can we explicitly make a function that is used only for compile time and doesn't exist at run time?

C++20 adds consteval for this purpose.
https://en.cppreference.com/w/cpp/language/consteval
Last edited on
> So it comes down to the fact that it takes longer to load the program. Would loading a[1000000]
> the array take longer than computing a[1000000] the array (assigning values from 1 to n)? Interesting.

Since we have "And I don't want to give the entries by hand, because the array is large and computation-able by a little function", and "if I build it I would not change it."
the most efficient option may be to avoid creating a large array of constants (compute the values in situ, at compile-time where appropriate).

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <cstddef>
#include <stdexcept>

template < std::size_t SIZE = 1'000'000, int FIRST = -256 > struct array_like
{
    constexpr std::size_t size() const { return SIZE ; }

    // I don't want to give the entries by hand, because the array is large and 
    // computatable by a little function. 
    // (the following little function, as an example)
    constexpr int operator[] ( std::size_t pos ) const { return FIRST+pos ; }

    int at( std::size_t pos ) const
    { return pos < SIZE ? FIRST+pos : throw std::out_of_range( "bad array index" ) ; }
};

int foo()
{
    constexpr array_like<> array ;

    constexpr int i = array[0] ; // -256
    constexpr int j = array[72] ; // -184
    
    return i+j ; // return -440 
}

int foo_rewritten_by_the_compiler() { return -440 ; }

int bar( std::size_t pos )
{
    constexpr array_like<> array ;
    return pos < array.size() ? array[pos] : 0 ;
    // return pos < 1'000'000 ? pos-256 : 0 ;
}

int bar_rewritten_by_the_compiler( std::size_t pos ) 
{ return pos < 1'000'000 ? pos-256 : 0 ; }

https://gcc.godbolt.org/z/3WW-Lv
> But then the for-loop has some additional computations.
How many for loops do you think your xGhz machine can do in the time it takes for a disk to visit all the sectors of your bloated executable?

Back of the envelope calculation, with some idealised (but still representative) numbers for easy calculation.
1. Disk spins at 6000 rpm, which is 100 times per second (10 milliseconds per revolution).
2. Disk has 1000 sectors per track, so the head sees a new sector every 10 microseconds.
So a single track would have approx 0.5MB of data per track.

Ah, but that 4MB executable I showed would need a full 8 tracks to store, and thus a full 80 milliseconds to read in.

And this ignores any seek times (which can be well into milliseconds) as well.

Compare that to the ~1mS it would take a 1Ghz machine to run a simple for loop 1 million times.

It's a different calculation if you have a fast SSD instead of a spinning disk, but you're still transferring lots of data over a bus that is slower than DRAM.
Pages: 12