Generate Code Depending on User Input

Pages: 12
Is there a way to generate code depending of user input?

For example, if I have this code:
1
2
3
4
5
6
7
8
switch (a)
{
	case 1: a(); break;
	case 2: b(); break;
	case 3: c(); break;
	case 4: d(); break;
	case 5: e(); break;
}


Now, the user input this:

"2,5,4,2"

Which will be reused millions of times, is there a way to take that, and generate code directly like this:

1
2
3
4
5
// Generated code:
b();  // from 2
e();  // from 5
d();  // from 4
b();  // from 2 


Without going to the switch statement every time?
You have that code. You compile a binary. N users run the same binary, with different inputs. They cannot change the binary.

You did not know that the input will be 2542 millions of times. If you had known, you would have written and compiled a different program. A program less suitable for other inputs.


Nevertheless, the question that you should ask is: does it matter? Is the switch statement computationally so heavy that replacing it would gain you substantial benefit? Is that really the bottleneck of your code?

There might be alternative constructs to the switch that are slightly more efficient.
Yes, it is important, without this optimization, it will be uselessly slow. The standard way to do this is to write an emitter, which involves reading the input, and emitting assembly code blocks.

Of course, that is not portable, so I wish for a way to do this directly in c++ - if possible - so I don't have to write an emitter for every supported architecture.
Last edited on
Don't know if this would be measurably faster than the switch-case; need to measure the performance:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <vector>

template< char TAG > constexpr void fn() { std::cout << TAG ; }

struct call_sequence
{
    using fn_type = void() ;

    call_sequence( const std::vector<std::size_t>& user_input )
    { for( std::size_t v : user_input ) call_seq.push_back( functions[ v%N ] ) ; }

    void operator() () const { for( auto fn : call_seq ) fn() ; }

    std::vector<fn_type*> call_seq ;
    static constexpr std::size_t N = 6 ;
    static constexpr fn_type* functions[N] { &fn<'a'>, &fn<'b'>, &fn<'c'>, &fn<'d'>, &fn<'e'>, &fn<'f'> };
};

constexpr call_sequence::fn_type* call_sequence::functions[call_sequence::N] ;

int main()
{
    const call_sequence fn( { 3, 4, 2, 0, 0, 5, 1 } ) ;
    for( int i = 0 ; i < 5 ; ++i ) { fn() ; std::cout << '\n' ; }

    std::vector<std::size_t> seq ;
    std::cout << "enter sequence of calls: " ;
    std::size_t n ;
    while( std::cin >> n ) seq.push_back(n) ;

    const call_sequence fn2(seq) ;
    for( int i = 0 ; i < 5 ; ++i ) { fn2() ; std::cout << '\n' ; }
}

http://coliru.stacked-crooked.com/a/84028b243467736c
http://rextester.com/HYSCP46938
If the input is known only at run-time, there's no portable way to generate code. You'd have to invoke some compiler or interpreter just-in-time or use some other similar technique. Any sort of code generation technique is useless if there's no way to execute the generated code: you need eval for that.

If the range of inputs is really small, you could generate each permutation of a sequence of calls (similar to JLBorges' post) and dispatch to one of those -- although that's clearly not scalable. With only n functions you're looking at factorial(n) different permutations. Not good, neither for code size nor compilation speed.

I expect with clever enough optimization (see, e.g., at the arXiv https://arxiv.org/abs/0902.1038 ) you could reduce the number of required call sequences substantially at the cost of adding more indirection.

This could (theoretically) be done using templates if you're a masochist and have a lot of memory and free time, but I think a better approach to this solution would be to emit C++ using another program before your build, and compile the output into your program.
Last edited on
Yeah, I expected as much unfortunately. My interpreter is 5000+ lines, not something you'd use these techniques with. There are other techniques other than just in time compilation, such as ahead of time compilation, not a real option since it can take 15 min to do so, and the temporary file is quite huge.

Multiple emitters it is then, just need to minimize the architecture specific parts.
closed account (48T7M4Gy)
C# had/has "Reflection Emit" which "supports the dynamic creation of new types at runtime". Who knows what C++ can do in a similar vein? I don't.
You might get better help if you describe what your end goal is.

If your problem is a interpreter loop's opcode dispatch, there are ways to significantly increase the speed of the inner loop, including one particular technique that borrows from the concept of continuation passing (called "handler chaining") that can eliminate most of the dispatch overhead, replacing it mostly with indirect function invocations.
Last edited on
I am aware that there are many ways to optimize interpreters, but no matter how optimized an interpreter gets, it won't reach the speed of direct binary translators and recompilers. Still, I've seen like a dozen or three ways to optimize an interpreter, if you went through this before, and have an idea of which one is superior, I'd appreciate a recommendation.

If there is a way to implement a recompiler completely in c++, that would be the ideal solution, but I've went through a lot of programs source code, and never seen anything like that, so I asked here to confirm that it is not possible.
closed account (48T7M4Gy)
so I asked here to confirm that it is not possible
@Houd, no harm in asking, but it looks like you've drawn a blank :(
OP: even the switch cases as you have them now won't be able to run more than one function at a time because of the break statements. Here's a suggestion based on std::shared_future with a deferred launch policy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <iostream>
#include <future>
#include <vector>
#include <limits>

void func1(){std::cout << "One \n";}
void func2(){std::cout << "Two \n";}
void func3(){std::cout << "Three \n";}
void func4(){std::cout << "Four \n";}
void func5(){std::cout << "Five \n";}

std::shared_future<void> f1 (std::async(std::launch::deferred, func1));
std::shared_future<void> f2 (std::async(std::launch::deferred, func2));
std::shared_future<void> f3 (std::async(std::launch::deferred, func3));
std::shared_future<void> f4 (std::async(std::launch::deferred, func4));
std::shared_future<void> f5 (std::async(std::launch::deferred, func5));
//can use auto, using (typedef) to reduce typing

std::vector<std::shared_future<void>> vec_f {f1, f2, f3, f4, f5};

void runFunctions();

int main()
{
    bool fQuit = false;
    while (!fQuit)
    {
        std::cout << "1. Run functions \t2. Exit \n";
        int choice;
        std::cin >> choice;//input validation here and throughout
        std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
        switch (choice)
        {
             case 1:
                runFunctions();
                break;
            case 2:
                fQuit = true;
                break;
            default:
                std::cout << "Wrong choice, try again \n";
                break;
        }
    }
}
void runFunctions()
{
    std::cout << "Enter # of functions to run \n";
    int runs{};
    std::cin >> runs;
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    std::vector<int> funcChoice;

    for (int i = 0; i < runs; ++i)
    {
        std::cout << runs - i << " choices left, enter function to run (1-5) \n";
        int choice;
        std::cin >> choice;
        funcChoice.push_back(choice);
    }
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    for (const auto& elem : funcChoice)
    {
        vec_f[elem - 1].get();
    }
}

Sample Output
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

1. Run functions        2. Exit
1
Enter # of functions to run
3
3 choices left, enter function to run (1-5)
2
2 choices left, enter function to run (1-5)
4
1 choices left, enter function to run (1-5)
5
Two
Four
Five
1. Run functions        2. Exit
2

Process returned 0 (0x0)   execution time : 18.830 s
Press any key to continue.

I should have clarified that the reason I need this is performance, most/good compilers implement long switch statements as a combination of jump tables and other techniques, which is usually pretty damn efficient. What you are doing is just reimplementing it in a different way, and it was 10% slower than a switch, and if the cases were much more, I'd wager that number would be much higher as well.

I've seen people implement function pointers or something else as a switch replacement, just to find it run slower than the code the compiler produced from the switch statement.
Last edited on
You could certainly write a program that emits the code for b(); e(); d(); p(); and then compile it. But does the code need to be generated in real-time or is this an offline question?

If this is a translator and b(), e(), e() and p() are functions to implement individual instructions then you really want to emit the code inline. That way the optimizer can get rid of all the functionality that you don't actually need (like setting flags that are never tested). Be careful though. First you need to be certain that the code sequence always executes a single block. If the program can jump into the middle of that block then the optimization will be invalid.
But does the code need to be generated in real-time or is this an offline question?


Real time.

See
http://www.emulators.com/docs/nx25_nostradamus.htm


Thanks, I've seen this long time ago and completely forgot about it.
Ok, here is how it is done - properly I hope? - if anyone is interested.

This will generate code, compile it as a dll, load the dll and run it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
#define _CRT_SECURE_NO_WARNINGS

#include <iostream>
#include <fstream>
#include <random>
#include <cmath>
#include <string>
#include <chrono>
#include <vector>
#include <windows.h>


using namespace std;

// Instructions implementation
void add(int &a, int &b, int &c) {  c = a + b; }
void sub(int &a, int &b, int &c) { b = c - b; }
void mult(int &a, int &b, int &c) { a = c * b; }
void divide(int &a, int &b, int &c) { if (a!=0) c = b / a; }
void inc(int &a, int &b, int &c) { a++; b++; c++; }
void print(int c) { cout << c << endl; }


typedef void(*funcPointer)(int&,int&,int&); // function pointer used to get the function from the dll

int main()
{
	int a = 50;  // some random initialization
	int b = 200;
	int c = 3;

	std::random_device seedDevice;
	std::mt19937 rEngine(seedDevice());
	std::uniform_int_distribution<int> InstNumDist(100000, 200000); 
	std::uniform_int_distribution<int> InstOpcodeDist(1, 5);

	int numberOfInst = InstNumDist(rEngine);

	vector<int> input;

	for (int i = 0; i != numberOfInst; i++) // emulate user input with random numbers
		input.push_back(InstOpcodeDist(rEngine));
        

        /// Interpreter

	auto interpreterStart = chrono::steady_clock::now();
        
	for (int i = 0; i != input.size(); i++)
	{
		switch (input[i])
		{
		case 1: add(a, b, c); break;
		case 2: sub(a, b, c); break;
		case 3: mult(a, b, c); break;
		case 4: divide(a, b, c); break;
		case 5: inc(a, b, c); break;
		case 6: print(c); break;
		}
	}

	cout << "Interpreter: " << endl << "a = " << a << endl; 	cout << "b = " << b << endl; 	cout << "c = " << c << endl;

	auto interpreterEnd = chrono::steady_clock::now();
	cout << "Interpreter took " << chrono::duration <double, milli>(interpreterEnd - interpreterStart).count() << " ms" << endl;


	/// Recompiler:

	a = 50;  // Same initial state
	b = 200;
	c = 3;

	ofstream CompileFile;
	CompileFile.open("GeneratedCode.cpp");
	CompileFile << R"(
	#include <iostream>
	extern "C"
	{
		void __declspec(dllexport) GeneratedCode(int &a, int &b, int &c)
		{
	
         )" << '\n';

        // Code Generation
	for (int i = 0; i != input.size(); i++)
	{
		switch (input[i])
		{
		case 1: CompileFile << "c = a + b;\n"; break;
		case 2: CompileFile << "b = c - b;\n"; break;
		case 3: CompileFile << "a = c * b;\n"; break;
		case 4: CompileFile << "if (a != 0) c = b / a;\n"; break;
		case 5: CompileFile << "a++; b++; c++;\n"; break;
		case 6: print(c); break;
		}
	}

	CompileFile << R"( 
		}
	}	
        )" << '\n';

	CompileFile.close(); // call before compiling

	auto compilerStart = chrono::steady_clock::now();
       
        // compile the generated code as dll
	system(R"(call "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat"  && cl /LD GeneratedCode.cpp)"); 

	auto compilerEnd = chrono::steady_clock::now();
	cout << "Compilation took " << chrono::duration <double, milli>(compilerEnd - compilerStart).count() << " ms" << endl;

	HINSTANCE calledCode = LoadLibrary("GeneratedCode.dll");

	if (!calledCode) {
		cout << "Import dll failed" << endl;
		system("pause");
		return EXIT_FAILURE;
	}
	else 
		std::cout << "Import dll succeeded" << std::endl;

	funcPointer fp = (funcPointer)GetProcAddress(calledCode, "GeneratedCode"); // get the function from the dll

	if (!fp)
	{
		cout << "dll function get failed" << endl;
		system("pause");
		return EXIT_FAILURE;
	}

	auto recompilerStart = chrono::steady_clock::now();

	fp(a,b,c); // execute the dll function

	auto recompilerEnd = chrono::steady_clock::now();
	cout << "Recompiler took " << chrono::duration <double, milli>(recompilerEnd - recompilerStart).count() << " ms" << endl;


	cout << "Recompiler: " << endl << "a = " << a << endl; 	cout << "b = " << b << endl; 	cout << "c = " << c << endl;

	system("pause");
	return 0;

}


Here is the output:


Interpreter:
a = 19
b = 4
c = 1
Interpreter took 74.4546 ms

*several lines of compiler log*

out:GeneratedCode.dll

Compilation took 12320.2 ms
Import dll succeeded
Recompiler took 3.27327 ms
Recompiler:
a = 19
b = 4
c = 1


Interpreter = 74.4546 ms, recompiler = 3.27327 ms, that's an insane improvement, but ... compilation time = 12320.2 ms. Pretty fast after a huge initial step, don't know how practical this will be with this huge of an initial step.

Not very portable, but the parts that need to be rewritten to port it are MUCH less than porting an assembly JIT.
Last edited on
It's good to see you figured it out.
...Hopefully you won't run into much self-modifying code :)
Well, that's a problem for another month, or year, if ever ʕ•ᴥ•ʔ . Not terribly worried about that, self-modifying code is not a very common thing thankfully, so you can reasonably work with them case by case.

Plus, this method isn't static, even a general solution is possible with checks, hopefully with not too much stutter.
So to recap, you executed 100,000-200,000 statements (why a random number). With interpreted code it took 78ms and with compiled code it took a total of 12,323ms. That's 157 times slower with compiled code. Also, you're ignoring the time to load the DLL, which I'll guess will be 10's to 100's of milliseconds.

Try running it with different (fixed) input sizes. I think you'll find that the time to compile a statement is always a lot longer than the time to execute it with the interpreter. So unless you're going to execute the compiled code many many times, the interpreter is faster.

Also, the compiler solution assumes that the compiler is available on the target machine, not just the development machine.
Not quite, the one time compilation is the 12320.2 ms, running it after that takes just 3.2 ms, around 23 times faster than the interpreter, and yes, I'll be running it millions if not billions of times, so it is definitely worth it. Random number is their because code blocks lengths are variable from each other, even if they are usually much shorter than that.

Plus, I plan on running the compilation stage on a different thread, while the main thread runs on the interpreter on the untranslated blocks, until the block is translated, then it will shift to the recompiler once it is done. Hopefully the translation thread will finish the block before the main thread even reaches it.

I ignored the dll loading because - mainly, I forgot, but also - 99% of the time of the one time stage comes from the compilation itself, but yeah, I should have put it in just for completion sake.

About the compiler on client machine, I'm not sure which way I'll do it yet, but it will be either a portable - and hopefully a faster lightweight - compiler that will be included, or will just make the Microsoft Visual C++ Redistributable Package a requirement on PC.
Pages: 12