Architecture help

Hi,
We have a simulation running environment where it reads the data from text file(Hex data) and runs data on algorithm.
At present the code looks like this.

Simulation Object directly calls the algorithm functions.

Now we are going to enhance this to running the simulation for different versions of algorith.

The input data is very huge(In Gigabytes). So we have to read the data only once and run the simulation algorithm for different versions of algorithm available.

The scenario might look like this
Simulation Tool
Algorithm Version-1
Algorithm Version-2
Algorithm Version-3
.
Algorithm Version-n

Can someone help on high level architecture for this?
Thanks in advance.
Last edited on
To store the data I would recommend to store it in a different class and pass a reference to the algorithms.
To choose different algorithms you might consider the Strategy Pattern,
https://en.wikipedia.org/wiki/Strategy_pattern
Thanks for your reply.
In case of strategy I can run for only one algorithm, but here i need to run the collected data on all different versions of the algorithm available.
One question is how is the data consumed by the algorithms? Do the algorithms make a single pass on the data, processing 1 value (or block or chunk) at a time before moving on to the next, never to return to previous values? Or, do the algorithms need to look forward and back, jumping around the data such that some of it needs to persist in memory?

Assuming single pass algorithms, I would create a vector of algorithm subclass objects. I would then read in data one value (block / chunk) at a time and give the data as input to each of the algorithms to process. All of the algorithms would run in parallel, and the data would only be read once.

Assuming the need for persistent memory, you will have to read in the data and store it in memory somehow.
does algorithm change input data?

how many gigs? Current PCs have 32-64GB on the average, some have much more, a few have less... and is this running on a PC or a server/workhorse that has even more (TB+ machines..?).

A few GB is a non-issue these days. I routinely open 4-10 GB xml files in a single chunk on 64 GB machines and just process that in memory. Are you talking a few GB or hundreds / thousands of GB? Give us a sense of scale vs the hardware involved.

Hopefully your simulator is something like:
1
2
3
while (read some data) {
    call some algo functions
}


Step 1: modify the algo functions so that they are virtual methods of a base algo class. You want to be able to do this:

1
2
3
4
5
6
7
class Algo1 : public Algo {
    // override virtual methods
};
Algo1 *myAlgo = new Algo1;
while (read some data) {
   call myAlgo->call_virtual methods;
}

Deal with the memory management however you like. You'll see why myAlgo is a pointer in a moment.

Once this is working:

Step 2: Create the other algos as other classes:
1
2
3
4
5
6
7
8
9
10
11
12
13
class Algo2 : public Algo;
class Algo3 : public Algo;
// etc.

vector<Algo *>algos;
algos.push_back(new Algo1);
algos.push_back(new Algo2);
algos.push_back(new Algo3);
while (read some data) {
    for (auto algo : algos) {
        algo->call_virtual_methods;
    }
}


As others have mentioned, be sure that the algos don't interfere with each other or with the input data.
Sorry for not putting my question properly.

Simulation tool has to scan the specified directory and based up on the number of algorithm available it has to process on the read data.

I don't know how to handle this during the run time.

Any help ???
Last edited on
Any suggestions?
Topic archived. No new replies allowed.