Storing related inputs from four datafiles

Hey Guys, I am at the final stage of a model I am working on, where I am optimizing the variance, to get the best parameter fit for my results compared to clinical data sets. I am using Powell's optimization method, which I already got to work with a simple case with just four data file each having one row each.

However, my goal is the solve the Runge-Kutta method for each dataset(with parameter values from four different input files with the same number of rows (Dataset)), optimize the model-parameters, then increment the variance as appropriate.

My challenge now is that I have 4 files that contain parameters unique to each data set and each line in these files represents a data set. These files have the same number of rows which is the total number of datasets that I have, but a varying number of column, depending on the parameters that each represents.

Say;

X0.dat: Contains the 30 initial values for the dependent variables on each row: which I have as thirty valarray elements


X0 X1 X2 ............ X30

X0 X1 X2 ............ X30

X0 X1 X2 ............ X30

X0 X1 X2 ............ X30

X0 X1 X2 ............ X30

X0 X1 X2 ............ X30



Inf.dat: Contains the 'ith' species, in location 'j', the value of I[ij], the time it starts, t_start, and the time it stops, t_stop. I have made all other I[i*j] = 0 except the one declared in each row of this file.

Again my Inf.dat file with the same number of rows, as the X0.dat file above



i j Iij t_start t_stop \\All for data set 1

i j Iij t_start t_stop \\ Dataset 2

i j Iij t_start t_stop

i j Iij t_start t_stop

i j Iij t_start t_stop

i j Iij t_start t_stop



time.dat file, which is a set of file I have been able to digitize from each clinical data set, at which points the concentrations were taken. I have all the timepoints as a vector of times. Here, the first entry of each row is the number of data points in the dataset, and the vector is populated after.



8 t0 t1 t2 t3 t4 t5 t6 t7 \\All for data set 1

4 t0 t1 t2 t3 \\ Dataset 2

5 t0 t1 t2 t3 t4

6 t0 t1 t2 t3 t4 t5

2 t0 t1

8 t0 t1 t2 t3 t4 t5 t6 t7



Lastly, I have a set of experimental concentrations, from the clinical data for which has the same number of row and column as the time file above. In this file, each row represents a clinical data set and the first entry on each row is the species number (0-29 from my X0), whose concentrations in time was available from that dataset.

Xclin.dat


1 X1(t0) X1(t1) X1(t2) X1(t3) X1(t4) X1(t5) X1(t6) X1(t7) \\Dataset 1

1 X1(t0) X1(t1) X1(t2) X1(t3) \\ Dataset 2

3 X3(t0) X3(t1) X3(t2) X3(t3) X3(t4)

2 X2(t0) X2(t1) X2(t2) X2(t3) X2(t4) X2(t5)

4 X4(t0) X4(t1)

5 X5(t0) X5(t1) X5(t2) X5(t3) X5(t4) X5(t5) X5(t6) X5(t7)



Currently, I can do the single dataset case, solve the RK4, and optimize the transfer variable (not included here), but i need your kind suggestions as to how can do this for all the data sets in each file.

I have tried a vector of structs, but I can only store the row entries of a single, file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>
#include <fstream>
#include <iomanip>
#include <valarray>
#include <vector>
#include<string>
#include<sstream>

using namespace std;

typedef valarray<double> val;

void read_X(ifstream &infile, val(&x));
void read_I(val(&I));

struct Inf
{
    int i, j;
    double Iij;
    double ton, toff;
};

int ijIndex( int i, int j )   // Casting Dimensions from 2 to 1
{
    return 5 * ( i - 1 ) + ( j );
}


istream& operator >> (istream &is, Inf &s)
{
    string line;
    getline(is, line);
    istringstream ss(line);
    ss >> s.i >> s.j >> s.Iij >> s.ton >> s.toff ;
    return is;
}

vector<Inf> Xdata;

val Iij(20);

int main()
{
  Iij = 0;

  ifstream infile("Data.txt");

  Inf I;

  if(infile.is_open()){
     while (infile >> I) Xdata.push_back(I);
    }
   else cerr << "The file cannot be opened" << endl;

  for(Inf s : Xdata) cout << s.ton << " " << s.Iij << endl;

  infile.close();
  return 0;
}


data.txt

1   1  0.1204 0.0  1.170

1   2  0.142  1.0  1.170

1   3 0.523  5.0   5.170

1   4  0.200  2.0  4.170

2   1  0.4    3.0  4.170

2   2  0.1204 0.0  1.170

2   3  0.1204 0.0  1.170

2   4  0.1204 0.0  1.170

3   1  0.1204 0.0  1.170

3   2  0.1204 0.0  1.170

3   3  0.1204 0.0  1.170

3   4  0.1204 0.0  1.170

4   1  0.1204 0.0  1.170

4   2  0.1204 0.0  1.170

4   3  0.1204 0.0  1.170

4   4  0.1204 0.0  1.170


Basically, my algorithm would be:

** in main

1. Open all the files

2. call Powell's method

3. Takes the user-supplied number of datasets in the four files, say N;

4. Start i = 0;

5. Calls row i, in each of the 4 files and store their values as appropriate;

6. Solve the RK4 equation and update the tolerance.

5. increment i (i++)

6. When i == N stop and print the final value of tolerance;

7. Go to 5






Last edited on
@Kloppite,

I have to admit that I find your description of the problem unfathomable. I can't work out what files are the same for each test run and which are not. In the circumstances I take a risk posting anything on this topic. However, I can offer three general suggestions without any code. FWIW, I usually do the third if I have a lot of data cases to run.


Suggestion 1. (Not good if you have a lot of runcases).
Set up the input file(s) and output file(s) for each runcase in a struct, say:
1
2
3
4
5
struct Runcase
{
    string infile1, infile2, infile3, infile4;     // choose better names, and the appropriate number
    string outfile1, outfile2;
};

Then you can have a vector of Runcases:
vector<Runcase> runs;
and you can simply cycle through runs:
for ( Runcase R : runs ) { // do your processing for this case; }



Suggestion 2.
Read the names of the input and output files for each run case from file. Do your processing for each.



Suggestion 3. (Which is what I tend to do.)
Have an external batch file (Windows) or script file (unix) which, for each runcase you want to do:
- copies the relevant input files into those named by the program;
- runs the program;
- copies the output files produced by the program into appropriately named files for that run.

This way you separate your program (which doesn't change) from your data (which does).



I can't provide any code for any of this because I don't follow what you are trying to do.

Hi, lastchance.

Your first case above is probably what I will like to do. Ideally, i will like to use a bash script for the files (option 3), but the people I am working with insist having all the datasets pertaining to one parameter in a single file - divided into rows is a better idea.

As for what I am trying to do in code form, here is a basic idea with one line of input from the 4 files.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
#include<iostream>
#include<iomanip>
#include<fstream>
#include<valarray>
#include<vector>

using namespace std;

valarray<double>X(30);       // The dependent variables initial conc
valarray<double>Inf(30);     // The source items for all the dependent variables
vector<double>t_points;      // The timepoints in the clinical data
vector<double>P;             // This is my one-D vetor for Optimization
vector<vector<double>>Xopt;  // N * N vector of P above for use in Optimaztion



void conc(ifstream&infile, valarray<double>(&X));
void source(ifstream&infile, valarray<double>(&I));
void timefile(ifstream&infile, vector<double>(&t), int &n);


int main()
{

    ifstream input1("X0.dat");
    ifstream input2("Inf.dat");
    ifstream input3("time.dat");
    ifstream input4("Xclin.dat");


    Inf = 0.0;                         // Set all the source items to zero

    int n;                             // n = number of times

    if (input1.is_open() && input2.is_open()
        && input3.is_open())
    {
        conc(input1, X);                // Update conc
        source(input2, Inf);            // Update source
        timefile(input3, t_points, n);  // vector of times for each data sets,
                                        // and number of timepoints(n)
    }
    else
    {
        cerr << "File(s) not open" << endl;
        return 1;
    }


    int n_opt = 3;                      // This is an assumed number minimizable rate constants

    P[0] = 5.5, P[1] = 2.0, P[2] = 0.5; // The initial values of the rate constants

    var = 0.0;                          // which is supposed to update for all the datasets

     // Optim(P, Xopt, n_opt, var, min_Rk4(P));
    /* This function calls the min_RK4 function
       which in turn calls the RK4 functions where
       I have calculated the 30 dependent variables in time.
       Then minimized the desired number of rate constants Kijk's, in
       this case 3, and returns the updated value of var.
    */


    return 0;

}

//read the initial concentrations of 30 items
void conc(ifstream&infile, valarray<double>(&X))
{
    double xi;

    for (int i = 0; i < 30; ++i)
    {
        infile >> xi;
        X[i] = xi;
    }

}

void source(ifstream&infile, valarray<double>(&I))
{
    double Iij;

    for (int i = 0; i < 30; ++i)
    {
        infile >> Iij;
        I[i] = Iij;
    }

}

void timefile(ifstream&infile, vector<double>(&t), int &n)
{
    double ti;

    infile >> n;

    for (int i = 0; i < n; ++i)
    {
        infile >> ti;
        t.push_back(ti);
    }

}


X0.dat
0 1.0 0.0 ..... 0.0 0.0 0.0


Inf.dat
1 1 0.12 5.71 11.32


time.dat
8 -1 0 1.32 4.32 5.71 11.32 33.45 45.00 60.00



I want to store the inputs from each line of the files in the appropriate variable just before the call of Opti(), and be able to loop through the vector to get the inputs.

Would I be able to do this with your first suggestion? if yes, can you suggest an example that I can then implement to suit my case?

Thanks again.



Topic archived. No new replies allowed.