Replace RAM by HardDisk storage

Dear all,

I think my question seems stupid, but I dont have a plave to start.

In my simulation, I have a huge storage which is an std::vector<std::vector<std::vector<float> > > _Storage

Unfortunatly, this storage gets very large, such that it might far above a GB. So, I would like to export it to a file on the hard disk. I am aware of the fact, that this will cost me a lost of computation time.

I have to operations: read and write.
During a process, I read only from _Storage[0], but write (potentially) verywhere (except 0). Then, I do a cyclic shift [1]-->[0] .... [0]-->[end-1] and start again to read [0] etc.

Therefore, I think it is resonable to read [0] in the RAM, do my processing and store the results [1..end-1] dirctly to disc. But how do I do that?

Lets assume an example:
 
{[(0 1 2) (3 4 5) (6 7 8)] [(9 10 11) (12 13 14) (15 16 17)] [(18 19 20) (21 22 23) (24 25 26)]


- With which method do I write them one after another into a file, such that .....


- ... I can read and write, e.g. element (2,2,2) without having to read / write the others?

- It should be very fast (within the restriction to read/write to disk)

Sry for so much text, but I tried to make the problem clear.

regards,
curator
Last edited on
I fear problem is not very clear.

Shortly speaking, you can only hope to get decent performance from file storage if you read and write them sequentially. If you instead will read or write from different places, not in order - that would be very slow.

Such things need to be very accurately tuned anyway.


Unfortunatly, this storage gets very large, such that it might far above a GB

But you usually have memory swap for this case so that part of RAM will be stored to disk without asking you...

However, it would be better if you explain how much data you are trying to store exactly ("very large" does not explain anything)... Perhaps some way of optimization could still be found...
memory mapped files.
Hi and thanks for your replies.

rodiongork, the size of memory I need is given by an accuracy parameter. Currently, I think that the worst case is something about 5 TB, more common cases are in the region of approx. 500 GB.

It try to write a bit of code, that explains my procedure. That is not tested, so please forgive me some syntax errors, but the original code is by far to complex:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
std::vector<std::vector<std::vector<float > > > _Storage
_Storage.resize(1000)
for (unsigned i = 0;i<_Storage.size();++i)
{
     _Storage[i].resize(100000)
     for (unsigned j = 0;j<_Storage[i].size();++j)
     {
            _Storage[i][j].resize(100000)
            for (unsigned k = 0;k<_Storage[i][j].size();++k)
                 _Storage[i][j][k]=(float)rand()/1000.0f;
     }
}
unsigned int _currentI = 0;
while(!someFancyAbortCirterion())
     for (unsigned j = 0;j<_Storage[_currentI].size();++j)
     {
          for (unsigned k = 0;k<_Storage[_currentI][j].size();++k)
          {
               float c = _Storage[_currentI][j][k];
               _Storage[_currentI][j][k]=0.0f;
               // find random element
               unsigned int _randI = rand()%999;
               if(_randI==_currentI)_randI++; // all but the current one
               unsigned int _randJ = rand()%100000;
               unsigned int _randK = rand()%100000;
               // store by chance
               if(rand()%2==0)
                    _Storage[_currentI][j][k]+=c;
          }
     }
     _currentI=(_currentI+1)%1000
}


Well my idea would be something like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
for (unsigned i = 0;i<100000;++i)
{
     for (unsigned j = 0;j<100000;++j)
     {
            for (unsigned k = 0;k<100000;++k)
                saveToDisk(i,j,k,(float)rand()/1000.0f);
     }
}
unsigned int _currentI = 0;
std::vectorstd::vector<<float> > _OneLineStorage =readOneLineFromDisk(_currentI);
while(!someFancyAbortCirterion())
     for (unsigned j = 0;j<_OneLineStorage.size();++j)
     {
          for (unsigned k = 0;k<_OneLineStorage[j].size();++k)
          {
               float c =_OneLineStorage[j][k];
               // find random element
               unsigned int _randI = rand()%999;
               if(_randI==_currentI)_randI++; // all but the current one
               unsigned int _randJ = rand()%100000;
               unsigned int _randK = rand()%100000;
               // store by chance
               if(rand()%2==0)
                    addToDisk(_randI,_randJ ,randK,c);
          }
     }
     _currentI=(_currentI+1)%1000
     _OneLineStorage =readOneLineFromDisk(_currentI);
}


or

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
for (unsigned i = 0;i<100000;++i)
{
     for (unsigned j = 0;j<100000;++j)
     {
            for (unsigned k = 0;k<100000;++k)
                saveToDisk(i,j,k,(float)rand()/1000.0f);
     }
}
unsigned int _currentI = 0;
while(!someFancyAbortCirterion())
     for (unsigned j = 0;j<1000000;++j)
     {
          for (unsigned k = 0;k<100000;++k)
          {
               float c = readFromDisk(_currentI, j, k);
               // find random element
               unsigned int _randI = rand()%999;
               if(_randI==_currentI)_randI++; // all but the current one
               unsigned int _randJ = rand()%100000;
               unsigned int _randK = rand()%100000;
               // store by chance
               if(rand()%2==0)
                    addToDisk(_randI,_randJ ,randK,c);
          }
     }
     _currentI=(_currentI+1)%1000
}


But how would these functions look like?
Last edited on
You realize 5TB is much bigger than most people's hard disk? Even 500GB is probably more space than most people have available on their HD... What on Earth is requiring this level of accuracy?
It is not important that everyone can simulate that stuff. It is more than sufficient, that I can simulate on my maschine, where I can make 5TB available. Lets say, it has to be done, and it is easier to get 5TB of HardDisk than 5TB of RAM.
I will rephrase my question. What on Earth are you simulating? (not trying to derail the thread here, just curious).
Have you considered using SQL database?
Not really. Where do you think are the advantages? Performance or programming effort? Havent worked with SQL yet, but that might me changed :-)
Well, the SQL databases must cope with the situations, where there are a humongous tables on disk, and they can update single records.
Topic archived. No new replies allowed.