How do I transfer contents of text file to a database?

Hi,

I am using C++ Builder 6 and am attempting to read the contents of a text file
and output it to a file that can be read by a standard database program. The
goal is to transfer several old DOS database files (PFS:Professional File) to
a modern Windows database to which we can continue to add records. The text file
was generated by the 'print to file' option and can be viewed by a text editor,
such as Notepad. Each 'record' contains the field name as well as the data. My
idea is to do a search for the position of each field name, then copy the number
of characters equal to the space between the current field name and the next
field name, and create a database file one record at a time using SQL statements
as the original file is read. I think it can be done but I don't know how to
approach the problem. Any suggestions will be appreciated.
The text format looks something like this:

FIELD1: data
FIELD2: data
FIELD3: data




FIELD1: data
FIELD2: data
FIELD3: data


etc...
Do I need to pose my question differently to get a reply?
I don't know how well C++ Builder 6 conforms to current c++ standards. I was going to suggest using std::string and std::getline in order to read the data from the file. A very simple example of reading a file line by line:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <iostream>
#include <fstream>
#include <string>

int main()
{
    std::ifstream data("file.txt");
    if (!data)
    { 
        std::cout << "could not open file" << std::endl;
        return 1;
    }    
    
    std::string str;
    while (std::getline(data, str))
    {
            std::cout << str << std::endl;
    }
   
}

You could then use built-in functions such as find() to look for the delimiting ": " and substr() to extract the field name and data parts of each line.
perhaps like this:
1
2
3
4
5
6
7
8
const std::string delim = ": ";

    size_t pos = str.find(delim);
    if (pos != std::string::npos)
    {
        std::string name = str.substr(0, pos);
        std::string data = str.substr(pos+delim.size());
    }


The full details will depend on what the input file contains, and what sort of output you wish to generate. For example I looked at a solution which took an input looking like this:
FIELD1: dataA
FIELD2: dataB
FIELD3: dataC




FIELD1: dataD
FIELD2: dataE
FIELD3: dataF


FIELD1: dataG
FIELD2: dataH
FIELD3: dataI


... and generated output which looked like this:
INSERT INTO CPHoff01 (FIELD1,FIELD2,FIELD3) VALUES ('dataA','dataB','dataC');
INSERT INTO CPHoff01 (FIELD1,FIELD2,FIELD3) VALUES ('dataD','dataE','dataF');
INSERT INTO CPHoff01 (FIELD1,FIELD2,FIELD3) VALUES ('dataG','dataH','dataI');
Thanks for the response, Chervil. I will try that route as well as look for a newer compiler.
code::blocks is often recommended as a good IDE, which comes with an up-to-date compiler. There's also the Orwell DevC++ (important to go for the Orwell version, others are too old) which comes with the same compiler.

If you need more help feel free to ask for clarification.
I am back on this project and am having trouble figuring out another issue. (I had downloaded and installed code::blocks just before I read your last reply.) The output is close to what I am looking for but some of the fields (in the original database file) are blank, and those fields are skipped when the program below encounters them. I need to be able to output those blank fields in order to keep all the fields properly aligned. The file I am experimenting on is small - 37 records with three fields per record, so I will know when I get the output I want. The file that really matters, and is appended often, contains five fields (some of which are also blank), and is quickly approaching 5000 records. That brings up another question: will a file of that size (2.41KB) be a problem for what I am trying to do?
Thanks again for your assistance.
Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <fstream>
#include <string>



int main()
{
    int rec_count = 0;
    std::ifstream data("C:\\PFS\\Experimental\\AB2");
    if (!data)
    {
        std::cout << "could not open file" << std::endl;
        return 1;
    }

    std::string str;
    while (std::getline(data, str))
    {

    const std::string delim = ": ";

    size_t pos = str.find(delim);

    if (pos != std::string::npos)
    {
        ++rec_count;
        std::cout << "Record# " << rec_count << std::endl; //actually counts lines, not records

        std::string name = str.substr(0, pos);
        std::string data = str.substr(pos+delim.size());

        std::cout << name << ": ";
        std::cout << data << std::endl;

    }
    }

}
It might be helpful if you could share an example of the actual file data (possibly edited to avoid privacy concerns, if necessary). Not the entire 5000 records here, just a few to show typical data as well as the data which gives problems.

As for the file size, it certainly isn't a problem for C++ to handle, and I don't expect it to cause problems later on but you'd have to try that and see. Initially you might experiment with a smaller sample to prove the concept will work.

As for your code, I've not really studied it in detail yet, as it really needs to be assessed together with some actual data.
Here are the first few records of the file I am using to prove the concept:

                       ALLEN-BRADLEY CROSS REFERENCE
 
AB PART NUMBER: 741214-27
 
PART NUMBER: 150MF 15V 10% A E
 
REMARKS: AXIAL ELECTROLYTIC CAPACITOR.
 
 
 
 
                       ALLEN-BRADLEY CROSS REFERENCE
 
AB PART NUMBER: 625874-38
 
PART NUMBER: 1.5MF 35V R T
 
REMARKS: RADIAL TANTALUM
 
 
 
 
                     ALLEN-BRADLEY CROSS REFERENCE
 
AB PART NUMBER: 626629-01
 
PART NUMBER:
 
REMARKS: 0 OHM 1/4W JUMPER.
 
 
 
 
                       ALLEN-BRADLEY CROSS REFERENCE
 
AB PART NUMBER: 625874-22
 
PART NUMBER: 4.7MD 25V R T
 
REMARKS: RADIAL TANTALUM



I deleted a lot of white space between each record so the post wouldn't be huge. Looking at the file with a hex editor, all that white space is lots of CR, LF and <space> (0D 0A 20). The header 'ALLEN-BRADLEY CROSS...' is not needed and has not caused a problem. As you can see, the third record contains a blank field, and that field is skipped on the output.
Thank you for that response - it may take a while longer before I digest what it means now.

edit As a first very quick and simple response, changing the delimiter from ": " to ":" (without the space) might be sufficient to get past the immediate problem.
 
     const std::string delim = ":";


Though I should add that now I've seen the data, a lot more ideas arise, about how to group the lines into records. For example if the heading line "ALLEN-BRADLEY CROSS REFERENCE" is always present before each record, it could be used to determine the correct grouping.
Last edited on
Thanks for the quick response, Chervil. I will try eliminating the space. Also, in the large file that I am more interested in converting, there is no heading like in this file.
Ok, but is there any clear way to identify the end of one record, and the start of the next? Again, a data sample would be useful.

For what it's worth, here's a quick attempt to group the data as it is read in from the file. Datarow will hold one line of data. The vector (basically a dynamically allocated array) will hold all the data for one record.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <vector>

struct Datarow {
    std::string name;
    std::string data;
};

void process(std::vector<Datarow> & rows, std::ostream & os=std::cout);

int main()
{
    std::ifstream data("input_data.txt");
    if (!data)
    {
        std::cout << "could not open file" << std::endl;
        return 1;
    }

    std::vector<Datarow> rows;
    
    const std::string header("ALLEN-BRADLEY CROSS REFERENCE");
    const std::string delim = ":";
        
    std::string str;
    while (std::getline(data, str))
    {
        // First check for header
        size_t pos = str.find(header);
        if (pos != std::string::npos)
        {        
            if (rows.size())
                process(rows); 
            continue; // finished with this line, 
        }
        
        pos = str.find(delim);
        if (pos != std::string::npos)
        {
            Datarow row;
            row.name = str.substr(0, pos);
            row.data = str.substr(pos+delim.size());
            rows.push_back(row);
        }
    }

    if (rows.size())
        process(rows); 

    return 0;
}


void process(std::vector<Datarow> & rows, std::ostream & os)
{
    static int rec_count = 0;
    os << "Record# " << ++rec_count << std::endl;
    
    for (auto row : rows)
    {
        os << std::left << std::setw(20) << row.name
           << " " << row.data << '\n';
    }
    os << "--------------------------------------\n";
    rows.clear();
}


Function process() by default displays the output on the screen, but can easily send the output to a file instead. Example:
 
    std::ofstream fout("output.txt");

At lines 36 and 51:
 
    process(rows, fout);


Example output:
Record# 1
AB PART NUMBER        741214-27
PART NUMBER           150MF 15V 10% A E
REMARKS               AXIAL ELECTROLYTIC CAPACITOR.
--------------------------------------
Record# 2
AB PART NUMBER        625874-38
PART NUMBER           1.5MF 35V R T
REMARKS               RADIAL TANTALUM
--------------------------------------
Record# 3
AB PART NUMBER        626629-01
PART NUMBER          
REMARKS               0 OHM 1/4W JUMPER.
--------------------------------------
Record# 4
AB PART NUMBER        625874-22
PART NUMBER           4.7MD 25V R T
REMARKS               RADIAL TANTALUM
--------------------------------------

Last edited on
You make this look easy. Eliminating the space in the delimiter did give me all of the fields in the output, so now I need to read up on what I don't understand in your latest post. Since I am home now and I haven't put this project on a thumbdrive yet, the rest will have to wait until tomorrow. Have a good evening.
Another idea for the output, put in a comma-separated format, save it as a .csv and then open it in a spreadsheet (e.g. excel).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void process(std::vector<Datarow> & rows, std::ostream & os)
{  
    static bool firsttime = true;
    if (firsttime)
    {
        // Outout the heading line once only
        firsttime = false;
        std::string prefix = "";
        for (auto row : rows)
        {
            os << prefix << '\"' << row.name << '\"';
            prefix = ",";
        }        
        os << '\n';    
    }
    
    // output the data values
    std::string prefix = "";
    for (auto row : rows)
    {
        os << prefix << '\"' << row.data << '\"';
        prefix = ",";
    }        
    os << '\n';    
    
    // Empty the vector ready for next use
    rows.clear();
}

Output, I put everything in quotes in case any data itself might contain a comma.
"AB PART NUMBER","PART NUMBER","REMARKS"
"741214-27","150MF 15V 10% A E","AXIAL ELECTROLYTIC CAPACITOR."
"625874-38","1.5MF 35V R T","RADIAL TANTALUM"
"626629-01","","0 OHM 1/4W JUMPER."
"625874-22","4.7MD 25V R T","RADIAL TANTALUM"
Topic archived. No new replies allowed.