Removing NUL values in text file

Hello,

i have a measuring board that writes a .txt file in the following format:

board number;type;NUL;channel;measured value;date and time
DS207;5000007;NUL;0;20251;11.07.2013 12:30:02 MESZ
DS207;5000007;NUL;1;10159;11.07.2013 12:30:02 MESZ
DS207;5000007;NUL;4;27.18;11.07.2013 12:30:02 MESZ
DS207;5000007;NUL;0;20233;11.07.2013 12:35:02 MESZ
DS207;5000007;NUL;1;10149;11.07.2013 12:35:02 MESZ
DS207;5000007;NUL;4;27.31;11.07.2013 12:35:02 MESZ
DS207;5000007;NUL;0;20256;11.07.2013 12:40:02 MESZ
...

I would like to extract and analyse the data but the data behind the NUL entry is not accessible for me maybe due to the fact that NUL normally marks the end of a line. Is there a method to remove the NUL entries in this text file?

Thanks a lot!

Martin
I presume you wrote the code that generated the txt file? As such I would suggest you check for the condition that generated the NUL entries and don't allow it to write anything to txt file.
If you cannot change the code, it does look like the record size is fixed. So you might be able to open the file as binary and then read fixed size blocks in using ifstream::read()

Andy


Thank you very much for your replies so far.

I presume you wrote the code that generated the txt file? As such I would suggest you check for the condition that generated the NUL entries and don't allow it to write anything to txt file.


Unfortunalely I do not have access to the code that generated the txt file and I already have hundreds of files formatted this way.
I will try to open the file as binary as Andy suggested.
If you have another ideas, please post them.

Martin

Is there a method to remove the NUL entries in this text file?


You could write a small utility program that just opens each file, reads each line and parses the fields delimited by the semi-colon and effectively filter out the NUL; fields. Then just output to a new file.
You could write a small utility program that just opens each file, reads each line and parses the fields delimited by the semi-colon and effectively filter out the NUL; fields. Then just output to a new file.


That's what I tried but if I use the following command

file.getline(row, 1024);

the variable "row" only contains the string "DS2007;50000007;" and nothing else.

Is this exactly what you have on an example line in your txt files:

DS207;5000007;NUL;0;20251;11.07.2013 12:30:02 MESZ

or do you have:

DS207;5000007;\0;0;20251;11.07.2013 12:30:02 MESZ

i.e. a NULL terminator char ?
When I open the file with Notepad++ it looks like this:

DS207;5000007;NUL;0;20251;11.07.2013 12:30:02 MESZ CR LF

Opening the file in binary mode does not yield better results.

I am not very experienced in working with text files so I don't really know how to proceed.
Unfortunately I have 80 of the measuring boards I mentioned above. The software has a small bug in writing this column containing "Nul" values.



sed "s/\0//g" < input > output
IF you open one of the files in a Binary editor what do you see for the NUL; field? Or you could pm me to arrange to send me an example file.
closed account (z05DSL3A)
What do you get if you try something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <fstream>
#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>
#include <regex>

int main () 
{ 
    std::ifstream infile("c:\\test\\test.txt");
    std::string line;
    if (infile.is_open())
    {
        while (std::getline(infile, line))
        {
            std::regex reg(";");
            std::string fmt(" ");
            line = std::regex_replace(line, reg, fmt);
            std::stringstream strstr(line);

            std::istream_iterator<std::string> it(strstr);
            std::istream_iterator<std::string> end;
            std::vector<std::string> results(it, end);

            std::ostream_iterator<std::string> oit(std::cout);
            std::copy(results.begin(), results.end(), oit);

            std::cout << std::endl;
        }
    }
    return 0; 
}


Edit:

or go real simple:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int main () 
{ 
    std::ifstream infile("c:\\test\\test.txt");
    char row[1024];
    if (infile.is_open())
    {
        while(infile.getline(row, 1024))
        {
            char * ptr = std::find(row, row + 1024,'\0');
            *ptr = ' ';
            std::cout << row << std::endl;
        }
    }
    return 0; 
}
NB: very crude code to show the idea.
Last edited on
You could use std::replace ??

This code works with char buffers and uses istream::gcount to find out how many chars were read by the last operation (inc. term null, hence the -1 adjustment.)

If you use std::replace with std::string you don't have to worry about gcount, you just use begin() and end() as usual.

Andy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <iomanip>   // for boolalpha
#include <fstream>
#include <string>
#include <algorithm> // for replace

void test_with_char_buff(bool use_replace)
{
    std::cout << "test_with_char_buff (use_replace = "
              << use_replace << ")\n\n";

    std::ifstream infile("c:\\test\\test.txt");
    if (!infile.is_open())
    {
        std::cerr << "error : open file failed\n";
        return;
    }

    const size_t buffer_size = 1024;
    char buffer[buffer_size] = {0};
    while (infile.getline(buffer, buffer_size))
    {
        if (use_replace)
        {
            size_t count = static_cast<size_t>(infile.gcount());
            if (0 < count)
            {
                std::replace(buffer, (buffer + count - 1), '\0', '*');
                // -1 as don't want to replace final null
            }
        }

        std::cout << buffer << "\n";
    }

    std::cout << "\n";
}

int main () 
{
    std::cout << std::boolalpha;

    test_with_char_buff(false);
    test_with_char_buff(true);

    return 0; 
}


test_with_char_buff (use_replace = false)

DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;

test_with_char_buff (use_replace = true)

DS207;5000007;*;0;20251;11.07.2013 12:30:02 MESZ
DS207;5000007;*;1;10159;11.07.2013 12:30:02 MESZ
DS207;5000007;*;4;27.18;11.07.2013 12:30:02 MESZ
DS207;5000007;*;0;20233;11.07.2013 12:35:02 MESZ
DS207;5000007;*;1;10149;11.07.2013 12:35:02 MESZ
DS207;5000007;*;4;27.31;11.07.2013 12:35:02 MESZ
DS207;5000007;*;0;20256;11.07.2013 12:40:02 MESZ
Last edited on
Hello 'Grey Wolf' and Andy,

your solutions both work very well!
Thanks a lot all of you for your help.
This saves a lot of time for me as I will not have to manually format several hundred text files.

Thank you very much again.

Martin
Topic archived. No new replies allowed.