extract some integer data from a txt file

Pages: 12
i have txt file which i need to extract some specific integer data. it is as follows. i need a c++ code to extract only the integers mentioned below

<Descriptor xsi:type = "EdgeHistogramType"><BinCounts>0 2 5 1 4 0 2 0 1 2 0 5 0 2 3 0 5 0 1 2 0 0 0 0 2 1 0 0 0 4 0 1 0 3 3 0 1 0 1 2 0 1 3 3 5 0 0 2 2 5 1 0 2 3 5 0 0 1 2 4 1 0 1 1 6 0 2 3 3 5 0 1 2 4 6 1 0 2 2 5 </BinCounts>
</Descriptor>

<Descriptor xsi:type = "EdgeHistogramType"><BinCounts>0 2 5 2 2 0 2 0 0 1 0 5 0 2 2 0 5 1 1 2 0 0 0 0 2 0 0 0 1 3 0 1 0 3 4 0 1 0 3 2 0 0 1 2 4 0 1 2 1 5 0 1 1 2 5 1 0 2 3 4 0 2 2 2 5 0 1 4 4 6 0 1 5 3 6 0 0 2 4 6 </BinCounts>
</Descriptor>
.
.
.
there are many like these i just want to extract this part from every line to a array and save to an another text file.
0 2 5 1 4 0 2 0 1 2 0 5 0 2 3 0 5 0 1 2 0 0 0 0 2 1 0 0 0 4 0 1 0 3 3 0 1 0 1 2 0 1 3 3 5 0 0 2 2 5 1 0 2 3 5 0 0 1 2 4 1 0 1 1 6 0 2 3 3 5 0 1 2 4 6 1 0 2 2 5
plz help
Last edited on
That looks like XML. You can use an existing third-party XML parsing library to extract the data, rather than writing one yourself.
okay thank you.i will try that.
Something like this, perhaps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <string>
#include <fstream>

std::string extract_counts( std::string line )
{
    static const std::string prefix = "<Descriptor xsi:type = \"EdgeHistogramType\"><BinCounts>" ;
    static const std::string suffix = "</BinCounts></Descriptor>" ;

    const auto pos = line.find(suffix) ;
    if( line.find(prefix) == 0 && pos == line.size() - suffix.size() )
        return { line.begin() + prefix.size(), line.begin() + pos } ;
    else return {} ;
}

void extract_counts( std::istream& input, std::ostream& output )
{
    std::string line ;
    while( std::getline( input, line ) )
    {
        const std::string counts = extract_counts(line) ;
        if( !counts.empty() ) output << counts << '\n' ;
    }
}

int main()
{
    std::ifstream in_file( "input.xml" ) ;
    std::ofstream out_file( "counts.txt" ) ;
    extract_counts( in_file, out_file ) ;
}
nope not working. thank you for the effort. JLBorges. only a empty txt file is generated
It looks like the end tag </Descriptor> is on a separate line.

That means in the code above, line 8
 
    static const std::string suffix = "</BinCounts></Descriptor>" ;


would be better with that part removed:
 
    static const std::string suffix = "</BinCounts>" ;

if i can extract it somehow to a txt file can u help me to improve a code to look them like this

+1 1:0 2:2 3:5 4:1 5:4 6:0 7:2 8:0 9:1 10:2 11:0 12:5 .........
need to print plus one on each line, then numbering 1: 2: and so on.

input text file contain lines of numbers extracted like this

0 2 5 1 4 0 2 0 1 2 0 5 0 2 3 0 5 0 1 2 0 0 0 0 2 1 0 0 0 4 0 1 0 3 3 0 1 0 1 2 0 1 3 3 5 0 0 2 2 5 1 0 2 3 5 0 0 1 2 4 1 0 1 1 6 0 2 3 3 5 0 1 2 4 6 1 0 2 2 5

0 2 5 2 2 0 2 0 0 1 0 5 0 2 2 0 5 1 1 2 0 0 0 0 2 0 0 0 1 3 0 1 0 3 4 0 1 0 3 2 0 0 1 2 4 0 1 2 1 5 0 1 1 2 5 1 0 2 3 4 0 2 2 2 5 0 1 4 4 6 0 1 5 3 6 0 0 2 4 6

0 2 5 1 4 0 2 0 1 2 0 5 0 2 3 0 5 0 1 2 0 0 0 0 2 1 0 0 0 4 0 1 0 3 3 0 1 0 1 2 0 1 3 3 5 0 0 2 2 5 1 0 2 3 5 0 0 1 2 4 1 0 1 1 6 0 2 3 3 5 0 1 2 4 6 1 0 2 2 5
.
.
.
output must be again printed to a txt file. please help.thank you
@Chervil thank you for the guidence. i will check it. thank you
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;


//======================================================================


void getWholeFile( string filename, string &buffer )       // Get the whole file, replacing line breaks by a space
{
   ifstream in( filename );   assert( in );                // Assertion failure ==> couldn't open file
   string line;
   while ( getline( in, line ) ) buffer += line + " ";
   in.close();
}


//======================================================================


void extractData( const string& buffer, vector<string> &allData )
{
   string prefix = "<BinCounts>";
   string suffix = "</BinCounts>";
   size_t p = 0, q = 0;

   while( true )
   {
      p = buffer.find( prefix, p );   if ( p == string::npos) break;
      p += prefix.size();
      q = buffer.find( suffix, p );
      allData.push_back( buffer.substr( p, q - p ) );
      p = q + suffix.size();
   }
}


//======================================================================


void output( const vector<string> &allData, const string &filename )
{
   int count;

   ofstream out( filename );
   for ( string line : allData )
   {
      int n = 1;
      out << "+" << 1 << " ";          // Did you mean +1 every time, or a new line number?
      stringstream ss( line );
      while ( ss >> count )
      {
         out << n << ':' << count << " ";
         n++;
      }
      out << endl;
   }
   out.close();
}


//======================================================================


int main()
{
   string buffer = "";
   vector<string> allData;
   string infileName = "input.xml";
   string outfileName = "output.dat";

   getWholeFile( infileName, buffer );
   extractData( buffer, allData );
   output( allData, outfileName );
}


//====================================================================== 

Last edited on
+1 everytime. i wil check this thank you for the support lastchance
it is working perfectly fine thank you for this great help. all of you thank you.
please one more help.if i need to collect above data from 2 different text files and collect them and make them in arrays and print as above said.what should i do.
example
data from input1
0 2 5 1 4 0 2 0 1 2 0 5 0 2 3 0 5 0 1 2 0 0 0 0 2 1 0 0 0 4 0 1 0 3 3 0 1 0 1 2 0 1 3 3 5 0 0 2 2 5 1 0 2 3 5 0 0 1 2 4 1 0 1 1 6 0 2 3 3 5 0 1 2 4 6 1 0 2 2 5
data from input 2
0 2 5 1 4 0 2 0 1 2 0 5 0 2 3 0 5 0 1 2 0 0 0 0 2 1 0 0 0 4 0 1 0 3 3 0 1 0 1 2 0 1 3 3 5 0 0 2 2 5 1 0 2 3 5 0 0 1 2 4 1 0 1 1 6 0 2 3 3 5 0 1 2 4 6 1 0 2 2 5
input 1 plus input 2 printed as above mentioned one...

Only difference from the above is that now i get two inputs and need to be printed as above. consider order of input 1 and 2 cannot be changed.
Last edited on
You can call getWholeFile() again with a second input file and second buffer.
Then call extractData() again with the second buffer and a second vector of strings to hold the other data.
Then write an alternate output routine to take both vectors and combine their output as you want it (which has changed somewhat during this thread).
Another variant to read the file. With this solution, you can even combine as many xml files as possible and then read them at once.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
/**
* @file		ExtractIntegers.cpp
* @author	blongho
* @date		12-Nov-2017
* @brief	A program to read an xml file containing some integers. 
* @detail	This program reads the xml file, extract the numbers and saves the numbers 
*			in a text file
*/
#include <iostream>
#include <string>
#include <vector>
#include <fstream>

const std::string dataFile = "bigCounts.xml"; /**< File containing data */
const std::string savedFile = "output.txt";	 /**< File where data will be saved in */

/**
* @brief Read file and save number strings into a vector
* @param numVec Vector holder strings of numbers
* @param infile File containing the data
*/
void extractNumbers(std::vector<std::string> &numVec, const std::string &infile);

/**
* @brief Saves data into a file
* @param numbers Vector containing string of numbers
* @param outFile File to contain the data from the vector
*/
void saveNumbers(const std::vector<std::string> &numbers, const std::string &outFile);

int main()
{
	std::vector<std::string> numbers;
	extractNumbers(numbers, dataFile);
	saveNumbers(numbers, savedFile);

	return 0;
}

void extractNumbers(std::vector<std::string> &numVector, const std::string & infile)
{
	std::ifstream readFile(infile);
	const std::string numbers = "0123456789";

	while (readFile){
		std::string content;

		std::getline(readFile, content); 

		auto numPos = content.find_first_of(numbers); 

		if (numPos != std::string::npos){

			std::string numContent = content.substr(numPos); 

			auto endNum = numContent.find_last_of(numbers); 

			if (endNum != std::string::npos){
				std::string onlyNumbers = content.substr(numPos, endNum + 1); 
				numVector.push_back(onlyNumbers);
			}
		}
	}
	readFile.close();
}

void saveNumbers(const std::vector<std::string> &numbers, const std::string & outFile)
{
	std::ofstream save(outFile);
	for (const std::string str : numbers){
		save << str << std::endl;
	}
	save.close();
}



Last edited on
@lastchance can you please upload the updated code. i am new to this coding. i tried as you said but it doesnt work
@dsblaster: maybe you would like to upload what you have changed the code to: then we can assist in the learning process.
IS THIS HOW YOU ASKED ME TO DO @lastchance.?please help. my final output should be combined to one txt file.




1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;


//======================================================================


void getWholeFile(string filename, string &buffer)       // Get the whole file, replacing line breaks by a space
{
	ifstream in(filename);   assert(in);                // Assertion failure ==> couldn't open file
	string line;
	while (getline(in, line)) buffer += line + " ";
	in.close();
}


void getWholeFile2(string filename2, string &buffer2)       // Get the whole file, replacing line breaks by a space
{
	ifstream in(filename2);   assert(in);                // Assertion failure ==> couldn't open file
	string line;
	while (getline(in, line)) buffer2 += line + " ";
	in.close();
}
//======================================================================


void extractData(const string& buffer, vector<string> &allData)
{
	string prefix = "<BinCounts>";
	string suffix = "</BinCounts>";
	size_t p = 0, q = 0;

	while (true)
	{
		p = buffer.find(prefix, p);   if (p == string::npos) break;
		p += prefix.size();
		q = buffer.find(suffix, p);
		allData.push_back(buffer.substr(p, q - p));
		p = q + suffix.size();
	}
}

void extractData2(const string& buffer2, vector<string> &allData2)
{
	string prefix2 = "<Values>";
	string suffix2 = "</Values>";
	size_t p = 0, q = 0;

	while (true)
	{
		p = buffer2.find(prefix2, p);   if (p == string::npos) break;
		p += prefix2.size();
		q = buffer2.find(suffix2, p);
		allData2.push_back(buffer2.substr(p, q - p));
		p = q + suffix2.size();
	}
}

//======================================================================


void output(const vector<string> &allData, const string &filename)
{
	int count;

	ofstream out(filename);
	for (string line : allData)
	{
		int n = 1;
		out << "+" << 1 << " ";          // Did you mean +1 every time,
										 //    out << "+" << n << " ";          //           or +n?
		stringstream ss(line);
		while (ss >> count)
		{
			out << n << ':' << count << " ";
			n++;
		}
		out << endl;
	}
	out.close();
}


//======================================================================


int main()
{
	string buffer = "";
	vector<string> allData;
	string infileName = "output.xml";
	string outfileName = "output.txt";
	//string infileName1 = "output2.xml";
	//string outfileName1 = "output2.txt";
	getWholeFile(infileName, buffer);
	//getWholeFile(infileName1, buffer);
	extractData(buffer, allData);
	output(allData, outfileName);
	//output(allData, outfileName1);
}


//====================================================================== 

[code]
[/code]
@dsblaster,
You shouldn't have to make nearly identical copies of functions - the idea of functions is that they are reusable with different parameters.

Please explain exactly and carefully what you are trying to do. You seem to be using prefixes of BinCounts in one file and Values in another. Did you really intend that?

Please give a very precise form of the input and a very precise form of the output. It is currently impossible to deduce what you want.
the code you have given me is working for a one txt file input. it extract the required data i want. there is an another txt file in which i have to do the same process except

string prefix2 = "<Values>";
string suffix2 = "</Values>";
these prefix suffixs are different

first process data have 80 numbers in a array,
what i want is continue from that (81) and print to the same txt file.
inputs from txt1 looks like this
<Descriptor xsi:type = "EdgeHistogramType"><BinCounts>1 3 4 3 5 1 2 0 0 2 0 0 0 0 1 0 1 0 0 2 0 0 0 0 3 0 0 2 1 2 0 1 0 0 3 0 0 0 1 3 0 1 2 3 5 1 1 0 3 5 0 1 0 1 6 0 0 0 0 5 0 1 1 0 7 3 1 3 1 5 0 1 1 3 7 1 1 3 1 6 </BinCounts>
</Descriptor>

inputs from txt2 looks like this

<MultimediaContent xsi:type = "ImageType"><Image><VisualDescriptor xsi:type = "ColorStructureType" colorQuant = "1"><Values>0 0 0 0 0 0 0 0 17 161 208 25 2 12 3 1 1 9 6 1 3 18 14 0 2 8 23 177 255 77 25 2 </Values>
</VisualDescriptor>

now i want to extract these datas in both files and write to an array
+1 1:1 2:3 3:4 4:4 5:4 6:0 7:2 8:0 9:0 10:2 11:0 12:0 13:0 14:0 15:1 16:0 17:1 18:0 19:0 20:2 21:0 22:0 23:0 24:0 25:3 26:0 27:0 28:2 29:1 30:2 31:0 32:0 33:0 34:0 35:4 36:0 37:0 38:0 39:2 40:2 41:1 42:0 43:1 44:4 45:5 46:0 47:1 48:0 49:3 50:6 51:0 52:0 53:0 54:1 55:6 56:1 57:1 58:1 59:1 60:5 61:2 62:1 63:0 64:1 65:6 66:2 67:1 68:3 69:0 70:5 71:2 72:1 73:0 74:2 75:6 76:1 77:0 78:2 79:1 80:6 81:0 82:0 83:0 84:0 85:0 86:0 87:0 88:0 89:15 90:163 91:220 92:25 93:1 94:12 95:1 96:1 97:1 98:8 99:12 100:1 101:4 102:18 103:15 104:1 105:1 106:7 107:22 108:164 109:255 110:69 111:20 112:0
1 to 80 set is from txt1 and other part is from txt2
It would really help if your sample inputs corresponded (numerically) to your sample output. I'm still having to guess what you meant. From the inputs given how do the last few combined counts end up as 255, 69, 20, 0 for example? I think they should end 255, 77, 25, 2.

If I was absolutely sure that file formats weren't going to change yet again then there are much more efficient ways of doing this - e.g. reading two files and outputting in parallel.

However, I'm taking no chances and, at the expense of quite a lot of computer memory, reading the whole lot into string buffers, extracting into separate vectors and then recombining.

I'm assuming equal number of lines of BinCounts/Values from each file. If not, then you will have to adapt the combination lines and/or output accordingly.

If you want to change input/output file names then this should be obvious in int main().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;


//======================================================================


void getWholeFile( string filename, string &buffer )   //Get the whole file, replacing line breaks by a space
{
   ifstream in( filename );   assert( in );
   string line;
   while ( getline( in, line ) ) buffer += line + " ";
   in.close();
}


//======================================================================


void extractData( const string& buffer, string prefix, string suffix, vector<string> &allData )  // Extract data from buffer
{
   size_t p = 0, q = 0;

   while( true )
   {
      p = buffer.find( prefix, p );   if ( p == string::npos) break;
      p += prefix.size();
      q = buffer.find( suffix, p );
      allData.push_back( buffer.substr( p, q - p ) );
      p = q + suffix.size();
   }
}


//======================================================================


void output( const vector<string> &allData, const string &filename )
{
   ofstream out( filename );
   int lineNum = 0;    // just in case ...
   for ( string line : allData )
   {
      lineNum++;       // still just in case you change your mind ...
      out << "+1 ";
      stringstream ss( line );
      int n = 1, value;
      while ( ss >> value )
      {
         out << n << ':' << value << " ";
         n++;
      }
      out << endl;
   }
   out.close();
}


//======================================================================


int main()
{
   vector<string> allData1, allData2;
   string buffer1, buffer2;
   string infile1 = "input1.xml", prefix1 = "<BinCounts>", suffix1 = "</BinCounts>";
   string infile2 = "input2.xml", prefix2 = "<Values>", suffix2 = "</Values>";
   string outfile = "output.dat";

   getWholeFile( infile1, buffer1 );
   extractData( buffer1, prefix1, suffix1, allData1 );

   getWholeFile( infile2, buffer2 );
   extractData( buffer2, prefix2, suffix2, allData2 );

   // Combine corresponding lines into the strings of allData1 - assumes equal numbers
   assert( allData1.size() == allData2.size() );   // Taking no chances
   for ( int i = 0; i < allData1.size(); i++ ) allData1[i] += " " + allData2[i];

   output( allData1, outfile );
}


//====================================================================== 


Last edited on
Pages: 12