Count first digit on each line of a text file

My project takes a filename and opens it. I need to read each line of a .txt file until the first digit occurs, skipping whitespace, chars, zeros, or special chars. My text file could look like this:

1
2
3
4
5
6
   1435                 //1, nextline
    0                   //skip, next line
                        //skip, nextline
    (*Hi 245*) 2       //skip until second 2 after comment and count, next line
    345 556           //3 and count, next line 
    4                //4, nextline 

My desired output would be all the way up to nine but I condensed it:

1
2
3
4
5
 Digit Count Frequency
    1:      1     .25
    2:      1     .25
    3:      1     .25
    4:      1     .25

My code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 #include <iostream>
        #include <fstream>
        #include <string>
        using namespace std;
        
        int main() {
        
            int digit = 1; //used for cout later on
            int array[8]; //will store how many times number is used
            string filename;
            //cout for getting user path
            //the compiler parses string literals differently so use a double backslash or a forward slash
            cout << "Enter the path of the data file, be sure to include extension." << endl;
            cout << "You can use either of the following:" << endl;
            cout << "A forwardslash or double backslash to separate each directory." << endl;
            getline(cin,filename);
        
            ifstream input_file(filename.c_str());
        
            if (input_file.is_open()) { //if file is open
                cout << "open" << endl; //just a coding check to make sure it works ignore
                string line;
                while (getline(input_file, line)) {
                    //how could I write if char go on to next
                    //or if "(" skip until next ")" then if int count or skip
                    //ex if 1 then array[0] = array[0] + 1 but not limited to this
                }
            }
            else {
                cout << "Error opening file check path or file extension" << endl;
            }


In this file format, `(*` signals the beginning of a comment, so everything from there to a matching `*)` should be ignored (even if it contains a digit). For example, given input of `(*Hi 245*) 6`, the `6` should be counted, not the `2`.

How do I iterate over the file only finding the first integer and counting it, while ignoring comments?
Last edited on
Just for purposes of reading and understanding what I am trying to do I condensed it. Assuming as an example a loop is written up till 4, if the example works then I could easily change the numbers to represent the real data set of 1-9 that I have. The main problem is the either skipping over comments or removing them then getting the first number of each line and counting it.
Last edited on
Anything starting with (* and ending with *)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>  
#include <fstream>
#include <string>
using namespace std;
        
int main() {
        
    int digit = 1; //used for cout later on
    int array[8]; //will store how many times number is used
    string filename;
    //cout for getting user path
    //the compiler parses string literals differently so use a double backslash or a forward slash
    cout << "Enter the path of the data file, be sure to include extension." << endl;
    cout << "You can use either of the following:" << endl;
    cout << "A forwardslash or double backslash to separate each directory." << endl;
    getline(cin,filename);
        
    ifstream input_file(filename.c_str());
        
    if (input_file.is_open()) { //if file is open
        cout << "open" << endl; //just a coding check to make sure it works ignore
        string fileContents;
        string temp;
        while (!input_file.eof()) {
                 getline(input_file, temp);
                 fileContents.append (temp);
        }
            }
     else {
         cout << "Error opening file check path or file extension" << endl;
            }


I believe these are the correct changes as you mentioned above. Also is there an easier way to format your code it shifts right every time?
Last edited on
You need to implement a token stream that detects (* and *) and "eats" the data between.
1435
14350
14350
14350(*Hi 245*) 2
14350(*Hi 245*) 2345 556
14350(*Hi 245*) 2345 5564


This was my output. Now I have the string containing the whole file. It's starting to make sense and I'm seeing it differently. I have some thoughts on how to do the next step but I am not entirely sure if it is the most efficient way or how to loop through it. Those include a vector or can I just loop through a string and pick stuff out with a mix of isdigit()? Or if I seperate them all with a whitespace can't I get the number after each as well?
Last edited on
This is my loop but I have no idea how to fix this error:
error: no matching function for call to 'isdigit()'
Also what argument do I pass into isdigit in this case?

Above I added:
int digitCount[10] = {0};

Loop:

1
2
3
4
5
6
7
for (int i = 0; i < fileContents.length(); i++) {
     if( isdigit()){
        digitCount[fileContents[i] - '0']++;
 }
else{

}
Last edited on
I did include cctype but the issues was this:

1
2
3
  for (int i = 0; i < fileContents.length(); i++) {
                if( isdigit(fileContents[i])){
                digitCount[fileContents[i] - '0']++;
Last edited on
Here is my current code and output. The output is counting all the numbers which is progress!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <fstream>
#include <string.h>
#include <cctype>
using namespace std;

int main() {

    int digit = 1;
    int digitCount[10] = {0};

    string filename;
    //cout for getting user path
    //the compiler parses string literals differently so use a double backslash or a forward slash
    cout << "Enter the path of the data file, be sure to include extension." << endl;
    cout << "You can use either of the following:" << endl;
    cout << "A forwardslash or double backslash to separate each directory." << endl;
    getline(cin,filename);

    ifstream input_file(filename.c_str());

    if (input_file.is_open()) { //if file is open
        string fileContents;
        string temp;
        while (!input_file.eof()) { //while not end of file
            getline(input_file, temp);
            fileContents.append(temp);
        }
        cout << fileContents << endl;
        for (int i = 0; i < fileContents.length(); i++) {
                if( isdigit(fileContents[i])){
                digitCount[fileContents[i] - '0']++;
            }
        }
    }
    else {
        cout << "Error opening file, check path or file extension." << endl;
    }
        cout << "Digit  Count  Frequency" << endl; //print column titles
        for (int p = 0; p < 9; p++){
            cout <<"  " <<  digit << "      " << digitCount[p + 1] << endl;
            digit++;
        }
    return 0;
}


14350(*Hi 245*) 2345 5564
Digit  Count  Frequency
  1      1
  2      2
  3      2
  4      4
  5      5
  6      1
  7      0
  8      0
  9      0
Last edited on
I edited the above code as well. It now picks up and counts all numbers.

Output:

14350(*Hi 779*)14350(*Hi 889*)23455564
Digit  Count  Frequency
  1      2
  2      1
  3      3
  4      4
  5      5
  6      1
  7      2
  8      2
  9      2
I apologize, I am just not seeing where that fits into the code. I understand what you are saying but not where it belongs.
Last edited on
My newest code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <fstream>
#include <string.h>
#include <cctype>
#include <numeric>
#include <iomanip>
using namespace std;

int main() {

    int digit = 1;
    float digitCount[10] = {0};
    float frequency[10] = {0};
    int total;

    string filename;
    //cout for getting user path
    //the compiler parses string literals differently so use a double backslash or a forward slash
    cout << "Enter the path of the data file, be sure to include extension." << endl;
    cout << "You can use either of the following:" << endl;
    cout << "A forwardslash or double backslash to separate each directory." << endl;
    getline(cin,filename);

    ifstream input_file(filename.c_str());

    if (input_file.is_open()) { //if file is open
        string fileContents;
        string temp;
        while (!input_file.eof()) { //while not end of file
            getline(input_file, temp); //copies file contents to string
            fileContents.append(temp);
        }
        cout << fileContents << endl; //prints contents for testing purposes
        for (int i = 0; i < fileContents.length(); i++) {
                if( isdigit(fileContents[i])){
                digitCount[fileContents[i] - '0']++;
            }
        }
    }
    else {
        cout << "Error opening file, check path or file extension." << endl; 
    }
        total = accumulate(begin(digitCount), end(digitCount), 0, plus<int>()); //gets number total in array
        total = total - digitCount[0];

        for(int r = 0; r < 9; r++){ //calculates frequency into new array 
            frequency[r] = digitCount[r + 1] / total;
        }

        cout << "Digit  Count  Frequency" << endl; //print column titles
        for (int p = 0; p < 9; p++){
            cout <<"  " <<  digit << "      " << digitCount[p + 1] << "      " << setprecision(1) << frequency[p] <<endl;
            digit++;
        }
    return 0;
}


14350(*Hi 245*) 2345 5564
Digit  Count  Frequency
  1      1      0.07
  2      2      0.1
  3      2      0.1
  4      4      0.3
  5      5      0.3
  6      1      0.07
  7      0      0
  8      0      0
  9      0      0
Consider this code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <sstream>
#include <string>

std::istringstream in("14350(*Hi 245*) 2345 5564");

int main() {

    bool inParen = false;
    char ch;

    std::string content;

    while (in.get(ch)) {
        if (inParen && ch == ')') { inParen = false; }

        if (inParen) { content += ch; }
        else if (ch == '(') { inParen = true; }

        if (!inParen && !content.empty()) {
            std::cout << '"' << content << "\"\n";
            content.clear();
        }
    }
}


It prints out only characters between parenthesis. Your code must do something similar. The differences are that you want the stuff outside the parenthesis and you need to account for a two-character initiator/terminator (requiring keeping track of the last two characters extracted instead of just the one.)

Hopefully analyzing this will get you a little closer to where you need to be.
Doing it on a buffer should be a million times easier, @cire.

No, it would not.
closed account (48T7M4Gy)
<string> functionality enables the opening and closing substrings, (* and *) , to be easily found in any line and thus enable the line to be parsed accordingly and using a parse-on/parse-off flag system. (Character by character will work but is not best suited to the double character delimiters)

http://www.cplusplus.com/reference/string/

Of course if there are no intermediate word(s) between the two delimiters eg (*start intermediates end*) then the task is very simple by treating input as strings and using cin instead of getline. Any string containing the delimiter substrings gets ignored. There doesn't appear to be anything so far in the sample input that precludes this that I can see.
(Character by character will work but is not best suited to the double character delimiters)

If you mean you can use strings and do more work while increasing complexity and needlessly fragmenting memory... yeah - I guess that is more suited to the problem than introducing a single char variable to track the previous character in the stream. Why didn't I think of that?

Perhaps we should introduce std::regex?
Topic archived. No new replies allowed.