reading bytes from a file

I have another online "fill in the ..." programming example i'm having trouble with. It has to do with reading binary files and this is the first problem they give:

Open the file with the given name as a binary file. Count how often each byte value (between 0 and 255) occurs in the given file. A byte value returned by infile.get outside the range from 0 to 255 indicates the end of the file. Also compute the length of the file.
The program will then print all byte values that occur at least 2 percent of the time. (Horstmann, 2017-08-12, p. 287)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
	int byte_counts[256];
	for (int i = 0; i < 256; i++)
	{
		byte_counts[i] = 0;
	}
	fstream infile;
	string filename = "queen-mary.bmp";
	int length = 0;
	// full here
	while (input > 0 && input < 256)
	{
			int input = infile.get();

		length++;
	}

	for (int i = 0; i < 256; i++)
	{
		if (byte_counts[i] >= 0.02 * length)
		{
			cout << i << ": " << byte_counts[i] << endl;
		}
	}
	return 0;
}


I have no idea how to go about this or even sure what they are asking, i'm not sure why they have an array set up either. I believe they are asking for me to count how many types a byte value shows up in a file?

anyway this is all i managed to come up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
	int byte_counts[256];
	for (int i = 0; i < 256; i++)
	{
		byte_counts[i] = 0;
	}
	fstream infile;
	string filename = "queen-mary.bmp";
	int length = 0;
	// full here
	infile.open(filename, ios::in | ios::out | ios::binary);
	int input = infile.get();
	while (input > 0 && input < 256)
	{
			int input = infile.get();

		length++;
	}

	for (int i = 0; i < 256; i++)
	{
		if (byte_counts[i] >= 0.02 * length)
		{
			cout << i << ": " << byte_counts[i] << endl;
		}
	}
	return 0;
}





Last edited on
Look closely at your instructions, this part in particular: "A byte value returned by infile.get outside the range from 0 to 255 indicates the end of the file"
You're only checking for values greater than 255, but you forgot about values less than zero. Note: you would probably be better off testing the state of the stream instead of looking at the actual value.

Where are you "Also compute the length of the file." and "The program will then print all byte values that occur at least 2 percent of the time."?
Last edited on
A byte value returned by infile.get outside the range from 0 to 255 indicates the end of the file.

this makes no sense. either its a signed byte, in which case, its -127 to 128 or whatever, or its unsigned, and its 0-255. There are no 8 bit bytes outside of 0-255. Either infile.get returns something other than a byte, or its no 8 bit system, or someone is on drugs.

I would forget all this, and just open the file, read all the bytes (you can seek the end to get the file's size in bytes if you want an old school way) and count them. To do that, you can make an array of unsigned int size 256 and increment each one as you find it, treating the bytes of the file as array index, eg if the first thing in the file is 0xFF then countarray[255]++ and so on.
Last edited on
@OP
This will get you started.

You need to read char's not int's otherwise you'll go nowhere.

It looks also like the end of file comment in the question is wrong because you will be very lucky to go past the first line.

The program is a little bit like counting characters - use the array for that - if you have an input char such that its value (ASCII) is 253, then byte_count[253]++ etc etc That goes inside the while loop

I reckon the 0.2 refers to printing out only 20% of the input chars as the while loop proceeds.

At the end outside the while loop display length then the array totals.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    int byte_counts[256]{0}; // <-- KEEP BUT USE THIS LATER
    
    string filename = "queen-mary.bmp"; // <- GET YOURSELF A PICTURE
    ifstream infile (filename, ios::in | ios::out | ios::binary);
    
    int length{0};
    char input;
    
    while (!infile.eof()) // <-- eof() SORT OF BAD BUT WORKS FOR DEMO
    {
        input = infile.get();
        std::cout << input << '\n';
        
        length++;
    }
    std::cout << "Length: " << length << '\n';
    
//    for (int i = 0; i < 256; i++)
//    {
//        if (byte_counts[i] >= 0.02 * length)
//        {
//            cout << i << ": " << byte_counts[i] << endl;
//        }
//    }
    
    return 0;
}



Lots of stuff ...

\266
\261
\305
A
\322
0
&
\360
\363
\267
\272
\361
\273
\317
W
\270
\310
\205
K
\374
}
\321
\363
\217
\233
F
&
\317
\377
\331
\377
Length: 7163
Program ended with exit code: 0



BTW
This might also help and provide some vague insight into eof and the textbook out of range comment. Doesn't go much anywhere beyond interest though.
https://stackoverflow.com/questions/13830338/377-character-in-c
Hello JamesHelp,

I worked on the program yesterday and did get get it to work properly. At least based on the file I was using.

It would help to be able to use the file that you are using and to know if there is any expected out based on that file.

I changed some of the variable names and added some parts. You may not want to use everything I added and that is fine.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <iomanip>  // <--- Added.
#include <fstream>
#include <string>

using namespace std;

int main()
{
    int byte_counts[256]{};

    //string filename = "queen-mary.bmp";
    std::string inFileName{ "wmpnss_color32.bmp" };

    ifstream inFile{ inFileName, std::ios::binary };  // <--- Changed.

	if (!inFile)  // <--- Added.
	{
		std::cerr << "\n File " << std::quoted(inFileName) << " did not open" << std::endl;

        return 1;
	}

    int length{};
    int index = inFile.get();  // <--- Changed.

    // full here

    while (index >= 0 && index < 256)  // <--- Changed.
    {
           byte_counts[index]++;

           length++;
        
        index = inFile.get();
    }

    for (int idx = 0; idx < 256; idx++)
    {
        if (byte_counts[idx] >= 0.02 * length)
        {
            cout << std::setw(3) << idx << ": " << byte_counts[idx] << endl;
        }
    }

    return 0;
}


Based on the file I used it produced this output:

  0: 545
128: 852
255: 462

Length = 4152


The last line is not coded in the program, but added just to show its value.

Your second code is a good start, but lacking some of what it needs to work.

Changing "input", or "index" as I used, is not necessary.

Before seeing agintry's link I did find that "inFile.get()" did return a "-1" which causes the while condition to fail when end of file is reached.

Andy
@OP
Take your pick. Best choose a smaller picture of the Queen Mary than I did:
@OP take your pick.
I suppose I could have picked a smaller picture of the Queen Mary.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>

int main()
{
    int byte_counts[256]{0}; // <-- KEEP BUT USE THIS LATER
    
    std::string filename = "queen-mary.bmp"; // <- GET YOURSELF A PICTURE
    std::ifstream infile (filename,
                          std::ios::in | std::ios::out | std::ios::binary);
    
    int length{0};
    char input;
    
    while (!infile.eof()) // <-- eof() SORT OF BAD BUT WORKS FOR DEMO
    {
        input = infile.get();
        if(length % 5 == 0)
            std::cout
            << "No: "
            << std::setw(5) << std::right << length
            << ' ' << input << ' ' << (int)(input) << '\n';
        
        if( (int)input >= 0 and (int)input <= 255 )
            byte_counts[(int)input]++;
        
        length++;
    }
    infile.close();
    
    
    for (int i = 0; i < 255; i++)
    {
        std::cout
        << i << ' '  << char(i)<< ": " << byte_counts[i] << " times\n";
    }
    
    std::cout << "File length: " << length << '\n';
    
    return 0;
}


No:     0 \377 -1
No:     5  16
No:    10  0
No:    15  1
No:    20 \377 -1
No:    25 	 9
No:    30  18
No:    35 [ 22
No:    40 & 21
No:    45 < 24
No:    50 & 21
No:    55 & 21
No:    60  29
No:    65 % 37
No:    70 1 49
No:    75 . 46
No:    80 3 51
No:    85 ( 40
No:    90 
 10
No:    95  14
No:   100 - 45
No:   105 - 45
No:   110 0 48
No:   115 - 45
No:   120 - 45
No:   125 - 45
No:   130 - 45
No:   135 - 45
No:   140 - 45
No:   145 - 45
No:   150 - 45
No:   155 \300 -64
No:   160 \276 -66
No:   165 " 34
No:   170  3
No:   175  0
No:   180  5
No:   185  0
No:   190  0
No:   195  0
No:   200  6
No:   205  0
No:   210  1
No:   215  6
No:   220  7
No:   225  2
No:   230  18
No:   235 Q 81
No:   240 2 50
No:   245 \360 -16
No:   250  7
No:   255 S 83
No:   260 \341 -31
No:   265 \242 -94
No:   270 % 37
No:   275 \302 -62
No:   280 > 25
No:   285  1
No:   290  0
No:   295  0
No:   300  2
No:   305 \304 -60
No:   310  2
No:   315  4
No:   320  0
No:   325  0
No:   330  3
No:   335 A 65
No:   340 R 82
No:   345 " 34
No:   350 C 67
No:   355 \377 -1
No:   360  1
No:   365  17
No:   370 \200 -128

No:  7075 \347 -25
No:  7080 \257 -81
No:  7085 \374 -4
No:  7090 \252 -86
No:  7095 \375 -3
No:  7100 } 125
No:  7105 \357 -17
No:  7110 \232 -102
No:  7115 Q 81
No:  7120 l 108
No:  7125 \363 -13
No:  7130 \326 -42
No:  7135 A 65
No:  7140 \363 -13
No:  7145 \317 -49
No:  7150 K 75
No:  7155 \217 -113
No:  7160 \377 -1


0 : 107 times
1 : 51 times
2 : 37 times
3 : 34 times
4 : 35 times
5 : 31 times
6 : 29 times
7 : 25 times
8 : 36 times
9 	: 20 times
10 
: 26 times
11 : 14 times
12 : 22 times
13 
: 23 times
14 : 27 times
15 : 31 times
16 : 29 times
17 : 32 times
18 : 35 times
19 : 26 times
20 : 26 times
21 &: 37 times
22 [: 34 times
23 ]: 29 times
24 <: 40 times
25 >: 23 times
26 
: 26 times
27 : 32 times
28 : 25 times
29 : 25 times
30 : 33 times
31 : 17 times
32  : 29 times
33 !: 28 times
34 ": 20 times
35 #: 23 times
36 $: 28 times
37 %: 31 times
38 &: 34 times
39 ': 22 times
40 (: 22 times
41 ): 29 times
42 *: 31 times
43 +: 38 times
44 ,: 24 times
45 -: 89 times
46 .: 25 times
47 /: 28 times
48 0: 16 times
49 1: 35 times
50 2: 34 times
51 3: 30 times
52 4: 26 times
53 5: 30 times
54 6: 26 times
55 7: 30 times
56 8: 20 times
57 9: 25 times
58 :: 41 times
59 ;: 26 times
60 <: 23 times
61 =: 24 times
62 >: 36 times
63 ?: 12 times
64 @: 24 times
65 A: 42 times
66 B: 24 times
67 C: 22 times
68 D: 23 times
69 E: 31 times
70 F: 35 times
71 G: 32 times
72 H: 29 times
73 I: 32 times
74 J: 30 times
75 K: 26 times
76 L: 26 times
77 M: 32 times
78 N: 23 times
79 O: 21 times
80 P: 23 times
81 Q: 40 times
82 R: 31 times
83 S: 32 times
84 T: 34 times
85 U: 32 times
86 V: 25 times
87 W: 25 times
88 X: 29 times
89 Y: 26 times
90 Z: 38 times
91 [: 27 times
92 \: 31 times
93 ]: 23 times
94 ^: 28 times
95 _: 35 times
96 `: 29 times
97 a: 36 times
98 b: 39 times
99 c: 22 times
100 d: 28 times
101 e: 25 times
102 f: 20 times
103 g: 18 times
104 h: 26 times
105 i: 32 times
106 j: 34 times
107 k: 27 times
108 l: 30 times
109 m: 30 times
110 n: 31 times
111 o: 34 times
112 p: 19 times
113 q: 27 times
114 r: 29 times
115 s: 28 times
116 t: 34 times
117 u: 25 times
118 v: 22 times
119 w: 12 times
120 x: 24 times
121 y: 29 times
122 z: 22 times
123 {: 20 times
124 |: 35 times
125 }: 26 times
126 ~: 13 times
127 : 21 times
128 \200: 0 times
129 \201: 0 times
130 \202: 0 times
131 \203: 0 times
132 \204: 0 times
133 \205: 0 times
134 \206: 0 times
135 \207: 0 times
136 \210: 0 times
137 \211: 0 times
138 \212: 0 times
139 \213: 0 times
140 \214: 0 times
141 \215: 0 times
142 \216: 0 times
143 \217: 0 times
144 \220: 0 times
145 \221: 0 times
146 \222: 0 times
147 \223: 0 times
148 \224: 0 times
149 \225: 0 times
150 \226: 0 times
151 \227: 0 times
152 \230: 0 times
153 \231: 0 times
154 \232: 0 times
155 \233: 0 times
156 \234: 0 times
157 \235: 0 times
158 \236: 0 times
159 \237: 0 times
160 \240: 0 times
161 \241: 0 times
162 \242: 0 times
163 \243: 0 times
164 \244: 0 times
165 \245: 0 times
166 \246: 0 times
167 \247: 0 times
168 \250: 0 times
169 \251: 0 times
170 \252: 0 times
171 \253: 0 times
172 \254: 0 times
173 \255: 0 times
174 \256: 0 times
175 \257: 0 times
176 \260: 0 times
177 \261: 0 times
178 \262: 0 times
179 \263: 0 times
180 \264: 0 times
181 \265: 0 times
182 \266: 0 times
183 \267: 0 times
184 \270: 0 times
185 \271: 0 times
186 \272: 0 times
187 \273: 0 times
188 \274: 0 times
189 \275: 0 times
190 \276: 0 times
191 \277: 0 times
192 \300: 0 times
193 \301: 0 times
194 \302: 0 times
195 \303: 0 times
196 \304: 0 times
197 \305: 0 times
198 \306: 0 times
199 \307: 0 times
200 \310: 0 times
201 \311: 0 times
202 \312: 0 times
203 \313: 0 times
204 \314: 0 times
205 \315: 0 times
206 \316: 0 times
207 \317: 0 times
208 \320: 0 times
209 \321: 0 times
210 \322: 0 times
211 \323: 0 times
212 \324: 0 times
213 \325: 0 times
214 \326: 0 times
215 \327: 0 times
216 \330: 0 times
217 \331: 0 times
218 \332: 0 times
219 \333: 0 times
220 \334: 0 times
221 \335: 0 times
222 \336: 0 times
223 \337: 0 times
224 \340: 0 times
225 \341: 0 times
226 \342: 0 times
227 \343: 0 times
228 \344: 0 times
229 \345: 0 times
230 \346: 0 times
231 \347: 0 times
232 \350: 0 times
233 \351: 0 times
234 \352: 0 times
235 \353: 0 times
236 \354: 0 times
237 \355: 0 times
238 \356: 0 times
239 \357: 0 times
240 \360: 0 times
241 \361: 0 times
242 \362: 0 times
243 \363: 0 times
244 \364: 0 times
245 \365: 0 times
246 \366: 0 times
247 \367: 0 times
248 \370: 0 times
249 \371: 0 times
250 \372: 0 times
251 \373: 0 times
252 \374: 0 times
253 \375: 0 times
254 \376: 0 times
File length: 7163
Program ended with exit code: 0
quite confused by this whole mess.

@OP: your two code snips are too similar, I see that you open the file and «fixed» the compile error
¿why did you post the first one?

> what they are asking (...)
> I believe they are asking for me to count how many types a byte value shows
> up in a file?
the problem statement says «Count how often each byte value (between 0 and 255) occurs in the given file»
¿what part you don't understand?
¿can you do «count how many 'a' are in this sentence»? then count each vowel, then instead of a sentence a whole file, then count each letter
it's all the same idea


> Where are you "Also compute the length of the file." and "The program will
> then print all byte values that occur at least 2 percent of the time."?
@jlb: ¿are you testing the OP? those are the only things that are done because (I guess) they were on the original question
sure, it's not counting, but the print part it's done.


> You need to read char's not int's otherwise you'll go nowhere.
¿ever heard of overflow?

> while (!infile.eof()) // <-- eof() SORT OF BAD BUT WORKS FOR DEMO
indeed, shows the one off error which is the reason you shouldn't check for eof that way


@Handy Andy: nice, let's just solve the OP homework, that'll teach them.


@againtry's second code: it's like you are seeing all the blood but don't realise that you just blow off your foot.
Topic archived. No new replies allowed.