Relatinship int and char

Hi there,
I am currently dealing with an issue resulting out of reading text with characters in ASCII and code > 127. But the original problem is not my issue....

So I have a file that contains such characters and I want to show them an deal with them, but I have trouble doing so, which results from what the compiler treats char and int. Here my code for isolating the issue:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <string>
#include <fstream>

int main()
{
	std::ifstream file;
	file.open("info.txt", std::fstream::binary);
	if (!file) {
		std::cout << "Unable to open file!\n";
		return __LINE__;
	}
	file.seekg(0, file.end);
	size_t length = file.tellg();
	file.seekg(0, file.beg);

	char* buffer = new char[length];
	file.read(buffer, length);

	char* cp = buffer;

	int int_var;

	while (*cp) {
		if (strncmp(cp, "searchstring", strlen("searchstring")) == 0) {
			char text[50];
			memset(text, 0, 50);
			strncpy_s(text, 50, cp, 49);
			std::cout << text << std::endl;
			for (int i = 0; i < 49; i++) {
				memset(&int_var, 0, sizeof(int));
				memcpy(&int_var, cp+i, sizeof(char));
				std::cout << text[i] << " (" << int_var << ", ";
                                std::cout << static_cast<int>(*(cp+i)) << ") ";
			}
			std::cout << "\n";
		}
		cp++;
	}

	file.close();
	delete[] buffer;
	return 0;
}

My expectation would be that displaying int_var and static_cast<int>(*(cp+i)) show the the same, but int_var shows indeed a value between 127 and 255 but the otherone shows me a negative number. Why is that?

Doing this on Windows, Visual Studio Community 2019, 64bit.

(I am doint the strcmp as I know that the characters I am interested in are following my search string)
Thanks!
Last edited on
I have a file that contains such characters
A good starting point would be to determine the character set of these characters. It's probably utf-8.
Last edited on
In this implementation, I'd guess that char is a signed 8-bit integer.

Which means it's a seven bit number, with the first bit being used to indicate positive or negative.

So this number, interpreted as an unsigned int: 1111 1111 , is the value 255.
The same, 1111 1111 , as a signed int, is -1. A negative number.

The binary 0111 1111 means 127 as an unsigned int, and 127 as a signed char.
The binary 1000 0000 means 128 as an unsigned int, and -128 as a signed char.

*(cp+i), being a char, is a signed integer value, so it can be negative.

If you change all your use of char to unsigned char, do you see the behaviour change?



That would be the next question and part of the original issue, but at the moment I am confused about the specific int / char issue (it will be the default Windows, what ever its name is).

(my starting point in my original code was to search for characters 127 < c < 255, but that failed due to the above issue)
Last edited on
ASCII is only 7-bit, but of course most modern computer systems are 8-bit bytes. Character sets for values 128 to 255 are not ASCII; they may be different on Windows vs. Linux (for example), and I think depends on your locale.

On Windows, the character set will be Windows-1252 (CP-1252) [at least, by default?].

Also, note that char may be signed or unsigned, depending on the implementation. If you want to be sure that 128 is a valid char value, use unsigned char (edit: as Repeater said after I refreshed the page :p)
Last edited on
, but at the moment I am confused about the specific int / char issue (it will be the default Windows, what ever its name is).


Can you be more specific? What is the int / char issue?

Looks to me like you've got a signed char, represented in binary as 1000 0000 , and then you're telling the compiler to take that value (minus 127) and create a signed int of the same value, minus 127.

I'd expect the char to be 1000 0000 , being an 8 bit value, and a 32 bit int to be 11111111 11111111 11111111 10000001.

Let's check:

https://ideone.com/VMK3pc

Seems about right. Different implementations might have char as signed by default, or unsigned by default, but it seems clear what's going on in your code.
Last edited on
The unsigned char is the solution.... tried it actually before, but changed everything to unsigned char, which gave me compile errors as the file operations expect char*. So with this it behaves like I would expect:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
	char* buffer = new char[length];
	file.read(buffer, length);

	unsigned char* cp = reinterpret_cast<unsigned char*>(buffer);

	int int_var;

	while (*cp) {
		if (strncmp(reinterpret_cast<char*>(cp), "searchstring", strlen("searchstring")) == 0) {
			char text[50];
			memset(text, 0, 50);
			strncpy_s(text, 50, reinterpret_cast<char*>(cp), 30);
			std::cout << text << std::endl;
			for (int i = 0; i < 30; i++) {
				memset(&int_var, 0, sizeof(int));
				memcpy(&int_var, reinterpret_cast<char*>(cp+i), sizeof(char));
				std::cout << text[i] << " (" << int_var << ", " ;
                                std::cout << static_cast<unsigned int>(*(cp+i)) << ") ";
			}
			std::cout << "\n";
		}
		cp++;
	}
Last edited on
Something to be aware of is that char, signed char, and unsigned char are all distinct types.

The signedness of char is implementation dependent.

Important modern compilers, however, will let you specify the signedness of char as a compile option.

Either way, if you wish to use the integer value of a character, you should always explicitly cast it first to an unsigned char, then to whatever integer range you wish.

(Otherwise you may find yourself dealing with sign-extend issues like in your first post.)

Hope this helps.
Topic archived. No new replies allowed.