std::string gethex(char c)
{
/* EXAMPLE
if (c == 'é')
return "%c3%a9";
etc...
I need a function that converts chars like "á, é, í, ã" to UTF-8 hexadecimal strings...
*/
}
std::string encode(std::string str)
{
static std::string unreserved = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~";
std::string r;
for (int i = 0; i < str.length(); i++ )
{
char c = str.at(i);
if (unreserved.find(c) != -1)
r+=c;
else
r+=gethex(c);
}
return r;
}
I need a function that converts chars like "á, é, í, ã" to UTF-8 hexadecimal strings...
The thing is... characters in a string are already encoded as something. They have to be. So you have to ask yourself whether or not the string is already UTF-8 encoded.
If it isn't... you'll have to find out what encoding it's in, and convert that to UTF-8.
Once you have a UTF-8 string, it's just a matter of looking at (and printing) the values as integers rather than as chars:
1 2 3
char example = 'a';
cout << hex << static_cast<int>(example); // prints '61'
Yes, but 'c' in this case is just going to be an integer. All characters are represented by the computer as an integer.
The char data type is the same as the int data type, only smaller in size. The character it contains is really the integral ID of a character.
So this:
1 2 3
char c = 'a';
if(c == 0x61) // <- this will be true, because 'a'==0x61
So if all you want is to print the character as an integer... then that is the code I already posted:
1 2 3
char example = 'a';
cout << hex << static_cast<int>(example); // prints '61'
But the real question here how is your 'c' encoded? Is it UTF-8 or is it some other encoding?
There is no way to solve this problem unless you know what kind of characters you're dealing with. In the end you just have a bunch of numbers, and in order to do this properly you need to know what those numbers represent.
So where are you getting 'c' from? A file? The user?
To get lower case characters, change line 8
// stm << '%' << std::hex << std::uppercase << c ;
stm << '%' << std::hex << std::nouppercase << c ;
> not "%FFFFFFE1".
Treat each byte in the utf-8 encoded string as an unsignedchar;
the default char may be a signed integral type. for( unsignedchar c : str ) { /* ... */ }