Locales

Hello all,

Recently, I have been trying to write a small program to practice C++. Since I'm from a French speaking region, I would like my program to be able to adjust to different locales.

To test my understanding of locale basics, I have written this code to test whether a char "a" is a letter or not (in the locale):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include<iostream>
#include<locale>

using namespace std;

bool letter(char a); // Test if "a" is a letter in the locale's language.

int main(void)
{
	cout.imbue(locale("fr_CA.utf8")); //French canadian locale.
	cout << letter('è'); // A letter in French...

	return 0;
}

bool letter(char a)
{
	locale userLocale("fr_CA.utf8"); // Construct locale object using the French canadian locale.

	bool isAlpha = use_facet< ctype<char> >(userLocale).is(ctype_base::alpha, a);

	return isAlpha;
}


The program compiles fine, but returns "0" (thus, "è" is not a letter for the program).

What am I doing wrong here? It seems like the program is ignoring the locale.

FYI, I am programming on Ubuntu using GCC. The fr_CA.UTF-8 locale is installed on my computer but is not my default locale, which is en_CA.UTF-8.

Thank you!
Last edited on
What encoding do you use? I believe you need a single-byte encoding for both your file and locale to work properly.
MiiNiPaa,

Thanks for your reply. My .cpp file is encoded in UTF-8, which is not single-byte, according to this source: https://en.wikipedia.org/wiki/SBCS.

When I try using the two single-byte encodings in Eclipse, the letter 'è' is transformed as a weird sign for missing characters.

I'm not exactly sure I understand what you mean...
Try to use wide characters:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include<iostream>
#include<locale>

using namespace std;

bool letter(wchar_t a); // Test if "a" is a letter in the locale's language.

int main(void)
{
	wcout.imbue(locale("fr_CA.utf8")); //French canadian locale.
	wcout << letter(L'è'); // A letter in French...

	return 0;
}

bool letter(wchar_t a)
{
	locale userLocale("fr_CA.utf8"); // Construct locale object using the French canadian locale.

	bool isAlpha = use_facet< ctype<wchar_t> >(userLocale).is(ctype_base::alpha, a);

	return isAlpha;
}
The problem of utf8 is that it is neither char nor wchar_t. It may take up to four char:

https://en.wikipedia.org/wiki/UTF-8

hence isalpha is unable to deal with this.
coder777,

Thanks for the comment. From what your telling me, it seems like I am not using the right tool to reach my goal... Would you have any suggestion as to what I could use to achieve this?

I could I test for French, Russian or Japanese characters (depending on the user's locale)?

@MiiNiPaa: thanks for your suggestion, but sadly using wide chars is not working...
Last edited on
Unfortunately it seems that there is no simple solution. Especially since it is language dependand.

Take a look at boost:

http://www.boost.org/doc/libs/1_58_0/libs/locale/doc/html/index.html

and some unicode libraries:

http://unicode.org/resources/libraries.html


If there are not so many alpha characters outside the standard [ascii] (which is actually char and works) then you might consider to hard code this.
coder777,

Ok. The links you provided are really interesting, I will definitely look into that. I thought there was a straightforward way to do this...

I will keep your last suggestion in mind.
You don't need boost to do what you're trying to do, linux actually supports Unicode in C++ as the language intended (Windows is the outlier here)

The problem with the first program is that the character literal (something enclosed by single quotes) must be a single byte, and è is not a single byte in UTF-8, which is what linux saves your source code file in. assuming normal configuration.

In fact, clang gives an error here:
1
2
test.cc:11:17: error: character too large for enclosing character literal type
        cout << letter('è'); // A letter in French... 


and gcc gives a somewhat worse-worded warning:
1
2
3
test.cc:11:17: warning: multi-character character constant [-Wmultichar]
  cout << letter('è'); // A letter in French...
                 ^


MiiNiPaa is correct, you have to use a wide character or, alternatively, a string

Also, don't use that hideous use_facet

The following works for me:

using wchar_t:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <iostream>
#include <locale>
#include <clocale>

bool letter(wchar_t a)
{
    return std::isalpha(a, std::locale("fr_CA.utf8"));
}

int main()
{
    // either this:
//    std::setlocale(LC_ALL, "fr_CA.utf8");
    // or this:
    std::ios::sync_with_stdio(false);

    std::wcout.imbue(std::locale("fr_CA.utf8"));
    std::wcout << "letter(" << L'è' << ") returns "
              << std::boolalpha << letter(L'è') << '\n';
}


using UTF-8 strings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <iostream>
#include <locale>
#include <clocale>
#include <algorithm>
#include <codecvt>

// test if every character in a UTF-8 string is a letter
bool letters(std::string str)
{
    // this part of C++11 didn't come to gcc until version 5.0
    // (Microsoft and Clang had it in 2010 though)
    std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
    std::wstring wstr = conv.from_bytes(str.data());
    return std::all_of(wstr.begin(), wstr.end(),
             [](wchar_t c){return std::isalpha(c, std::locale("fr_CA.utf8"));});
}

int main()
{
    // either this:
//    std::setlocale(LC_ALL, "fr_CA.utf8");
    // or this:
    std::ios::sync_with_stdio(false);

    std::cout.imbue(std::locale("fr_CA.utf8"));
    std::cout << "letters(" << "è" << ") returns "
              << std::boolalpha << letters("è") << '\n';
}



By the way, you're not using anything french-canadian in here, any UTF-8 locale will work. Here's how these programs run on coliru with American English UTF-8:

wide char:
http://coliru.stacked-crooked.com/a/0bce63e9beb5ce05
strings:
http://coliru.stacked-crooked.com/a/2d88de701da9b81f
Last edited on
Topic archived. No new replies allowed.