1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
|
/*
* Copyright (c) 2009, Helios (helios.vmg@gmail.com)
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY HELIOS "AS IS" AND ANY EXPRESS OR IMPLIED
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
* EVENT SHALL HELIOS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "utf8.h"
wchar_t *UTF8_to_WChar(const char *string){
long b=0,
c=0;
if ((uchar)string[0]==BOM8A && (uchar)string[1]==BOM8B && (uchar)string[2]==BOM8C)
string+=3;
for (const char *a=string;*a;a++)
if (((uchar)*a)<128 || (*a&192)==192)
c++;
wchar_t *res=new wchar_t[c+1];
res[c]=0;
for (uchar *a=(uchar*)string;*a;a++){
if (!(*a&128))
//Byte represents an ASCII character. Direct copy will do.
res[b]=*a;
else if ((*a&192)==128)
//Byte is the middle of an encoded character. Ignore.
continue;
else if ((*a&224)==192)
//Byte represents the start of an encoded character in the range
//U+0080 to U+07FF
res[b]=((*a&31)<<6)|(a[1]&63);
else if ((*a&240)==224)
//Byte represents the start of an encoded character in the range
//U+07FF to U+FFFF
res[b]=((*a&15)<<12)|((a[1]&63)<<6)|(a[2]&63);
else if ((*a&248)==240){
//Byte represents the start of an encoded character beyond the
//U+FFFF limit of 16-bit integers
res[b]='?';
}
b++;
}
return res;
}
//Do not call me.
long getUTF8size(const wchar_t *string){
if (!string)
return 0;
long res=0;
for (;*string;string++){
if (*string<0x80)
res++;
else if (*string<0x800)
res+=2;
else
res+=3;
}
return res;
}
char *WChar_to_UTF8(const wchar_t *string){
long fSize=getUTF8size(string);
char *res=new char[fSize+1];
res[fSize]=0;
if (!string)
return res;
long b=0;
for (;*string;string++,b++){
if (*string<0x80)
res[b]=(char)*string;
else if (*string<0x800){
res[b++]=(*string>>6)|192;
res[b]=(*string&63)|128;
}else{
res[b++]=(*string>>12)|224;
res[b++]=((*string&4095)>>6)|128;
res[b]=(*string&63)|128;
}
}
return res;
}
//my helper function
void loadContent(const string& filename, wstring& dest)
{
ifstream fin(filename.c_str(), ios::binary);
if(!fin)
{
cout<<"Cannot open file\n";
exit(1);
}
fin.seekg(0, ios::end);
int len = fin.tellg();
fin.seekg(0, ios::beg);
char *content = new char[len+1];
fin.read(content, len);
/**/fin.close();
content[len] = '\0';
wchar_t *wcontent = UTF8_to_WChar(content);
dest = wcontent;
/**/delete [] content;
/**/delete [] wcontent;
}
|