duthomhas::utf8_console

Pages: 12
Something that has bothered me for a while is that *nix programs natively support UTF-8 cin,cout,etc and UTF-8 command-line arguments, but Windows programs do not, and the usual methods to go Unicode with Windows means a lot of code butchery.

C++ actually makes it pretty easy to fix the standard streams, but less so for the arguments to main().

Why can’t code just compile and work correctly on both systems?


Well, it can.

I wrote a little library to do this and finally tweaked it up nice enough to post on GitHub.

https://github.com/Duthomhas/utf8_console

If you are on Windows please give it a quick spin to tell me what you think and any issues you encounter.
Windows programming as in command line on a Windows machine? People do that?

Why not just PutTy into a linux server?

(That's a genuine question b.t.w).

I just now learned Unix command line compilation, so if I figure out Windows too, I'll check it out.
@jjordan33,

https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=vs-2019

MS has IMO made it easier to access build tools from the command line with VS 2017 and now 2019.

Maybe I just became more comfortable with command line builds since MS released 2017.

Either way, it isn't rocket science any more. :Þ
I'll check it out after work.
Thanks for the link.
Yeah, let me know how it works for you. Don’t forget to play around with the code page as you do. (CP 65001 is UTF-8.)

And yeah, from the very beginning I’ve been using C and C++ from the command terminal, so it seems odd to me that so few people are familiar with it — or even reject it outright. PuTty is a whole lot of overhead to not use my machine directly.

I suppose I ought to finish my Windows Command-Line How-To FAQ. It really isn’t very hard. The key is:

    Control Your PATH

That’s it!

Of course, there is a little more to it than that.

I keep a little directory under my account: C:\Users\Michael\bin.
In that directory I keep a file called prompt.bat, which is more or less equivalent to a .profile or .bashrc file in *nix. Here is the current content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
@echo off
cls
path C:\Users\Michael\bin;%PATH%

set DIRCMD=/ogn

:: Macro Aliases

doskey deltree=rd/s/q $*
doskey dira=dir/a $*
doskey dird=dir/a-hd $*
doskey mcd=@md $* $T @cd $*
doskey mpath=for %%A in ("%%PATH:;=";"%%") do @if not "%%~A"=="" @echo   %%~A
doskey rmq=del *.~*

doskey 7z="C:\Program Files (x86)\7-Zip\7z.exe" $*
doskey astyle=C:\m\bin\AStyle\bin\astyle.exe -A1 -s2 $*
doskey cppdocs=start "" C:\Users\Michael\Documents\Programming\cpp.reference.com\reference\en\index.html
doskey gimp="C:\Program Files\GIMP 2\bin\gimp-2.10.exe" $*
doskey hxd="C:\Program Files (x86)\HxD\HxD.exe" $*
doskey md5=CertUtil -hashfile $1 MD5
doskey npp="C:\Program Files (x86)\Notepad++\Notepad++" $*
doskey sdx=tclsh C:\Users\Michael\bin\sdx.kit $*
doskey sha256=CertUtil -hashfile $1 SHA256
doskey tcldocs=start "" /max C:\Users\Michael\Documents\Programming\TclTk86.chm

Inside my little “bin” directory, in addition to programs like less.exe and grep.exe (GNU Tools), inside this directory I also have a number of little batch files to manipulate the environment. For example:

markdown_edit.bat:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@echo off
setlocal
for %%A in ("%%PATH:;=";"%%") do if %%A == C:\Python37 goto :PYOK
path C:\Python37\Scripts;C:\Python37;%PATH%
:PYOK

if "%1"=="" (
  set F=README.md
) else (
  set F="%~f1"
)

if not exist %F% echo #title > %F%

start "" markdown_edit %F%

This particular one is a little messy, but straight-forward. Add Python 3.7 to the path temporarily (but only if it isn’t already in the path*) and start Mike Ward’s Markdown Edit python script without disrupting my use of the existing prompt. (Just like starting any other normal GUI program.)

* This is because I keep my path very clean. If a utility is not in use, it isn’t in my path. I will explain raisins in just a moment.

Another example is python.bat, which simply adds Python 3 and the scripts directory to the head of the path and activates the python interpreter with the arguments I provided.

python.bat:
1
2
3
@echo off
path C:\Python37\Scripts;C:\Python37;%PATH%
python.exe %*

This kind of thinking allows transparent path management at the prompt. I can type “python quux.py” at any time and it will work properly, bringing the entire Python 3.7 environment to instant use.

When I am done with Python, I can either restart the prompt or use another little utility to remove it. What I have is actually fairly complex, but a line like the following would do:

 
@path %PATH:C:\Python37\Scripts;C:\Python37;=%

For C++ compilers that do a lot of PATH modification (like MSVC), I also tend to nest a call to cmd.exe, so that I can start and use the compiler and then remove it by simply typing “exit”.

Path manipulation matters.

On Windows, especially, this is important because there may otherwise be more than one version of a tool sitting around in your path, and mixing stuff results in Bad Things HappeningTM.

For example, you may have a variant copy of MinGW sitting around from each of MSYS2, Code::Blocks, Strawberry Perl, TDM-MinGW-w64, plus other random stuff. Each version is incompatible, and utilities that come with each version may or may not choose to run the version you think you are using or some version you don’t think you are running.

Set your Windows Console Properties Defaults

I keep a shortcut on my TaskBar for the Windows Console. From that I have modified the Windows Console Properties to make it use the proper font (Lucida Console Unicode) and position itself nicely on the screen, and have a much larger space than the tiny 80 by 25 default.

I can tell a properly-initialized console from something the system or another program generates easily enough, but you could just modify the global Windows Console Properties so that all consoles start in a way you like as well.

ConEmu

For the last couple of years I have been using Maximus5’s ConEmu as my main terminal when programming. Setting it up to work much like the Windows Console is easy, and it does a few things to make life a whole lot easier as well. But that is a blog for another day, LOL.

I have yet to play with Microsoft’s latest Windows Terminal (or whatever it is called), and I can’t stand the PowerShell, so...

That’s all folks!
Last edited on
Just an update. I haven't forgotten about my commitment here.

I'm just trying to catch up on school after putting so much off for the hackathon weekend. I should get to this sometime over the weekend.
Eh, you have no obligation to me.


But if you do play with it, feedback would be nice. :O)
jjordan33 wrote:
Windows programming as in command line on a Windows machine? People do that?

All the time, actually!


Also, as @duthomas, I have plenty of simple batch files set up to modify the path temporarily: setgcc.bat, setNAG.bat, setpython.bat, setMPI.bat etc. and doskey is pretty useful.

I've even been known to start and stop services with the command line (net start or sc start) as well.

You learn about filesystems pretty quickly.
Last edited on
Okay, I tap out for now.
This is a bit above my current skill and understanding to properly field test and offer anything resembling useful feedback. However, I always keep my word, so Monday I will direct some of the upper division students to this post and see if I can conscript some professionals.
You don’t have to understand it, just use BUILD.bat to create an object file, then link it with your programs like any other .obj or .o file.

Complain if the standard streams or command-line arguments don’t properly handle UTF-8 data.
Windows programming as in command line on a Windows machine? People do that?

Why not just PutTy into a linux server?

--------------------------
If I did that, I would have to constantly transfer my files to unix, putty in, run the program, transfer the files back to my windows machine …. what is wrong with keeping it local?

I have literally 50 little 1-page programs using Cygwin on my work laptop that automate dumb little tasks, turning manual 5-10 min processes into subsecond process, at the cost of writing a few lines of code... one of them was a big deal and cut a 1-2 day process down to a few min: my co-workers were manually inventorying a dependency tree to figure out what to move from one environment to another and my program got 95% of the work done instantly and left just a tiny bit for the humans...

why spend hours tying a gui to that kind of thing? What would it be, pop up a file open dialog and a 'go' button and a second file dialog for the result... 3 slowdowns, vs drag and drop the input file onto the exe, get a outputfile of same name different extension out, no interaction to slow you down...

ive got command line js, java, c++, py and fortran on my windows box, and with Cygwin can write both .bat and unix bash scripts as well, and can also use the unix tools grep/etc at will. Don't need to go get permission from my employeer to do weirdness on the server, I can do it locally where I have more freedom and can't break anything important.
Last edited on
>> jonnin

Would you have to do all those extra steps if you downloaded that linux on windows thing?
I downloaded an ubuntu terminal a little while back for school and to mess around with. So far, I haven't seen a benefit to terminal programming over using a good compiler, but I am willing to concede that there probably advantages to command line programming.

>> Duthomas

Okay, I'm ready to do this. So, I create a batch file, fill it with your provided code.

Then this is the part I'm unclear on.

Create a makefile and do the same thing I would do for a .cpp and a .h?


Will I need to do any sort of #include at the top of my files?
er, none of that.

Download archive. Unzip. Open command prompt with the compiler available. (If MSVC, use the Native Tools Command Prompt for VS. If MinGW, make sure the MinGW's bin directory is in the path. For example, for MSYS2 add it with “path C:\msys64\mingw64\bin;%PATH%”.) Change to the unzipped directory. Type “build” and wait.

Now you have an object file you can use. You need that object file and nothing else.

Copy the utf8_console.obj or utf8_console.o file to your program’s source directory. When you compile, include it in your list of things to compile with.

MSVC: 
 
cl /EHsc /W4 /utf-8 /Ox /std:c++17 myprog.cpp utf8_console.obj


Clang-CL: 
 
clang++ -Wall -pedantic-errors -O3 -std=c++17 myprog.cpp utf8_console.o -o myprog.exe


MinGW-w64: 
 
g++ -Wall -pedantic-errors -O3 -std=c++17 myprog.cpp utf8_console.o -municode -mconsole


Clang-w64: 
 
clang++ -Wall -pedantic-errors -O3 -std=c++17 myprog.cpp utf8_console.o -municode -mconsole -o myprog.exe


MinGW (not recommended): 
 
g++ -Wall -pedantic-errors -O3 -std=c++17 myprog.cpp utf8_console.o


Other compilers: don’t bother (or know how to read the docs to use ‘wmain()’ as the entry point).

The README.md states this all in the first 20 lines.

[edits] stupid BBCode...
Last edited on
Windows Command-Line: Introducing the Windows Pseudo Console (ConPTY)
The introduction of the ConPTY API is perhaps one of the most fundamental, and liberating, changes that’s happened to the Windows Command-Line in several years … if not decades!

We, the Console team, have already ported some of Microsoft’s tools to use the ConPTY API. We’re also working with several teams inside Microsoft (Windows Subsystem for Linux (WSL), Windows Containers, VSCode, Visual Studio, etc.), and with several external parties including @ConEmuMaximus5 – creator of the awesome ConEmu 3rd party Console for Windows.

But we need your help to raise awareness of, and to start adopting the new ConPTY API
https://devblogs.microsoft.com/commandline/windows-command-line-introducing-the-windows-pseudo-console-conpty/


Windows Command-Line: Unicode and UTF-8 Output Text Buffer
If you’re running Windows 10 October 2018 Update (build 1809), you’re already running this new buffer!
https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/
Can you help me past this bump?

>>Open command prompt with the compiler available<<
I assumed having VS installed would just sort of automatically do this for me.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
11/07/2019  10:58 PM             4,930 BUILD.bat
11/07/2019  10:58 PM             1,846 example.cpp
11/07/2019  10:58 PM                37 input-example.txt
11/07/2019  10:58 PM             1,361 LICENSE_1_0.txt
11/07/2019  10:58 PM             2,939 README.md
11/07/2019  10:58 PM             9,604 utf8_console.cpp
               6 File(s)         20,717 bytes
               2 Dir(s)  310,807,941,120 bytes free

C:\Users\jeffj\Documents\ut8test\utf8_console-master>build
usage:
  BUILD.bat [COMPILER [ARCH]]

where COMPILER is one of:
  msvc     -- Make sure you are running at a Native C++ Visual Studio prompt
              (either x86 or x64)
  clang    -- Make sure that Clang is in the path
  mingw    -- Make sure that MinGW is in the path
  clean    -- Removes all generated obj and exe files.

and where ARCH is one of:
  -m32
  -m64

If you do not specify a compiler, then the first one found in the %PATH%
will be used, preferring Clang over MinGW.

If you do not specify an architecture, the compiler default is used.
Valid only for Clang and MinGW targets. (If supported. For example,
Clang-CL supports it, but MSYS2's Clang-w64 does not.) 
Would you have to do all those extra steps if you downloaded that linux on windows thing?
I downloaded an ubuntu terminal a little while back for school and to mess around with. So far, I haven't seen a benefit to terminal programming over using a good compiler, but I am willing to concede that there probably advantages to command line programming.

There are ways to do less steps if it were a free-for-all environment. Our unix systems are our SERVERS. We can't just cross map it so that the unix box can edit the files on my laptop directly, so yes, I would have to do a lot of file copies to do it on unix instead of on my laptop.

benefit to terminal programming over using a good compiler
?? I don't know what that means. Coding in the terminal uses a good compiler. Did you mean IDE?

I don't use an IDE for little things. The only benefit is less aggravation: the very tools that make an IDE sweet for a larger project make them a pain in the behind for a 1 page program. I can literally write hello world and have it execute before visual studio is done opening on my system, due to enterprise slowness (does he have permission to use this? is it up to date? ok, what project did he load last? None, ok, so what folder are we going to write 100 project files to? ok, now lets load up the hard disk's folder tree. and on and on and on it goes and ALL IT HAD TO DO WAS OPEN A 50 byte text file!!!!!). For eclipse, I can write a full program, not just hello world, before it starts.

Its not the unix. Its not the IDEs. Its the combination of all of it on a secure network/enterprise level setup.
Last edited on
Yes, I meant IDE.

I did a quick google search, and realized my error. I thought IDE's were for interpreted languages that don't need to be compiled(like python), and that programs like Clion were compilers.

VS is certainly slow.

I suppose I can't escape what everyone has been telling me this entire semester: terminal programming is a "pro move".

I'll learn it eventually.
They're wrong. Terminal programming takes as much effort to learn as does using <insert favorite IDE here>.

The thing that trips people up these days is that they are used to the shiny GUI stuff and afraid of the command-line, especially after watching decades of ‘elite hackers’ saving the universe with command-line magic in movies and on television.

My favorite was the, “Oh, this is Unix!” scene in Jurassic Park, which was totally wrong in the other direction. (Yes, I know that that program exists. No one in their right mind uses it.)


@jonnin
Linux already does UTF-8 I/O. It doesn't need the library.

All utf8_console does is make Windows console programs minimally behave like on Linux.


Although, I am reconsidering the design. Would everyone get along with it better if it were just an #include file?
Duthomhas wrote:
Would everyone get along with it better if it were just an #include file?

Oh, hell, YES! :)
But that would likely bifurcate people's code base. Why could people on Linux care about utf8_console?

The only people who would care are those trying to compile on Windows, and using an #include would mean figuring out how to modify the sources. This, I would think, is more confusing than simply running a batch file to make a linkable object file — something that is downright mundane.

@jjordan
Having VS simply existing somewhere on your machine is not enough, even for using the command line without utf8_console. You have to follow the instructions and start a VS command prompt.

Otherwise there are just too many possibilities for desired compiler locations and compiler configurations.
Last edited on
Pages: 12