Questions about c++, syscalls, and compilers

Hi,
how exactly does libc (and windows version of the runtime library) implement syscalls?

Because in c++ (AFAIK) you cant directly modify registers, but don't you need to do that in order to perform syscalls ?

and another question:
AFAIK the way c / c++ work is that the platform vendor (e.g. windows) provide a runtime library that implement things like file handling ect, using syscalls.

But say, you wanted to create a new language, would you have to manually implement a runtime library with wrappers for every single syscall (for every single architecture) for every single platform you wanted to run on?

AFAIK windows only provide a runtime library for c / C++,
so does that mean every single other language implement their own runtime libraries? isn't that a HUGE job, and super hard since platforms like windows dont provide much documentation on the internal workings of their platform, etc ?
Last edited on
Because in c++ (AFAIK) you cant directly modify registers, but don't you need to do that in order to perform syscalls ?
Using C++, the programmer doesn't have control over the registers, but the compiler does (obviously, otherwise computation would be impossible).

But say, you wanted to create a new language, would you have to manually implement a runtime library with wrappers for every single syscall (for every single architecture) for every single platform you wanted to run on?
Kind of.

First of all, "syscall" is a fairly low-level operation. It doesn't necessarily have a one-to-one correspondence with a high-level action. For example, when you use a language's file system library to query the size of a file, zero, one, or more syscalls may be performed (the runtime might conceivably skip the syscall if it knows for a fact the file couldn't have changed sizes). Ultimately, a syscall is nothing more than a transition from user-mode to kernel-mode. Some operations on some systems might be implemented such that they don't need this transition, or it might even be the case that there's no distinction between user-mode code and kerne-mode code.

Second, you would have to provide wrappers for your new language around every high-level action that you want to support, rather than around every low-level system call. You don't really care that to open a file on system Foo you need to do X and on system Bar you need to do Y. All you care about is opening a file. This can be as easy as just calling C's fopen(), if you want.
Thanks for the anwser

Using C++, the programmer doesn't have control over the registers, but the compiler does (obviously, otherwise computation would be impossible).


But what about self hosting compilers?
Last edited on
Consider a language specific to x86, with only two operations: if the next character is '0' it copies the value of eax into edx, otherwise it adds into ecx the value of ebx.
1
2
3
4
5
6
7
8
9
10
11
12
13
std::vector<unsigned char> compile(const std::string &input){
    std::vector<unsigned char> output;
    for (auto c : input){
        if (c == '0'){
            output.push_back(0x89);
            output.push_back(0xC2);
        }else{
            output.push_back(0x01);
            output.push_back(0xD9);
        }
    }
    return output;
}

The compiler doesn't need to itself be able to execute the operations it will generate, it just to be able to write them to the output.
ah of course thank you very much
c++ will indeed let you directly tap a register. Its going to tie your code to one computer type, but you can embed assembly language in c++ either by compiling it alongside the c++ code or by embedding it (language extension that almost all compilers support) such as the __asm keyword to make an assembly block. You can use this to invoke system calls. There are probably APIs out there to let you make low level system calls without having to DIY, but I don't know anymore... I only use assembly very sparingly, the one case that it pops up over and over for me is endian flipping large amounts of backwards integers.
c++ will indeed let you directly tap a register [...] by compiling it alongside the c++ code
If you're assembling code independently of the C++ compiler and then linking it with the rest of the program, how exactly is that something C++ lets you do?

embedding it (language extension that almost all compilers support) such as the __asm keyword to make an assembly block.
Although many compilers support inline Assembly in one form or another, the syntax is not necessarily portable from one compiler to another. And anyway it's still not C++, for two reasons: first, the inline Assembly syntax is not part of the C++ standard; and second, you have to write in Assembly, not in C++.
stav wrote:
how exactly does libc (and windows version of the runtime library) implement syscalls?

Minor point, libc doesn't actually "implement" system calls: the OS kernel implements them. libc invokes system calls, as needed.

For example, libc implements the C library function strlen, which doesn't invoke any syscalls. Libc also implements the C library function fopen, which invokes a syscall. On a Unix (MacOS, Linux, etc), that syscall will be likely invoked through the POSIX API "open()".

stav wrote:
say, you wanted to create a new language, would you have to manually implement a runtime library with wrappers for every single syscall (for every single architecture) for every single platform you wanted to run on?

Yes, and since the number and meaning of syscalls is totally different between operating systems and changes version to version, you'd have to write some abstractions to make your language portable. Or just implement C interop and make your runtime call C or POSIX/WinAPI, which already did that work.
The operating system supports a binary interface to make system calls. As with higher level functions, this involves putting the arguments somewhere well known (on the stack, in specific CPU registers, etc), and leaving the results somewhere equally well known. But the big difference is that the call itself is usually invoked via an interrupt of some sort. This causes the processor to change to a privileged state and jump to some specific address setup by the OS. The OS then implements the call and returns from the interrupt.

The key here is that it's a different binary interface from a normal function call in an application, necessitated by the need to switch privilege levels and enter the OS code.

So yes, you need a small assembly wrapper of some sort to translate between the language API and the OS API. If the language and the OS use the same conventions to pass parameters and return values, then the wrapper might be as simple as just triggering the appropriate interrupt.

If you create a new language then you might have your programs link against the C library and use it's interface to the OS. Or you might have to write your own. Is this tedious? Sure, but you only have to do it once.
helios, this is where I get into trouble.... the visual studio compiler is where I have the most experience, and it is very, very casual about injecting assembly.

As far as having to do it once, a single wrapper around an 'emit' function in assembly would let you do anything you needed to do, at the cost of extreme ugliness. (Intel cpu assembly, emit lets you send the cpu a direct command in binary).
Last edited on
Topic archived. No new replies allowed.