How are operating systems written in C

Hi guys,

A concept that baffles me and it may take a book to explain in detail but if you could sum it up that would be awesome,

so it amazes me that operating systems are developed in C and or C++ aswell as assembly. assembly not as much as this is lower level.

I mean wouldn't you need the power of lower level constructs such as machine code or even assembler?? I mean you are controlling the hardware directly?

but C and C++ although lower level than most high level programming languages are still high level, so how is an operating system or components of an operating system written with simple constructs of a programming language like C or C++?? for example there exists no GUIs, no standard libraries . but even at that how do you go about writing an operating system with ints,doubles,chars, iteration , functions , conidtional statements etc etc??

obviously programming higher level applications such as writing games, console applications in a high level language is plausible but when you are starting from scratch how is it possible that a whole operating system is developed using C,C++ and assembly, note** I do know that operating systems are not fully developed in C or C++ some of the components need to be done in assembly such as the bootstrapping program.

thanks!
Last edited on
I mean wouldn't you need the power of lower level constructs such as machine code or even assembler?? I mean you are controlling the hardware directly?

Of course. Therefore such parts must be written directly in assembly.
You might enjoy this short tutorial:
https://wiki.osdev.org/Bare_Bones

thanks mbozzi I'll check it out :)
a lot of hardware is talked to by just loading up a pile of bytes in the right format and sending it to the device. Your code loads the array of bytes and sends it, their hardware gets it and knows what to do if you did that right. You can do this in C/C++ all day long.

As said, there are some bits you need to do at a lower level, but the above covers an awful lot of situations as well.

then you have the fun fact that C and C++ both support inline assembly (directly in the middle of code OR linked in at build).
Last edited on
@adam2016,

The history of C is informative here.

It is intimately tied to the history of UNIX.

There's a long backstory about AT&T (a controlled monopoly in that era), back in the 60's when mechanical dialing systems were to be transformed into all electronic systems. It would be the first time a company required thousands (tens of thousands) of computers all functioning in the same fashion, but in that epoch computers were all but hand built, and it was rare that any few were alike.

In that era, computer hardware was commonly delivered without an operating system. It was the purchaser's responsibility to fashion one. We have no common situation today. AT&T built an early version of UNIX in assembler as a start, but they knew by the time they could purchase a few hundred computers, the next batch (working toward that 10,000+ systems) would be different (sometimes very different).

The solution was to develop a portable assembler. An assembly language that had no dependence on a particular CPU platform.

The source example language chosen was BCPL. The result was a language called B. It wasn't sufficient.

Ritchie is largely credited, along with Kernighan, with creating the language to port UNIX from a PDP-8 assembler example implementation into a new, more portable language. Consider the task. There is an OS in assembler, running. The goal is to create a language suitable for translating that example into a language portable to other CPU's and machines. As they made the attempt(s), they could see what was not working in that main task of translating the assembler example code. Naturally this guides the development of that destination language into a suitable result from all perspectives, include speed and space efficiency as well as access to CPU hardware features.

The result was C. From there the plan was that what was written in C could be ported to new architectures by implementing the back end of the compiler to target new CPU's in the future. Ritchie and Johnson expanded the C language to make that work better as new targets were attempted.

In the 70's and 80's, C was considered an assembler by many professionals, though academics had plenty of detailed arguments on the point. Largely, however, many of C's basic language constructs map to individual machine language instructions, and where they map to two or more, they are frequently those required in assembler results anyway, merely making the writing simpler, clearer and more readable.

These early compilers ran in computers with limited RAM, and from more primitive engineering than we expect today, yet it still performed well enough to be effective. It was purpose built for the very act of building an operating system from its inception.

...how do you go about writing an operating system with ints,doubles,chars, iteration , functions , conidtional statements etc etc??


Actually, that's what assembler does. On a side note here, there's not much distinction between assembler and machine language (machine code) - assembler is merely the mnemonic representation of the machine's codes for various operations. With the exception of HLA (high level assembly, not popular), assembler languages are unique for each CPU, and are human readable text representing a 1:1 correspondence to the machine's underlying numeric codes for the same instructions (machine code).

In assembler, the various types you mention (chars, ints, doubles) are either native 'types' to the CPU, or very closely related (doubles may be emulated in CPU's without floating point, for example).

Iteration is merely implemented with assembler jumps, usually conditional jumps. Conditionals (C's 'if', among other boolean tests) are merely conditional jumps which act dependent on status flags in the CPU indicating the result of some comparison. Most comparisons are subtraction, with subsequent conditional jump instructions based on the result. If the result is zero, the comparison is "equal to", and the conditional jump is taken if the zero flag is set. If the result is negative the comparison is "less than", and similarly the conditional jump is taken if the sign flag is set (indicating a negative value).

C's constructs map so closely to the CPU's most basic features that, with some careful writing aware of this, a large portion of the resulting assembler is nearly identical to that formed by hand. Where C or C++ uses higher level expressions this moves to the responsibility of the optimizer (a more recent notion than the origin of the language), but any writer can carefully choose C constructs mapping so closely to the CPU's raw, primitive operation so as to generate near or exact assembler results.

The original design of the language took that path based on its initial purpose of porting the assembler example of UNIX into a portable language for all CPU's.

Another tributary of this point is that language level does not automatically translate into less optimized assembler output. That is a common misunderstanding. It is not the level of the language that matters as much as the information encoded within that language's construct upon which the object code generator and the optimizer can operate.

This point is illustrated by Stroustrup in sorting, comparing C to C++. C has the qsort standard library function, which requires a pointer to a function to perform the comparison. Pointers to functions are fast, but resolving the comparison function to a pointer forces the compiler's back end to generate a function call for the comparison with no option otherwise. The pointer has no information about the comparison, so optimization beyond the function call is impossible.

In C++, however, the std::sort function takes a function object, which could be a lambda in recent C++ or a function object built as a class. This parameter to std::sort can resolve into a pointer to a function like it does in C. However, the parameter's types allows for more information encoded in that call to std::sort. It can provide a full description of the comparison, which now gives the optimizer more information upon which to make an optimized inline comparison, avoiding a function call.

This is a prime, limited, simple, single element example of how a higher level construct can lead to faster execution. It is only possible in the modern era because it is dependent upon the optimizer, but optimizers are mature enough to be relied upon to know the CPU better than a human with respect to the day to day writing of code. Optimizers can, essentially, build what the code means, not just what it says.

This doesn't mean high level languages develop faster code. Clearly that's not true. What matters is what information is encoded in the higher level constructs feeding into the optimizer. Where that is not well engineered in language design, opportunities for optimized output are lost, exchanged for convenience or rapid authorship instead of performance. When higher level constructs are well engineered from their expression in the language all the way to the optimizer and the emitted object code, the result can (frequently) produce faster output than is practical to fashion by hand in assembler or lower level constructs of languages like C (or Pascal).

Bottom line, C was built to make operating systems as its first objective, and it worked in part because it is, to a great extent, a platform independent assembler if used that way.
Topic archived. No new replies allowed.