how create a compiler?

from the text, we need the tokens: the Lexic step:
- numbers;
- identifications;
- operators;
- keywords;
- parentheses....

now we need line\instruction identifications and it's rules(Sintatic step):
- if is an 'if' and it's rules;
- if is a function\procedure and it's rules....
now i'm confused, that's why i convert the new language to C++...
but normaly we create the exe file...
how we can convert a line instruction to an exe file?
(please try to be more simple for i understand the basic.. and yes, i can be mistaken on creation of exe file)
"how we can convert a line instruction to an exe file?"

I'm going to focus on this part, if I understand the question correctly.

"A line instruction" I assume is something like this:

1
2
3
4
5
int a;
int b = 5;
int c = 6;

a = b * c;


The "line" in this case is "a = b * c;"

This is assumed to be a multiplication of "b" times "c". The answer will be stored in "a".

The purpose of compiling this will be to generate machine instructions, sometimes called assembler instructions. These are the instructions the CPU hardware recognizes.

If the CPU can perform multiplication (and that is not always true, some can't), this may be translated into machine code for the multiply instruction. That instruction may have requirements. It may require that one or both of the operands are stored in CPU registers. It may only require one of them to be in a register, and the other might come from memory.

The purpose of the compiler will be to figure out how to fashion those instructions.

They will be different for each CPU design. It is different for ARM CPU's than it is for PowerPC CPU's, which are different from Intel/AMD CPU's.

When the multiply instruction is executed, the "line" above will have to figure out how to move the answer into the memory for "a", the result. This is usually a move instruction, which moves the answer from a CPU register into the memory for "a".

This highly detailed process is just for a simple multiply instruction.

If the CPU does not have multiplication instructions (and some didn't in the past, some still don't), then the compiler may be required to generate instructions which perform multiplication using what primitive features the CPU does have.

Now, leaving that focus for a moment, there is a much larger study about parsing (what you called the Lexic step). Parsing is how the text is "read" and "understood" by the compiler. It is what involves, as you put it, discovering keywords, operators, etc.

To do that, one must fashion a type of grammar. That is it's own computer science study.

The grammar takes care of handling all the ways humans might write code, but enforces the lexical demands of the language.

"now i'm confused, that's why i convert the new language to C++..."

I don't really understand this text. I have no idea what "why I convert the new language to C++" means.

My point here is that this is a complex, deep subject of many levels.

There are books on the subject, and no post will do the subject justice.

What I've tried to do is introduce the concept of machine language to you. The CPU has it's own language, but it is extremely primitive.

The CPU is an electronic version of simple mechanical devices.

Imagine if you have to gears. One gear has 10 teeth. The second has 30 teeth. When put together, the small gear must turn 3 times to move the large gear one rotation.

This is a mechanical divider. If you count the rotations of the large gear, as a result of turning the small gear, this mechanical device performs division.

If, instead, you turn the large gear, the small gear turns 3 times for every rotation of the large gear, making this a multiplication device.

The circuits of a CPU are related to this kind of simple, mechanical idea. The are more complex, of course, and work in base 2 (or binary math).

That said, the machine instructions are not so much, really, a language - but a way of firing specific circuits in the CPU that do the various primitive functions the CPU can perform.

To make a compiler, one must know those primitive features - what the CPU can do, and what instructions they're "called" to work them. That is the last stage of the work, and appears to be what you refer to when you post "I'm confused"


I don't really understand this text. I have no idea what "why I convert the new language to C++" means.
What OP is saying is "[avoiding machine code generation] is why I transpile my language to C++".
i mean that i convert my own code text language to C++...
the C++ uses 'void' for a procedure, i use 'procedure'.. so i convert the 'procedure' to 'void'... i do these and more...

my language:
procedure multiplication(operator1 as integer, operator2 as integer)

converted to c++:
void multiplication( int operator1 , int operator1)
something like these
So you're asking how to do that?

Start by separating out text into tokens.
For more complicated functionality, you're going to need to be an abstract syntax tree (AST).
https://en.wikipedia.org/wiki/Abstract_syntax_tree

I scrolled through this guy's blog, and it looks OK:
https://ruslanspivak.com/lsbasi-part7/

Parsing is complicated, and can be very hard to get right, depending on how complicated the syntax of your language is (C++, for instance, has very complicated syntax/parsing).

If you're just doing mostly simple text replacement, e.g. "[name] as [type]" becomes "[type] [name]", then doing a search-match-and-replace is fairly easy.
Last edited on
but i get what both mean: instead convert to C++, i must convert to Assembly using AMD\Intel CPU instructions and, of course, depends on Operation System too.
using the Note(windows text editor), can i create an exe just for say 'hello'?
(is just for test and start)
ganado:
1 - i convert all text into a charater array, and test if is a valid character;
2 - from the 1st i can get the tokens(the start and the end);
3 - after get the tokens, i test the 1st token on the line for see the instruction type;
4 - from the 3rd i can test the instruction rules;
5 - i do these for all lines and save the var\function names for avoid repeating names.
6 - after test the rules\errors and if is ok, i convert it to C++
Not that I am an expert, but it looks like you are creating a preprocessor for your language, not a compiler.
Furry Guy: maybe you have right... but i use the Lexic and others
You might find it interesting to know that when Stroustrup was in the early days of making C++, when it was called CFront, it compiled (early) C++ into C.

Which was then compiled by standard C compilers to the platform.

Now, however, what you've described might be done on the LLVM platform.

It is designed to make compilers in various languages.
using LLVM, can i use, too, win32 and use\create some class's?
using the Note(windows text editor), can i create an exe just for say 'hello'?

No. Exe files are in binary and have bytes that cannot be typed in via a text editor.
to see this, you can get a hello world smallest possible assembly program, assemble it, and take a look at it both in notepad and in a hex editor. You will see that, for example, byte value 0 is used frequently but not typeable or displayable in text editors. Most exe files do contain text and often this is a hackable vulnerability, but they have much, much more than text inside.

you could type in hello world exe in a hex editor. I can't imagine how frustrating that would be.

you don't have to write asm and assemble it. You could write machine code directly, but you need to know an awful lot to do it. (Someone has to write the first assembler, after all)
Last edited on
@Cambalinho,

I'm not saying you are doing something wrong or not worthy of learning from. If I were to do what you are doing I simply wouldn't call it a compiler.

As Niccolo pointed out when Stroustrup was first working with C++ he wrote a preprocessor that converted his C++ code into C. No one had created a compiler for C++ yet.
i'm starting from what i know.
and thank so much for all to all
i'm starting from what i know.

You're following in the foot-steps of Bjarne Stroustrup. Nothing wrong with that. :)

Keep on learning, pushing the limits of what you understand.
;)
using LLVM, can i use, too, win32

Yes, as long as you have the Win32 API/headers/libraries installed.

If you have Visual Studio 2017/2019 those are already installed.

With VS 2017 you can an LLVM Compiler Toolchain tool extension:
https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain

With 2019 you can install directly LLVM/Clang support, if you select it. It isn't installed as part of a regular 2019 install.
https://devblogs.microsoft.com/cppblog/clang-llvm-support-in-visual-studio/
Topic archived. No new replies allowed.