Creating my own programming language

Pages: 12
What's with all the killjoys posting here? The first program I designed and wrote myself was a video game. Not a very good one, mind you. OOOoooh, maybe I should have practiced programming a few years more and then asked questions.

You can't learn by not doing.
Yeah. I'm not good with game kinda stuff. I tried using Allegro and failed because I can't make graphics for shit.
@Duoas

High-five. That's not only for programming, but any skill.
Last edited on
What's different between a newbie asking "I want to make a programming language, but I don't have the first inkling of how" and a newbie asking "I want to make a game, but I don't have the first inkling of how"? Because the first response around here to the latter is always "don't."

Making a programming language is no easier than making a video game--arguably harder.
I'm interested in making my own compiler too but people are just so vague.
1. how to take c++ to machine code ?

2. how is linking solved. is this done by the compiler or another software ?

3. do you need assembly knowledge ?

until i understand more.
I decide try to make a language that transfer code to c++ and use the c++ compiler.
still working on framework idea.

this is what i thought of.
build commands on the fly,

block_command range autocast(int) variable at start to end
translate = "for(cast variable = start; variable < end; variable++) { block }"
endblock = end_range
end_block_command

command print args
mark = iostream
arg_expand = <<
translate = "std::cout << args << std::endl"
end_command

range a at 1 to 50
print a
end_range
They're vague because the question would take a book to answer.

Here is my "10 easy steps to making a programming language":

1) Define the syntax of your language, preferrably in Bachus-Naur Form (BNF). The easiest type of grammar to parse is a context-free grammar. You'll want to look up examples of how to get operator precedence right so that multiplication and division are performed before addition and subtraction. If you don't understand any of what I just said, then stop here. (Note: this step will take weeks to months if you expect the language to be even the least bit useful)
2) Make a list of all the tokens and keywords that your language uses. If you don't understand what this means, you don't understand the most fundamental concept of a lexical analyzer, and I'd recommend stop here.
3) Decide on what toolset you are going to use to create your compiler. You might choose lex/yacc (flex/bison), in which case you should download and read the couple-hundred-page programmer's guide to them. Or you might choose boost::spirit, though it has an enormous learning curve and you can't really find many real-world non-trivial examples, so I wouldn't recommend that. Or you might choose to hand code the lexer and parser yourself. I'll assume the last of these.
3) Write the tokenizer. You may want to have the tokenizer deal with comments itself rather than modifying the grammar to accommodate them -- it is far easier this way.
4) Decide on how you want to create the parser. You can learn about how bison generates parsers and do it that way (state machine), or you can choose a recursive-descent approach. The recursive-descent approach is simplest.
5) Figure out what your compiler's internal symbol table needs to look like -- what it needs to store regarding declarations, how you will handle scoping, etc.
6) Figure out your target output. Are you creating an executable? If so, then what architecture and platform? Assuming Linux/Intel, you should download the ELF specification which tells you the format of the executable that you must follow so the linux loader can load your program. This is about an 80 page document that contains tons of information. You'll also want to learn either Intel assembly language or just learn the opcodes instead. If you learn the assembly language, you could output assembler files and use GNU's assembler to create the final executable. This is probably easiest, since it avoids having to learn ELF.
7) Implement the parser for each production in your grammar so that you can at least recognize a syntactically correct program.
8) Implement semantic checks, such as assignments of strings to ints, that violate your language's rules. To do this, you'll need your symbol table code.
9) Generate the output code. You'll have to understand how compilers today manage the stack in order to pass parameters and call functions.
10) OPTIONAL: If you want your language to be performant, you'll need to optimize the output code you generate. You should research peephole optimization and instruction pattern recognition and other optimization techniques.


Alternatively, you could use LLVM to generate the actual machine code/assembly code for you. The advantage to this is you don't have to learn in depth the Intel architecture, nor do you have to worry about optimization -- LLVM takes care of this for you. You will, however, need to learn LLVM's intermediate format, which is now what your compiler will generate instead of assembly or machine code. I don't know anything about that to say whether it is easy or hard, a 50-page document or a 500-page document or no documentation at all.

Now that you have your compiler working, it's time to start working on a library. As it stands now, you can't even get input from the user or write output to the terminal, since you have no library to do so, unless you build these as language primitives. For this, you'll have to learn how to make system calls on your platform, so if you're using Linux, you can start by looking at the syscall interface.
FWIW, just recently I decided as a challenge to write a "compiler" of my own, for reasons I won't go into. Initially I started by figuring out what features I wanted my language. I started writing a grammar, but quickly decided that the point of my exercise what to write a compiler, not a language, so I chose instead to write a Pascal compiler simply because I know Pascal and parsing it isn't too terribly bad. Essentially, I skipped steps 1 and 2. For 3, I chose recursive descent for a couple of reasons, one of which is that it is easy to do. [One of the drawbacks of recursive descent is changes to the language have relatively major effects on the parser, but since Pascal is Pascal, I don't have to worry about it.]

Then I wanted to decide whether to target Windows or Linux, but since my assembler knowledge post-8088 is a bit lacking, rather than learn Intel assembler I chose to have my compiler actually transliterate Pascal into C/C++, avoiding having to learn assembler, executable file formats, and optimization techniques, and LLVM.

Even with the knowledge I have of how compilers work, even with over 20 years of programming experience, even with complete confidence that my C++ is well more than adequate for the task, I've taken many shortcuts, and all I'm doing is creating a transliterator--not even really a compiler per se. Why? Because I know what the time investment will be to "shoot the moon", and frankly, it isn't worth it.

[And before anyone asks why I'm doing this, I'll quote the first line of this response again:]
...for reasons I won't go into.




I agree with jsmith. Unless you are paid to create a programming language else trying to do it yourself just isn't worth the effort unless the target audience of your programming language is to gain self-satisfaction or the little numbers of supporters.

I believe creators of C, C++, Java are paid full-time to develop their "new" programming language which are supposed to bring in good monies for those companies that devote much time and effort on such a endeavor. Or even in Universities, researchers are paid to write papers and also to create programming language perhaps ?

Looking back at the history of C (and arguably C++). Kernighan and Ritchie wanted to create an operating system, but there wasn't a high-level programming language available that gave enough direct access to hardware. So they created C. Somehow, C caught on...

Nowadays, C and C++ are owned by "committees" of individuals from all over the world. These individuals represent their companies and, in some cases, their countries. Largely, the evolution of C and C++ is now a world-wide tug-of-war amongst big companies who have vested interested in the directions that the languages take.


Correct me if I'm wrong but C creators (Brian Kernighan, Dennis Ritchie, Ken Thompson) are employees of Bell Labs isn't it ? So they are paid employees isn't it ?

As for Java, James Gosling is under Sun Microsystems (now called Oracle) and is a paid employee isn't it ?

As for C++, Bjarne Stroustrup is under AT&T Labs and is a paid employee isn't it ?

So all these programming language creators are paid employees as they created their language isn't it ?

In fact I am trying hard to find a programming language creator that create the language as a hobby instead of a full-time job. Maybe PHP ? Python ? Hmmm
That's true. C, C++, Python, Java, all languages created by people who, at the time, were working for a company, and the invention of the language was to satisfy a business need.
Take a look at Rasmus Lerdorf PHP creator and Guido van Rossum Python creator, it seems they created the language as a hobby to occupy time ? That is if wikipedia information is correct :)
jsmith wrote:
That's true. C, C++, Python, Java, all languages created by people who, at the time, were working for a company, and the invention of the language was to satisfy a business need.
That is misleading... While it is true that the creation of C and C++ were to satisfy a business need, those business needs could have been satisfied in other ways... Stroustrup was given a task which he preferred to attack in a certain way -- which lead him to the task of creating (what eventually became) C++ -- and his employers were kind enough to allow him the leeway to spend time on it. C was written to build the Unix kernel -- but another, existing language could have been used. K&R just thought they could improve on things... and again had the leeway to do it. No one told them to invent a language. It was, essentially, a side-project that was included in the task. I don't know about Python or Java.

sohguanh wrote:
Take a look at Rasmus Lerdorf PHP creator and Guido van Rossum Python creator, it seems they created the language as a hobby to occupy time ? That is if wikipedia information is correct :)
It is correct, of course, but I wouldn't use PHP as a model for anything. It is a mess and still suffers from some bad design decisions early on. While efforts have been made to mature the language, it is still a rough hack created by people who didn't really know what they were getting into. While very usable (obviously) it is not an example of a well-designed language.

It is correct, of course, but I wouldn't use PHP as a model for anything. It is a mess and still suffers from some bad design decisions early on. While efforts have been made to mature the language, it is still a rough hack created by people who didn't really know what they were getting into. While very usable (obviously) it is not an example of a well-designed language.


Note that the same can be told about C++. When making such statements, you should always keep in mind the purpose of the language and constraints that were present at the time of its creation. There is nothing wrong with PHP as long as it is used for small dynamic websites - the purpose it was originally designed for.
I agree with rapidcoder. Some programming languages are created for a specific purpose and hence it is not "correct" to write them off. I like Perl and there are lot's of developers against Perl free-form syntax and it's so-called OOP implementation but that does not mean Perl is no good isn't it ?

Python is quite a different beast though. It seems to have a well thought-out design. It has abundant libraries and it's datatype many almost equivalent to Standard C++ STL. In fact, some of Google engineers uses them like say Google App Engine SDK. But sometimes reality sink in and soon they have a Java SDK for Google App Engine.

I think if Python cannot take off successfully in the commercial realm, it maybe a good choice to let it become a University teaching language just like Pascal was years ago. Imagine set, tuples etc part of the language constructs and it is very useful for certain teaching topics in Computer Science.
rapidcoder wrote:
Note that the same can be told about C++.
Perhaps, but saying so would be wrong.
When making such statements, you should always keep in mind the purpose of the language and constraints that were present at the time of its creation.
Duh.

I'm only interested in objective truth here. I already said that PHP was a useful and successful language. But the conversation against the OP's personal language was that he did not have the experience or credentials to make a great language. No one does at the start.

PHP was not carefully designed. It has many inconsistencies and mixed idioms.

C++ was very carefully designed. Stroustrup had a great deal of experience in the field when he began, and he has had help from a very many people making C++ what it is today. I would say that most of the design inconsistencies in C++ are inherited from C, not from poor design choice at the outset.

If you want to argue a point, you must be consistent. Given your predeliction for PHP as a perfectly good example of an acceptable language, but given its origins, do you still wish to discourage the OP from writing himself a programming language?

BTW. Careful how you dis Pascal there... Modern language design is only now starting to do things Pascal has always done... ;-)

PHP was not carefully designed. It has many inconsistencies and mixed idioms.



Name a few. I doubt you will end with a longer list than it is possible to create for C++.
Many C++ inconsistencies have nothing to do with C (e.g. the famous export keyword or overloaded operators that are not first-class operators, or that you can override a public method with a private one violating Liskov principle, etc.). Actually C is a very consistent language.

BTW: Number of people with PhD in a committee designing the language really doesn't correlate with the quality of the design. The more people designing some software, the worse is usually the design.


Last edited on
Topic archived. No new replies allowed.
Pages: 12