Simple compiler/langauge

I am trying to make a simple "language" and compiler that will let me do a "hello world" project and I have a few questions.
Here is my syntax
1) All statements end with .
2) Whitespace's don't matter 1+2 is the same as 1 + 2
3) Comments start with ~ and end with ~ can be multiple lines
4) The only keywords are var , in , out
    a) var - creates a new variable.
    b) in  - input console
    c) out - output console
5) Variables names can't start with numbers
6) There must be only one statement per line
7) The program will start with begin and end with stop
<DIGIT>		  ::= '0' .. '9'
<LETTER>	  ::= 'A' .. 'Z' | 'a' .. 'z'

<VARIABLE>	  ::= ( <LETTER> | _ ) ( <LETTER> | <DIGIT> )*

<COMMENT>         ::= '~' <CHAR>* '~'

<DECLARATION>     ::= "var :" <VARIABLE>
<STATEMENT>       ::= ( <DECLARATION> | ( "out" <VARIABLE> ) | ( "in" <VARIABLE> ) ) .

1) Would the "Dragon Book" be a good thing to read about this or is that a bit complicated for something like this?
2) Any suggestions as to where I could start I am thinking I have to create a c++ project that will read in the other "language" document and then based on what the line says do different things.

here is an example of how my "hello world" would look like
~Creating a simple Hello
World program in my language~.
var : word "Hello World!".
var : input.
out word.
in input.
out input.

Any suggestions/advice would be greatly appreciated

*Ps this is just for something fun to do/try for a learning experience.
Are you compiling this to an exe or interpreting it?

As far as I know the dragon book is very good, but if you only want to do something basic like this, then I think it would be a waste of time.

Also, "begin" and "stop" should also be considered keywords.
Also, if you have a terminating character ('.') then surely multiple statements per line should be easy to implement.

As a whole, I like the idea, not original, but a very good project and starting point
I was thinking about compiling it. I could have the two mixed up basically I will create a program and then compile the "language" via the cmd prompt using arg's

int main( int argc , char **argv )
    std::cout << "Compiler name - " << argv[ 0 ] << std::endl;
    std::cout << "Program to be compiled - " << argv[ 1 ] << std::endl;
C:\Users\Giblit\Simple_Language>g program_to_compile.g

Compiler name - g.exe
Program to be compiled - program_to_compile.g

Thanks for the quick response and yeah I was just trying to do something very basic to start with and I suppose I could have multiple statements per line but I figured I would start with just one statement per line. Also I'm not sure why I forgot to put begin and stop and keywords I will probably have to work on the syntax a little bit more later for right now I am just trying to make it as simple as possible.
Also, you should try and get basic math operations going. And then allow multiple operations, ect.

EDIT: Compiling could be very difficult (exe file format, etc.) , although you could compile to C or ASM and then have another program do the compiling for you.
Last edited on
Yeah I was planning on doing the basic math stuff after basic i/o works properly. I know very little in c and nothing in assembly but I am pretty confident in my c++(not saying I am the best or know everything). What if I write my program in a text editor say notepad or sublime( for the looks :P ) then I make a program in c++ that will take in command line arguments so I can open the program in my "language" and read it all in and if anything breaks my syntax then do an assertion or cerr. If everything is in the syntax then based on what it says do different things. I am guessing it will pretty slow this way though and look very awful.

I have not yet decided if I should use cerr or assert to tell if the program has any errors or not what do you think? Here is what I have done so far it's still a WIP.
#include <string>
#include <vector>
#include <cassert>
#include <fstream>
#include <iostream>
void syntax( const std::vector<std::string> &data )
    if( !data.empty() )
        if( data.front() != "begin" ) std::cerr << "Error - Missing\"begin\" at the beginning of the file." << std::endl;
        if( data.back() != "stop" ) std::cerr << "Error - Missing \"stop\" at the end of file." << std::endl;
    for( const auto &it : data )
        if( ( it.front() == '~' && it.back() != '~' ) || ( it.front() != '~' && it.back() == '~' ) ) std::cerr << "Error - Missing a \'~\'." << std::endl;
int main( int argc , char **argv )
    std::string temp( "" );
    std::vector<std::string> data( 0 );
    std::ifstream in( argv[ 1 ] );
    while( std::getline( in , temp , '.' ) )
        while( temp.front() == '\n' ) temp.erase( temp.begin() );
        if( temp.front() != '\0' ) data.emplace_back( temp );
    syntax( data );

**edit since I am only doing one statement per line I think I should use std::getline( in , temp , '\n' ); instead and then in the syntax I can check to see that each item ends with a '.'
Last edited on
You're going to want to look into tree structures for evaluating expressions, they make it very easy. Of course, first you have to convert the expressions into a tree...
Okay cool thanks I will look into that right now. Will post some code after I investigate it a bit. Then you can maybe tell me if that is what you meant by it.
Thanks for the response.
Well if I understand you correctly, you are planning on interpreting your code. Not a bad idea for your first language. By tree, I think he is referring to
Oh alright cool thanks I was actually just looking at a Binary tree that I found on google still have to investigate some more and the wiki site you sent me.

From the looks of it it looks like something on the lines of this
class Tree
    //data here
    Tree *left , right; //nodes that point to left and right in the tree.
    Tree( void ) : data( data ) , left( NULL ) , right( NULL ) {}

Or something along the lines of that. Will get back soon thanks for all your time and patience.
One approach to writing a compiler is to use LLVM to generate the machine code from your ADT.

I had a go at this "toy compiler" and it was straight forward enough. It should/might be possible to use the same approach for your more involved language. Unless you want to do everything ground up, that is.

Writing Your Own Toy Compiler Using Flex, Bison and LLVM

Last edited on
Cool thanks that looks like a very good place to start with I'll give it a go later tonight busy right now. I'll get back with the code I come up with after following it step by step.
Thanks agian.
Okay I am trying to do this...but this may seem like a really dumb question but I linked the llvc library just like any other library. How in the heck would I link or install or w.e I am supposed to do with the bison and flex.

I tried reading the install/readme on both of them but they don't say much for how to actually get them working. And they do not have include folder.

The instructors just say "use the shell command `./configure; make; make install'" but...I went to the command prompt and changed the directory to the folder that has the configure and make files and it doesn't work. I even tried configure but then it tries to open the file is a "unknown" type of file so I don't have a program that can open it unless I open it in like notepad but then it just has all the code they wrote it with.

Could anyone help me to use these two programs please? I am going to keep trying.
bison and flex are tools you need to install; they take .y and .l files respectively and use the information in them to generate source files which you then build and link in to your exe (I have and generated from parser.y; and generated from tokens.l)

When I had a look at the toy compiler, as I was working on Windows, I had to manually installed these tools (e.g. I downloaded them and copied then into the folder where my Gnu tools live.) I then wrote custom build rules for Visual Studio so it knew what to do with .y and .l files when I added them to a project.

But if you're using Linux, I would have thought you could obtain these tools using your distro's package manager (if I've remembered the Linux jargon right; like Synaptic ?), and they'd be installed along with all the existing build tools.


PS For what it's worth, I see the build commands I'm using for .y and .l files are:

BISON --define $(Input).y


FLEX -o$(Input).cc $(Input).l

I think I had to look at the man pages for the Bison and Flex tools to ensure I was using them correctly.
Last edited on
Well I just downloaded and installed GnuWin32 tools / bison at

I can't figure out how to manually install flex though.

Ps I am using code::blocks with the latest GNU mingw GCC compiler

Also I have no idea what build commands are I will have to look into that I have never used before. I am guessing they are like flags ( -std=c++11 ) in the compiler settings but I could be wrong.

Thanks again for your time and patience I appreciate it a lot.
That's the same place I got my tools. GnuWin do provide installers which should put things in the right place for you?

Well, flags are part of build commands. A compiler command will include the compiler name, assorted flags (like -std=c++11 ) plus the name of one or more source files.


PS Are you using Code::Block via MSYS, or like a normal Windows app?
Like a normal windows. Also yeah under the flags I have c++11 enabled. Then I under the search directories compiler I included the "include" folder and under linker I included the "lib" folder but I do not understand where I am supposed to put the --defin $.... or -0$... I am guessing the same place where I have c++11 enabled or am I completely wrong? I am about to download the auto conf and auto make so I can finish installing bison/flex now.
Topic archived. No new replies allowed.