Project Idea: Toy Language Challenge

Level
Intermediate to Advanced

Synopsis
Create a programming language of your own design.

Restrictions - No external tools like YACC, LEX, etc are allowed.
- Must be programmed in C or C++ (the latter is preferred).
- No "what I would do to C++/Pascal/FORTRAN/whatever if I could"
stuff. Make your own language design.

Requirements - Must compile on Win32 and POSIX platforms (at least)
- Signed and unsigned integer data types (including characters and booleans)
and mathematical operators: at minimum + - * /
- Arrays (homogeneous data) and Lists (heterogeneous data)
- Functions and procedures
- Simple text I/O
- Flow control statements: at minimum if and while and logical operators: at minimum and, or and not - A floating point data type

Optional - Bitwise operators
- User-defined data types
- Binary file I/O (that is, files must be opened in binary mode;
the I/O methods themselves can be textual or binary or both)
- Operator overloading
- Exceptions or Monads or some other form error handling
- SDL graphics

Methods - Make an interpreter for your language
- Make a preprocessor to produce C or C++ compilable code

Details
This is a very challenging project (hence the rating), but taken step by step it can be done by anyone who wants to. The hardest part will be understanding certain concepts fundamental to programming languages.

This challenge will be accompanied by a short series of tutorial-lectures that introduce and explore the aspects of this challenge. Those of you who take it up should feel free to post questions, comments, ideas, and expositions relating to the challenge in this thread. Also feel free to post your finished (and working) code as one of:
- a hypertext link
- a compressed and MIME-encoded copy if less than 8000 characters.

I prefer to keep to .zip, .7z, or .tar.gz compression archives so that everyone can open them easily.
Use base64 to MIME encode.
Downloadable source:
http://www.fourmilab.ch/webtools/base64/
Online encoders:
http://gtools.org/tool/base64-encode-decode/
http://makcoder.sourceforge.net/demo/base64.php

You can use all the facilities of C and C++ that you are aware of, but you may not simply pass them into your toy language. For example:

cout << "foo\n"

is forbidden in your language, because it requires no effort on your part. Make the language yours.

Hints 1. Decide which method you wish to use to implement your language.
(Unless you are a total wiz, I recommend you stick to creating an interpreter.)
2. Start with the requirements and work your way down the list.
(That is the order in which I will present the tutorials.)
3. Before you actually start coding, design what your language should look like.

For example, I may want to create a simple calculator language. It might be as simple as:

integer x y
print "Enter two integer numbers: "
x y = input integer integer
print "The sum is " + x y

This is pretty easy to parse (though there are a few interesting twists in there).

Programs
Finally, you should write a program (in your language) which demonstrates the features of your language. Suggestions are:
- (easy) A simple calculator
- (easy) A tick-tack-toe/noughts and crosses game
- (medium) A program that adds, multiplies, and simplifies fractions
- (hard) A program that takes a list of integer pairs and a number
and determines whether the number can be summed by taking one or
zero of the integers from each pair in the list.
SDL Graphics (very hard)
- The tick-tack-toe/noughts and crosses game
- The "pong" game
- A fractal image generator

Well, that's all for now. The first tutorial will be coming soon!

Feedback is most welcome!

[2008-08-22 22:12 revised the requirements as indicated below]
Last edited on
That's a very nice project, when I get back home I'll start it.
I have a quick question for now:
What is the difference between functions and procedures ?
It is part of the requirements.
Answers to Questions
Mitsakos

All in due time :-)

There is no difference, really. A procedure is generally technospeak for a function that does not return any value. You can read more here
http://en.wikipedia.org/wiki/Subroutine
But we will get to that later. ;-)

Let me know if this format works for you all. I think I will re-arrange a couple of things up in the first post (2008-08-22 21:15) as a matter of concept flow. I will treat this entire topic as if everyone were creating an interpreter. Those of you who want to go above and beyond that can do so, but I will not directly address the issue except in response to questions.

Tutorial Zero: Forward

I am not going to devote time to teaching a University 350-level course. This is going to be a very simplified foray into the world of PLC. If you really want to learn the deep magic, take a university course or buy a good book. I can't really suggest any, but Google is a wonderful thing. (Try "programming language concepts books shopping results".)

Notes
Stay away from the one with the Rosetta Stone on the cover.

The one by Sebasta is highly rated, though it suffers from translation issues. I believe it is up to the 7th edition.

The one by Sethi (with the teddy bear on the front) is also supposedly pretty good, but again, I can't comment having never read it myself.

You will also benefit from some background with
Computational Theory http://en.wikipedia.org/wiki/Theory_of_computation
and Algorithms http://en.wikipedia.org/wiki/Algorithm

However, neither are absolutely required to benefit from these tutorials.

Books

Introduction to Computer Theory, 2nd ed.
This is a very readable book. I highly recommend it.
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471137723.html

Introduction to Algorithms
This is a very good, very complete book, but it is fairly technical and terse -- not for the light-hearted. Nevertheless I highly recommend it also.
http://mitpress.mit.edu/algorithms/


Well, I'm tired. I'll update the requirements above, surf a second, then retire until next time. :-)
Oh dear, I don't consider this a good idea. However, to be constructive, I urge anyone trying this to do the following first:

Write a lexical analyzer & parser to parse user-provided mathematical formulas (obeying brackets and precedence rules), in the steps (I want to parse things like
(1-6*(3.5+35/2)*(34-0.12))):
(1) write down the expression grammer in BNF (Backus-Naur-Form) *Don't look anything up*. You wont't be able to look something up with your language, so think about it yourself and experience how hard it actually is.
(2) implement it in Lex&Yacc. Your grammer most likely has flaws, and you want a high-level interface to detect them and experiment with changes. I am talking here about grammer conflicts like shift/reduce conflicts and unexpected behavior alike.
(3) then, if you feel the need, implement it in C++. I deem this step especially useless in most cases, but that was the original idea, so if you want to, do it.

What if you leave one step out? Infinite amounts of pain will be the result. You Have Been Warned.

Let me finish with a quote:
Anyway, there's plenty of room for doubt. It might seem easy enough,
but computer language design is just like a stroll in the park.

Jurassic Park, that is.

-- Larry Wall, creator of Pearl
Last edited on
Why do you feel the need to discourage people from trying this? You can't learn anything by not doing it.

Also, it is counter-productive to instruct people against the requirements of the project. There is no need to mangle people's brains over a toy language.

Finally, I don't think that quoting Larry Wall, is very friendly. Pearl is designed for a very different purpose.

Please, wait a bit for the tutorials to come together before spreading FUD.
Last edited on
OK, maybe it wasn't so productive after all. But I really think that "Intermediate to Advanced" in "Level" should be "Advanced and beyond". Perhaps I'm excessively stupid, but *I* needed about half a year *after* hearing "Building Compilers" (argh, I don't know the proper name, here it is called "Compilerbau") and "Language Theory" courses just to figure out the most basic stuff about actual language design. Yes, it might be a different scope, but also, the people doing this task most likely don't have a MSc CompSci already. I think that at least a warning about the fact that it is really time and work intensive would be appropriate, and that one should be familiar enough with C++ that one doesn't have to worry about the language but can concentrate on the task.
That being said, I hope that you aren't too angry about my comments. If you want, I will edit them accordingly - this *is* your topic, after all, and if you are willing to help the ones trying, I think it is a Good Thing.
Oh, heh, I think you are worried about the scope of the challenge. My rating levels are a little expanded over that of commercial booksellers

beginner
competent
intermediate
advanced
wizard (or sage)
master

No, I'm not planning to do anything as intensive as compiler design (that's a wizard-level challenge). Syntactical analysis and program transformation are far and beyond the scope of this thread.

This is only a very basic introduction to PLC, touching briefly on just data store and types, structure, evaluation by substitution and deferred evaluation, functions (and maybe first-class functions) and recursion, and flow control and sugar (or preprocessing).

The language I will develop through the tutorials will be a mix between imperative and functional. Details about how the language is actually implemented will be minimal (hence the intermediate level --you'll need to be comfortable with C or C++), and all data will be dynamically managed by the interpreter (much like BASIC, but with strong types and required declaration).

The finished toy language(s) will be very simple, but they will be complete. The tutorial-lecture structure is designed to walk through the lot of it in relatively easy steps.

But, exception's comment about work is correct, this is a challenge --some brain bending will be involved.
Topic archived. No new replies allowed.