Best syntax?

If you had to specify a binary format, which of these syntaxes would you prefer to use?

format little twoscomp
begin namespace Zip
    begin type EndOfCentralDirectoryRecord
        u32 signature require == 0x06054b50
        u16 disk_number
        u16 central_directory_start_disk
        u16 local_central_directory_entry_count
        u16 central_directory_entry_count
        u32 central_directory_size
        u32 central_directory_offset
        u16 zip_comment_length
        string zip_comment length zip_comment_length
    end
end

<format end="little" neg="twoscomp"/>
<namespace name="Zip">
    <type name="EndOfCentralDirectoryRecord">
        <u32 name="signature">
            <require equals="0x06054b50"/>
        </u32>
        <u16 name="disk_number"/>
        <u16 name="central_directory_start_disk"/>
        <u16 name="local_central_directory_entry_count"/>
        <u16 name="central_directory_entry_count"/>
        <u32 name="central_directory_size"/>
        <u32 name="central_directory_offset"/>
        <u16 name="zip_comment_length"/>
        <string name="zip_comment" length="zip_comment_length"/>
    </type>
</namespace>

Catfish666 (666)

If I had to choose between the two above, I would probably choose the XML-esque one, despite the symbol noise and verbosity.

That said, how about a C style format?
(Disclaimer: I don't know if the "grammar" of this thing is unambiguous.)

format(little, twoscomp);

Zip
{
EndOfCentralDirectoryRecord:
    u32     signature == 0x06054b50;
    u16     disk_number;
    u16     central_directory_start_disk;
    u16     local_central_directory_entry_count;
    u16     central_directory_entry_count;
    u32     central_directory_size;
    u32     central_directory_offset;
    u16     zip_comment_length;
    string  zip_comment[zip_comment_length];
}

James Parsons (181)

I kinda like the first,if I may ask, what is this for?

helios (17506)

James: It's for a binary parser generator I'm working on. Deserializing specific binary formats is very tedious and error-prone. The idea is to automate at least part of the process by having a program generate the code from an human-readable specification:

//This is just a conceptual example.
namespace Zip{
    struct EndOfCentralDirectoryRecord{
        u32 signature,
            central_directory_size,
            central_directory_offset;
        u16 disk_number,
            central_directory_start_disk,
            local_central_directory_entry_count,
            zip_comment_length;
        std::string zip_comment;
        EndOfCentralDirectoryRecord *read(std::istream &stream){
            EndOfCentralDirectoryRecord *ret = new EndOfCentralDirectoryRecord;
            ret->signature = read_little_dword(stream);
            if (ret->signature != 0x06054b50){
                delete ret;
                return 0;
            }
            ret->disk_number = read_little_word(stream);
            ret->central_directory_start_disk = read_little_word(stream);
            ret->local_central_directory_entry_count = read_little_word(stream);
            ret->central_directory_entry_count = read_little_word(stream);
            ret->central_directory_size = read_little_dword(stream);
            ret->central_directory_offset = read_little_dword(stream);
            ret->zip_comment_length = read_little_word(stream);
            ret->zip_comment = read_sized_string(stream, ret->zip_comment_length);
            return ret;
        }
    };
}

Catfish: Why do you prefer the XML?
C-like syntaxes are difficult to parse. I don't know ahead of time what features I'll want to add, so if I'll have to extend the syntax later anyway, I may as well just start from a completely new syntax or from something more generic like XML.

James Parsons (181)

What is this parser for, it seems like it is for directory/zip operation. Are you designing some piece of software, or are you going deeper

Catfish666 (666)

helios wrote:
Catfish: Why do you prefer the XML?

To be honest, it's nothing more than a gut feeling that it's the "right way".
I can give no solid reasoning, sorry.

CodeGazer (163)

I personally prefer the XMLish type format also, if for nothing else then I like the way it looks and is easier to read for me. I also think it would be easier to parse in that format.

helios (17506)

I originally came up with the idea while getting pissed off while writing a ZIP parser, but it can be used to generate parsers for a large variety of formats.

closed account (3hM2Nwbp)

I'll third the XML-like solution. In fact, if it was XML that would be even better. In such a case, various formats (xml, json, ini) could be used to feed your generator with boost's property_tree library (if you so choose to utilize it).

IWishIKnew (1364)

Judging by the format, I would also choose XML. At first glance, there are obvious assumptions you can make that eliminate the possiblilty of screwing up parsing.

helios (17506)

Luc Lieber: How is that not XML?
Would it make it easier to feed that info into the generator if it was implemented as a library and you could specify the binary format by creating instances of various classes? For example,

1
2
3

Type EndOfCentralDirectoryRecord("EndOfCentralDirectoryRecord");
//...
EndOfCentralDirectoryRecord.add(String("zip_comment", zip_comment_length));

IWishIKnew: What do you mean? Parsing of what?

LB (13399)

XML does not support same-tag closing, that is <tag />

I would advocate for a JSON format.

helios (17506)

XML does not support same-tag closing, that is <tag />

Oh. Well, I'm going to be using tinyxml, which does support those tags. The user is free not to use them if they don't want to. The specification file is read-only, after all.

closed account (3hM2Nwbp)

@helios - sorry for the delay...were you looking at something like this?

* Quick and dirty -- I'm multitasking tonight!


<format endianess='little' neg='twoscomp'>
    <namespace name="Zip">
        <type name="EndOfCentralDirectoryRecord">
            <u16 name="disk_number"/>
            <u16 name="central_directory_start_disk"/>
            <u16 name="local_central_directory_entry_count"/>
            <u16 name="central_directory_entry_count"/>
            <u32 name="central_directory_size"/>
            <u32 name="central_directory_offset"/>
            <u16 name="zip_comment_length"/>
            <string name="zip_comment" length="zip_comment_length"/>
        </type>
    </namespace>
</format>

#include <map>
#include <boost/variant.hpp>
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
#include <string>

typedef boost::variant<uint8_t, uint16_t, uint32_t, uint64_t, int8_t, int16_t, int32_t, int64_t, std::string> XMLAttribute;

struct Type
{
	/// Name of the type
	std::string name;
	
	/// Variables of the type
	std::map<std::string, std::string> vars;

	void addType(const std::string& type, const std::string& name)
	{
		vars.insert(std::make_pair(type, name));
	}
};

struct Namespace
{
	/// Types contained in this namespace
	std::map<std::string, Type> types;
	
	/// Sub-namespaces
	std::map<std::string, Namespace> namespaces;
		
	Namespace& addNamespace(const std::string& namespace_)
	{
		namespaces.insert(std::make_pair(namespace_, (::Namespace())));
		return namespaces[namespace_];
	}

};

struct Format
{

	/// Format Attributes
	std::map<std::string, XMLAttribute> attributes;
	
	/// Top level namespaces
	std::map<std::string, Namespace> namespaces;

	template<typename T>
	void addAttribute(const std::string& key, T val)
	{
		attributes.insert(std::make_pair(key, val));
	}
	
	Namespace& addNamespace(const std::string& namespace_)
	{
		namespaces.insert(std::make_pair(namespace_, (::Namespace())));
		return namespaces[namespace_];
	}
};

struct Model
{
	/// Formats found in this file
	std::map<std::string, Format> formats;

	void addFormat(std::string name, Format&& format)
	{
		formats.insert(std::make_pair(name, std::move(format)));
	}
};

#define DEBUG
#if defined(DEBUG)
#	define debug(message) std::cout << message << std::endl;
#else
#define debug(message) do {} while(false)
#endif

void iterate_namespace(Namespace& namespace_, boost::property_tree::ptree& tree)
{
	for(auto child : tree)
	{
		if(child.first == "namespace")
		{
			debug("\tAdded Namespace: " << tree.get<std::string>("<xmlattr>.name"));
			iterate_namespace(namespace_.addNamespace(tree.get<std::string>("<xmlattr>.name")), child.second);
		}
		else if(child.first == "type")
		{
			Type type; 
			type.name = child.second.get<std::string>("<xmlattr>.name");
			debug("\t\tAdded Type: " << type.name);
			for(auto var : child.second)
			{
				if(var.first != "<xmlattr>")
				{
					debug("\t\t\tAdded Variable: " << var.second.get<std::string>("<xmlattr>.name") << "(" << var.first << ")");
					type.addType(var.first, var.second.get<std::string>("<xmlattr>.name"));
				}
			}
		}
	}
}

int main(int argc, char** argv)
{

	std::cout << "test" << std::endl;
	boost::property_tree::ptree tree;
	try
	{
		Model model;
		std::string file = "data.xml";
		// Swap out for read_json or read_ini if need be.
		boost::property_tree::xml_parser::read_xml(file, tree);

		debug("Loading format file: " << file);
		for(const auto& iter : tree)
		{
			if(iter.first != "format")
			{
				debug("Unknown Field: " << iter.first);
				continue;
			}

			Format format;
			/// Read Format Attributes
			for(auto attrib : iter.second.get_child("<xmlattr>"))
			{
				format.addAttribute(attrib.first, attrib.second.data());
				debug("\tAdded Attribute: " << attrib.first  << ": " <<  attrib.second.data());
			}
			
			debug("");
			
			/// Read Format
			for(auto namespace_ : iter.second)
			{
				if(namespace_.first != "namespace") continue;
				debug("\tAdded Namespace: " << namespace_.second.get<std::string>("<xmlattr>.name"));
				iterate_namespace(format.addNamespace(namespace_.second.get<std::string>("<xmlattr>.name", "")), namespace_.second);
			}
		}

	}
	catch(const boost::property_tree::ptree_error& error)
	{
		debug(error.what());
	}
}


Loading format file: data.xml
        Added Attribute: endianess: little
        Added Attribute: neg: twoscomp

        Added Namespace: Zip
                Added Type: EndOfCentralDirectoryRecord
                        Added Variable: disk_number(u16)
                        Added Variable: central_directory_start_disk(u16)
                        Added Variable: local_central_directory_entry_count(u16)
                        Added Variable: central_directory_entry_count(u16)
                        Added Variable: central_directory_size(u32)
                        Added Variable: central_directory_offset(u32)
                        Added Variable: zip_comment_length(u16)
                        Added Variable: zip_comment(string)

RUN FINISHED; exit value 0; real time: 10ms; user: 0ms; system: 0ms

Last edited on

helios (17506)

Oh! Sorry, I misunderstood you earlier. Ignore my previous question.

Lowest0ne (1536)

What is wrong with straight XML that you choose not to use it ( or any other existing format )?

helios (17506)

Personally, I think self-closing tags are alright, and save some typing when writing in an already overly verbose language, so any library that claims to parse XML should at least be configurable to accept them.
But, like I said, the library merely supports those tags. If you want to write the above as
<u16 name="disk_number"></u16>
then that's fine, too. The program won't particularly care either way.

LB (13399)

If you are having empty tags it means you're misusing the language. For instance, it makes more sense for the name to be between the open and clase tags rather than being an attribute.

helios (17506)

I always use attributes for simple members (numbers, strings, etc) and elements for objects and object members. I don't see how one can say that one way makes more sense than the other. To themselves maybe, but just more sense objectively?

Last edited on

LB (13399)

I don't know, it just feels like misuse. The syntax requires it but you're explicitly ignoring it, which means there's no a good syntax-to-use matchup. That's just my opinion, though, I guess it's not that huge of a deal.

Topic archived. No new replies allowed.