Read nested structures from file.

Forum

Forum
General C++ Programming
Read nested structures from file.

Read nested structures from file.

Hi guyz im having trouble reading structures from text file containing:

// this is example.txt
structname1{
    hp 50
    mp 10

    structname2{
        hp 90
        mp 50
        structname3{
            hp 1
            mp 9
        }
    }
}
//

My code so far:


#include <iostream>
#include <fstream>
#include <list>

class dataBlock{
public:
    long locationstart;
    long locationend;
    dataBlock(long,long);
};

dataBlock::dataBlock (long a, long b) {
  locationstart = a;
  locationend = b;
}

int main()
{
    std::list<dataBlock *> blocklist;
    std::streampos size1;
    char *memblock;
    char *pos;

    std::ifstream file ("example.txt", std::ios::in|std::ios::binary|std::ios::ate);
    if (file.is_open())
    {
        size1 = file.tellg();
        memblock = new char [size1];
        file.seekg (0, std::ios::beg);
        file.read (memblock, size1);
        file.close();

        std::cout << "The entire file content is in memory:\n\n";

        pos = memblock;
    }
    else std::cout << "Unable to open file";

    int openBracket = 0;
    int closedBracket = 0;

    while(*pos){
        if(*pos == '{'){++openBracket;}
        if(*pos == '}'){++closedBracket;}
        ++pos;
    }
    //Can't figure out which open bracket belong to which closed bracket.

    //So i can use this function at the right time.
    blocklist.push_back(new dataBlock(blockStart, blockFinish));


    delete[] memblock;
    while(!blocklist.empty()){delete blocklist.front(); blocklist.pop_front();}

    return 0;
}

All help is mutch appreciated.

Duthomhas (13131)

Before I bother, I have a concern about your data structure.

The file is clearly a tree. But you are storing items in a list.

Let me know, and I’ll demonstrate the magic of recursive descent for you. :O)

PacR (97)

I want to print dataBlock like this:


    int i2 = 1;
    for(std::list<datablock*>::iterator it = blocklist.begin(); it != blocklist.end(); it++){
        std::cout << "structname" << i2 << " begin at memblock[" << (*it)->locationstart <<"] and end at memblock[ " << (*it)->locationend << "] " <<std::endl;
        i2++;
    }

If you can do a better way im interested to see.

Last edited on

Duthomhas (13131)

I assume begin is the open brace and not the name?

Give me a few minutes.

PacR (97)

Yes.

jonnin (11333)

if the file format is your doing, you need to redo it.
If its trying to read this junk that someone else produced, ... sigh.

tell us any weird (school?) requirements... can you not use std::string here?
what is the expected output from your test file?

is this an 'outsmart the bunghole' problem? I mean you can read the whole file into memory, use built in c++ to find the { markers, and that is the start of your records, minus the leading name text so a little backwards iteration off the { characters. Matching the } braces can be done with a counter/tracker, unless they can be mismatched or bungled in a 'bad' file and you have to detect or handle that (?).

Last edited on

mbozzi (3911)

If all you want is to find the begin + end of the structures labeled with {}, use a stack.

When a { is encountered, push its address on the stack.
When a } is encountered, pop the address of the most recent { off the top of the stack. The address you popped off is the start and the address of the current } is the end.

If you reach the end of the file and there's still {s on the stack, there were too few }s.
If you underflow the stack there were too many }s.

Last edited on

seeplus (6458)

Do you want something like this:

#include <iostream>
#include <fstream>
#include <list>
#include <stack>
#include <iterator>
#include <iomanip>

struct dataBlock {
	size_t locationstart {};
	size_t locationend {};
};

int main() {
	std::ifstream file("example.txt", std::ios::binary);

	if (!file.is_open())
		return (std::cout << "Unable to open file\n"), 1;

	file >> std::noskipws;

	const std::string data {std::istream_iterator<char>(file), std::istream_iterator<char>()};
	std::stack<size_t> block;
	std::list<dataBlock> blocklist;

	for (size_t f {}; f = data.find_first_of("{}", f), f != std::string::npos; ++f)
		if (data[f] == '{')
			block.emplace(f);
		else
			if (block.empty())
				return (std::cout << "Missing {\n"), 2;
			else {
				blocklist.emplace_front(block.top(), f);
				block.pop();
			}

	if (!block.empty())
		return (std::cout << "Missing }\n"), 3;

	for (size_t blk {}; const auto& [blstart, blend] : blocklist)
		std::cout << "structname" << ++blk << " begins at position " << blstart << " and ends at " << blend << "\n";
}

Output is (my test file excludes the comment line):


structname1 begins at position 11 and ends at 162
structname2 begins at position 53 and ends at 159
structname3 begins at position 105 and ends at 152

Last edited on

Duthomhas (13131)

Sooo, I don’t know if you are a student or just a newbie or if you actually have a good idea with some older languages and are just tripping over C++.

So I’m going to pretend you’re learning and give you something that looks like total overload.

The way I am presenting it is tailored pretty close to your posting (though not exactly), but is designed simply to point out a few concepts.

(I will typically have a class that wraps a stream for tracking line and column values, but here we will do it the old way and try to track them ourselves. It doesn’t work perfectly in all cases, but is good enough.)

Here’s the code:

#include <ciso646>
#include <fstream>    // for reading a file
#include <iomanip>    // for printing pretty output
#include <iostream>
#include <limits>     // for skipping comments in the input
#include <sstream>    // for treating a string like a file
#include <stdexcept>  // for error handling
#include <string>
#include <vector>     // (unless you have a SPECIFIC and QUANTIFIABLE need for a linked list.)


//-------------------------------------------------------------------------------------------------
// Read an entire file and return it as a string (defaults to text mode)
//
std::string load_file_as_text(
	const std::string & filename,
	std::ios::openmode mode = std::ios::in )
{
	std::ifstream f( filename, mode );
	std::ostringstream ss;
	ss << f.rdbuf();
	return ss.str();
}


//-------------------------------------------------------------------------------------------------
// This error reports failure to parse blocks
//
struct parse_to_blocks_error : public std::runtime_error
{
	std::size_t line, column;
	
	parse_to_blocks_error( std::size_t line, std::size_t column, const char * message )
		: std::runtime_error( message )
		, line{line}
		, column{column}
		{ }
};


//-------------------------------------------------------------------------------------------------
// Our block data and list of blocks type
//
struct block
{
	std::string name;
	std::size_t begin, end;  // inclusive, exclusive ::= positions of '{', '}'
};

using block_list = std::vector <block> ;


//-------------------------------------------------------------------------------------------------
// Block parsing helper functions
//
namespace helper
{
	// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	// Extracts whitespace and updates our line position counters
	// (If line, column do not matter when reporting errors, you 
	// can replace all calls to this function with `ins >> std::ws`.)
	//
	void extract_whitespace( std::istream & ins, std::size_t & line, std::size_t & position )
	{
		while (true)
		{
			auto c = ins.peek();
			if (c != std::istream::traits_type().eof())
				switch (c)
				{
					case ' ': case '\t': case '\f': case '\v': case '\r': ins.get(); continue;
					case '\n': ins.get(); line += 1; position = ins.tellg(); continue;
				}
			break;
		}
	}
	
	// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	// Returns true if a comment was extracted
	// Also updates our line position counters
	//
	bool extract_comment( std::istream & ins, std::size_t & line, std::size_t & position )
	{
		if ((ins.get() == '/') and (ins.peek() == '/'))
		{
			ins.ignore( std::numeric_limits <std::streamsize> ::max(), '\n' );
			line += 1;
			position = ins.tellg();
			return true;
		}
		ins.unget();
		return false;
	}


	// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	// Returns true if the argument character was extracted
	// Does not expect `c` to ever be a whitespace character.
	// If that changes, make sure to fix this function 
	// to update the line position counters properly!
	//
	int extract_character( std::istream & ins, int c )
	{
		return (ins.peek() == c)
			? ins.get()
			: 0;
	}

	
	// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	// Returns true if a 'name' was extracted
	// A name is any set of characters except /, {, }, space, tab, newline, or carriage return.
	// You can tweak this function however you like.
	// (Just don't read past EOL, else you'll need to update the line position counters.)
	//
	bool extract_name( std::istream & ins, std::string & name )
	{
		name.clear();
		while (true)
		{
			auto c = ins.peek();
			if ( (c == std::istream::traits_type().eof())
					or (c == '/') or (c == '{') or (c == '}')
					or (c == ' ') or (c == '\t') or (c == '\n') or (c == '\r') )
				break;
			name.push_back( ins.get() );
		}
		return !name.empty();
	}


	// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	// Returns true if a 'value' was extracted
	// Currently a value has the same semantics as a 'name', 
	// so we just pass it off to extract_name().
	// Tweak as desired
	//
	bool extract_value( std::istream & ins, std::string & value )
	{
		return extract_name( ins, value );
	}

(to be continued)

Last edited on

Duthomhas (13131)


	// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
	// This is the function that RECURSIVELY extracts blocks of name/value pairs.
	// 
	void parse_to_blocks(
		std::istream & ins,       // text to parse
		block_list &   blocks,    // resulting list of blocks being created
		bool           first,     // true if this is the top-level block
		std::size_t &  line,      // line number
		std::size_t &  position ) // absolute position in text of first character in current line
	{
		// Helper to throw parsing errors with line and column information
		auto error = [&]( auto message )
		{
			// heh, I learned something:
			// https://stackoverflow.com/questions/13732338/
			ins.clear();
			return parse_to_blocks_error( 
				line, (std::size_t) ins.tellg() - position + 1, message );
		};
		
		// Our loop has two sequential actions:
		//   first Extract a name, 
		//   then extract either a value or a nested block
		// The two actions are fairly alike, but require different enough 
		// behaviors that we just unroll the action into two similar pieces of code.
		while (true)
		{
			// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
			// (1) Extract the name
			
			// Leading whitespace?
			extract_whitespace( ins, line, position );
			
			// EOF?
			if (ins.eof())
			{
				if (first) return;
				throw error( "unexpected EOF" );
			}
			
			// Comment?
			if (extract_comment( ins, line, position )) 
				continue;
			
			// End of block?
			if (ins.peek() == '}')
			{
				if (first) throw error( "unexpected }" );
				return;
			}
			
			// There must be a name here
			std::string name;
			if (!extract_name( ins, name ))
				throw error( "expected name" );

			// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
			// (2) Extract the associated block or value

			// Leading whitespace?
			extract_whitespace( ins, line, position );
			
			// EOF?
			if (ins.eof())
				throw error( "unexpected EOF" );
			
			// Comment?
			if (extract_comment( ins, line, position ))
				extract_whitespace( ins, line, position );

			// End of block?
			if (ins.peek() == '}')
				throw error( "unexpected }" );
			
			// Begin block data?
			if (extract_character( ins, '{' ))
			{
				auto n = blocks.size();
				blocks.push_back({ name, (std::size_t) ins.tellg(), 0 });
				parse_to_blocks( ins, blocks, false, line, position );
				if (!extract_character( ins, '}' ))
					throw error( "missing }" );
				blocks[n].end = (std::size_t) ins.tellg() + 1;
				continue;
			}
			
			// Not block data, must be a value
			std::string value;
			if (!extract_value( ins, value ))
				throw error( "expected value" );
		}
	}

} // namespace helper


//-------------------------------------------------------------------------------------------------
// Parse a string into a list of blocks
//
auto parse_to_blocks( const std::string & text )
{
	block_list blocks;
	std::size_t line = 1;
	std::size_t position = 0;
	std::istringstream iss( text, std::ios::binary );
	helper::parse_to_blocks( iss, blocks, true, line, position );
	return blocks;
}


//-------------------------------------------------------------------------------------------------
int main( int argc, char ** argv )
//-------------------------------------------------------------------------------------------------
	try
	{
		// (This just helped me play around with it)
		const char * filename = (argc > 1)
			? argv[1]
			: "example.txt";

		// As per your example, we wish to load the file into memory
		// (presumably for reference later)
		//
		// We load the file as TEXT (with newline conversions to '\n'),
		// but then parse it as binary.
		//
		// We could technically avoid the whole in-memory buffer and
		// just parse directly from stream, but you'd have to make sure
		// to open a binary stream for the error reporting to be accurate.
		//
		auto raw_data = load_file_as_text( filename );
		auto blocks   = parse_to_blocks( raw_data );

		// Let's print our results.
		std::cout
			<< "block begin end   name\n"
			<< "----- ----- ----- -------------\n";
		std::size_t n = 1;
		for (auto block : blocks)
			std::cout
				<< std::setw(5) << (n++)       << " "
				<< std::setw(5) << block.begin << " "
				<< std::setw(5) << block.end   << " "
				<< block.name << "\n";
	}

	// Errors that we got from indexing the blocks with our parse function
	catch (const parse_to_blocks_error & e)
	{
		std::cerr << "[line:" << e.line << ", column:" << e.column << "]: " << e.what() << "\n";
		return 1;
	}

	// Any other random exception, just to be complete
	catch (const std::exception & e)
	{
		std::cerr << e.what() << "\n";
		return 1;
	}
	
	// meh, if being complete, then exceptions that aren't std::exception-derived objects
	// (like if you `throw 42;`)
	catch (...)
	{
		std::cerr << "something failed and I don't know what it is!\n";
		return 1;
	}

The idea is to use recursion to parse through these kinds of data structures.
What made this weird was the desire to parse out ONLY the beginning and end of the nested structures.

--> It would be a whole lot simpler to simply parse the data itself, and just annotate it with its position in the source file.

The advantages to handling the data as a parse tree instead of just getting a few indexes out of it (for later re-parsing?) are manifold.

Oh, notice also that I added the 'name' to the data being extracted. This was just to show the possibilities, but also just because it made it easy for me to test a few variations on your file structure.

Anyhow... that’s it. The accept/expect and recursive design here is a very old one, and is kind of the OG of structured parsing. There are certainly other ways to do it, but I figured, eh, IDK, I fell asleep halfway through this last night.
And discovered a weirdness about istream::tellg() that I didn’t know about, lol.

Last edited on

Duthomhas (13131)

Oh, and, results on your original data:

block begin end   name
----- ----- ----- -------------
    1    35   175 structname1
    2    73   173 structname2
    3   122   167 structname3

PacR (97)

Thank you all for your answers, i'm still new to c++ and have so mutch more to learn.
For now i will use seeplus and mbozzi approach.
Here is my code using logic provided by seeplus & mbozzi:


#include <iostream>
#include <fstream>
#include <list>
#include <iterator>
#include <stack>

class dataBlock{
public:
    long locationstart;
    long locationend;
    dataBlock(long,long);
};

dataBlock::dataBlock (long a, long b) {
  locationstart = a;
  locationend = b;
}

int main()
{
    char c;
    std::stack<size_t> block;
    std::list<dataBlock *> blocklist;
    std::streampos size1;
    char *memblock;
    char *pos;

    std::ifstream file ("example.txt", std::ios::in|std::ios::binary|std::ios::ate);
    if (file.is_open())
    {
        size1 = file.tellg();
        memblock = new char [size1];
        file.seekg (0, std::ios::beg);
        file.read (memblock, size1);
        file.close();

        std::cout << "The entire file content is in memory:\n\n";

        pos = memblock;
    }
    else std::cout << "Unable to open file";

    int openBracket = 0;
    int closedBracket = 0;
    long num = 0;

    while(*pos){
        if(*pos == '{'){
            ++openBracket;
            block.emplace(num);
        }
        if(*pos == '}'){
            ++closedBracket;
            blocklist.emplace_front(new dataBlock(block.top(), num));
            block.pop();
        }
        ++pos;
        ++num;
    }

    int i2 = 1;
    for(std::list<dataBlock*>::iterator it = blocklist.begin(); it != blocklist.end(); it++){
        std::cout << "structname" << i2 << " begin at memblock[" << (*it)->locationstart <<"] and end at memblock[ " << (*it)->locationend << "] " <<std::endl;
        i2++;
    }

    delete[] memblock;
    while(!blocklist.empty()){delete blocklist.front(); blocklist.pop_front();}
    std::cin>>c;
    return 0;
}

seeplus i didint understand why you declare sting in your code line 21 like this:

1
2
3


const std::string data {std::istream_iterator<char>(file), std::istream_iterator<char>()};

and also didin't understand line 39):

1
2
3


for (size_t blk {}; const auto& [blstart, blend] : blocklist)

Duthomhas your code was to complicated for me, however i will continue to study it.

seeplus (6458)

Note that <ciso646> is removed in C++20. <iso646.h> is still available
https://en.cppreference.com/w/cpp/header/ciso646

seeplus (6458)

L21 - reads the whole file into the std::string. std::string has a constructor that takes a start and end iterator. See 7) of https://cplusplus.com/reference/string/string/string/ and
https://cplusplus.com/reference/iterator/istream_iterator/

L39. This is structured binding. See
https://en.cppreference.com/w/cpp/language/structured_binding

Simply, blstart is blocklist.locationstart and blend is blocklist.locationend

This is a range-for loop. See:
https://en.cppreference.com/w/cpp/language/range-for

It iterates over all the elements in blocklist without needing to use explicit iterators.

Why do you use std::list<dataBlock*> instead of std::list<dataBlock>? Using a list of pointers means you have to allocate/delete memory whereas using a list of dataBlock doesn't have all that overhead.

Last edited on

seeplus (6458)

i'm still new to c++ and have so mutch more to learn

What resources are using? For an on-line learning resource, consider:
https://www.learncpp.com/

kigar64551 (784)

Unless your goal is to write your own writer/parser, e.g. as an exercise, I highly recommend to go with a standardized format to store "structured" data, such as JSON or XML, and use one of the available existing writer/parser libraries:

https://github.com/json-c/json-c

https://libexpat.github.io/

Last edited on

seeplus (6458)

however i will continue to study it

What are you trying to achieve with the code? Your/my code just details the positions of the {} found. It assumes the blocks are named structname1, structname2 etc. But what if the blocks were called foobar, barfoo and qwerty? The output would be the same showing the names structname1 etc instead of the actual names. Duthomhas's code also obtains the struct name - at the expense of more complexity - and so would correctly display foobar, barfoo in the output.

The design/code of a program depends upon what is required. Just the position of the braces/level number is much easier to obtain than also having to obtain the names.

If with code like yours/mine you then decide later you also need the actual names to be displayed then that would require almost a total rewrite as the design just doesn't provide for this.

jonnin (11333)

also, not to ruin your day, but if you are new to c++, and especially if you are new to c++, write c++.

The first thing to learn and do, IMHO, is to take that code you posted (assuming it works. if it does not work, make it work first) and rewrite it using std::string instead of char*. get rid of all the new/delete stuff (raw pointers) and clean it up that way. Its a small step, but it will teach you much and get rid of some bad habits. There are exceptions, but in general, avoid char* (C-style) strings.

Duthomhas (13131)

~~I guess if we’re not gonna accept advice on how to NOT do things WRONG, then here’s a thought:~~
[EDIT] Sorry, I was being a jerk. I’ve done plenty of things in a less-than-superb way before — still do — and I shouldn’t be treating anyone badly for trying to learn. I’ll leave this post up, but please know I wish I hadn’t had the knee-jerk reaction to be unkind and rude. I’m working on that. [/EDIT]

This assignment is your basic “match da parens” program, just without the obnoxious “match da parens in a C prog” having to deal with quoted strings.

It does have line comments, though, so I guess there is that wrinkle.

So, not really being interested in actually parsing the data, we can easily match parentheses with a couple of vectors:

#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

//-------------------------------------------------------------------------------------------------
// Read an entire file and return it as a string (defaults to text mode)
//
std::string load_file_as_text(
	const std::string & filename,
	std::ios::openmode mode = std::ios::in )
{
	std::ifstream f( filename, mode );
	std::ostringstream ss;
	ss << f.rdbuf();
	return ss.str();
}


//-------------------------------------------------------------------------------------------------
struct data_block
{
	long location_start, location_end;
};


//-------------------------------------------------------------------------------------------------
std::vector <data_block>
find_locations( const std::string & s )
{
	std::vector <data_block> locations;
	std::vector <unsigned>   indices;

	for (std::size_t n = 0;  n < s.size();  n++)
		switch (s[n])
		{
			case '{':
				indices.push_back( locations.size() );
				locations.push_back({ (long)n, 0 });
				break;

			case '}':
				if (indices.empty())
					throw std::runtime_error( "unexpected }" );
				locations[ indices.back() ].location_end = n;
				indices.pop_back();
				break;

			case '/': // might as well keep the ability to skip line comments
				if (s.c_str()[n+1] == '/')
					while ((n < s.size()) and (s[n] != '\n'))
						n += 1;
				break;
		}

	if (!indices.empty())
		throw std::runtime_error( "missing }" );

	return locations;
}


//-------------------------------------------------------------------------------------------------
int main( int argc, char ** argv )
//-------------------------------------------------------------------------------------------------
	try
	{
		const char * filename = (argc > 1)
			? argv[1]
			: "example.txt";

		auto data      = load_file_as_text( filename, std::ios::binary );
		auto locations = find_locations( data );

		std::cout
			<< "block begin end\n"
			<< "----- ----- -----\n";
		std::size_t n = 1;
		for (auto block : locations)
			std::cout
				<< std::setw(5) << (n++)                << " "
				<< std::setw(5) << block.location_start << " "
				<< std::setw(5) << block.location_end   << "\n";
	}
	catch (const std::exception & e)
	{
		std::cerr << e.what() << "\n";
		return 1;
	}

That thar indices vector is the part that does the magic. It makes the iterative version of what would otherwise be a recursive function.

Last edited on

seeplus (6458)

As another take on this, consider:

#include <iostream>
#include <fstream>
#include <list>
#include <stack>
#include <iterator>
#include <string>
#include <tuple>
#include <format>

struct dataBlock {
	std::string name;
	size_t blstart {};
	size_t blend {};
	size_t level {};
};

int main() {
	std::ifstream inFile("example.txt", std::ios::binary);

	if (!inFile.is_open())
		return (std::cout << "Unable to open file\n"), 1;

	std::list<dataBlock> blocks;
	std::stack<dataBlock> db;

	for (auto [chi, name] {std::tuple { int {}, std::string{} }}; (chi = inFile.peek()) != EOF; inFile.get())
		switch (const auto ch { static_cast<char>(chi) }; ch) {
			case '}':
				if (db.empty())
					return (std::cout << "Missing {\n"), 2;

				blocks.push_front(db.top());
				blocks.front().blend = inFile.tellg();
				db.pop();
				break;

			case '{':
				db.emplace(name, inFile.tellg(), 0, db.size() + 1);
				break;

			case '/':
				if (inFile.get(); inFile.peek() == '/')
					for (inFile.get(); inFile.peek() != EOF && inFile.peek() != '\n'; inFile.get());

				break;

			case ' ':
			case '\t':
			case '\r':
				break;

			case '\n':
				name.clear();
				break;

			default:
				name += ch;
				break;
		}

	if (!db.empty())
		return (std::cout << "Missing {\n"), 3;

	const std::string form {"{:<15}  {:>5}  {:>5}  {:>5}\n"};
	const auto fmt_to { std::ostream_iterator<char>(std::cout) };

	std::vformat_to(fmt_to, form, std::make_format_args("Name", "Level", "Start", "End"));

	for (const auto& [nam, blstart, blend, level] : blocks)
		std::vformat_to(fmt_to, form, std::make_format_args(nam, level, blstart, blend));
}

which also extracts and shows the block name and displays the level associated with each block.

Given:


// this is example.txt
foobar{
    hp 50
    mp 10

    barfoo{
        hp 90
        mp 50
        qwerty{
            hp 1
            mp 9
        }
    }
}

this displays:


Name             Level  Start    End
foobar               1     30    171
barfoo               2     67    168
qwerty               3    114    161

Last edited on

Registered users can post here. Sign in or register to post.