Handling Errors for Big Projects

Hey everyone,

I'm looking for some general advice. When working small scale in C++, handling errors either using exceptions or sans-exceptions is of little consequence. However, as I have never worked on a really large project, I would love to hear people's thoughts on whether it is generally regarded best practice to utilize C++'s built in exception handling or if it is better to handle them in some custom manner.

It might even be that some mixture of both is preferred. Leave your thoughts below. In what situations is it best to use exceptions and when is it best to deal with errors in another way? Pros? Cons? Thoughts?

EDIT: When I say "using exceptions", I'm referring to the throw -> catch paradigm. When I say "sans-exceptions", I'm referring to the classic return-false-if-something-goes-wrong-I'll-deal-with-it-one-layer-up paradigm. (e.g. if (!myFunc()) cout << "Error" << endl;)

Best wishes,
Tresky
Last edited on
If you want your program to crash and not know the reason, then go ahead and trust the programmers and c basic error handling.

I think it's best to build in some kind of error reporting, call it debug mode, turn it off when your want best performance and turn it on when you want to debug a issue. Hopefully you can hit it again.

There will be bugs, the question is how do you want to spend your days and nights looking for them.
@SamuelAdams: Thanks for the reply. Any suggestions as to how to build that in? What I mean, are you referring to creating a class that inherits from std::exception or something of the like? Or do you have some specifics in mind that maybe you find helpful?
Let me preface this by saying I don't do things as I expect most C++ coders do. I still follow the older technique of checking all return error codes, i.e., your 2nd scenario. I don't use try - catch - throw.

Having said that, error handling and debugging are so critically important that before I even begin to write any new code my first thoughts are of how I'm going to debug it. In fact, I usually write debugging code even before I start coding something. What I do specifically is log everything to debug output files. I create an equate/#define in my programs such as this...

 
#define MyDebug 


...and in any code I write I code something like this....

1
2
3
4
5
6
7
8
HWND hWnd = CreateWindow(....);
if(hWnd==NULL)
{
    // do something
}
#ifdef MyDebug
fprintf(fp, "  hWnd = %p\n", hWnd);
#endif 


For Release builds I just comment out the #define symbol.

I log all critical variables in an organized fashion, as well as the entrances and exits of all function calls so that I always know which procedures I'm in and what the the values are of all the critical variables. Booleans/flags are especially important because of course their state controls the control or logic flow and branches. Everything has to be done in such a manner that if something goes wrong one can quickly trace through the log file to see what data was read in, what decisions the code took based on read in values, and what the values of variables were. A lot of the work I do is data processing type work, so a lot of the computations involve tables of values, and a lot of tabular data is output in my log files. That's why stepping debuggers are not of best use to me as are tabular outputs that shows what's going on with lots of values.

C++ purists will fault me I'm sure for not using C++ Exception Handling I suppose. Perhaps its justified. I don't know. All I can say is that I'm 64 years old and what I've been doing has worked for me for a long time. I don't care for code built with C++ Exception Handling enabled as it adds too much to executable size to suit me. Otherwise I might do it. Its an interesting idea. Finally, let me state that the program sizes I deal with are usually in the 5,000 to 50,000 lines of code category. I know from reading Bjarne Stroustrup's various commentaries that at AT&T when he and others were developing C++ they were dealing with millions of lines of code programs. Perhaps in that context what I do would break down. Don't know.



Last edited on
Basic information on error handling is available in
CppCoreGuidelines: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#S-errors
Super-Faq: https://isocpp.org/wiki/faq/exceptions

Extracted from 'C++ Coding Standards' (Alexandrescu and Sutter):
The basic, strong, and no-fail guarantees were originally described in ... with respect to exception safety. They apply to all error handling regardless of the specific method used, and so we will use them to describe error handling safety in general. ...

In general, every function should provide the strongest guarantee that it can provide without needlessly penalizing calling code that doesn't need the guarantee....

Ideally, we write functions that always succeed and therefore can provide the no-fail guarantee. Certain functions must always provide the no-fail guarantee, notably destructors, deallocation functions, and swap functions...

Most functions, however, can fail. When errors are possible, the safest approach is to ensure that a function supports a transactional behavior: Either it totally succeeds and takes the program from the original valid state to the desired target valid state, or it fails and leaves the program in the state it was before the call - any object's visible state before the failed call is the same after the failed call (e.g., a global int's value won't be changed from 42 to 43) and any action that the calling code would have been able to take before the failed call is still possible with the same meaning after the failed call (e.g., no iterators into containers have been invalidated, performing ++ on the aforementioned global int will yield 43 not 44). This is the strong guarantee.

Finally, if providing the strong guarantee is difficult or needlessly expensive, provide the basic guarantee: Either the function totally succeeds and reaches the intended target state, or it does not completely succeed and leaves the program in a state that is valid (preserves the invariants that the function knows about and is responsible for preserving) but not predictable (it might or might not be the original state, and none, some, or all of the postconditions could be met; but note that all invariants must still be reestablished). The design of your application must prepare for handling that state appropriately.

That's it; there is no lower level. A failure to meet at least the basic guarantee is always a program bug. Correct programs meet at least the basic guarantee for all functions; even those few correct programs that deliberately leak resources by design, particularly in situations where the program immediately aborts, do so knowing that they will be reclaimed by the operating system. Always structure code so that resources are correctly freed and data is in a consistent state even in the presence of errors, unless the error is so severe that graceful or ungraceful termination is the only option.

When deciding which guarantee to support, consider also versioning: It's always easy to strengthen the guarantee in a later release, whereas loosening a guarantee later will break calling code that has come to rely on the stronger guarantee. Remember that "error-unsafe" and "poor design" go hand in hand: If it is difficult to make a piece of code satisfy even the basic guarantee, that almost always is a signal of its poor design. ...


The process of reporting an error itself should be fail-safe. For instance, it is a bad idea for an exception class to have a data member (say std::string) or base class whose copy constructor could throw.

Particularly when writing libraries, it is useful to remember that robust programs have to deal with run-time errors that occur in production code; logging error information is primarily useful only in debugging scenarios where the user of the library (or the one who looks at the error log) is the programmer who wrote the library.
Like Freddie1, I prefer to avoid exceptions because if you want to report decent diagnostics about where an error occurred, then you have to check for errors wherever they occurred and also report on them all the way up the call stack until you can do something about it:
1
2
3
4
5
6
7
8
9
try {
    func1();
    do some stuff;
    func2();
    do some more stuff
    func3()
catch (SomeClass &ex) {
    cerr << "uh " << ex.text() << "happened somewhere. Not sure where or why\n";
}


In real production-ready code, the error handling is about 80% of the lines of code. At work, we use a thread-specific error variable that contains both numeric values and a text description of the error and, more importantly, the context where it occurs. If you call a function that returns an error, then you can add your context and return an error yourself. Eventually this will bubble up to a point where the code, or a person reading the log file, can do something about it. E.g., a program might fail to start and the error log shows:
Call to config server failed: keyword not found.
Can't get config keyword "logFileName"
Fatal error: can't initialize program.

The errror happens in main() -> init() -> get_config(). Each function adds something to the error object so that when it's finally printed, it basically leads you right to the problem (the program won't start because it failed to initialize, and it failed to initialize because it went looking for the "logFileName" keyword in the config server and the config server couldn't find it.

If you use exceptions, you'd probably be tempted to through the exception way down in get_config() and catch it way up in main(). In that case you're likely to just get a "keyword not found" message. What good is that??? Which keyword? Where wasn't it found?

If a function returns a value then be careful about how you report errors. For example:
int getTheAnswer(some args);
If an error occurs inside getTheAnswer(), then what will you return? 0? -1? But could these also be returned as legal values? It's sometimes better to write the code like:
bool getTheAnswer(some args, int &result);

Just my 2 cents.
Thanks JLBorges and DHayden for posting that. Both excellent reads.

Error handling and recovery is interesting, and its the sort of thing one can get creative with. For example DHayden's example with passing error codes or error objects back through reference parameters has always interested me, and I actually have something like that implemented in a partially complete application I started working on quite some time ago. I say only partially implemented because I got the application up and running, but I didn't implement the error codes yet. But the last parameter of all the functions that get called are the By Reference error indicator, so whatever happens can be passed back up through the stack to a point where the error could be reported.

It occurs to me now though that an error object would be more useful than a single integral error code. Possibly a struct/class containing a text buffer where diagnostic information could be accumulated at various points where 'irregularities' occurred in the processing.

Likely not too many folks are familiar with this specific Api, but I always appreciate the SQLGetDiagRec()...

https://msdn.microsoft.com/en-us/library/ms716256(v=vs.85).aspx

...setup in the ODBC Api. Its C based but various error codes and textural statements are returned.
Last edited on
> It occurs to me now though that an error object would be more useful than a single integral error code.
> Possibly a struct/class containing a text buffer where diagnostic information could be
> accumulated at various points where 'irregularities' occurred in the processing.

Certainly useful for debugging error handling code, but quite useless for actual robust error recovery for a large code base in a production environment.

Stroustrup in 'The C++ Programming Language' (emphasis added):
Not every function should be a firewall. That is, not every function can test its preconditions well enough to ensure that no errors could possibly stop it from meeting its postcondition. The reasons that this will not work vary from program to program and from programmer to programmer.
However, for larger programs:
[1] The amount of work needed to ensure this notion of ‘‘reliability’’ is too great to be done
consistently.
[2] The overhead in time and space is too great for the system to run acceptably (there will be
a tendency to check for the same errors, such as invalid arguments, over and over again).
[3] Functions written in other languages won’t obey the rules.
[4] This purely local notion of ‘‘reliability’’ leads to complexities that actually become a burden to overall system reliability.

However, separating the program into distinct subsystems that either complete successfully or fail in well-defined ways is essential, feasible, and economical. Thus, major libraries, subsystems, and key interface functions should be designed in this way. Furthermore, in most systems, it is feasible to design every function to ensure that it always either completes successfully or fails in a welldefined manner.


An obvious drawback of textual error information being dynamically accumulated at various points is that it violates a fundamental principle of robust error handling: the process of reporting a failure should itself be fail-safe. Attempting to circumvent this by, say, implementing fail-safe dynamic allocation mechanisms for error information would 'lead to complexities that actually become a burden to overall system reliability'
dhayden wrote:
If you use exceptions, you'd probably be tempted to through the exception way down in get_config() and catch it way up in main(). In that case you're likely to just get a "keyword not found" message. What good is that??? Which keyword? Where wasn't it found?


@dhayden

I just wanted to point out something, but I first need to acknowledge the vastly superior knowledge and experience of yourself and others like JLBorges, compared to a backyard basher like myself :+). I am not sure whether you are aware of what I have written below.

C++11 has polymorphic nested exceptions, there is a really good example here:
http://en.cppreference.com/w/cpp/error/nested_exception

Given that one can create their own exception classes , then one should be able to at least log and/or preferably actually handle exceptions at multiple levels of depth, with the context info at each level.

Maybe the reason for your answer is that (IIRC) you have worked on existing code bases, where it is impossible to implement exceptions - I think you said it was a firable offence :+) Also IIRC, you have or currently work with old compilers - C++03?



For me, the most important part in JLBorges quote from Stroustrup was this part:

However, separating the program into distinct subsystems that either complete successfully or fail in well-defined ways is essential, feasible, and economical. Thus, major libraries, subsystems, and key interface functions should be designed in this way. Furthermore, in most systems, it is feasible to design every function to ensure that it always either completes successfully or fails in a welldefined manner.


With this part:
JLBorges wrote:
An obvious drawback of textual error information being dynamically accumulated at various points is that it violates a fundamental principle of robust error handling: the process of reporting a failure should itself be fail-safe. Attempting to circumvent this by, say, implementing fail-safe dynamic allocation mechanisms for error information would 'lead to complexities that actually become a burden to overall system reliability'


I imagine it wouldn't be worth it to have fail safe allocation for a typical exception string; I am having a hard time imagining how a typical exception object might fail in this regard anyway. I guess there is a possibility, but what is the real probability? Maybe I am thinking about it too simplistically? But even then, shouldn't an exception object be a simple thing: This part has failed; send a message to that effect? On the other hand, I imagine an exception handler could be quite involved and complex .

Any way, I hope everyone is well - and Seasons Greetings to you all :+)

> wouldn't be worth it to have fail safe allocation for a typical exception string

IS:
Class exception
1
2
3
4
5
6
7
8
9
10
namespace std {
    class exception {
        public:
            exception() noexcept;
            exception(const exception&) noexcept;
            exception& operator=(const exception&) noexcept;
            virtual ~ exception();
            virtual const char* what() const noexcept;
    };
}


The class exception defines the base class for the types of objects thrown as exceptions by C ++ standard library components, and certain expressions, to report errors detected during program execution.

Each standard library class T that derives from class exception shall have a publicly accessible copy constructor and a publicly accessible copy assignment operator that do not exit with an exception. These member functions shall meet the following postcondition: If two objects lhs and rhs both have dynamic type T and lhs is a copy of rhs, then strcmp(lhs.what(), rhs.what()) shall equal 0.


For example, re. copying objects of type std::run_time_error
Because copying std::exception is not permitted to throw exceptions, this message is typically stored internally as a separately-allocated reference-counted string. This is also why there is no constructor taking std::string&&: it would have to copy the content anyway.
http://en.cppreference.com/w/cpp/error/runtime_error
@JLBorges

Perhaps I worded that badly, I meant in relation to:

..... Attempting to circumvent this by, say, implementing fail-safe dynamic allocation mechanisms for error information .....


I was trying to see how that applied to exceptions - they still deal with textual info from various locations. But I probably misconstrued this part:

freddie1 wrote:
It occurs to me now though that an error object would be more useful than a single integral error code.


Maybe freddie1 was talking about a single global error object, as opposed to an exception object derived from std::exception . Sorry for the confusion.

So I guess it's obvious one should design the system as per your quote from Stroustrup, and if exceptions can be utilised, use an exception object derived from std::exception to communicate/handle problems.

As per always thanks for your expert knowledge and effort in replying.

Regards :+)

I do defer to JLBorges superior knowledge, and I have to admit I've not studied the theoretical writings of Stroustrup and others at Bell Labs who developed C++ (that's why I appreciated the info JLBorges provided), but I don't see how this as a simplistic model example could fail or increase program complexity in actual application...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
// Demo31.cpp
// cl Demo31.cpp /O1 /Os /GS- TCLib.lib kernel32.lib user32.lib
// cl Demo31.cpp /O1 /Os /MT
// 56,320 bytes MS Windows 7, x64, LIBCMT, VC15
//  3,584 bytes MS Windows 7, x64, TCLib,  VC15
#define TCLib
#ifdef TCLib
   #include <windows.h>
   #include "stdio.h"
#else   
   #include <string.h>
   #include <cstdio>
#endif   
#define MAX_BUFFER 1024
 
 
class ErrorObject
{
  public:
  ErrorObject()
  {
    this->szError[0]=0;
    this->Errors=0;  
  }
 
  size_t iErrLen()
  {
    return strlen(this->szError);   
  }

  bool AddErrorMessage(char* pMsg)
  {
    if(strlen(pMsg) + iErrLen() < MAX_BUFFER)
    {      
       strcat(this->szError,pMsg);
       return true;
    }
    else
       return false;      
  }
 
  void Output()
  {
    printf("We've Got %u Diagnostic Messages...\n\n",this->Errors);   
    printf("%s",this->szError);   
  }    
 
  public: 
  char szError[MAX_BUFFER]; 
  size_t Errors;
}; 


size_t Foo3(ErrorObject& Err)
{
  Err.AddErrorMessage("S***!  Stuff Has Gone Bad And Destruction Is Here!");
  Err.Errors++;
  return Err.Errors;
}  


size_t Foo2(ErrorObject& Err)
{
  Err.AddErrorMessage("Hmmm.  This, That, And The Other Thing Doesn't Look So Good.  But We'll Try To Proceed!\n");
  Err.Errors++;
  return Foo3(Err);
}    


size_t Foo1(ErrorObject& Err)
{
  Err.AddErrorMessage("Everything Looks OK Here!\n");    
  Err.Errors++;
  return Foo2(Err); 
} 


void DoProcessing(ErrorObject& Err)
{
  if(Foo1(Err))
     printf("\nErrors Were Encountered!\n");
  else
     printf("No Errors Were Encountered!\n");     
}    

 
int main()
{
  ErrorObject Err;

  DoProcessing(Err);
  if(Err.Errors)
     Err.Output();
  getchar();
 
  return 0;
}

#if 0

C:\Code\VStudio\VC++9\Silvah\TCLib>cl Demo31.cpp /O1 /Os /GS- TCLib.lib kernel32.lib user32.lib
Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Demo31.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:Demo31.exe
Demo31.obj
TCLib.lib
kernel32.lib
user32.lib

C:\Code\VStudio\VC++9\Silvah\TCLib>Demo31

Errors Were Encountered!
We've Got 3 Diagnostic Messages...

Everything Looks OK Here!
Hmmm.  This, That, And The Other Thing Doesn't Look So Good.  But We'll Try To Proceed!
S***!  Stuff Has Gone Bad And Destruction Is Here!

#endif 


There are no globals, statics, or dynamic memory allocations involved in the code above. Of course, my ErrorObject class is simply a wrapper around a 1024 byte buffer allocated by the compiler in stack memory in main().

I've never implemented anything exactly like this in any of my programs; I'm just presenting it for discussion purposes. Seems to me its just a workable way of allowing both error codes and error or diagnostic messages to percolate back up to main() through deep recursions into the stack. In the above code we're about four levels deep before we finally back out and error/diagnostic messages accumulated up to that point are output. Not sure if it would scale or not, but my guess is that it would. Its also lightweight. That builds to 3.5 k for me in x64 using my own custom version of the C Std. Lib.
freddie1 wrote:
....... , but I don't see how this as a simplistic model example could fail or increase program complexity in actual application


As JLBorges mentioned, this scenario seems to do the job of reporting errors, but what about otherwise actually handling the errors? You have the DoProcessing function, but the code doesn't mention an actual object, apart from the Error object.

Could your system have termination semantics like exceptions do?

I am wondering why one would go to the expense of inventing something new, when all the required facilities already exist with exceptions?

Regards :+)
What we found many years ago was that it's futile to check for out-of-memory conditions. Either you'll forget to check somewhere, or, your recovery code will won't work properly, or, even if the recovery code's design is valid, it will contain a bug. Most bugs are in a program's error handling logic, probably because that's the code that's hardest to test.

Instead, we check for low memory conditions are strategic points in the programs and deal with it there. Generally this means restarting the process.

So the problem of having an error object that itself might fail when allocating memory has never been a problem in our system and the diagnostics that it provides are highly valuable.

If you do need to handle out-of-memory conditions, the best way that I've seen was in the Borland libraries way back in the day. They would allocate a small buffer, about 4k I think, when the program started. If they ran out of memory, they'd free the buffer, thus providing 4k that the recovery code could use.

As for the value of text messages in error recovery, JLBorges is completely correct: they are nearly useless. What this really means is that you need both text and program-checkable data in the error object. Our error object contains:
- A human-readable string for the error
- A human readable string for the context where it occurred.
- A numeric value for the library or subsystem where it occurred
- A subsystem-specific error code
- A flag that says whether the error might succeed if retried.

The flag is useful because our application is highly distributed so a network glitch results in an error that might succeed if retried.

Regarding how to pass the error object, freddie1 mentioned passing it as a parameter. We also tried adding an error code to objects. What we found was that it was easier to use a thread-specific global. This is the approach taken with the C library and errno. Passing an error object as a parameter gets tedious, and if you have the error in an object, then you're assuming that the object will exist and be in scope at the point where you can do something about the problem. That turns out to be unlikely.
dhayden wrote:
What we found many years ago was that it's futile to check for out-of-memory conditions.

Different people - different stories. I found many years ago that C++ is the only language where out-of-memory conditions can be handled well.

Either you'll forget to check somewhere

If you forget to catch bad_alloc, you get a termination, which may not be what I like to see, but is still actionable response to an error condition.

Most bugs are in a program's error handling logic, probably because that's the code that's hardest to test.

That is a good point. Error handling tests in general and allocation failure tests in particular are very important, and only core libraries (like Intel TBB), some databases, and some other high-reliability software actually do allocation failure tests. But "we can't be bothered writing a test for this" does not justify unreliable software. Especially if we're talking about big projects, which is what this thread started off about.
Last edited on
So every ones comments are helpful I think.

When i think of a large project i'm thinking 20+ programmers, peopls lifes work. one day the program crashes, or gives the wrong answer or the computer reboots.

How are you going to tell your boss, this is the problem and it will take 8 hours to fix ?

Building in logs and a way to trigger a crash dump if you hit a certain condition is critical to finding the problem quickly. Otherwise your going to be guessing and in my opinion looking for a new job soon...

As far as how, every project / programmer is different. The information you want a user to see will be different. If your programming for a bank, you might dump the data in hex and convert it when you get the logs. If it's a mom and pop shop, you can print it so its' in readable format.

Say your building your own project and you want some thing built in to help you debug later, then put that in your flowchart before you ever write a line of code.

Last edited on
SamuelAdams wrote:
When i think of a large project i'm thinking 20+ programmers

When I think of a large project, I'm thinking 2000+ programmers. 20 is just a couple teams.

one day the program crashes, or gives the wrong answer or the computer reboots. How are you going to tell your boss, this is the problem and it will take 8 hours to fix ?

Large projects that really care about error handling (like a power grid or a stock exchange or an airplane) are designed with those possibilities in mind: health checks, load shedding, hot redundant copies, etc. Errors don't interrupt service (and here in post-Sandy New York, even natural disasters won't interrupt many services). In the morning the programmer comes to work, looks at the reported failures, and investigates.

Building in logs and a way to trigger a crash dump if you hit a certain condition

Good logs are everything, especially event logs that can be replayed later. In some cases you can't just abort and dump core without rolling back transactions, but yes, core dumps, minidumps, stacks, crash fingerprints, etc are some of the things that have to be made available for post-mortem investigations.
Topic archived. No new replies allowed.