Commenting Scheme on Large Projects

Pages: 12
One of the things I've been reading a lot lately is how a lot of people fail to comment code on large projects, not always pertaining to C++ programmers, and their replacements have no clue what a function does. This makes troubleshooting a pain, but so is writing a bunch of comments for functions if you don't have a set style of writing them. Since reading all of these articles, I've been getting a bug up my butt about commenting everything, but the first thing I realized was I have no standard way of commenting and that is the best place to start.

I have looked at other's code and their comments and some have made me go WTF? while others have impressed me. My question is, what styles do you use on large projects so that you, or your replacement, will know exactly what each function does? I've grown to like the style of the Doxygen comments, but requires a little more typing, but if you have the Doxygen plugin installed, it can convert your comments in to a beautiful HTML document.
I do Doxygen comments, but only in header files. Source files should have as few comments as posisble.
I don't know that I agree with source files having as few as possible, mainly since you could create a fairly complex function. Granted, properly separating functions into external source files would be ideal, this isn't always the case, nor plausible. I ask mainly because I'm in the middle of recreating the STL list container, and would like to find an effective way to comment my code. It's all in one file, at the moment, to allow quicker editing of the source and allow me to change what I need when I want it.

In the meantime, while I'm adding features, making stuff faster, I was thinking up of a way to comment it correctly, and eventually going back to comment my other large files I have (I say large due to the number of small programs I have, my largest file doesn't exceed 1000 lines, however). Here is what my list currently looks like: http://pastebin.com/jCgsky0Q

As you can guess, based off of the STL list, you already know how the functions should act, ect. But it's not always going to be so apparent to others, especially someone who is new, or has never used the STL list. I also want to understand why I did what I did, when I did it, so commenting only seems right. Also, I want to understand why I picked X algorithm over Y, or why I didn't attempt to write simpler code, i.e. because it's faster writing several extra lines as opposed to writing just two.

Mainly, the documentation should be for my own use, but I have been coming up with a thought about creating a blog style website that allows me to share my code, the generated documentation, and also have updates on what I'm currently working on. This would be ideal for sharing with a potential employer instead of bringing in my laptop, or worse case scenario, printing out 100's of pages to share my code.

As of now, looking at the code I just shared, the declaration of the list and all of it's private and public members, the comments wouldn't make sense to go there, so I am going to attempt to place them at each function definition. I also plan to do the same with my other header files. What thoughts do you have on this, along with thoughts on the overall commenting scheme, which is obviously non existent in my code?
Not all documentation needs to be in comment form. You could have a personal wiki where you record some of your design decisions and the whys behind those. Comments should strictly be there to describe the use of functions, with expected arguments and return values. Design choices should be left to some external documentation.
closed account (iw0XoG1T)
My question is, what styles do you use on large projects so that you, or your replacement, will know exactly what each function does?


I am careful to keep my functions short 7 to 9 lines. And I show my work regularly to others I work with and ask, "can you tell what I am doing here?" (and they don't have to be a programmer)

When I look at the documentation in someone else's code, what I look for is who are they, and when did they do it. If you know the person or someone else knew them talking to them or the person that knew them helps.

Comments have never been as useful as clear names and short functions that do only one thing. And too much commenting definitely makes it worse.
Last edited on
closed account (3hM2Nwbp)
It'd be very nice if the C++ committee came up with something akin to Javadoc to push. I've found in the past that C++ is one of the worst languages when it comes to documentation. Doxygen is nice and all, but it's not being pushed as the de-facto documentation system for C++.

Comments should strictly be there to describe the use of functions, with expected arguments and return values


and preconditions, postconditions, thread safety info, exceptions, and anything else that is important enough for the end-user to know. So long as your doc tool generates snazzy output, there's never such a thing as too much information.
Documentation comments should be such that I can implement the function myself just by reading the comment and seeing the type signature.
@chwsks
I am careful to keep my functions short 7 to 9 lines.


I can't help thinking that would be overkill in some situations - especially for mathematical type functions. Although I agree with your idea of having functions that do one thing - a better concept I think.

For example, write a function that converts decimal degrees to degrees minutes seconds. Of course one can always split functions even further, but that is a bit pointless in this case. Having a function that is 20 - 30 LOC is not unreasonable. And 80 LOC (approx 1 page) might be a worst case scenario, apart from other valid reasons, just for readability / printability .

closed account (iw0XoG1T)
There are limits to how many variables a human mind can keep track of -- I personally
start to lose track after three (thank god for mnemonic tricks). So the 7 to 9 line rule seems reasonable to me.

Having a function that is 20 - 30 LOC is not unreasonable.

30 is to much for me, something that complicated would take more than a few minutes and multiple readings to grasped.

I did a quick google and found that I am not unusual, see below:
http://www.psychologicalscience.org/media/releases/2005/pr050308.cfm

It maybe overkill but when I look at programs written by others with functions so long that they need comments inside of them just so they could keep track of what they were doing. I feel that I am erring on the correct side of the problem.
Last edited on
I agree with the general rule to keep functions short. I don't know about 9 lines short.... but I guess it depends what you consider a line of code.

Anyway, there are no absolutes. Sometimes you need to have a big function because breaking it up might not be practical. IMO, putting comment blocks to explain portions of the function, in that case, is as good as splitting the code up into actual functions.
I will have to disagree with the comments about you should break up functions if they exceed x amount of lines. One example (albeit, it's not pretty in any regards):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
template<class T>
void vp::list<T>::sort() {
   if (size() <= 1)
      return;
 
   for (int merges = 0, k = 1; merges != 1; k *= 2) {
      node *p = firstNode, *q = p;
 
      for (int qsize = merges = 0; p; merges ++, p = q) {
         for (qsize = 0; qsize < k && q; qsize ++, q = q->next);
 
         while (qsize && q) {
            if (q->value <= p->value) {
               if (q->next) {
                  q = q->next;
                  q->prev->insert_before(p);
               }
               else {
                  q->insert_before(p);
                  q = NULL;
                  lastNode = lastNode->prev;
               }
 
               if (p == firstNode)
                  firstNode = firstNode->prev;
 
               qsize --;
            }
            else
               p = p->next;
         }
      }
   }
}


This was about as short as I could possibly make this function, and if you can see, it still is hard to read. Granted, I could change some variable names, separate some things on to their own line, but this is basically as simple as this algorithm gets. The is another version of the merge sort which relies on a separate split function, and possibly a merge function, which would reduce the number of lines, but then you detract from the overall readability of the program, in my eyes.

I am still in the process of commenting all of my functions (I've only gotten to do the Node methods) and need to rewrite some of them (the above being one) for increased readability and performance issues. However, no matter what your general rule of thumb is, I believe that there is always going to be exceptions. Also, you have to understand that there comes a point where increasing readability can really take away from the performance. It may seem small at first, but running my list through the sort algorithm takes only 2x longer than it does to add 1million items to said list. I'm fairly happy with the performance, however, I would like to improve it, per a rewritten function that cire has presented me.

I'm also looking into the recursive version, which in his example, knocks about 200ms off of the time to run the entire function.

I'm not above putting comments in functions. Heck, before I sat down with this list again, I had a comment for every line, letting my know exactly what each line was doing and why. It makes the code harder to read, for some people, but there is also no doubt as to why everything is the way it is. My new approach is going to be to generate a paragraph or so before each function to explain all of that, the algorithm used, if there is one, the reason for choosing the method I used, and params and return values. I'm also looking into adding post and pre conditions for my code to express when and how a function should be used.

I'm also looking into how I want to structure each of my functions, as well as looking into possible ways to improve overall speed of everything. I've learned the hard way, more recently than before, that less lines doesn't mean faster code. However, there comes a time when you have to decide what's more important. For all intensive purposes, I want to completely optimize this class to get a complete and thorough understanding of why this and not that. I also want to share my code with others to demonstrate that, even though my code has several hundred lines (well over 1000 by the time I'm completely done), it has speed and efficiency in mind. My code will also, in the future, be very easily read and understood, even with the extra lines in it.

I'm interested to see the people's opinion on my function who said x number of lines is the max that should be in a function. What do you believe would make this more efficient, easier to read (aside from replacing the names and breaking down the for loop)? I'm also interested to see where you stand on other seemingly long functions that deal with x algorithm when y algorithm would have been shorter to write, but overall slower.
closed account (iw0XoG1T)
A line of code ends with a ';'.

You are showing a bit of code which 11 lines that is able to be viewed in one screen(80 X 40 ).

7-9 is a rule I use like this: If I go over it I take a serious look at what I have written. What I believe you are saying is this rule would be detrimental for me in this case. (Who am I to argue with you over such a trivial example of more than nine lines? If I would that would make me a first rate fool.)

I would suggest when a rule causes more problems than it solves it be ignored. That doesn't make it a bad rule--what it means is that it is not absolute--and no rule should be absolute.

I do believe you would be hard pressed to find 20 to 30 lines of codes which doesn't fit in one screen (80 X 40) that could not be obviously rewritten to make it easier to read and maintain.
Do you really restrict yourself to 80 columns? Because I think that's a little extreme. It may have made sense 10 years ago when screen resolutions were half of what they are now.
closed account (iw0XoG1T)
Yeah I do -- I am old now and need the bigger font. Also at home I work on a laptop and find it difficult to have more than one window open if I don't restrict myself to 80 (actually I use 72 with 8 columns of wiggle room).

I am surprised that someone who likes pretty print would not limit themselves to 80.
Last edited on
My rule is that it must not be too wide for my laptop (1376 x 768).

You should never do something like shorten variable names to make a line of code shorter; unless it doesn't reduce clarity.

Also never restrict yourself with absolutes. I've written quite a few functions where not commenting in the source code would be very foolish.

I document what a function does and how to use it in the header file, and in the source files I document why I did something the way I did, efficiency trade offs, mathematical formulas which would otherwise look obfuscated by optimizations, I add search tags so that I can find stuff for different purposes (like code I'm not satisfied with and want to rewrite), etc. I keep those comments in the body as close to where they apply as possible.
@iseeplusplus
So then your lines will be too long on any screen smaller than that and on the same screen with a larger font size.
I am surprised that someone who likes pretty print would not limit themselves to 80.


I'm surprised you're surprised. Pretty print is often the reason my lines go beyond 80 (added whitespace to make it pretty).

80 characters is like a third of my screen:

http://i49.tinypic.com/2md29s6.jpg
So then your lines will be too long on any screen smaller than that and on the same screen with a larger font size.


I think it's a reasonable limit. Any smaller and I have to start making names really short.

How in the heck were you able to maintain columns <= 80 while you were using 8 space indentations?
iseeplusplus wrote:
How in the heck were you able to maintain columns <= 80 while you were using 8 space indentations?

By moving nested blocks out into functions. For example, instead of
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void function()
{
        if (condition1) {
                switch (variable) {
                case value:
                        while (condition2) {
                                if (condition3) {
                                        do_something();
                                        do_a_second_thing();
                                }
                                always_do_this();
                        }
                        break;
                }
                default:
                        break;
        }
}

I would write
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static void helper_function()
{
        while (condition1) {
                if (condition2) {
                        do_something();
                        do_a_second_thing();
                }
                always_do_this();
        }
}

void function()
{
        if (condition) {
                switch (variable) {
                case value:
                        helper_function();
                        break;
                default:
                        break;
                }
        }
}


Namespaces and classes made it harder, but I managed. Even now that I've gone back to using 4 spaces for indents, I still move nested blocks into helper functions, it's good because it keeps functions short (< 10 lines being good, and < 5 being ideal).
@chrisname
You prefer the overhead over the extra lines? What about code that shouldn't be broken up? Don't multiple nested for loops perform much faster that adding calls to the stack (I believe functions go on the stack).

I wrote a long post right after C++.com came back online, but my internet cut out (seems to happen to me lately) and I lost my post to the void of C++ servers. Anyways, to sum it up, I am essentially forced to abide by the 80 character width code on my computer since I'm using a netbook, who's highest resolution (which is what I have it at) is only 1024x600. In C::B with project manager open, my 80 character line is only about .5 inch from the edge of my screen in full screen mode. Without project manager (I don't use it much except for wxSmith projects), the line rests about 2/3 of the way across my screen.

Copying and pasting other's code into my environment is brutal when they refuse to have no limits on their line lengths. Thankfully, AStyle helps alleviate the tabbing issues, but can't modify long cout statements or mathematical equations, not to mention comments that seem to go on for miles.

Another reason I prefer the 80 character limit is because when pasting code 80 characters or less on the forum, the div boxes around each post are visible. Let's say someone posts code that's 85 characters wide, I'm forced to have a scroll bar on my web browser (Chrome) which reduces overall visible height and means more scrolling. Now imagine lines 120+ characters wide. I very rarely will look at the code, and when I do, I'm either bored or just refuse to scroll. If there is valuable code off to the right, it's unknown to me.

This is why I had the code formatting thread awhile back and we got onto the topic of indentation there. I prefer the 3 space indentation over anything else, granted tab indentation is fine in C::B since tabs are converted to 3 spaces anyways, but it gets messy on the forums, especially with inconsistent indentations.
Pages: 12