How to Write Short Clean Code?

I've run into a big problem.

I was told to write a function, strrep(), which replaces a substring with another string. I made quite a few slower versions of this function, but I finally made a version that is fast. When I wrote the pseudo code the function looked very good.

Pseudo code:

strrep (string, word, replacement)

word->size = replacement->size                                                                                               
overwrite word with replacement                                                                                              
                                                                                                                             
word->size > replacement->size                                                                                               
search for word (store end of word in m0)                                                                                    
loop:                                                                                                                        
exit if not found                                                                                                            
overwrite with replacement                                                                                                   
search for next word                                                                                                         
if found, store end of word in m1, otherwise store end of string in m1                                                       
copy memory from m0 to m1 and overwrite it to end of replacement                                                             
m0 = m1                                                                                                                      
go to loop                                                                                                                   
                                                                                                                             
word->size < replacement->size                                                                                               
difference   = replacement->size - word->size                                                                                
displacement = no. of instances of word in string multiplied by difference
...
...


Looks neat? Wait for it... When I wrote the code, everything seemed fine except for one thing. It looks TERRIBLE. My Tutor had a really hard time understanding it. I've added loads of comments to make things easier now.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#define string_count		strcnt
#define string_find		strstr
#define string_shift_left	strmovl
#define string_shift_right	strmovr
#define string_write		strwrt

char *strrep_test(char *hstr, char *substring, char *replacement, size_t string_size)
{
	int substr_length  = strlen(substring);
	int repstr_length  = strlen(replacement);
	int len   = strlen(hstr);
	char *string = hstr;
	int diff;

	if (substr_length == repstr_length) {
		while (string = string_find(string, substring))
			string = string_write(string, replacement);
	} else if (substr_length > repstr_length) {
		char *start, *end; /* memory location #0 and #1 */

		end = string = string_find(hstr, substring);
		start = string+substr_length;

		while (string) {
			diff = start - string_write(end, replacement); /* overwrite substring with replacement string */
			string = string_find(start, substring); /* find substring */
			end = string? string: hstr+len;
			string_shift_left(start, end, diff); /* move memory chunk from start to end leftwards */
			start = end+substr_length; /* next pick position */
			end -= diff;   /* next write position */
		}
	} else { /* repstr_length > substr_length */
		char **substr_instances; /* saved locations of substring */
		int n; /* no. of instances of substring */
		int displacement; /* displacement value */
		int errval = errno;

		n = string_count(hstr, substring, &substr_instances);
		if (!substr_instances) {
			errno = errval; /* preserve userspace (to us, no error occured) */
			strrep_old(hstr, substring, replacement, string_size); /* <- slower version */
			return hstr;
		}

		diff = repstr_length - substr_length;
		displacement = diff * n;
		string = hstr + len;
		if  (displacement+len > string_size) { /* should we set errno = EOVERFLOW? */
			free(substr_instances);
			return NULL;
		}

		while (n-- > 0) {
			string_shift_right(substr_instances[n]+substr_length, string, displacement);
			string = substr_instances[n]-1;
			displacement -= diff;
			string_write(substr_instances[n]+displacement, replacement);
		}
		free(substr_instances);
	}

	return hstr;
}


Just to give you an idea, this is how the function works on a line that says:
strrep_test("eight two eight two eight", "eight", "one"); /* replace eight with one */
output:

0->[oneht two eight two eight]
       ^
1->[one two **eight two eight]
       ^--<--^       
0->[one two oneight two eight]
              ^
1->[one two one two ****eight]
               ^--^<---^
0->[one two one two one*eight]
                      ^
1->[one two one two one******]
[DONE]


It wors but, the function is now very long, and looks ugly. Maybe I'm doing it the wrong way?
Any advice on how to make this ugly code into something beautiful?
I really want to make this thing shorter. I think this has turned out to be the longest function I've written.

How do you guys write code that is both short and clean? That others can understand?

Edit: I edited this function somewhat. Does look neater now? But it's still hard to understand for someone else right?
Last edited on
I can't read that. I would say your variable names are too abstract. you might loose time in writing a long variable name, but you'll gain it back in clarity.

As far as simpler code, try to write more generic functions. For example, this whole thing can be done with three functions, find(), erase() and insert()
1
2
3
4
5
6
7
int found = find(myString, whatImLookingFor);
while (found != -1)
{
  erase(myString, found, lengthOfWhatImLookingFor);
  insert(myString, found, myReplacement);
  found = find(myString, whatImLookingFor);
}


You just need to write those functions (or I see you're using strlen, perhaps you can use other lib functions).
I agree with LowestOne. It's next to impossible to read. Needs to broken up into simpler functions.

Some other comments.
1) I see calls to functions you haven't included (strwrt, strmov1, strcnt, strmovr). No way to tell what they do, so therefore it's impossible to tell what the provided code is doing.
2) I see a call to free, but no matching call to malloc. Can't tell where you're allocating anything.
3) Why are you calling strrep_old? Isn't this function a replacement for it?
4) It's easy to confuse your function names (see #1) with the standard C library routines. You should have a different naming convention. Don't be afraid to make your function names meaningful to the reader.


Thanks for the advice guys. I'll expand the variable names and such.

AbstractAnon, strcnt() (string count) counts all the instances of substr and saves all found locations in svdlocs. strcnt() allocates enough memory to store any number of instances found - infinite if we have enough memory. On line 33, we just check if allocation failed. If it did, then we call strrep_old because it doesn't allocate any memory from heap. Yeah, it's a mess.

You're right, strwrt (string write), strmovl (string move left), all seem to similar to C library functions like strcpy (string copy). It made me more comfortable working with them.

I'm working on fixing it. It looks worse if I expand the var names LOL. I'll post the new fixed code once I'm back. Thanks guys.

Major Edit:
I just fixed the variable names. I've edited the original post.
Does it look all right?

Any thought on how I should divide this function?
I thought about making one for handling word->size < replacement->size
and another for replacement->size > word->size but somehow it doesn't feel right.

For those who are curious, here is the implementation of strmovl, strmovr, strcnt and strwrt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/* str-move-left: move sequence (start to end) leftwards by len bytes. */
void strmovl(char *start, char *end, int len)
{
  do
    *(start-len) = *start;
  while (start++ != end);
}

void strmovr(char *start, char *end, int len)
{
  do
    *(end+len) = *end;
  while (start != end--);
}

/* strcnt: count occurences of @substr in @string */
int strcnt(char *string, char *substr, char ***memblk)
{
	size_t cnt = 0, len = strlen(substr), sz = 8;
	char *p = string;
	char **svdlocs;
	int save = memblk != NULL;
	
	if (save) {
		*memblk = svdlocs = calloc(sizeof(char *), sz);
		if (!svdlocs)
			save = 0;

	}
	while ((p = strstr(p, substr)))
	{
		if (save) {
			svdlocs[cnt] = p;
			if (cnt >= sz) {
				sz *= 2;
				if (!realloc(svdlocs, sizeof(char *) * sz)) {
					save = 0;
					*memblk = NULL;
				}
			}
		}
		p += len;
		++cnt;
	}
	return cnt;
}


strwrt() is just strcpy() but doesn't append '\0' at the end. Kind of like strncpy(str, otherstr, strlen(otherstr));
Last edited on
LowestOne:

I tried that, but it gets complicated if substr->size is not equal to repstr->size.

I can easily stretch or contract the string, but to do it the fastest way I'll have to do it the way I've shown in the OP

The function needs to:

if slen > rlen
loop:
find(substr)
replace(substr, otherstring)
end = find(substr) /* again */
if (end == 0) end = end of string;
shift_left(substr.location+substring.length, end);
goto loop


Basically this is what I'm doing with:

1
2
3
4
5
6
diff = start - string_write(end, replacement); /* overwrite substring with replacement string */
			string = string_find(start, substring); /* find substring */
			end = string? string: hstr+len;
			string_shift_left(start, end, diff); /* move memory chunk from start to end leftwards */
			start = end+substr_length; /* next pick position */
			end -= diff;   /* next write position */


The extra boilerplate is just calculating the difference between the first instance of @substr and the other instance, and making @start and @end ready for next loop iterations.

Of course, I need a different approach entirely for replacing string if replacement size is greater than substring size to make it fastest as possible. For that, basically, we search backwards... or rather, write backwards while extending the string from the end, to make sure we don't overwrite unnecessarily.

Yeah, this is where it's getting long, 3 stuff to do for 3 different conditions. Maybe I'll really divide this thing into 3 functions.

----------------------------------------------------------------------------------------
I did come up with a cleaner version but it's slower than the ugly version.
This is the other "shorter" version:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
char *
strrep(string, substr, repstr)
char *string, substr, repstr;
{
         char *hstr = string;
        int rlen = strlen(repstr);
        int slen = strlen(substr);
        char *htmp;
        register char *tmp;

        htmp = tmp = malloc((abs(rlen - slen)*strcnt(string,substr,NULL))+strlen(string));
        if (!tmp)
                return NULL;
        while ((pos = strpos(string, substr)) != -1)
        {
                strncpy(tmp, string, pos);
                tmp += pos;
                strcpy(tmp, repstr);
                tmp += rlen;
                string += pos + slen;
        }
        strcpy(tmp, string);
        strcpy(hstr, htmp);
        free(htmp);
        return hstr;
}


Okay, this is much much shorter and that does exactly the same job but is slower and allocates far too much memory.

Should I switch the fast but long function with this slow but simple function?
Last edited on
Some comments on you variable naming. Just my opinion here.
1) _len is a common suffix for length variables. No need to write out _length.
2) str_ would have been a suitable prefix for your string routines. Shorter that string_. string_ also tends to imply compatability with or extension to std::string which your routines are not. It's also shorter.
3) Line 12. char * string; Although you're not using std::string, this does imply a level of confusion with std::string.

strwrt() is just strcpy() but doesn't append '\0'

Why not use memcpy? Probably faster than writing your own.

Should I switch the fast but long function with this slow but simple function?

This is the kind of tradeoff that professional programmers have to make all the time. It comes down to how your code is going to be used. If you're writing something that is going to be used in a text processor that has to repeatedly rip through thouands of pages of documents, or performance is part of the grading of homework assignment, then yes, performance matters. Otherwise, I tend to favor clarity over performance.

p.s. I wish you'd posted your updated code separately rather than editing the OP. I wasn't able to compare the two.
Thanks for the tip on variable names and memcpy(). I thought it would take too much space if I posted the code all over again.

I was writing this function just for exercise but lol I made it so complicated just because I wanted it to be fast. Ah well. I'll figure something out to fix this bad boy.
In the code in this latest post, line 23 runs the potential of being out of bounds.

How fast/slow are we talking about? Do you have a file that you could post somewhere?


I think this may be faster for when the replacement string is less than the length of the string to be replaced. There is no "move left" function, the intent is that it does the replacing while doing the move left. Not sure if it works.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
char* insertPtr = str;
char* findPtr = str;

while (*findPtr)
{
  if (*findPtr == *subStr && stringsMatch(findPtr, subStr))
  {
    insert(insertPtr, repStr);
    insertPtr += repStr_len;
    findPtr += subStr_len;
  }
  else
  {
    *insertPtr++ = *findPtr++;
  }
}
*insertPtr = '\0';


If this works, then I imagine it works for when they are the same length, no need for special case there.

When the other string is shorter than the replacement string, then yes, the program will take longer. The resulting string will be longer, so first you have to count the amount of replacements you will be doing.

In this case, you do everything backwards. InsertPtr equals the end of the allocated memory, findPtr is the null terminator of the string, and decriment pointers.
Topic archived. No new replies allowed.