C: recursive merge sort for a linked list

I apologize for the upcoming wall of code.

The merge sort for a linked list has to be of return type void and must take as its parameter the header to a linked list containing a head pointer and tail pointer (intialized to NULL if there are zero nodes).

Otherwise, the supplementary functions merge and length can be of any return type.

Right now, it seems to get down to the base case of one node, then merges into a list of 2 nodes correctly but then moving up the recursive chain, reverses itself. For example, let's say the base case is integers 5 and 6 (in a list containing nodes whose values are integers 0 to 8). It will merge 5 and 6 into {5,6} correctly but then merge is called again almost immediately after (because it's recursive I think), and then it enters the merge function as {6,5}.

Here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
#include <stdio.h>
#include <stdlib.h>
#include "list.h"
#include <assert.h>

//swap values rather than making sublists

//left and right are name of unsorted sublists from original list 
//left = 0 to mid, right= mid+1 to last
void merge(list_t* left, list_t* right)
{
    printf("in merge\n");
    //list_t* result = list_create();
    //assert(result);
    
    element_t* curr1 = left->head;
    element_t* curr2 = right->head;
    printf("left and right lists to merge: \n");
    list_print(left);
    list_print(right);
    int temp =0;
    //sublist A will be larger, Sub B is smaller
    while(curr1!= NULL && curr2!= NULL)
    {   printf("in merge while compare\n");
        if(curr1->val <= curr2->val)
        {
            printf("left val < right val\n");
            //swap values
            temp=curr1->val;
            curr1->val=curr2->val;
            curr2->val=temp;
            curr1 = curr1->next;
            curr2 = curr2->next;
        }
        
        else//curr1->val > curr2->val
        {
            //printf("left val > right val\n");
          	curr1 = curr1->next;
            curr2 = curr2->next;
        }
    }
    //Concantenate sublists
    right->tail->next = left->head;
    right->tail = left->tail;

    list_print(right);

   	list_destroy(left);
}

/*
Given a linked list head pointer, compute
and return the number of nodes in the list.
*/
int Length(list_t* list)//node 1 is 1 
{   
    printf("in length\n");
    element_t* current = list->head;
    int count = 0;
    while (current != NULL) 
    {
        printf("in length while\n");
        count++;
        current = current->next;
    }
    printf("length is %d\n", count);
    return count;
}

void list_sort(list_t* list)
{
printf("in sort 1\n");
    
    //base case of 0 or 1 element
    if(list->head ==NULL || list->head->next == NULL)
    {
    	return;
    }
   
    list_t* sublistA = list_create();
    assert(sublistA);
    list_t* sublistB = list_create();
    assert(sublistB);

    int len = Length(list);
	int mid = (len)/2;
	printf("mid is %d\n", mid);
    int i=0;
    element_t* current = list->head;
    //make sublist A
    while(i < mid)
    {
        printf("making sublistA\n");
        list_append(sublistA, current->val);
        //sublistA->head = current;
        i++;
        current = current->next;
    }
    list_print(sublistA);
    //make sublistB
    element_t* current2 = current;
    printf("current2->val is %d\n", current2->val);
   // int j= i+1;
    while(current2!=NULL)//j<length)
    {
        printf("making sublistB\n");
        list_append(sublistB, current2->val);
        
        current2 = current2->next;
    }
    list_print(sublistB);
    printf("going to sort A\n");
	list_sort(sublistA);
    printf("going to sort B\n");
    list_sort(sublistB);
    
    merge(sublistA, sublistB);

    //list_destroy(sublistA);
    //list_destroy(sublistB);
}




Here is a sample of the output on the terminal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
in sort 1
in merge
left and right lists to merge:
{ 8, }
{ 7, }
in merge while 1
in merge else 1
{ 7, }
final result
{ 7, 8, }//this is correct
//going back up the recursive chain here?
in merge
left and right lists to merge:
{ 9, }
{ 8, 7, }//what happened??? 


It merges the base case correctly but then it seems like going back up the recursive chain, it is getting reversed.
Last edited on
Using pointers on lists and mixing them (sort) can sometimes lead your code to suffer from high complexity will eventually can lead to bugs and errors. There tools who help to deal with those as checkmarx but it is recommended to make sure you code correctly and detect mistakes as you code to avoid a big time waste.
Good luck.
The structure of your sort is off: sort() and merge() should be mutually recursive:

1
2
3
4
5
6
7
8
9
sort(xs):
  split xs into lefts, rights
  merge(lefts,rights)

merge(lefts,rights):
  if (length of lefts > 1) sort(lefts)
  if (length of rights > 1) sort(rights)
  for (left:lefts, right:rights) 
    append->result(min(left,right))

This is very simplified pseudocode, You'll have to think hardest about how to handle that merge loop.

An important consideration is the way you manage the list. Right now you are iterating over every element of the list to move it (or worse, copy it) to another list. Don't do that; Just cut and paste nodes.

For example, code that splits the list should look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
  int N, n, temp;
  element_t* node = list->head;

  N = list_length( list );

  // Don't sort an unsortable list
  if (N < 2) return;

  // just swap the two elements if necessary
  if (N == 2)
  {
    if (list->head->val > list->tail->val)
    {
      temp = list->head->val;
      list->head->val = list->tail->val;
      list->tail->val = temp;
    }
    return
  }

  // Now for the real work
  n = N / 2;

  // node <-- node PRIOR to the last n or n-1 elements
  while (n--) node = node->next;

  // Split the list.
  list_t* left  = list_create();
  list_t* right = list_create();

  // splice out the right side
  right->head = node->next;
  right->tail = list->tail;
  right->length = N - n;  // (see below)

  // splice out the left side
  left->head = list->head;
  left->tail = node;
  left->length = n;  // (see below)

  // finish breaking all the connections from the old list
  node->next = NULL;
  list->head = list->tail = NULL;
  list->length = 0;  // (see below)

  // Now you can merge, making sure to place the result in some good place. 
  // I recommend a third argument to merge() for the result:
  merge( left, right, list );

  // At this point, left and right should again be empty lists, because all their nodes 
  // were moved into 'list'.
  left  = list_destroy( left );
  right = list_destroy( right );


-----

The length function could also be reduced to O(1) complexity. Since you are taking the time to track head and tail of your list, there is no reason why you can't track its length also.

1
2
3
4
  int list_length( list_t* list )
  {
     return list ? list->length : 0;
  }

(I do understand that you might not have the ability to modify the list structure, though. If you can't then just get rid of lines 34, 39, and 44.)

-----

A quick primer on complexity.

The point of a merge sort is to do things efficiently, which in the case of sorting, means touching each element of the list as few times as possible. That is why you should:
  1. not copy elements one-by-one from the old to temporary lists.
  2. avoid a length function that has to count elements.
The only counting complexity you should have to put up with is simply finding the middle element in the list. Beware fencepost errors.

-----

Finally, a technical detail. POSIX reserves names ending in "_t", meaning that the "list_t" and "element_t" are in potential conflict with stuff in the OS API, which is bad.

Hope this helps.
Topic archived. No new replies allowed.