Longest Common Subsequence Problem

Ok, so I was going through the Longest Common Subsequence problem in CLRS, and I understand it for the most part. What I don't understand is why it is so needlessly complicated.

The algorithm uses a normal nested for-loop to go through both sequences, and every time it finds a matching element, it does a bunch of crap with tables and arrows. Why not just add this matching element to a vector, thus ending up with a vector containing the subsequence in order?

Whats the point of making two extra tables and wasting so much space?
THe brute-force version (for two sequences) is exceedingly simple:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
std::string head( std::string str ) // invariant: !str.empty()
{
    str.pop_back() ;
    return str ;
}

// algorithm: https://en.wikipedia.org/wiki/Longest_common_subsequence_problem#LCS_function_defined
std::string lcs( std::string a, std::string b )
{
    if( a.empty() || b.empty() ) return {} ;

    std::string ah = head(a) ; 
    std::string bh = head(b) ;

    if( a.back() == b.back() ) return lcs( ah, bh ) + a.back() ;

    std::string first = lcs( a, bh ) ;
    std::string second = lcs( b, ah ) ;
    return first.size() > second.size() ? first : second ;
}
I'll have to look at the code later, but my question still stands: Why does the algorithm presented in CLRS needlessly waste so much space and is so complicated when the solution I presented in the OP works just as well?
when the solution I presented in the OP works just as well?

you did not present any solution in the OP, just some thoughts on an alternative approach. code it and then we can compare
Thought my explanation was clear enough, but okay, here is the pseudocode below
1
2
3
4
5
6
7
8
9
10
11
12
int LCS_LENGTH(X, Y)
{
    int m = X.length;
    int n = Y.length;
    
    for(i = 1 to m )
        for(j = 1 to n)
            if(X[i] == Y[j])
                push X[i] into a vector
    
    return vector.size();
}


In the book, once they are inside the inner for-loop, they start adding stuff to the two auxillary tables (i.e. matrices) they've made in the function. What im saying is skip all that, and simply add the matching element to the vector.
Last edited on
You may well be right but as it stands the pseudo-code might need a bit more polishing (or at least my interpretation of it):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <iostream>
#include <string>
#include <vector>

size_t LCS_LENGTH(const std::string& X, const std::string& Y)
{
    auto m = X.size();
    auto n = Y.size();
    std::vector<char> common{};

    for(auto i = 0; i < m; ++m)
    {
        for (auto j = 0; j < n; ++j)
        {
            if(X[i] == Y[j])
            {
                common.push_back(X[i]);
            }
        }
    }
    return common.size();
}

int main()
{
    std::string X = "ABCDGH";
    std::string Y = "AEDFHR";
    std::cout << LCS_LENGTH(X, Y);
}
^ thats not pseudocode you wrote, thats actual code. I only presented the pseudocode in the style of CLRS, there was nothing to polish really.
OK why don't you post your super-efficient code on the lines of your pseudocode and that'll put the matter to rest
My pseudocode is exactly what your real code is. Thats exactly my point. Its simple and straightforward code, but the book uses all this extra space by using not one, but TWO 2-dimensional arrays to store positions and arrows.
that means you haven't tried running the program do that and come back here ... (hint: my program which, as you say, is based on your psuedocode doesn't work)
Last edited on
Have you walked through your algorithm at least once?
Had you done so, you would have realised instantly that it is broken.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <iostream>
#include <string>
#include <vector>
#include <iomanip>

std::string head( std::string str ) // invariant: !str.empty()
{
    str.pop_back() ;
    return str ;
}

// algorithm: https://en.wikipedia.org/wiki/Longest_common_subsequence_problem#LCS_function_defined
std::string lcs( std::string a, std::string b )
{
    if( a.empty() || b.empty() ) return {} ;

    std::string ah = head(a) ;
    std::string bh = head(b) ;

    if( a.back() == b.back() ) return lcs( ah, bh ) + a.back() ;

    std::string first = lcs( a, bh ) ;
    std::string second = lcs( b, ah ) ;
    return first.size() > second.size() ? first : second ;
}

size_t lcs_length(const std::string& X, const std::string& Y)
{
    auto m = X.size();
    auto n = Y.size();
    std::vector<char> common{};

    for( std::size_t i = 0; i < m; ++i )
    {
        for ( std::size_t j = 0; j < n; ++j )
        {
            if(X[i] == Y[j])
            {
                common.push_back(X[i]);
            }
        }
    }

    return common.size();
}

int main()
{
    std::string x = "h-e-l-l-l-l-l-l-o";
    std::string y = "h.e.l.l.l.l.l.l.o";
    std::cout << "lcs: " << std::quoted( lcs(x,y) )
              << " (length:" << lcs(x,y).size() << ")\n"
              // lcs has: 1 (h) + 1 (e) + 6 (l) + 1 (o) "hellllllo" length == 9

              << "lcs_length yielded: " << lcs_length(x,y) << '\n' ;
              // lcs_length's computation is this : 1 (h) + 1 (e) + 36 (l) + 1 (o) == 39
}

http://coliru.stacked-crooked.com/a/f7f4997b37d7d61b
Have you walked through your algorithm at least once?
Had you done so, you would have realised instantly that it is broken.

indeed that's the point i'm making ad infinitum to OP but s/he doesn't seem to get it
Oh snap! Didn't even realize that. Okay, I was wrong, I should have coded and ran my algorithm first.

@JLBorges: It seems there are 2 algorithms for this problem. You presented the "naive" recursive approach that I was unfamiliar with. The one with tables and arrows uses dynamic programming, and I get now why that is necessary.

Thank you both and apologies for not running my program before posting here.

Topic archived. No new replies allowed.