Bug hunt, need another set of eyes.

The following code is meat for an online judge. When submitted, it results in a segmentation fault prior to any output being generated. Having tinkered around with the judge, I know that the last 3 of the string pairs that are processed in this code result in a segmentation fault when submitted to the judge.

I cannot, however, reproduce the issue on my end or pin down the logic error, so I'm looking for someone with better (or at least fresher eyes) to help me spot what I'm doing wrong.

TIA.

The code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
#include <string>
#include <iostream>
#include <fstream>
#include <climits>
#include <array>
#include <vector>

using iter_type = std::string::const_iterator;

iter_type find_pattern_in(iter_type txt_beg, iter_type txt_end, 
                          iter_type pat_beg, iter_type pat_end)
{
    const unsigned txt_len = txt_end - txt_beg;
    const unsigned pat_len = pat_end - pat_beg;

    // boyer-moore-horspool
    if (txt_len >= pat_len)
    {
        std::array<int, UCHAR_MAX>  shift;
        shift.fill(-1);

        for (unsigned i = 0; i < pat_len - 1; ++i)
            shift[static_cast<unsigned char>(pat_beg[i])] = i;

        unsigned i = 0;
        unsigned m = pat_len;

        while (i <= txt_len - pat_len)
        {
            int j = pat_len - 1;

            while (j >= 0 && pat_beg[j] == txt_beg[i + j])
                --j;

            if (j < 0)
                return txt_beg + i;

            i += pat_len - 1;
            i -= shift[static_cast<unsigned char>(txt_beg[i])];
        }
    }

    return txt_end;
}

iter_type find_pattern(const std::string& text, std::string pattern)
{
    // * in the pattern indicates a match of 0 to any number of characters.
    // \* indicates a literal match for the asterisk.

    std::vector<std::string> pats;  // list of literal patterns which must occur in
                                    // text, consecutively and with no overlap

    {
        unsigned last_pos = 0;
        unsigned pos = 0;
        while ((pos = pattern.find('*', pos)) != std::string::npos)
        {
            if (pos == 0 || pattern[pos - 1] == '\\')
                pattern.erase(pos ? pos - 1 : pos, 1);
            else
            {
                pats.emplace_back(pattern.substr(last_pos, pos - last_pos));
                last_pos = ++pos;
            }
        }

        if (last_pos < pattern.length())
            pats.emplace_back(pattern.substr(last_pos));
    }

    iter_type last_loc = text.end();
    iter_type loc = text.begin();
    for (auto& pat : pats)
    {
        last_loc = loc = find_pattern_in(loc, text.end(), pat.begin(), pat.end());
        if (loc == text.end())
            return text.end();

        loc += pat.length();
    }

    return last_loc;
}

#include <sstream>
std::istringstream in(  // text,pattern
R"(Hello,ell
This is good, is
Mothra,M*ra
Old,Young
bY T,Y T
lnHLp5kGMHeYsoA7NXFrL8Ij Su,S
f3SFw5trxxZIaKec,wa3fZceFxS
)"
);

int main(int argc, char* argv[])
{
    // std::ifstream in(argv[1]);

    std::string text, pattern;

    while (std::getline(in, text, ',') && std::getline(in, pattern))
        std::cout << (text.end() != find_pattern(text, pattern) ? "true\n" : "false\n");
}


http://ideone.com/lWCUrs
1
2
3
unsigned pos = 0;
//...
(pos = pattern.find('*', pos)) != std::string::npos
npos is often implemented as maximum value of string::size_type, which is unsigned long long on 64bit machines.
unsigned is often 32 bit in size
When you are asigning result of search to unsigned int it gets truncated and subsequent comparsion for inequality returns true even if element is not found. Then you are trying to access element UINT_MAX
Last edited on
Thanks for the assist, MiiNiPaa. That was indeed the (most frustrating) issue. ;)
Curious about what compiler ideone.com uses since your code executes.

On my Debian box using g++-4-8 or g++-4.9 it compiles without complaint but segfaults.

clang++ v3.4.1 yields a warning but segfaults also.
main.cpp:58:47: warning: comparison of constant 18446744073709551615 with expression of type
'unsigned int' is always true [-Wtautological-constant-out-of-range-compare]
while ((pos = pattern.find('*', pos)) != std::string::npos)
yields a warning
That depends on your compiler settings. I found the isssue by setting max warning levels (and having similar issues before)

Segfault vs not depends on std::string::size_type. if its size equals to unsigned size (common on 32bit platform) then everything will probably work. Otherwise it segfaults.
Thanks MiiNiPaa!
Registered users can post here. Sign in or register to post.