Breadth First Search - marking problem

I'm not really having a coding issue or compiling problem . I am having more or less a design choice. Rather I want to see if there are more (or better) ways to do it than I thought of.

I have a data structure of a directed graph, represented by a kind of adjacency list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
template < typename T >
class Location
{
  public:
    T                            m_name;
    LinkedList<Location<T>* >    m_neighbors;

    Location(T l):m_name(l) {}
    Location(){}
    ~Location();
};

template < typename T >
class Graph
{
  public:
    string                     m_name;
    LinkedList<Location<T> >   m_graph;

    Graph(const string& s = ""):m_name(s) {}
    ~Graph();
};


Here is the pseudo-code I found on wikipedia for Breadth-First search.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1  procedure BFS(G,v):
2      create a queue Q
3      enqueue v onto Q
4      mark v
5      while Q is not empty:
6          t ← Q.dequeue()
7          if t is what we are looking for:
8              return t
9          for all edges e in G.adjacentEdges(t) do
10             u ← G.adjacentVertex(t,e)
11             if u is not marked:
12                  mark u
13                  enqueue u onto Q
14     return none

Each location has a re-sizable list of location pointers. So I guess locations are the node class. The Graph object is just helps to encapsulate and provide functionality.

My problem is that with breadth first or even the depth first search algorithm require that a location/node/whatever be marked if it hasn't been marked. I understand by marking each location, the code won't potentially infinity loop.

My first thought was put a bool in every location object. But then after the Breadth first search was over, I would have to make a function that sets each bool in each location to false.

Another thought was to put a check before pushing locations or locations pointers on the queue. Search the queue to see if the location is there or not. if it is not, then simply push that location on the list. But this could hurt speed performance, if there is a lot of data so it makes me feel a little uneasy.

The whole reason I tried the whole adjacency list with pointers is because I wanted a re-sizable, memory efficient data structure. Adjacency matrices are easy, and very speed efficient but there value in speed quickly down when trying to map data that is not inherently numeric in nature.

If you have a graph of ints represented by a 2d-array, and you know node 5 has an edge that goes to node 100 then it is all you have to do to access that neighbor of 5 is array[100]. Instant almost. But if you have a graph of lets say strings represented by a 2d array adjacency matrix, And you figure out that node "St. Louis" is connected to "Chicago", how do you get to Chicago's neighbors? Iterate through the entire list until you find Chicago. That is the only way I figure you can do it with an adjacency matrix anyway.

So if there are any other ways I could go about it, I would like to hear them. Or if the ways I suggest are basically the only practical ways to do it, I guess that is fine too.
> My first thought was put a bool in every location object.
> But then after the Breadth first search was over,
> I would have to make a function that sets each bool in each location to false
you could have std::vector<bool> visited; in the scope of the function.

> Another thought was to put a check before pushing locations or locations pointers on the queue.
> Search the queue to see if the location is there or not.
That approach is incorrect.
When you analyze a node, you remove it from the queue.
Later, that same node may become a candidate, but it shouldn't be pushed to the queue as it was already checked.
>That approach is incorrect.
>When you analyze a node, you remove it from the queue.
>Later, that same node may become a candidate, but it shouldn't be pushed >to the queue as it was already checked.
true, I would need a separate list to make that work, but finding each location would be something I would like to avoid.

Vectors are essentially arrays, so if I use one, then don't I have to correlate positions in the vector<bool> to locations/nodes? And if I did would that not amount to a list I have to check anyway?
> My first thought was put a bool in every location object. But then after the Breadth first search was over,
> I would have to make a function that sets each bool in each location to false.

It also violates the semantics in two ways. The bool flag does not have anything to do with the logical state of the vertex; it is just an artifact used by the search algorithm. And a const-correct search on a graph should not modify the graph in any way.


> Search the queue to see if the location is there or not. if it is not, then simply push that location on the list.
> But this could hurt speed performance, if there is a lot of data so it makes me feel a little uneasy.

To achieve a worst-case time of O( |V| + |E| ), we need to use an auxiliary hash table to keep track of visited nodes (look up in amortized constant time).


**** CAUTION: Don't read this part right now****

Untested code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#include <unordered_map>
#include <unordered_set>
#include <stack>
#include <queue>
#include <iostream>
#include <string>
#include <sstream>

template< typename T, typename REF = T >
using adjacency_list_t = std::unordered_map< T, std::unordered_set<REF> > ;

using adjacency_list = adjacency_list_t<int> ;

// input of pairs of adjacent vertices of a directed graph
// 1 7 => edge from 1 -------> 7
// 2 7 => edge from 2 -------> 7
// 1 5 => edge from 1 -------> 5
// etc.

adjacency_list make_adjacency_list( std::istream& stm )
{
    adjacency_list graph ;
    int a, b ;
    while( stm >> a >> b ) { graph[a].insert(b) ; graph[b] ; }
    return graph ;
}

template< typename STACK_OR_QUEUE, typename FN_NEXT >
bool graph_search( const adjacency_list& graph, int start, int target, FN_NEXT next )
{
    STACK_OR_QUEUE stk_or_queue ;
    std::unordered_set<int> visited ;
    stk_or_queue.push(start) ;
    while( !stk_or_queue.empty() )
    {
        int id =  next(stk_or_queue) ;
        std::cout << id << ' ' ;
        visited.insert(id) ;
        if( id == target ) { std::cout << " <= target\n" ; return true ; }
        else
        {
            stk_or_queue.pop() ;
            auto iter = graph.find(id) ;
            if( iter != graph.end() )
                for( auto& n : iter->second )
                    if( visited.find(n) == visited.end() ) stk_or_queue.push(n) ;
        }
    }
    std::cout << "  could not find target\n" ;
    return false ;
}

bool depth_first_search( const adjacency_list& graph, int start, int target )
{
    using stack = std::stack<int> ;
    return graph_search<stack>( graph, start, target, []( stack& s ) { return s.top() ; } ) ;
}

bool breadth_first_search( const adjacency_list& graph, int start, int target )
{
    using queue = std::queue<int> ;
    return graph_search<queue>( graph, start, target, []( queue& q ) { return q.front() ; } ) ;
}

int main()
{
    const std::string str = "1 2   1 7   1 8   2 3   2 6   2 10  3 4   3 5   3 11   "
                            "6 10   6 12  8 9   8 12   9 10   9 11   11 7   12 5" ;
    std::istringstream stm(str) ;
    const adjacency_list graph = make_adjacency_list(stm) ;

    std::cout << "DFS 1 to 5 => " ; depth_first_search( graph, 1, 5 ) ;
    std::cout << "BFS 1 to 5 => " ; breadth_first_search( graph, 1, 5 ) ;
    std::cout << "DFS 1 to 12 => " ; depth_first_search( graph, 1, 12 ) ;
    std::cout << "BFS 1 to 12 => " ; breadth_first_search( graph, 1, 12 ) ;
}

http://ideone.com/vcUBkM
I like the hash table idea. I don't know why I didn't think of that. Hash tables can be a bit on the memory intensive side, but sometimes, (and really a lot of the time) you can't have speed and low memory usage everywhere in the code.
Oh I know there a recursive version of breadthfirst search. Does the recursive version use queues?
Yes, either directly or indirectly. The simplest recursive versions just pass the queue along as an extra parameter to the recursive call.

Unlike DFS, which can naturally use the call stack in a recursive implementation, BFS is not the ideal candidate for showcasing the elegance of recursion.
Topic archived. No new replies allowed.