Binary Search Tree Duplicates Question

Hello,

I had a simple question on BST's, on dealing with duplicate entries. I see things about whether its good to put them on the left, right, or not allow them. I was wondering, if instead of each node only storing the value, it also stored the number of occurrences, then there wouldn't be any duplicate nodes but you would also be able to track duplicate entries. I was wondering, is there any downside to this? I understand that if there are not many duplicated, it could increase memory uses, but if you reasonably expect a large number of duplicate entries, I would think it could shrink the size of the tree significantly, therefore reducing search time as well as memory? Thanks
Last edited on
It depends.

Say you are storing numbers. Sure, you store one "1" and you have stored them all. In this case, a tree that counts occurrences would be fine.

Say you are storing a Car class:
1
2
3
4
5
6
7
class Car
{
  std::string make;
  std::string model;
  int year;
  int mileage;
};


One typically should not change the data used to map the value into the tree, so you probably shouldn't use mileage as part of the key. When you add cars to the tree that happen to have the same make, model, and year, are they really the same car? For this situation, you would want a tree that puts duplicates to the left or the right.

But, back to counting occurrences. This is were I believe you may rethink the design of your data structure. Associative containers map a key to a value, and in your case, I believe you are using the value as the key. This is totally fine, but it isn't helping you count. You have made a set:
http://www.cplusplus.com/reference/set/set/

Another type of associative container uses a key that is separate from the value, a map:
http://www.cplusplus.com/reference/map/map

If all you are trying to do is maintain a duplicate system, then a map is your friend:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>
#include <map>
#include <string>

int main( void )
{
  std::map< std::string, int > word_counter;

  std::string temp;
  while( std::cin >> temp )  // read words until EOF
    word_counter[ temp ]++;  

  std::cout << "The counts are:\n";
  for ( const auto& pair : word_counter )
    std::cout << pair.first << ": " << pair.second << '\n';

  return 0;
}
It depends what problem you are trying to solve. Usually BST is used to store key value pair data where the key is used for searching. Assuming BST is using a balanced tree (splay, red-black tree, etc..), we allow duplicates because it guarantees of O(log n) look up time when looking for several duplicated keys with unique data. Using your method, the tree will not be balanced and may cause the node to be really fat.
Topic archived. No new replies allowed.