The Most Repeated Character In a String - Google Interview Question

I am starting to practice programming puzzles to prepare myself for my future interview questions.

I saw a question on YouTube about finding the most repeated character in a string in O(n), and not O(n^2).
https://www.youtube.com/watch?v=GJdiM-muYqc

It is easy to find O(n^2) algorithm, you simply have to compare every single character with other characters and increment the count, then return the most repeated character.

I found this algorithm online:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

#include <iostream>
#include <string>
using namespace std;

string mostFrequent(string text){
    int max = 0;
    int count = 0;
    string maxCharacter;
    for(char q=' ';q<='~';q++)
        
    {
        count = 0;
        for(unsigned int i=0; i<text.length();i++)
        {
            if(text[i]==q)
                count++;
        }
        
        if(count == max)
        {
            maxCharacter += q;
        }
        
        if(count>max)
        {
            max=count;
            maxCharacter=q;
        }
    }
    
    return maxCharacter;
}

int main(){
    cout << mostFrequent("DBCABAEEE");
    return 0;
}


It appears the time complexity to be O(n^2) since there are two for loops.

Is there any way we can implement this algorithm in O(n) running time?
Last edited on
It appears the time complexity to be O(n^2) since there are two for loops.


There are two loops, but each does not loop n times. The outer loop happens a constant number of times. The inner loop text.length() times. So it is O(n).
Last edited on
Just because there's a loop inside another doesn't mean that the time complexity is O(n^2). The time complexity of this algorithm is O(1):
1
2
3
4
for (int i = 0; i < 1000000; i++){
    for (int j = 0; j < 1000000; j++){
    }
}

The inner loop runs for text.length() iterations, and the outer loop runs for k iterations. In total, that's k*text.length(), or just k*n. Since big O notation compares the asymptotic behavior of algorithms, constant factors are ignored, therefore the time complexity is just O(n).

These two algorithms have the same time complexity:
 
for (int i = 0; i < n; i++);

1
2
for (int i = 0; i < n; i++)
    for (int j = 0; j < 1000000; j++);
Ok, but the outer for loop is running the ASCII character and symbols (we can consider this set of ASCII characters and symbols to be of a certain size n). That indicates for every iteration of outer loop we are comparing the second loop text.length()-1 times, so I think it is O(n^2).

Could you please explain what you mean by
"outer loop happens a constant number of times"
?
Last edited on
Ok, but the outer for loop is running the ASCII character and symbols.
we can consider this set of ASCII characters and symbols to be of a certain size n


About 95 of them. Not n. In this case, n is the variable size of the string to be examined.

So the outer loop will run 95 times. Not n. Not anyhting to do with length of the text. A fixed amount.

So we will loop 95*n times in total, which is O(n).
Last edited on
I don't know why my edited response went lower, but I think I am understanding what you mean.


@helois
I believe you mean every loop is running on its own, so that is O(n). Can you explain to me how would the algorithm in the question would become O(n^2)? What conditions make the time complexity to change?

@Repeater, yeah that is true, some running times are like 3n^2 + 2n + 5, and we only take the greatest leading variable so n^2, and we ignore constants. I understand what you mean now :)

I just started studying about running times and efficiency, so I'm getting there.
Last edited on
The Standard Library really is your friend. O(n):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <iostream>
#include <string>
#include <unordered_map>

int main()
{
  std::string s;
  getline( std::cin, s );
  
  std::unordered_map <char, std::size_t> histogram;
  for (char c : s) histogram[ c ] += 1;  // O(n) × O(1)
  
  std::size_t count = 0;
  for (auto p : histogram)  // O(n)
    if      (p.second  > count) { count = p.second; s = p.first; }
    else if (p.second == count) { s += p.first; }
    
  if (s.size() != 1) std::cout << "No one character most common.\n";
  else               std::cout << "Most common character is '" << s << "'.\n";
}
Last edited on
I believe std::unordered_map has O(n) lookup.
It is easy to find O(n^2) algorithm, you simply have to compare every single character with other characters and increment the count, then return the most repeated character.
That isn't O(n^2) because the second search isn't dependent on n, it's dependent on the number of characters in your alphabet, so it's O(n).
It seems fairly easy to do it in O(n).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int occurenceTable[0xff] = {0}; //O(1) space complexity

for(char c: someString)   //O(n) loop
 ++occurenceTable[int(c)];

char mostOccurent;
int max = 0;

for(unsigned i = 0u; i < 0xff; ++i) //find max in O(1)
{
    if(occurenceTable[i] > max)
    {
        max = occurenceTable[i];
        mostOccurent = char(i);
    }
}


Also, OP, your algo is O(n)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
    for(char q=' ';q<='~';q++) // this is constant so O(1)
        
    {
        count = 0;
        for(unsigned int i=0; i<text.length();i++) //this is linear, so O(n)
        {
            if(text[i]==q)
                count++;
        }
        
        if(count == max)
        {
            maxCharacter += q;
        }
        
        if(count>max)
        {
            max=count;
            maxCharacter=q;
        }
    }
Last edited on
> I believe std::unordered_map has O(n) lookup.

In pathogenic cases. It should be f(1) normally. (It is supposed to be a hash table lookup.)


Everyone is falling into a too-tricky/too-neat problem: What is the most common character in

    abba 

?
Topic archived. No new replies allowed.