Find the first nonrepeating character in a string?

Pages: 12
Given a string "teeter", the first non repeating character would be 'r'.
in "toothless", it would be 'h'.

I'm wondering about the most efficient way to get this done? One option is to use a hash table, with the characters in the string as keys, and frequencies of each character (key) as values. Then return the first key with the value "1" (meaning it only appears once).

This seems to be a good way of doing it, but how exactly would I implement a hash table or array in which the indexes are chars?

What would be the big-O complexity of this algorithm?

Thanks a lot!
This sounds like an "algorithms" class problem.

Hints:

1 - How many times do you need to traverse the string to determine if a letter is repeated?

2 - For practical purposes, stick to the ASCII set (and a simple array). Time complexity won't significantly increase with a hash table, but coding complexity does.

3 - Big-O counts repetitions. See hint 1.

Hope this helps.
No it's actually from the book "Programming Interviews Exposed." I'm preparing for an interview, and don't exactly get the explanation the book gave for this problem...
The book implemented the proposed solution in Java, using the HashTable class. I was wondering how to to this in C++.

I don't know how to use the ASCII set (I just know that it is a set of integers representing each character), or how to use that as the INDEX of the array.

The book also mentions that an array is good for ASCII strings that are large, but for Unicode strings it would be inefficient, and that Hash Tables would be great for small strings of any character set...

How would I even make an array where I can use the chars as indexes for look up?

Thanks!
I think the trick is to write a program that starts from the center character and moves outwards, therefore 'r' would be the first non repeating character in teeter, and 'h' would be the first non repeating character in toothless, 'l' being the 2nd, 'e' being the 3rd. a single pass should do it
Last edited on
A chararacter is just another number.

For example, here is a simple substitution cipher:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <cctype>
#include <iostream>
#include <string>
using namespace std;

const string k = "LBXZCEGFHPIJKTMVRNOSQDUAWY";

string encode( string s )
  {
  for (unsigned n = 0; n < s.length(); n++)
    if (isalpha( s[ n ] ))
      s[ n ] = k[ toupper( s[ n ] ) - 'A' ];

  return s;
  }

string decode( string s )
  {
  for (unsigned n = 0; n < s.length(); n++)
    if (isalpha( s[ n ] ))
      s[ n ] = k.find( toupper( s[ n ] ) ) + 'A';
  return s;
  }

Notice how the character itself is used as an index into k.

Also, since you only need to operate on one pass through the source, and one pass through the array (or hash), your algorithm is linear, or O(n).

Good luck!

[edit] Why add complexity by starting at some random point in the string?
Last edited on
But how would you know if a character is repeating? More importantly, what about a case such as "151823440"? If I understood properly, your algorithm would return '2' as the first nonrepeating char, but the correct answer is '5'.
I guess you wouldnt have to start at the center as h is the first non repeating character from left to right, jeesh, i really need to focus
@Duas: I see, thanks!!!
But how would you know if a character is repeating?

You are going to have to count them. Let's take an "algorithms class" approach to the problem:

Let's start with a simple algorithm:

for each character,
traverse the string counting its number of occurrences
if it was not repeated, break, otherwise repeat

This is an algorithm, with quadratic efficiency or O(n2), similar to a bubble sort where there would be two nested for loops. It can solve the problem but is not the most efficient.

A better algorithm is to make a single pass counting characters and storing their occurrences in some kind of container, such as a hash table, in one for loop and then extracting the data from the container. As Duoas said, this would be linear efficiency or O(n).

It's often useful to think about an algorithm in simple terms and then optimizing it to improve its efficiency.
Last edited on
I was looking for code to practice today and remembered this topic, I wrote the following code, and even though it has goto statements all over the place, it works. I could have used functions but I was just messing around with routing possibilities. It has a single pass through the string that will first check the characters already disqualified, then check against the characters not yet tested for non repeating status. Once a non repeating character is encountered, the program outputs the result and exits, it will preserve the case of the character, and it will also output the appropriate message if there are no non repeating characters present in the string. Its kinda neat. Anyhow, here it is;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
int a,b,c,y;
char s1[80], s2[80], s3[3];

  cout << "Enter your word or sentence.";
gets(s1);

  cout << "Your word or sentence is: " << s1 << "\n\n";
  cout << "The length of your string is: " << strlen(s1)<< "\n\n";

strcpy(s3,"xxx"); strcpy(s2,s1); strcat(s1,s3);

for(c=0;s1[c]; c++) s1[c] = tolower(s1[c]);

a=0;b=a+1;

la1:
  if(a==strlen(s1)-3) goto last_char;
  if(a>=1)goto previous_char;

la2:
  if(s1[a]==s1[b]){a++;b=a+1;}
  if(b==strlen(s1)-1)goto finis1;
  if(s1[a]!= s1[b]){b=b+1;goto la1;}
    else {a=a+1;b=a+1;goto la1;}

previous_char:
  for(y=a-1;y>=0;y--)if (s1[a]==s1[y]){a=a+1;b=a+1;goto la2;}
  goto la2; 

finis1:
  if(a==strlen(s1)-2) goto last_char; {
    cout  << "Your first non repeating character is '"<< s2[a] << "'.\n";;return 0;}

last_char:
  cout << "There are no non repeating characters \n";return 0;

return 0;

}






Your word or sentence is: Happy, Happy, Joy, Joy, We're Not Happy Enough!!!

The length of your string is: 46

Your first non repeating character is 'W'






Argh! That's madness!

But since we're doing the OP's homework for him, lets have him fail properly, shall we?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <algorithm>
#include <cctype>
#include <functional>
#include <iostream>
#include <iterator>
#include <map>
#include <string>
using namespace std;

struct count_t: public map <char, int>
  {
  // This container associates each insert()ed character with 
  // the number of times it has been inserted.
  //
  // You can test whether or not a character is listed in this container
  // with an insert count of one using the match1() method.
  //
  typedef map <char, int> ::iterator iterator;
  void insert( char c ) {                      (operator [] ( c )) += 1;          }
  bool match1( char c ) { return count( c ) ? ((operator [] ( c )) == 1) : false; }
  };

int main()
  {
  count_t counts;

  string s, t;
  cout << "This program finds the first alphanumeric character that has no repeat.\n"
          "Please enter the text to evaluate:\n"
          "> "
       << flush;
  getline( cin, s );

  // We are only interested in alphanumeric characters, so we'll populate our
  // counts using a copy of the user's input string where all unwanted
  // characters have been removed.
  remove_copy_if(
    s.begin(),
    s.end(),
    back_inserter( t ),
    not1( ptr_fun <int, int> ( isalnum ) )
    );

  // Count the occurrances of each char in the modified string
  for_each( t.begin(), t.end(), bind1st( mem_fun( &count_t::insert ), &counts ) );

  // Now find the first character in the original string whose count == 1
  string::iterator found = find_if( s.begin(), s.end(), bind1st( mem_fun( &count_t::match1 ), &counts ) );
  if (found == s.end())
    {
    cout << "Your text does not contain any non-repeating alphanumeric characters.\n";
    }
  else 
    {
    cout << string( distance( s.begin(), found ) + 2, ' ' ) << "^here\n"
         << "The first non-repeating character is '" << *found << "'.\n";
    }

  return 0;
  }

This program finds the first alphanumeric character that has no repeat.
Please enter the text to evaluate:
> Happy, Happy, Joy, Joy, We're not happy enough!
                          ^here
The first non-repeating character is 'W'.

A few notes about the algorithm. It is not possible to solve this problem without some form of lookback. What that means is that the input string cannot be traversed only once. At worst, it must be traversed at least twice. Thats O(n+n) == O(2n) == O(n).

My program above traverses the string three times, because of the extra step on lines 37-42 to remove all non-alphanumeric characters. This could have been combined into the next step (line 45) by adding the alphanumeric condition into the insert method (line 19), but that is less general. Either way would have been fine. It is just that this way allows more convenient control over which characters are ignored and which are not -- it keeps the two considerations (which characters to ignore and counting given characters) separate.

My program also makes full use of the STL to do stuff for me, instead of doing it myself, so it is much easier to read and maintain. The purpose of a high-level language is to enable programmers to think high-level thoughts instead of pretending to be an assembler. Two months from now I (or anyone else) can still come back and instantly understand what it is that this code does.

My generous use of whitespace and commentary makes it look longer, but it is actually shorter code also.

Hope this helps.
cool, thanks for the code, I like that you are only interested in alphanumeric characters, I didnt think of that. It was a fun project in any event Duaos
this is madness!!
http://www.youtube.com/watch?v=AZPRf8qL8h0
also I appreciate and understand the concept and importance of object oriented programming, polymorphism, encapsulation, & inheritence. I am just beginning to learn about classes now, so I'm still early on, so even though my code is often way out there, it helps me get more familiar with the syntax and flow of the simpler commands and arguements. And also I totally agree with your assesment that the goal is high level language needs high level thoughts, so thank you for that reminder i need a kick in the butt from time to time
:)
@Duoas:

You could store, in addition to the count, the offset in the string of the first occurrence of the character. That would
eliminate the extra traversal, but then you'd need a boost::multi_index_container so you avoid a linear search to
find the "left-most" unique character.

Just a thought...

And since I particularly like celebrating boost when it comes to providing homework solutions, you could also
eliminate some of the mem_fun stuff by using boost::lambda :)

Hmm, that's an interesting thought. (Which I had not thunk.)

Hey, BettyBoopTS, don't take it too hard. I'm not trying to beat-up on you. I get a good kick in the pants often enough myself.


For example, I am currently working on my wife's business webpage. I did something beautiful, using pure CSS and HTML (no stinkin' JavaScript). Then I began testing it in major browsers. Guess what?

IE proved its worth to me again.

Firefox and Opera http://home.comcast.net/~mlyn7576/temp/ff-opera.png
Internet Exploder 6 http://home.comcast.net/~mlyn7576/temp/ie6.png

What the heck!? I still don't know how to make IE stop goobering my images. So now I'm rewriting using stupid image maps and js for IE. "Internet Standards" indeed. Loopy bunch of nonsense... @#$()*!@&#(*@&%)!!&^%&#(@!

(Oh, and here's Safari http://home.comcast.net/~mlyn7576/temp/safari.png )

What I'm trying to decide right now is whether it is worthwhile to maintain two separate versions of the home page (which automatically switch to the JS version for IE users), or whether I should give up the beautiful CSS and just use the JS version for everyone. (The vast majority of her clientel will be likely be using IE with JavaScript in its default-enabled state.)

Too bad I'm not an HTML/CSS guru -- I barely know what I am doing to begin with and random nonsense like this just ticks me off. I mean, cummon, it isn't like my code isn't valid CSS3 and strict HTML DTD 4.01 or anything... That ought to count for something...

[/rant]

Alas.
:-)
I don't take it hard but i am always willing to take advice : ) .

If this doesnt work out I can always open a restaurant/ bar called "The Last Byte" with user stations built into the table at every seat, and all along the bar,,serve drinks like "The infinite loop" that could be much likea long island ice tea",the "Semantic sugar"(gotta check if Helios has a copywrite on that statement first though)which would be like a Mint Julep,, the "srand', where the bartender adds random liquors and ingredients to the drink, stuff like that. Then of course the food,,, starting of course with the signature "The Last Byte ",, a 16 ounce steak with the fixings; "Spaghetti Logic ", spaghetti of course,'' order of french fries would be called 'an array of 'integers', and onion rings would be 'string of characters', a kiddie meal could be called a 'newbie', and many others!.

hmmmm,, probably have to move to Cambridge or the Silicon Valley to have it truly be succesful.

; )
Last edited on
I think it's brilliant! :-)
I read the other posts...
Here is a simple code but given the purpose exposed by "stingblah" in his posts, i think it woud do or at least give u a hint, I hope.... :)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#include <iostream>
using namespace std;

int main ()
{
	char str[20] = "eeeiorganization", nRepeat = ' ';
	
	for(int i = 0; i < (signed) strlen(str); i++)
	{
		bool rep = false;
		int cont = 0;
		if( i > 0 && str[i] == str[i-1]) continue;
		for(int j = i + 1; j < (signed) strlen(str); j++)
		{			
			rep = (str[i] == str[j]);      // Reduntant I acknowledge a simple if would 
			if (rep) cont++;		 //	do, but feeling kinda lazy!
		}
		if (cont == 0)
		{
				nRepeat = str[i];
				goto answer;
		}
	}
answer:
	cout << "El character that aint repeat is: " << nRepeat << '\n';
	return 0;
}


You would add some function to compare if lower or capital letters maybe....
Also if someone knows a better way to get out of the inner loop without the goto-label it'd be highly appreciated.... Cottections are welcomed!
That makes 4 different implementations posted on this thread...

Anyone for a fifth????
OK. :-]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main()
  {
  char     s[ 1024 ];
  unsigned slen;
  unsigned tries;
  int      found = -1;

  srand( time( NULL ) );

  printf( "Enter> " );
  fgets( s, sizeof( s ), stdin );
  slen = strlen( s );
  if (s[ slen - 1 ] == '\n') s[ slen - 1 ] = '\0';

  tries = 1000000;

  while (tries--)
    {
    unsigned    index = rand() % slen;
    const char* p     = strchr( s, s[ index ] );

    if ((p - s) == index) p = strchr( p + 1, s[ index ] );

    if (p == NULL)
      {
      found = index;
      }
    }

  if (found < 0)
    {
    puts( "I do not think that your string has non-repeating characters." );
    }
  else
    {
    printf( "%7s%*s%s\n", " ", found, " ", "^here" );
    printf( "The first non-repeating character might be '%c'.\n", s[ found ] );
    }

  return 0;
  }

I find that it is occasionally correct. XD
Pages: 12