is there another faster way to delete a string

//
// main.cpp
// characterRemover
//
// Created by Van Ha on 6/7/16.
// Copyright © 2016 Van Ha. All rights reserved.
//
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>


void removeDuplicateStr(char * str);
int main() {
    
    std::cout << "Hello, World!\n";
    char  testString [] = "computer is fun and intersting";
    
    removeDuplicateStr(testString);
    return 0;
}

void removeDuplicateStr(char *str)
{
    long len = strlen(str);
    bool foundIt;
    for (int i = 1; i < len; i++)
    {
        foundIt = false;
        for (int j = 0; j < i; j++)
        {
            if (str[i] == str[j])
            {
                foundIt = true;
                break;
            }
        }
        if (foundIt)
        {
            for (int j = i + 1; j < len; j++)
            {
                str[j - 1] = str[j];
            }
            len = len - 1;
            i--;
            str[len] = 0;
        }
    }
}
Last edited on
For those who are curious, this program removes all but the first occurrence of every character from a string.

Just for fun:

1
2
3
4
5
6
7
8
9
10
11
12
13
# include <array>
# include <algorithm>
# include <iostream>
# include <iterator>

int main() {
  std::cin.unsetf(std::ios::skipws);

  using it = std::istream_iterator<char>;
  std::copy_if(it{std::cin}, it{}, std::ostream_iterator<char>(std::cout),
               [found=std::array<int, 0xFF>{}](char c) mutable 
               { return !found[c]++; });
}

http://coliru.stacked-crooked.com/a/7fc43a1d1927c4f9
Last edited on
Perhaps.

Your algorithm, pseudo:
for each character in the input
  if it is not a keeper
    move all the remaining characters left by one

The thing is that you might do a lot of moving. That is not necessary.

For example the input is 'aaaaa'. We know that they are all 'a', so lets refer to them as '123450'
The second a exists, so you move the rest into:
134500
then 145000
then 150000
then 100000
done.

Look at
http://www.cplusplus.com/reference/algorithm/remove/
There is no str[j - 1] = str[j];
There is *result = *first;. They don't need to be consecutive.

That will get from 123450 to 100000 in one step.

However, for input 'aabaa' you have to
abaa
ab
done.

Therefore, adapt the idea of std::remove:
for each character in the input
  if it is a keeper
    append it to the result
Do you actually have to remove the chars from the original string?
It's easier to copy the unique char to a separate string - and might have better performance.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// assumes that dest is at least the length of src
char* removeDupliateChars(const char *src, char *dest)
{
  int j = 0;
  for (int i = 0; src[i] != '\0'; i++)
  {
    if (strchr(dest, src[i]) == nullptr)
    {
      dest[j] = src[i];
      j++;
    }
  }
  dest[strlen(src)] = '\0';
  return dest;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <iostream>
#include <cstring>
using namespace std;

void removeDuplicates( char *str );


int main()
{
   char testString[] = "computer is fun and interesting";
    
   cout << "Test: \"" << testString << "\"\n";
   removeDuplicates( testString );
   cout << "Result: \"" << testString << "\"\n";
}


void removeDuplicates( char *str )
{
   int len = strlen( str );
   int pos = 0;                        // where the next distinct character will be put
   
   for ( int i = 0; i < len; i++ )
   {
      char ch = str[i];                // test character
      if ( !ch ) continue;             // already blanked; ignore and go to next character
      str[pos] = ch;                   // move (forward) to current end of result - the ONLY MOVE ON THIS PASS
      pos++;                           // ready for next distinct character

      for ( int j = i + 1; j < len; j++ )
      {
         if ( str[j] == ch ) str[j] = '\0';      // blank out any matches in the rest of the string
      }
   }

   str[pos] = '\0';                    // null terminator for distinct characters at front
}


Test string is "computer is fun and interesting"
New string is "computer isfnadg"
O(N) time, O(1) space
(same idea as mbozzi's, but remove duplicates in situ and take care of default char being a signed type):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <limits>

char* remove_dups( char* cstr )
{
    if( cstr != nullptr )
    {
        // *** note: unsigned char (default char may be a signed integer type)
        bool found_earlier[ std::numeric_limits< unsigned char >::max() + 1 ] {} ;

        char* dest = cstr ;
        unsigned char ch ; // *** note: unsigned char
        for( char* srce = cstr ; ( ch = *srce ) != 0 ; ++srce )
        {
            if( !found_earlier[ch] )
            {
                *dest++ = ch ;  // invariant: dest <= srce
                found_earlier[ch] = true ;
            }
        }

        *dest = 0 ; // null terminate
    }

    return cstr ;
}

http://coliru.stacked-crooked.com/a/1b7f11a7330dae89

Note: would be able to use std::experimental::ranges::copy_if for this if the Ranges TS is available.
minor but is making the bool table static and clearing faster than recreating it? Not sure. It would take a LOT of strings to notice any difference if there were one.
> is making the bool table static and clearing faster than recreating it?

This would typically be slower.

Allocating space for one extra object on the stack is usually a zero cost operation:
for example, sub rsp, 32 vs. sub rsp, 288

The cost of stack allocation can be saved only if this array is the sole object that needs to be allocated on the stack. Even here, allocating the object every time on the stack may be faster; memory on the stack tends to have much better cache-locality than memory elsewhere.

As a general rule, use a local object with a static storage duration if and only if its previous state must be remembered and reused.
Topic archived. No new replies allowed.