the use of string methods

Pages: 12
hi guy I m doin some exercises from the "jumping into C++" bool and this chapter is about strings. the problem I m trying to solve asks me to make a program which reads in 2 strings: a needle and an haystack. Then I have to count how many times the needle appears in the longer sentence (The haystack). Here' s what I ve done:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 #include <iostream>
#include <string>

using namespace std;

int main ()
{
    string needle;
    string haystack;
    int needle_appearances;
    int i = 0;
    cout<< "Enter the haystack: \n";
    getline(cin, haystack, '\n');
    cout<< "Enter the needle: \n";
    cin>>needle;
    for (i = haystack.find (needle, 0);
         i!= string ::npos;
         haystack.find (needle, i))
    {
         needle_appearances++;
         i++;
    }
    cout << "The word " << needle << " has appeared " << needle_appearances << " times in your haystack.";
}

But when I run it and I type in the needle it remains still for 3o secs and then the program quits. Could you tell me why?
In VS 2017 I get the following warning:
warning C6295: Ill-defined for-loop: 'int' values are always of range '-2147483648' to '2147483647'. Loop executes infinitely.

A better way to do it:
1
2
3
4
5
6
7
  int needle_appearances = 0; // important to give an initial value
  auto pos = haystack.find(needle, 0);
  while (pos != string::npos)
  {
    needle_appearances++;
    pos = haystack.find(needle, pos +needle.length());
  }


Enter the haystack:
red green blue green blue red
Enter the needle:
red
The word red has appeared 2 times in your haystack.
Sorry to change the subject here, but I see in your code, Thomas, the use of auto.

auto pos = haystack.find(needle, 0);

I wanted to know what the type of pos was and found out it is size_t;

When I change the word auto in your code to type int, the program compiles and runs but with a warning. If I replace auto with size_t it runs without any warning. In reading about size_t, I see it is an alias for unsigned int. But if I change auto to unsigned int, the program crashes. Could you explain why?
I am not an expert with the C++ standard, but AFAIK it's not certain that size_t is an unsigned int, it also could be a unsigned long or ...
In VS 2017 it works with unsigned int.
Best practice is to use auto so you can't do anything wrong.
Yes a size_t is an implementation defined unsigned type, usually either an unsigned int or an unsigned long, but it could be any unsigned type that meets the rest of the requirements for the type (ie it must be large enough to represent the "size" of any object).

But if I change auto to unsigned int, the program crashes. Could you explain why?

Probably because your compiler isn't using an unsigned int for a size_t.

In VS 2017 it works with unsigned int.

While it may "work" if you use an unsigned int, you should either use size_t or auto for the type. Both size_t and auto will insure you are using the correct type for this variable. By the way are you compiling as a 32 bit or 64 bit program, the size_t may differ with these two architectures ( implementation defined).

Last edited on
Ok, I am learning a lot here. So, a size_t is an unsigned something, maybe int, maybe long, maybe something else, but definitely unsigned. And at the risk of making everyone laugh, I was not at my normal computer today so I was using http://cpp.sh/ to compile. :P
closed account (E0p9LyTq)
http://www.cplusplus.com/reference/cstddef/size_t/?kw=size_t

https://en.cppreference.com/w/cpp/types/size_t
Thanks FurryGuy, but I read that already a few times now. I haven't quite got my mind to wrap around this. But lets say a size_t is the size in bytes of an object, then I understand that the output of sizeof(int) is 4 and 4 is not an int but a size_t, then what is its use in the example posted by Thomas, where pos is a position in the string. I thought a position in a string was an unsigned int, but not really, it is a size in bytes?
closed account (E0p9LyTq)
The size of size_t is NOT always 4, it depends on the compiler.

1
2
3
4
5
6
#include <iostream>

int main()
{
   std::cout << "The sizeof size_t is: " << sizeof(size_t) << '\n';
}


Visual Studio 2017, 32 bit:
The sizeof size_t is: 4


Visual Studio 2017, 64 bit:
The sizeof size_t is: 8


MinGW (GCC) 32 & 64 bit compilation has the same output.
Thanks again FuryGuy. So then the size of a size_t will change depending on its use and compiler settings. But the sizeof(int) remains unchanged. Is this true? If so then that explains why you could not always substitute size_t for an unsigned int.
Last edited on
closed account (E0p9LyTq)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>

int main()
{
   std::cout << "Computing the size of some C++ built-in variable types\n\n";

   std::cout << "Size of bool:               " << sizeof(bool) << '\n';
   std::cout << "Size of char:               " << sizeof(char) << '\n';
   std::cout << "Size of unsigned short int: " << sizeof(unsigned short) << '\n';
   std::cout << "Size of short:              " << sizeof(short) << '\n';
   std::cout << "Size of unsigned long int:  " << sizeof(unsigned long) << '\n';
   std::cout << "Size of long:               " << sizeof(long) << '\n';
   std::cout << "Size of int:                " << sizeof(int) << '\n';
   std::cout << "Size of unsigned int:       " << sizeof(unsigned int) << '\n';
   std::cout << "Size of float:              " << sizeof(float) << '\n';
   std::cout << "Size of double:             " << sizeof(double) << "\n\n";

   std::cout << "The output can change with compiler, processor type and OS\n\n";

   std::cout << "C++11 added four new standard variable types.\n";
   std::cout << "The values can change whether compiled as 32 or 64 bit.\n\n";

   std::cout << "Size of unsigned long long: " << sizeof(unsigned long long) << '\n';
   std::cout << "Size of long long:          " << sizeof(long long) << '\n';
   std::cout << "Size of long double:        " << sizeof(long double) << '\n';
   std::cout << "Size of nullptr:            " << sizeof(nullptr) << '\n';
}

VS2017, 32-bit:
Computing the size of some C++ built-in variable types

Size of bool:               1
Size of char:               1
Size of unsigned short int: 2
Size of short:              2
Size of unsigned long int:  4
Size of long:               4
Size of int:                4
Size of unsigned int:       4
Size of float:              4
Size of double:             8

The output can change with compiler, processor type and OS

C++11 added four new standard variable types.
The values can change whether compiled as 32 or 64 bit.

Size of unsigned long long: 8
Size of long long:          8
Size of long double:        8
Size of nullptr:            4

VS2017, 64-bit (some output omitted):
The values can change whether compiled as 32 or 64 bit.

Size of unsigned long long: 8
Size of long long:          8
Size of long double:        8
Size of nullptr:            8

MinGW, 32-bit:
The values can change whether compiled as 32 or 64 bit.

Size of unsigned long long: 8
Size of long long:          8
Size of long double:        12
Size of nullptr:            4

MinGW, 64-bit:
The values can change whether compiled as 32 or 64 bit.

Size of unsigned long long: 8
Size of long long:          8
Size of long double:        16
Size of nullptr:            8

Two different compilers, major differences in variable type sizes.

Never assume any built-in data type will be a certain size, period. It can create hard to find bugs.

POD sizes are supposed to be AT LEAST a certain size, the C++ standard doesn't specify the exact size they have to be. That is up to each implementation.
Last edited on
ok guys I don t know either what long or size_t is cause I haven't got ther yet. So I shouldn t even use them. But anyway I tried to initialize the needle_appearance but I goy always the same issue.... I don't really get why thou... It gives me 0 problems and 0 warnings :/
Some history, hopefuly I am getting this right :)

A long time ago someone decided that the word 'word' meant the 'size of the default instruction or data chunk that a cpu can consume', roughly. So way back when I was getting into it, computers were 16 bit and a word was 2 bytes. Then 32 bit machines came out, and words were 4 bytes. Now we have 64 bit but because of backward compatible confusion a word might be 32 or 64 depending on the compiler, OS, and settings. Soon we should have 128 bits; many PC already have registers this big but not a bus or instruction this size (yet).


All that to say 'size_t' probably "should be" the size of the word of the CPU, but it may not be for various reasons. Its easy to check it, but its easier to use auto. Given the backwards compatible focus, if we get 128 bit machines tomorrow it would not surprise me if they supported 4 byte words just as 64 bit machines do now.

Neither long nor size_t are well defined. Both can change by compiler settings/OS/etc. I like to use the explicit ones or auto (there are explicitly named integers that tell how many bytes they have, or bits, like __int64 in visual studio, or the C99 ones). Ill also use int, when I don't care at all.

I am not sure how to fix what you have without a bit of rewrite. Anyway, try this version/ approach?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

int main ()
{
    string needle = "needle";
    string haystack = "blahneedlexneedlexxneedleneedlen ";
    int needle_appearances = 0;
    int i = 0;
    //cout<< "Enter the haystack: \n";
    //getline(cin, haystack, '\n');
    //cout<< "Enter the needle: \n";
    //cin>>needle;
    while(i!= string ::npos)
  {
		 i = haystack.find (needle, i);
	 if(i!= string ::npos)
         needle_appearances++;      
                i++;   
    }
    cout << "The word " << needle << " has appeared " << needle_appearances << " times in your haystack.";
}




Last edited on
ok guys I don t know either what long or size_t is cause I haven't got ther yet. So I shouldn t even use them.

With this logic you shouldn't be using std::string.find() because you probably haven't studied this function as well since it returns a std::string::size_type which is also an implementation defined unsigned type (in fact it is often just another typedef for a size_t).



closed account (E0p9LyTq)
I don t know either what long or size_t is

If you don't know what a long is then your instructor has been criminally negligent.

long, also known as long int, is one of the fundamental data types in C and C++.
http://www.cplusplus.com/doc/tutorial/variables/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int main ()
{
    string needle = "needle";
    string haystack = "blahneedlexneedlexxneedleneedlen ";
    int needle_appearances = 0;
    int i = 0;
    //cout<< "Enter the haystack: \n";
    //getline(cin, haystack, '\n');
    //cout<< "Enter the needle: \n";
    //cin>>needle;
    while(i!= string ::npos)
  {
		 i = haystack.find (needle, i);
	 if(i!= string ::npos)
         needle_appearances++;      
                i++;   
    }
    cout << "The word " << needle << " has appeared " << needle_appearances << " times in your haystack.";
}

If I use this approach it doesn t even compile, because there's an error on the if statement within the while loop.

If you don't know what a long is then your instructor has been criminally negligent.

long, also known as long int, is one of the fundamental data types in C and C++.
http://www.cplusplus.com/doc/tutorial/variables/

i' m just reading the book I ve mentioned earlier and it hasn t talked a lot about bits, size_t and all this staff. I ve read the article you linked me but even there it doesn t really talk about size_t or long...
Anyway going back to the program I ve written, now it's how it looks like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <iostream>
#include <string>

using namespace std;

int main ()
{
    string needle = "needle";
    string haystack = "fgndndndtjnedj";
    int needle_appearances =0;
    int i = 0;
    cout<< "Enter the haystack: \n";
    getline(cin, haystack, '\n');
    cout<< "Enter the needle: \n";
    cin>>needle;
  for (i = haystack.find (needle, i);
         i!= string ::npos;
         haystack.find (needle, i))
    {
         needle_appearances++;
         i++;
    }
    cout << "The word " << needle << " has appeared " << needle_appearances << " times in your haystack.";
}

but when I run it here's what happens
1
2
3
4
5
Enter the haystack:
cat and cat
Enter the needle:
cat
The word cat has appeared -1 times in your haystack.

and it takes a veeeeery long time to output the last line :/
Last edited on
Line 18: add "i=" at the start:
i=haystack.find (needle, i))

Just a niggle: it's actually counting character sequences, not words.
Last edited on
and it takes a veeeeery long time to output the last line :/

You're lucky you got any output at all. You're invoking undefined behavior when your variable i overflows.

Do you understand what value std::string::npos holds?

Do you understand what std::string find returns when it fails to find the value?

You've already been told why you're having problems and how to fix the issue but you refuse to take that advise so the only solution is to stop using std::string.find and std::string::npos and write you're own function to find the value that doesn't use the standard methods, good luck.
16
17
18
	for (i = haystack.find(needle, 0);
		i != string::npos;
		i = haystack.find(needle, i + 1))


Check the bolded parts. The first of the three parts in the for statement specify what happens to initialize the loop. The thing we want to do initially is look for the first occurrence of the needle, so our offset when we look will be 0.

In the third part we specify a step we want to run at the conclusion of every iteration of the loop. In this case, we want to search again to see if we can find another occurrence of the string. However, where do we want to start looking for the next occurrence? If we start from the same place we found the last one (which is i, since that's where we stored what find returned) then find will give us back the same value, since it finds it right away! We need to start on the character after that, hence the i + 1.
booradley wrote:
We need to start on the character after that, hence the i + 1.

He is already accounting for that on his line 21
i++;
It definitely shouldn't be done twice, or it will miss single-character needles.

It only needs the "i=" part on line 18, to match that already on line 16 (and, ideally, making i a size_t).
Last edited on
Pages: 12