How get all the words of same length from a string of thousands of words

How to get the words of same length from a string of thousands of words.

Please explain with the code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <iostream>
#include <string>
// #include <string_view>
#include <map>
#include <vector>
// #include <set>
#include <regex>

std::map< std::size_t, std::vector<std::string> > map_word_sizes( const std::string& text )
{
    // word: sequence of one or more alphanumeric characters
    // http://en.cppreference.com/w/cpp/regex
    static const std::regex word_re( "\\b\\w+\\b" ) ;

    std::map< std::size_t, std::vector<std::string> > map ;

    // iterate through each word in text
    // http://en.cppreference.com/w/cpp/regex/regex_iterator
    std::sregex_iterator iter( text.begin(), text.end(), word_re ), end ;
    // and add the word to the map
    for( ; iter != end ; ++iter ) map[ iter->str().size() ].push_back( iter->str() ) ;

    return map ;
}

int main()
{
    const std::string text = "How to get the words of same length from a string of thousands of words.\n"
                             "Please explain with the code.\n"
                             "\n"
                             "get to the individual words in the string; std::regex would be handy for this.\n"
                             "insert each word that is encountered into a map where the key is the\n"
                             "length of the word and the mapped value is the word itself.\n"
                             "\n"
                             "note: with C++17, to avoid the overhead of creating one string per word,\n"
                             "we can use std::string_view ie. std::map< std::size_t, std::vector<std::string_view> >\n"
                             "In the example, we use std::map< std::size_t, std::vector<std::string> > because if the\n"
                             "for almost all the words, small string optimisation would be applied.\n"
                             "\n"
                             "to get only unique words, use std:set instead of std::vector:\n"
                             "ie. std::map< std::size_t, std::set<std::string> >\n" ;

   std::cout << text << "\n\n---------------\n\n" ;

   // for each key-value pair in the map
   for( const auto& pair : map_word_sizes(text) )
   {
       // print the key (the word length)
       std::cout << "words of length " << pair.first << " [ " ;

       // print each word in the vector associated with this key
       for( const auto& word : pair.second ) std::cout << word << ' ' ;
       std::cout << "] (" << pair.second.size() << ")\n" ;
   }
}

http://coliru.stacked-crooked.com/a/6d704c677da1ad07
JL borges, thanks for the reply, but I am still a beginner and I am finding difficulty in understanding your code. Can you please explain it in a simple detail way.
Let us take it one small step at a time.

We know that a string is a sequence of characters. Given a long string, how do we get to the words in the string?
To start, we need a working definition of what a 'word' is.

So what would your definition of a word be? For example, if we have a string:
"I am still a beginner, and I am finding difficulty in understanding your code.",
what would be the words in this string? For instance, would beginner, and code. be words?
Or would the words be the characters without punctuation? ie. beginner and code

Would numeric characters be part of a word?
For instance, if we have "T-800 Model 101", would T-800 and 101 be words?

So what would your definition of a word be?
Once we have that, we can then look at how to access the different words in a string.
I have code like this and I need to extract the dictionary of words from the text and display the number of words.

Then I need to prompt the user for length of the word.

for example I have ["ape apple alphabet army back cook drink"] in my text file
If the user entered 4. Then it should print number of words that contain 4 letters. here it is 3.
Numeric characters will not be a part of it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int numWords;
	int number;
	ifstream file;
	file.open("dict.txt");
	string word;     // to extract the words in a file and then prints the number of words in it
	int wordCount = 0;

	while (file >> word)
	{
		++wordCount;
	}
         cout<<wordCount;
    // How to print the number of words in the text that contains all the words of particular length 


Pls, Help me with this.
1
2
3
4
if (word.length() == 4)
{
    wordCount++;
}
Last edited on
Note: this code uses C++11 features.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <iostream>
#include <string>
#include <fstream>

int main()
{
    const std::string file_name = "dict.txt" ;

    {
        // create the file containing the words.
        // you can omit this step since you already have the file
        std::ofstream(file_name) << "ape apple alphabet army back cook drink\n" ;
    }

    if( std::ifstream file{file_name} ) // if the file could be opened for input
    {

        // prompt the user for length of the word.
        std::size_t word_length ;
        std::cout << "word length? " ;
        std::cin >> word_length ;

        std::size_t total_words = 0 ;
        std::size_t words_of_required_lenth = 0 ;

        std::string word ;
        while( file >> word ) // for each word in the file
        {
            ++total_words ; // increment the count of total words

            // if the word is of the required length,
            // increment the count of words of the required length
            if( word.size() == word_length ) ++words_of_required_lenth ;
        }

        std::cout << "\nthere are " << words_of_required_lenth << " words of length "
                  << word_length << "\n(in all there are " << total_words
                  << " words in the file)\n" ;
    }

    else std::cout << "failed to open file '" << file_name << "' for input\n" ;
}

http://coliru.stacked-crooked.com/a/4bd946c39b4ea9cc
Thanks a lot, JL Borges..
Topic archived. No new replies allowed.