### stirngs

How many distinct substrings does a given string S have?
For example, if S = "abc", S has 7 distinct substrings: {"","a","b","c","ab","bc","abc"}. Note that
the empty string and S itself are considered substrings of S.
On the other hand, if S = "aaa", S has only 4 distinct substrings: {"","a","aa","aaa"}.
The first line of the input file contains N, the number of test cases. For each test case, a line
follows giving S, a string of from 1 to 1000 alphanumeric characters. Your output consists of one
line per case, giving the number of distinct substrings of S. Try to write an efficient program.
Sample Input
2
abc
aaa
Output for Sample Input
7
4

This is how I would check for a single string. You can write the reader for the sample input

 ``123456789101112131415161718192021222324252627282930`` ``````#include #include #include #include int main() { std::string toCheck("aaa"); //Find all substrings std::vector substrings; for(std::size_t i=0;i<=toCheck.size();i++) for(std::size_t j=i;j<=toCheck.size();j++) substrings.push_back(toCheck.substr(i,j-i)); //sort the substrings alphabetically - this step is required for the unique algorithm std::sort(substrings.begin(), substrings.end()); //eliminate duplicates std::vector::iterator it; it = std::unique(substrings.begin(), substrings.end()); substrings.resize(std::distance(substrings.begin(),it)); //answer std::cout<

hi:
i think i have an idea about the solution, try searching for the combinatorial function in maths, this function calculates the number of subsets of a given set of values, you can then make an easy algorithm to subtract the identical subsets.
i'm not putting any code, you have to do that part yourself, programming is about experimenting.
Last edited on
@ats15

Nice idea, however, I don't understand the following statement:

 ``123`` `````` for(std::size_t i=0;i<=toCheck.size();i++) for(std::size_t j=i;j<=toCheck.size();j++) substrings.push_back(toCheck.substr(i,j-i));``````

According to me, it should be something like:

 ``123`` `````` for (std::size_t i=0; i

sorry but i can't use vector...can u tell me a simple way to do it?
First, understand the program logic (the algorithm) from this code.
Then write a program which is your own program.

If you skip the second step, you would have learned nothing.
To learn even more: How much time does the program take? Can you make it faster?

 ``123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778`` ``````#include #include namespace no_stl { constexpr std::size_t MAX_UNIQUE = 1024*128 ; std::string uniques[MAX_UNIQUE] ; std::size_t n_unique = 0 ; bool is_unique( const std::string& str ) { for( std::size_t i = 0 ; i < n_unique ; ++i ) if( uniques[i] == str ) return false ; return true ; } std::size_t insert_unique( const std::string& str ) { if( is_unique(str) && n_uniquestd::size_t distinct_substrings( SET& set, const std::string& str ) { if( !str.empty() ) { for( std::size_t i = 0 ; i <= str.size() ; ++i ) set.insert( str.substr(0,i) ) ; distinct_substrings( set, str.substr(1) ) ; // O(N*N) } return set.size() ; } } #include int main() { { using namespace no_stl ; const std::string str = "abababc" ; distinct_substrings(str) ; std::cout << "#distinct substrings of '" << str << "' " << n_unique << '\n' ; for( std::size_t i = 0 ; i < n_unique ; ++i ) std::cout << "'" << uniques[i] << "'\n" ; } std::cout << "\n---------------------------\n" ; { std::set set ; // or use std::unordered_set const std::string str = "abababc" ; use_stdlib::distinct_substrings( set, str ) ; std::cout << "#distinct substrings of '" << str << "' " << set.size() << '\n' ; for( const auto& s : set ) std::cout << "'" << s << "'\n" ; } }``````

http://ideone.com/aOEVhB
@abhishekm71
 ``123`` `````` for(std::size_t i=0;i<=toCheck.size();i++) for(std::size_t j=i;j<=toCheck.size();j++) substrings.push_back(toCheck.substr(i,j-i));``````

In the first line i is the beginning of substring, in the second line j is the end position. It cannot be less than i. If j=i, then you have the empty string. You can start from i+1, but then you need to add it to your unique substrings. Of course, you are generating the empty substring a lot of times, but the program is supposed to keep only one of them. And it does :)
In the 3rd line j-i is the length of the substring. But I think the two loops are similar
ok.. thanks for the explanation ats15. I actually got confused earlier.

and thanks JLBorges for giving us the 2 options for this same operation without the need for sorting. Although please clarify the following:

 ``12`` `````` constexpr std::size_t MAX_UNIQUE = 1024*128 ; std::string uniques[MAX_UNIQUE] ;``````

1. Why have you used array instead of vector? Is it to optimize on time complexity (removes the need for multiple memory allocation in code)?
2. What is the significance of the number 1024*128? Why couldn't you just multiply it and define MAX_UNIQUE as ` const std::size_t MAX_UNIQUE = 131072 ; ` ? Any specific reason?
> Why have you used array instead of vector?

Because
 sorry but i can't use vector...can u tell me a simple way to do it?
and
 ****please dont use pointer. i haven't learnt them yet.

> without the need for sorting

The versions I posted are not more efficient. Standard library algorithms were not allowed; the intent was to make the code easy to understand; ergo linear search and append instead of sort/unique/erase.

> What is the significance of the number 1024*128? Why couldn't you just multiply it and define MAX_UNIQUE as `const std::size_t MAX_UNIQUE = 131072 ;` ?

I find it simpler to visualize the magnitude of the number as 128K if it is written as `1024*128` (or `128*1024` )
ok.. :)
A string with distinct components has exactly s*(s+1)/2+1 substrings where s is the string size(e.g. the string "abc" has 3*4/2+1=7 substrings and the string "abc...xyz" i.e. english alphabet with 26 characters has 26*27/2+1=352 substrings). This fact can be proved as follows:

Let S be a string with s distinct characters.

S = "C1C2...Cs"

Now, selecting substrings whith one component, we have exactly s substrings:

"C1", "C2",...,"Cs".

Now substrings of two components:

"C1C2", "C2C3",...,"Cs-1Cs".

Important note: the index of the first component of each substring gives its rank in the sequence and, of course, the last one gives the number of substrings, s-1.

So, the last substring with three components will be Cs-2Cs-1Cs and its rank will be s-2 who is the very number of substrings whith 3 components, s-2.

Continuing this process we have for the penultimate selection by s-1:

"C1C2...Cs-1", "C2C3...Cs" and, obviously, 2 substrings whith s-1 components each one.
And finally, the last substring is S itself:

"C1C2...Cs", one substring.

Reviewing the entire process we have the final result, adding all the obtained results:

Total_number = s + s-1 + ... + 2 + 1 or:
Total_number = 1 + 2 + ... + s-1 + s and adding the two sums we have:
-------------------------------------------------------------------------------
2*Total_number =(s+1)+(s+1) + ... + (s+1)+(s+1) here we have s times (s+1) and finally:

Total_number = s*(s+1)/2.

Adding the empty string we have the number of distinct substrings of a given string S whith s distinct components.

Important note: this process may be used as a guide to write the necessary code for the final step: removing duplicates if S has some identical components.
Last edited on
Topic archived. No new replies allowed.